BY
ABAD ALI
DEPARTMENT OF STATISTICS
G C UNIVERSITY, LAHORE (PAKISTAN)
DECLARATION
I, Abad Ai Roll No. 0563-M.Phil-STAT-2011, student of MPHIL in the subjects of Statistics
Session 2011-13 hereby declare that the matter printed in the Thesis on A COHERENT
EXAMINATION OF RAINFALL AND FLOOD DATA IN SOME SELECTED SITES OF
PAKISTAN" is my own work and has not been printed, published and submitted in the form of a
thesis in any University or Research Institute, etc. in Pakistan or abroad.
Dated: ______________
ABAD ALI
Supervisor
Dated: _____________
__________________________
Prof. Dr. Saleha Naghmi Habibullah
Visiting Professor
Department of Statistics,
GC University Lahore
Submitted Through:
________________
___________________
Controller of Examination
GC University Lahore
GC University Lahore
ACKNOWLEDGEMENT
None is worthy of praise except gracious ALLAH, Who created the Worlds of numerous
creatures in the capacity of Absolute Authority. Almighty Allah has opened the new dimensions
of knowledge for me and has led me to complete this task. All my respects to Almighty Allahs
last Prophet HAZARAT MUHAMMAD(peace be upon him) who is the great mentor of the
world. He enabled us to recognize the Creator of the world and to understand the philosophy of
life.
I feel great pleasure in expressing from the core of my heart gratitude to my Supervisor who has
been cooperative in all circumstances. I am extremely appreciative of her keen interest,
motivational behavior, tolerance and inspiring guidance that enabled me to surmount this uphill
task. It has been a great honor for me to work under her supervision. Her comments and valuable
suggestions have played a vitally important role.
I am also very thankful to Prof. Sam C Saunders, Prof. Emer. Washington State University,USA
for his valuable comments, suggestions and guidance throughout the period of my research.
I gratefully acknowledge Mr. Jaffer Hussain,Chairperson, Department of Statistics, GC
University,Lahore for his polite, helping, encouraging and motivational behavior to complete this
task. Last but not the least, I would like to thank my dearest mother and my family members.
ABBREVIATIONS
AEP
MOM
Method of Moments
ECDF
MLE
GOF
Goodness of Fit
EVT
GEVD
IWD
PPT
MAPE
MAD
MSD
TABLE OF CONTENTS
Chapter 1 Introduction
1.1
Preliminary Remarks
1.2
Global warming
1.3
Extreme Events
1.4
Precipitation
1.5
Rainfall
1.5.1
Intensity Of Rainfall
1.5.2
Rainfall measurement
1.5.3
1.5.4
Effects on agriculture
1.5.5
1.5.6
1.6
1.6.1
Winter
1.6.2
Monsoon
1.6.3
Pre Monsoon
1.6.4
Post Monsoon
1.7
1.7.1
Hydraulic Structure
1.8
Flood
1.8.1
Flood damaging
1.8.2
Health Hazard
1.8.3
1.8.4
Education
1.8.5
Energy
1.8.6
1.8.7
Environment
1.9
Chapter 2
Literature Review
2.1
Introduction
Chapter 3
Methodology
3.1
Introduction
3.2
Quantile
3.3
Exceedence Probability
3.4
Method of Estimations
3.5
Method of moments
3.6
3.7
3.8
Q-Statistics
3.9
Autocorrelation
3.10
Class of Distributions
3.10.1
3.10.2
3.10.3
Exponential Distribution
3.10.4
Gamma Distribution
3.10.5
Normal Distribution
3.10.6
3.10.7
Logistic Distribution
3.10.8
Nakagami Distribution
3.10.9
Weibull Distribution
3.10.10
3.10.11Rayleigh Distribution
3.10.12
Frechet Distribution
3.11
3.12
Probability Plots
3.13
Utilization of Software
Chapter 4
4.1
Introduction
4.2
Linear Models
4.3
4.4
4.5
4.6
4.7
4.8
4.9
4.10
4.11
Dir Rainfall
4.12
Kohat Rainfall
4.13
4.14
4.15
4.16
Muzafarabad
4.17
4.17.1
Lag Correlation
4.17.2
Cross correlation
Chapter 5
Record values
5.1
Introduction
5.2
5.3
5.4
5.5
5.6
5.7
5.8
5.9
Chapter 6
Stationary Models
6.1
Introduction
6.2
6.3
6.4
6.5
6.4
6.4.1
6.4.2
Tarbela Site
6.4.3
Shahdara Site
6.4.4
Mangla Site
6.4.4
Muzafarabad Site
6.4.4
Balakot Site
6.4.4
Risalpur Site
6.4.4
Kohat Site
6.4.4
Dir Site
Chapter 7
Probability distributions
7.1
Introduction
7.2
7.2.1
Exponential Distribution
7.2.2
Gamma Distribution
7.2.3
Normal Distribution
7.2.4
Log-Normal Distribution
7.2.5
Logistic Distribution
7.2.6
Nakagami Distribution
7.2.7
Weibull Distribution
7.2.8
Burr Distribution
7.2.9
7.2.10
Rayleigh Distribution
7.2.11
7.2.12
Frechet Distribution
7.3
7.4
7.5
Kohat
7.6
Chapter 8
Expected loss
8.1
Introduction
8.2
8.3
Pareto Distribution
7.4
8.5
8.6
Method of Moments
8.7
8.8
REFERENCES
CHAPTER 1
INTRODUCTION
1.1
Preliminary Remarks
Climate change is one of the hottest topics among the scientific community of the world today.
In particular, the phenomenon of global warming is a cause for grave concern for
meteorologists, oceanographers, and many other categories of scientists. United Nations bodies
(such as the World Health Organization) and national organizations (such as the US
Environmental Protection Agency) are investigating the risks to the inhabitants of this world due
to global warming.
One very important area of concern linked with the phenomenon of global warming is the
occurrence of flooding that is liable to cause loss of life and heavy damages to property. In
particular, people in the developing countries suffer heavily due to the damages caused by
excessive rain and flooding. Pakistan, for example, has experienced a number of floods during
the past few decades some of which caused excessive damage to life and property.
1.2
Global warming
The phenomenon of global warming has been taking place on planet earth for the past 15,000
years. Increase in the temperature of the planet gradually is considered as global warming. The
global warming is also called greenhouse effect. The greenhouse gasses (i.e carbon
monoxides,carbondioxide, sulpher dioxide etc) are the main caused of global warming. The
human beings, industries and vehicles are released these gasses.According to the scientists, these
gasses affect the atmosphere desperately. It caused to make a hole in the ozone layer, which is
working as a protector to the earth against the ultravioletrays released by the sun. Due to this
phenomenon, the earth received a higher temperature that caused global warming.
According to the opinion of a vast majority of the scientists today, this undesirable phenomenon
is the result of industrialization, significant increase in the human population and other factors
for which we ourselves are responsible.
Global warming exerts a variety of negative effects on the planet and its inhabitants such as
reduction of territories, damage to marine ecologies, destruction of seasonal insects and many
others. Extreme events such as high storms, cyclones and hurricanes can cause enormous
damages and destruction of infrastructures. People may experience increased water-loss from
reservoirs due to dryness, long summers and short winters as well as extreme temperature both in
summer and winter.Sometimes, the situation may become very grave such as in the case of
severe famines.
1.3
Extreme Events:
Any event is considered an extreme event if its amount differs from its normal value to a greater
extent, for example, inundation droughts and earthquakes etc. Extreme events affect human life
and property to a large extent. As such, attempts aimed at accurate modeling of extreme events
carry great significance for protecting human life and property.
During the past few decades, researchers have been greatly interested in studying extreme events
including both lower and upper extreme events. The smallest value of a data-set and the largest
value of a data-set, both are studied as extreme events. Analysis of record values (lower and
upper) based on some probability distributions have been studied.
1.5
Rain fall:
Precipitation in liquid form is called rain. Surface water on earth crust is evaporated by sun rays
then converted into clouds followed by the returning back to the earth surface in the form of
drops. This type of precipitation is called rainfall.
1.5.1
Intensity of rainfall:
Rainfall has broader effect on the socio-economic and human culture, so it is necessary to
measure it up. The intensive levels are classified as followed. Light rain is rated as 0.098
inches/hour.
The rain is considered moderate as it lies between 0.098 to 0.3 inches per hour. Heavy rain is
packed as 0.3 to 2 inches per hour. Extreme rain is reduced as 2 inches per hour to maximum.
1.5.2
Rainfall measurement:
Sectors like industry forestry and agriculture require swift updating of rainfall measurement.
Standardized rain gauge is used to detect rain and snow. Rain gauge is a device to measure the
depth of precipitation per unit area (m2) counted as millimeters. One litter water is precipitated
as if there is one millimeter per square meter rain fall. There is another unit as inches/square foot.
A rain gauge is a funnel with its upper end opens a storage beaker to measure depth of stored
water.
Pakistan meteorological department measure the rainfall at different stations and record is being
kept at different intervals like daily, weekly, monthly and annual basis.
1.5.3
Rainfall is a natural consequence of climate influence here on earth especially for those areas
which are far from irrigation systems. Rainfall has a splendid effect on human beings including,
mode, celebrations, socio economic sectors, poetry etc.
1.5.4
Effects on agriculture:
Rainfall is a natural phenomenon that has been specious impact on human existence. After some
regular intervals the rainfall is necessary for the plants to survive and nourish. Excessive
irregularity in rainfall patterns affects adversely the agriculture sector and its allied socioeconomic sectors like irrigation pattern and grain storage along with fodder.
1.5.5
Rainfall has a great effect on the socio-cultural aspect of life. The socio-cultural aspect has much
relation to the economy. A society having strong economic grounds has a much developed
culture. Agricultural economies are largely affected by the rain fall patterns. So there life styles
and culture are also getting affected. Rainfall has a direct influence on the behavior and moods of
people. Excessive rain getting country considers sunshine a blessing, a dry land devoid of rain
considers a single drop of rain a heavy blessing. Absence of rain and floods has some great
psychological and social effects. Poetry verses, objects, music attitudes, literature writings are
also affected by the rain. After all rain as a weather event have a great impact on human life.
1.5.6
agricultural country with latitudes 24N to 37N and longitudes 61E to 71E. Some areas receive
heavy rainfall, some with moderate and some with areas receiving light rainfall. Monsoon areas
get heavy rainfall through monsoon season and floods are common happening due to lack of
proper management of resources.
1.6
Pakistan has a variety of weather including every type of nature. It following has four seasons.
1.6.1
Winter
Pre-Monsoon
Monsoon
Post-Monsoon
Winter
The winter season in Pakistan almost existing in December, January, February and March but it
has a variation in different areas of Pakistan. Some areas received high cool and some are
slightly cool. The area of Himalayan received a heavy snowfall in winter season.
1.6.2
Monsoon
The monsoon is very popular in Asia due to the change in climate and the occurrence of rainfall.
This word (monsoon) is derived from Arabic word (Mawsam) and Portuguese word (moncao).
The winds in monsoon season enter in Pakistan from the Indian Ocean and Arabic Sea. These
monsoon winds caused a heavy rain and storms in the related areas of Pakistan. These are also
caused of floods in affected areas. The extreme peak of monsoon is happened in August. Its
duration is consists of the interval (July to September).
1.6.3
Pre Monsoon
Pakistan usually received a dry and hot weather in summer. From one aspect, this hot season is
harmful for Pakistan. Because, it melts the ice on glaciers which caused heavy floods like flood
of 2005 in Pakistan. The monsoon has started at the end of summer season thats why the
summer season is also called pre-monsoon. The summer season is consist of the period from
April to June.
1.6.4
Post Monsoon
Monsoon duration ends up to the last of September. Its duration is very short including October
and November. In this period a few rain has been received which is very useful for agriculture
point of view.
1.7
Proper Planning for any kind of formulation and structures is based on proper forecasting about
the event. For the purpose of weather records and forecasting Pakistan Meteorological
Department is working. In 2010 the highest temperature in Pakistan was recorded on 26 of the
May in Mohenj-o-daro ,in the province of Sindh. It was the most reliable measurement of the
hottest temperature in Asia ever recorded.
1.7.1
Hydraulic Structure
Pakistan has possessed the multi kind of land. It consists of Mountains, River, Desert cultivated
land etc. It has been classified in two main regions according to its geographical importance. The
first is based on Indus basin and second is based on the dry areas. Irrigation system of Pakistan
based on different rivers including (i) Satluj (ii) Jehlam (iii) Chenab (iv) Indus (v) Ravi. The
Hydraulic structure of Pakistan depends upon Dams, Weirs, Barrage, Rivers, lacks etc.
Pakistan has faced a dual type of problems, scarcity of rainfall as well as excessive floods and
rainfall. In some cultivated areas of Pakistan irrigation through the rainfall is the best irrigation
system. But the lack of preservative methods of overflow rainfall water caused huge type of
floods.
Global warming involved two main categories caused the change in climate (i) the natural
variability and (ii) the human activities. The greenhouse gasses, use of fossil fuel, properties of
the land surface, features of vaporizers and natural phenomenon are the major causes of the
global warming. Due to the large extent in temperature as a result the glaciers are melting rapidly
which caused the overflowing of the water named floods.
1.8
Floods
Floods are the most caustic natural catastrophes that occur in many parts of the world. These
floods have been renowned as the most costly orthodox hazards having high tendency to destroy
properties as well as human beings. These are very hard to predict due to the involvement of
many unnatural and natural factors in process of its occurrence.
An extreme situation due to the excessive rainfall leads to excessive losses of life and property.
This excessive level of water converted into a flood. Pakistan is being faced high intensity of
floods and flood damaging from last few decades.
1.8.1
Flood damaging
The monsoon in Pakistan occurred very severely, despite the forecasting with very low average
of the rainfall it comes with a huge amount in the mid of August in Southern areas of Pakistan, a
heavy rainfall is observed every year. The maximum rainfall is seemed in the beginning of July
and continued till the last week of September
1.8.2
Health Hazard:
The health infrastructure in rural areas is available for the sake of every kind of health and
provides the basics first aid. These infrastructures are being damaged with rainfall. Basic health
units and rural health centers suffer most damage. Millions of dollars invested in health sector
are ruined recklessly. The access of rainfall and flood also needed to investigate along with
health hazards measurements.
1.8.3
Agriculture has the central role in the growth of economy. Being a primary activity it engages a
larger number of work forces for hand work. The major source of the livelihood of Pakistan
population depends directly or indirectly on agriculture. The Rabi crop has been known as the
main crop wheat which is the staple food of major portion of population. Fruits consisting
grapes, citrus, mangoes and vegetables include potato, tomato, chilies and onion. Livestock is an
integral part of agricultural scenario. Buffalo and cattle are main source of milk, meat and hides
along with drafting power. Fodder crops include wheat straw and maize thinning.
1.8.4
Education
Education department has a significant effect on the economy and the development of any
country. The education institutions in those areas (school Madras and colleges) are spread over
the distance. These institutions are constructed irrespective of such kind of safety aspects like
floods, heavy rainfall and earthquake. A heavy proportion of such institutes have been affected
completely or partially due to the flood. In flood (2011) the 4096 educational institutes were
damaged in Sindh and Baluchistan. In Sindh, among the total number of damages which is 3892,
the 1032 Girls schools are damaged completely or partially.
1.8.5
Energy
Energy department is considered as a backbone of any country. There are many mega and small
units are working to fulfill the requirements of energy for multiple uses. In Pakistan the
following units are working for this purpose
(i)
(ii)
(iii)
Thermal Plants
Hydro-electric plants
Small Nuclear Plants
Most of the energy depends on the hydrological department. Heavy rainfall and floods caused a
huge damaging in these units as well. A well and preplanned policies can save much heritage and
reduce the cost of damaging.
1.8.6
Communication and transportation is a need of time in recent as well as the conservative period.
The recent world is required global network to improve the basic factors involved in the
infrastructure of any community. A large number of modern sources of communications and
transportations exist over there. The fundamental sources in Pakistan are as follows
(i)
(ii)
(iii)
Roads
Railway Lines
Airports
The total area of Pakistan is 796095 Sq/km. In which the 259618 km are interlinked by roads,
7791 km area is connected by railway lines, and there are 42 airports in Pakistan. The flood of
2011 has destroyed the communication and transport infrastructures including coastal highways
roads, railway network etc. From the two provinces of Pakistan, five districts of Baluchistan and
eighteen districts of Sindh received a large amount of destruction in this field.
1.8.7
Environment
The environment provides the basics for every society and it is highly affected by the extreme
changes. Pakistan is already facing composite complications of different kind of disease which
are related to the environment.
The objective of this study is to assess the appropriate statistical distributions which are used for
forecasting of the extreme rainfall and flood in coming years. The main objectives of this
research are
i.
ii.
iii.
1.9
Pakistan receives very small amount of rainfall in most areas of Pakistan especially in those
areas whose are located below the latitude of 32 degree. The study area consists of mostly the
northern sites of Pakistan and Azad Jammon & Kashmir. The data has been gathered from the
Meteorological Department of Pakistan. We have selected the different rainfall and flood sites in
this analysis. As the rainfall sites are not very close to the flood sites but the geographical map of
these sites lies within the region covered both (rainfall and flood) areas.
This study is useful for prediction of the worst rainfall in coming years. It is based on the rainfall
data collection and collective compilation of available data by the contributing of Meteorological
Department of Pakistan.
Pakistan is an agronomic country. The ecological situation of Pakistan lies in between
37 N latitudes and
61 E -
24 N-
is a vast diversity present in its climate. Some areas of Pakistan acquire a high rainfall, some get
impartial and some receive very small amount of rainfall. Pakistans monsoon regions usually
receive heavy rainfall during the monsoon period, which results in flood due to lack of proper
water resource management and planning.
1.10
S/N
Latitude
Longitude
Balakot
N
34.33
E
72.21
Dir
35.12
Kohat
33.35
Elevation
Years
Lengt
Site
995.40m
1977-
h
36
Description
Rainfall
71.51
1375.0m
2012
1977-
36
Rainfall
71.26
489m
2012
1977-
36
Rainfall
4
5
Mangla
Marala
33.14
32.67
73.64
74.46
147m
2012
1925-
89
Flood
250m
2013
1925-
88
Flood
56
Rainfall
Muzafarabad
34.22
73.29
702m
2012
1955-
Risalpur
34.04
71.58
1014m
2010
1977-
36
Rainfall
820m
2012
1925-
88
Flood
148m
2012
1977-
36
Flood
8
9
Shahdara
Terbela
34.15
34.74
73.49
72.48
2012
Table 1.3
Table contains the root Map from Terbela dam To mangla dam through Dir, Risalpur, Kohat and
Balakot
Where point A indicates Terbela flood site,point B indicates Dir rainfall site, point C indicates
Risalpur rainfall site, point D indicates Kohat rainfall, point E indicates Balakot rainfall and
point F indicates Mangla flood site.
CHAPTER 2
LITERATURE REVIEW
2.1 Introduction
Global warming is recently a great issue all around the World. The glaciers are melting
due the warmer temperature year by year with the passage of time. Change in temperature caused
the extreme events like temperature, rainfall, high floods etc. Increasing intensity of rainfall
caused high level of floods.
In the half of 20th century Professor Gumbel first time suggested the application of extreme
values distribution. (Gumble, 1941) used extreme value distribution for empirical analysis. The
statistician and engineers used it frequently later on. He used a meteorological variable (annual
flood flow, maximum precipitation etc) in 1941.
Huff and Neill (1959) used maximum magnitude of rainfall in Illinois and compared five
different statistical distributions. They analyzed annual maxima and seasonal for 1 to 10 days of
period. They used 30 stations having 40 years as a size of data analyzed and compute useful
statistical results. Method of moments and least square method are compared. The difference
between the results found insignificant.
Hershfield (1962) investigate the AMS series for the data based on the period of 24 hours rainfall
in USA. He used Gumble distribution which seems a good fit and give significant results
Alexander (1963) used the method of storm transposition for estimating the frequency of rare
events (Alexander, 1963).
Markovic (1965) used five probability distributions on the annual precipitation data along with
the river flows in Canada and USA based on 2506 gauge stations. These five distributions
named as candidate distribution including normal, log normal with two parameters, gamma of
two parameters, log normal of three parameters, gamma of three parameters. He found that the
gamma and log normal as insignificant results. Gamma distribution of one parameter has also in
significant results over three parameters (Markovic.1965)
Dickinson (1976), used extreme value distributions on rainfall data to developed some useful
rainfall extreme value distribution. He used the data of Southern Ontario from three stations. He
also suggested the analysis of seasonal patterns to estimate the rainfall run offs.
Lanwehr et al. (1979) used three methods for estimating the parameters which have lot of interest
in inferential statistics. Method of moments, Maximum likelihood and probability weighted
moment (PWM) method used in gumble distribution and made the comparison between these
three methods. He proved that PWM is good fit and best from all three methods.
In 1984. Stren and Coe fitted the non-stationary markove chains to the occurrence of rainfall.
They applied the Gamma distribution with using the different values of its parameters to the time
of years for total of the rainfall. They calculated the useful results from the models used for
rainfall data for prediction and planning.
In 1992, Haktanir applied thirteen different distributions to annual Rood peak series for more
than 30 observations taken from 45 unregulated streams in Anatolia. Parameters of these
distributions were mostly estimated by using method of maximum likelihood method of
moments and probability weighted moments (Haktanir, 1992).
The World Meteorological Organization in (1989) published a summary report favoring the
policy makers and engineers. In this report, different methodologies were discussed to estimate
the extreme events and utilization of different distributions to the data.
Akosy(2000), suggested that the gamma distribution found to be an appropriate distribution in
daily rainfall analysis. He used Markove chains to determine the dry and wet days through the
use of Gamma distribution. He used Gamma distribution to generate the sequence of such kind
of daily rainfall data.
Koutsoyiannis & Baloutsos, (2000) investigate the largest record of a long period of time
consisting of 136 years in Greece based on maximum annually rainfall data. They
furthermoreadvised that the use of (type 1) extreme value distribution was not feasible in its
conventionally used and suggested that generalized extreme value distribution is much better for
record values analysis and proved to be a good predictor for return periods.
Park et al. (2001) studied the maximum summer rainfall in South Korea from sixty one gauge
stations. He estimated L moments from Wakeby distribution and quantile estimation for the
different return periods. He renowned isopluvial maps for different return periods to the
estimated designs.
Kuczera (2001) utilized a comprehensive study of at site frequency flood data. He also used
Monte Carlo Bayesian method to estimate the confidence limits of quantile and the expected
probability distribution for any kind of frequency distribution of flood.
Pathak (2001) analyzed frequency analysis in South Florida Water management district for short
period of rainfall. The data used in this study based on the time interval of January 1, 1900 to
December 31, 1999. He used one day, three day and five day periods for highest rainfall.
Zalina et al, (2002) made a comparison between eight different candidate distributions to find the
best and reliable estimator of maximum annual rainfall in Peninsular. They applied extrapolation
of quantiles and a goodness of fit test and found that the generalized extreme value distribution is
a good fit to data.
Coles et al. (2003) used the concept of Beyesian inference in the modeling of rainfall data on
daily basis. He found a distribution which can predict the extreme rainfall in the coming years.
Ware & Lad (2003) used two different methods (frequentists and conventional) to make a
comparison of precision and accuracy of regional and at-site analysis of flood quantiles for
Wainmakariri River. He found that the frequentists method proved to be good as compared to
estimates by conventional method.
Ahsanullah, Chan and Balakrishnan (1993) discuss the recurrence relations between the
product moments of the extreme value distribution which are based upon record values. They
also settled the product and single moments utilizing these relations in very simple way.
Kamps (1995)investigate the order statistics and record values. He found someuseful
results which are applied to obtain the relations including explicit expressions and recurrence
relations for the moments of generalized order statistics of pareto, power function and Weibull
distributions. He gave the idea of generalized extreme values and their properties. Also derived
the joint density function of the first r and nth uniform generalized order statistics .He purposed
some necessary and fundamental conditions for the existence of moments of the generalized
order statistics.
Barbson and Palutikof (1999) used extreme wind speeds for frequency analysis based on five
Scottish islands. He applied the Generalized Pareto Distribution to the data and found the failure
behavior of the distribution in the presence of non-stationary in the wind speeds.
Pawlas and Szynal (2000) discussed some necessary conditions of characterization for
inverse Weibull and generalized extreme value distribution by the help of the moments of kth
record values.
In 2006 Thompson et al. Firstly introduced global index for earthquakes similar to index flood
method. They used the from 46 regions around the globe and showed that GPA and GEV are the
best fit to the magnitude and annual maximum series by using goodness of fit test and Lmoments diagram.
Soliman, Abd Ellah and Sultan (2006) investigate the Bayesian analysis of the Weibull
distribution having two parameters on the basis of record values.And estimate the Bayes and
Maximum likelihood estimators of record values. The hazard and reliability functions were also
discussed.
(Change, 2007) has reported that the eleven years from 1996 to 2006 were the warmest years.
Inthe third assessment report he had compared two intervals of the years to see the change in
linear trend. He found that from the years 1906 -2005 has a 0.74 linear trend which is increased
from 0.560C to 0.920C
and in the interval 1901 to 2000 years had received 0.6 0C change in
temperature. There is double linear warming trend in temperature of last half century as
compared to the change in temperature measured in last century.
Kao (2008) conducted an at-site frequency analysis of rainfall by using hourly precipitation dat
for 53 gauging station in Indiana. A combination of generalized extreme value distribution and
extreme value type-1 distribution was used to find at-site estimates and these estimates, at site
and regionals were compared. He found that the regional estimates were not better as compared
to at-site estimates.
Khan et al. (2008) found that the Frechet distribution is very useful and flexible having a
property to converge in different distributions. They used Monte Carlo Simulation to compare
the shape and scale parameters. They also gave some important results between the relation of
shape parameter to the Mode, Mean, Median, Variance, Coefficient of variance, Kurtosis and
Skewness. They also used mathematical and graphical technique for theoretical analysis of
Frechect distribution.
Sultan (2008) used Bayesian and Maximum Likelihood method to estimate the
parameters of Frechet distribution. He worked out with two different cases, one is estimation of
both parameter (shape and scale) being unknown and other is keeping location parameter as
known. He estimated the hazard function, survival rate and made a comparison between mean
square errors which is estimated by simulation method.
Kwon et al. (2008) used the gumble mixed model to analyze the bivariate storm frequency
analysis. They used hourly rainfall data collected for 34 years at Jecheon station in Korea. They
estimated the bivariate return periods, joint return periods and conditional return periods of storm
events.
Jakob et al. (2009) investigated the pattern of extreme rainfall in Sydney Australia in the absence
of stationary in the data. The data was based on the years 1921- 2005 of Sydney Observatory
Hills. He found that the rainfall pattern in Australia showed a large amount of variation both
seasonally and spatially.
In 2009, NOAA National Climate Data center composed a climate report indicating that the
temperature in last few decades had increased. This report was based on 300 scientists from all
over the world including 160 research groups. They used ten indicators to see the behavior of the
weather and temperature. Among these ten indicators seven are found to be increasing which
indicates increasing temperature duringthe last few decades.
Table 1.2
Indicators of Warming World
Sr #
1
2
3
4
5
6
Indicators
Air temperature near surface
Humidity
Snow cover
Temperature over oceans
Sea surface temperature
Sea ice
Increasing
Yes
Yes
No
Yes
Yes
No
Decreasing
No
No
Yes
No
No
Yes
7
8
9
10
Yes
Yes
No
Yes
No
No
Yes
no
Chapter No 3
Methodology
3.1
Introduction
The purpose of this analysis is to evaluate the relationship between the length and intensity of the
extreme events (like heavy rainfall and worst flood)to the chances of these events to be
happened. In this study we have taken annual floods peaks of the four dams of Pakistan and
some annual rainfall data of different stations. These rainfall stations have mostly nearby
locations of the dams. The empirical data analysis has been done by using descriptive summary
and graphical techniques. The graphical techniques including (probability plots, histograms,
density graphs, empirical distributions function, etc) which can be used to find the probability
density function suitable for relevant data.
3.2
Quantile
Quantiles have the vital importance in the field of statistics. It splits the distribution into the
desired parts as many as a researcher needed. Particularly in the sector of hydrology, the
estimation of the intensity of rainfall, floods, drought and storms etc. It provides the basis for
future planning polices and hydraulic designs. Quantile is a value of extreme event that had a
particular probability of exceedence and a specified return period of that extreme event.
3.3
Exceedence Probability
It is simply a probability that any particular event will across the specific value. In frequency
analysis of flood peaks, the probability of flood height cross the available capacity is known as
exceedence probability.
The Annual Exceedence Probability (AEP) is the expected chance of the occurrence of the
natural hazard event (such as rainfall or flooding event) within a year it is mostly expressed in
percentage form. Extreme flood events occur (exceeded) rare times. Then the event will have a
lesser annual probability. It is denoted by
n 1
AEP 100
Where k is the rank of the observed values and n is the total observations.
3.4
Method of Estimations
There are many methods and advance techniques available for estimating the parameters.
Different researchers used different techniques according to their convenient approach.
Followings are some methods which are commonly used
Method of moments
Maximum likelihood method
Weighted probability moments
3.4.1
Method of moments
Method of moments (MOM) is very conservative and oldest technique. It is developed and used
by Karl Pearson (1857-1936). In this method the sample moments (Raw moments) is being
equated to the corresponding population moments.
The non-central rth population moments are calculated as
x f x dx
r
mr
3.4.2
1 n r
X i , r
n i 1
Maximum likelihood method is a well-known and most frequently used method for estimating
L x1 , x2 ,..., xn ;
the unknown parameters. Let
x1 , x2 ,..., xn
is the value of
L x1 , x2 ,..., xn ;
that maximizes
then
L .
0
or
LogL .
0
2 L .
0
2
(3.4.1)
The equation (3.4.1) gives the maxima or minima of the given distribution
L .
The function
log L .
and
likelihood function Log L(.) is proved much easier to find the values of
as compared to
likelihood function.
Crammer Rao Inequality can be used to find the variance of MLE
2 log L .
V E
2
n
sample size becomes large enough i.e
, then the unique consistent MLE is exist.
ML estimator attain the normality assumption when n becomes very large
MLE are efficient and consistent when n increases
A sufficient estimator always found by ML method if the sufficient estimator exists.
The ML estimator is biased and can be moderately biased when sample size is quite
In time series analysis, the long term movement is called a trend. Trend can be of two types (i)
increasing with the passage of time (ii) or decreasing with time. A time plot is used to see
whether it is increasing or decreasing. Time plot is a simple graph which represents the relation
of the time to the corresponding values.
3.6
Q-Statistics
The Ljung-Box Q-statistics at lag k is a test statistic used for the null hypothesis i.e, there is no
autocorrelation up to order k . It is calculated as
k
QLB n (n 2)
j 1
rj2
n j
rj
Where
the j-th autocorrelation and n is the number of observations. If the series does not based
~ (k )
2
2
That is asymptotically distributed as a
autocorrelations.
3.7
Autocorrelation
There are three types of the data used for empirical analysis.
Cross section
Time series
Pooled
The pooled data is a combination of time series and cross sectional data. Time series data is an
important type of data for any empirical analysis. A number of assumptions had been made for
developing the models on these types of data.
The examination of the relation of two or more set of variables has always a great interest for
investigators. This kind of relationship is considered as correlation among those variables.
The correlation between two sets of random variables(X, Y) is the interdependence between
these random variables.
The correlation is measured by using the formula:
E X E X Y EY
E X E X EY EY
2
( x x )( y y ) / n
i
i 1
( xi x ) 2 / n
i 1
( y y)
i 1
/n
cov x, y
SxS y
Autocorrelation provides a good lead to investigate the properties of a time series. The auto
correlation
is
the
simple
( x1 , x1 k ) , ( x2 , x2 k ),..., xn k , xn
correlation
between
pairs
of
observations,
k 0
r (k )
(x
t 1
x(1) )( xt k x(2) ) / ( n k )
n
(x x )
t 1
where
nk
/n
x(1) xt / n k , x(2)
t 1
t k 1
/ n k and
x xt / n
t 1
The cross sectional data is collected through a random sample of cross-sectional units. For
example from a data of households consumption collected through sample survey, one cannot
believe in advance that the random error term of one household is correlated with another
household. If such type of correlation exist in the cross-sectional units is called Spatial
Autocorrelation.
3.7
Class of Distributions
The probability distributions are used in various fields of research (hydrology, economic
variables, civil engineering designs and models, weather forecasting and flood risk
management).
The following distributions are used to carry out the analysis of rainfall and flood data.
3.10.1
The extreme events give very negative impacts in some fields. These events are very rare in
happenings but have great consequences. For example large amount of snowfall, extreme floods,
high temperatures and storms or wind speeds etc. The most researchers and analysts used EVT
(extreme value theory) for developing the suitable models to evaluate the loss and risks due to
the extreme events.
The probability density function of extreme value distribution is as follows:
f ( x)
1
x
x
exp[(
) exp{(
)}]
where
Where is a scale parameter and is a shape parameter. If Z follows a weibull distribution with
parameters (,) then the Log(Z) is followed as Extreme Value Distribution with
log and
3.10.2
1
x k 1
x k
[1 k
] exp[{1 k
]
k 0
k 0
k 0
k 0
3.10.3
Exponential Distribution
f ( x) e
where
0 X
0
F ( x) 1 e x
3.10.4
Gamma Distribution
f ( x) x
a 1
Where
e b
b a a
x o, a , b 0
a X a1e x dx
0
3.10.5
Normal Distribution
The Normal distribution is the very well-known and frequently useable continuous distribution.
This distribution has another name as Gaussian distribution. Its pdf is as
f ( x, , , )
1
( x )2
exp[
]
2 2
2
and 0
3.10.6
The log normal (3P) is distribution which belongs to a class of continuous distributions. It has
three parameters , and .The probability density function is
f ( x, , , )
1
{log( x ) }2
exp[
]
2 2
( x ) 2
ox
o, 0, 0
3.10.7
Logistic Distribution
The logistic distribution is also belonging to a family of continuous distribution. The shape of
logistic distribution is similar to normal distribution its peak is higher than Normal. The logistic
distribution is useful Hydrologic records (discharge in rivers, rainfall etc)
x
)]
f ( x)
x 2
[1 exp{
}]
exp[(
3.10.8
, 0
Nakagami Distribution
The Nakagami distribution was proposed by (Nakagami, 1960). It is useful to develop models for
the fading of radio frequency or signals. Application of this distribution spread around many
fields like communications, hydrology, analysis of multimedia, traffic over networks and
ultrasound data etc.
f ( x)
2 2 1
x2
( ) x
exp(
)
x0
3.10.9
Weibull Distribution
In a large group of famous distributions, Weibull distribution is very useful to analyze the life
time data. The Inverse Weibull distribution is also pay a vital role for predicting and analysis of
many extreme events like earthquakes, rain fall, sea currents, floods and wind speeds etc.
Applications of the Inverse Weibull distribution in many fields given in Harlow (2002) who
found importance of this distribution for modeling the statistical behavior of material properties
for applications in the field of engineering. Nadarajah and Kotz (2008) pointed the sociological
models based on Inverse Weibull randon variables.The scale form of the Inverse Weibull
distribution has its density function given by
f x cx c 1e x
x 0, 0, c 0
While the location-scale form of Inverse Weibull distribution has its density function given by
c
x
c x c 1
f x
e
x 0, 0, c 0, 0
Where c is the shape parameter, is the location parameter and is the scale parameter.
3.10.10
The inverse Gaussian distribution derived in 1915 by Schrodinger. In 1945 Tweedie proposed the
name of this distribution as inverse Gaussian distribution. In 1947 Wald revised this distribution
and suggested it as a limiting form of samples of sequential probability ratio test. Thats why the
inverse Gaussian distribution is also called Wald distribution. The pdf of it is as follow:
12
( x ) 2
f x [
] exp
2 x3
2 2 x
x0
, 0
and
3.10.11
Rayleigh Distribution
)
2
2 2
f x
x0
>0
3.10.12
x2
)
2 2
Frechet Distribution
The French Mathematician Maurice Frechet (1878-1973) gave a limiting distribution of the
sequence for local maxima that provides the scale normalization (Frechet, 1927).
x
f ( x)
0, 0, and x 0
Where is a scale parameter is a shape parameter and is a location parameter. And the
cumulative distribution function is
x
F ( x ) exp
3.11
The probability distributions have been applied on the different site of rainfall and flood data in
intermediate step. After that a goodness of fit test is carried out to see whether the distribution is
good for available data. For this purpose the following test are used
Chi Square goodness of fit test
Chi-square test has a wide application in the literature and commonly used for investigating the
good fit of any particular distribution to the data.
Chi-square test with null hypothesis
H0 = Distribution is a good fit for data
H1 = Distribution is not a good fit for data
Test statistic:
n
i 1
f0 fe
fe
fo
Where
fe
known as observed frequencies. Whereas
2 2 ( v )
Where v is the degree of freedom
The conclusion is based on critical region and calculated value of chi square. If the calculated
value of chi square is greater than the critical value then one can reject the null hypothesis
otherwise accept.
Kolmogorove Smirnov test
Kolmogorove Smirnov test is another tool for testing the goodness of fit to the specified
distribution. The null hypothesis is used under this test is as
Ho : selected sample is drawn from the specified distribution.
H1 : selected sample is not drawn from the specified distribution.
The test statistic is used
Dn sup x Fn X F X
sup x
The
The critical decisions based on the value of Dn , if the value of Dn closer to zero
then distribution is considered a good fit to the data.
3.12
Probability Plots
Probability plots are commonly used as graphical technique for checking the basic
assumption about the nature of the data. The given data is plotted versus the
theoretical distribution and investigate the place of points around the line. If the
mostly points lie around the straight line then the theoretical distribution followed
observed data. We have used some probability plots to see the behavior of the data.
3.13
Utilization of Software
The work has been done by using different statistical software including MATLAB 5 ,
SPSS 16, MINITAB 15 and EASYFIT. Some graphical analysis is obtained using
R LANGUAGE.
CHAPTER 4
LEAST SQUARE ANALYSIS
4.1
Introduction
The researchers are always interested in the nature of relation between the variables. For
instance, a researcher is wantedto determine the relationship between the disasters and extreme
eventssuch as rainfall, storm, hurricane, earthquake and flood etc.
A number of works have been made to find the better and precise methods for the estimation of
linear models and fitting the data in recent years but the Least Square method is still dominant
and used as an important tool of estimating the parameters.
Least square methods is perhaps the most widespread technique in the field of statistics. There is
several factors behind this fact. Mathematically,the use of squares makes least square method
very submissive because the Pythagorean theorem directs when the error term is independent of
an estimated quantity one can might be add the squared error and squared estimated quantity.
Another mathematical aspect is the involvement of arithmetic tools ( eigen-decomposition,
derivatives and singular value decomposition ) in the construction least square method for the
relatively long period of time.
As this method is shown by its name Least Squares which is obtained by minimizing the sum
of squares of the deviations from the corresponding population observations. Method of least
squares is the combination of different observations as being the best estimate of the true value;
errors decrease with aggregation rather than increase by Roger Cotes(1722).
4.2
A preliminary examination of data has been done by fitting a straight line and some graphical
techniques to see what kind of variation exist. For this purpose , we fit a straight line to the data
to see if the slope is positive and to what degree. The least square method is commonly used to
find the estimates of the parameters. In this case a similar technique is used to find the estimates
and utilization of those estimates for the prediction of the diversity of rainfall in coming years. It
is suggested by Sam C.Saunders Prof. Emer. Washington State University.
Consider the yearly (maximum) flood height data over a period of say n years where there may
yi i j
be missing observations. The data is
j 1, 2,3,..., n
where
measurements of the recorded yearly flood height on the ith year of the sample sequence.
A preliminary examination of data can be done by using some graphical techniques. Perhaps the
following simple types of examinations can be completed using elementary procedures. Consider
the expected model for the rainfall and flood data is
E Xk k
for k = 1,2,4, . . .n
(4.2.1)
{ X k }k=1
n
S Xk k
k 1
(4.2.2)
yi i j
Consider the yearly (maximum) flood height data over a period of n years. The data is
j 1, 2,3..., n
where
yi
.These
j
on the ith year of the sample sequence. Let
1
j
y i.
i j
(4.2.3)
^ and
We are to obtain the LS-estimators, say
found from solving the simultaneous equations by setting partial derivatives equal to zero. i.e
S
=0
and
=0
j 1,2,3,..., n
First assume that
i
i 1
n n 1
2
i
and
i 1
n n 1 2n 1
6
(4.2.4)
Now we have
And
S 2 n
yi
n i 1
(4.2.5)
S
n 1
2 y
(4.2.6)
Where we define
1 n
y yi
n i1
y*
and for later use
2
n n 1
i. y
i 1
And thus the two equations solved simultaneously for and we get
6
y* y
$
n 1
(4.2.7)
And
4n 2 y 3 n 1 y *
n 1
n 1
n 1
2
2n 1
and y *
where
These estimates can also be written in more convenient form. This is more useful for numerical
calculations.
3
Xk
2 kX k n 1 X k
n
n n 1
(4.2.8)
6
2 kX k n 1 X k
n n 1 n 1
(4.2.9)
Where k = 1,2,3 . . .n
Yi i1
v
E
E $ 0
and
this shows that these estimators give the true answer in expectations. i.e they are
E Y v ik
namely,
then
v k n 1
E
^ and
rainfall and flood peaks after ten years by using the formulae
^ + (10 + n) ^
(4.2.10)
Now one might ask "What would the worst rainfall look like?" To answer this question, we could
then compute the reduced values
Zk=
X k k
(4.2.11)
Zmax= maxk =1 Zk
(4.2.12)
10 n $
Z max
(4.2.13)
4.3
The results of any data are mostly based on the availability and accuracy of the data.
Unfortunately, the missing observations are the real problem for every researcher. Now we will
discuss how one can tackle the issue of missing values.
J 1, 2,3, . .. n
we have
1
j
y i.
i j
(4.3.1)
And so
S
2
y i. and
i j
S
2
y i. i
i j
(4.3.2)
Now we equate these both to zero and with some acknowledged ambiguity we
y
i j
and y*
1
iy
( j ) i j i
(4.3.3)
1
j
and
i j
1
i2
j i j
(4.3.4)
The two equations determining the estimators are the solutions to the pair
y a
and
y* a b
Which are
y* a y
$
b a2
and
b y ay
b a2
(4.3.5)
It should also be demonstrated that no error is made by writing the ith year instead of the
calendar year say m + i where perhaps m = 1997. But if there is no data for two years then the
next coded entry would be the appropriate 1 value plus 2.
4.4
The Historigramof the Mangla flood peaks data pertaining the 89 years from 1925 to 2013
indicates decreasing line with very small value of r 2=0.011 in fig (4.4.1). The value of R-square
of the least square line is closer to zero which indicates that the line is almost horizontal. That
means there is neither an upward nor a downward secular at Mangla.
Fig.(4.4.1) The Historigram of the flood peaks at Mangla site
Variable
Actual
Fits
1000000
Accuracy Measures
MAPE
8.46726E+01
MAD
1.36120E+05
MSD
4.27873E+10
Mangla
800000
600000
400000
200000
0
1
18
27
36
45
54
Index
63
72
81
Z k X k 271525.5 829.500511 k
The estimates
and
The maximum value calculated by using the maximum of Zk is 474880.5 corresponding to the
10 n $
Z max
year 1992.The worst flood peaks after some years may also be estimated as
Table (4.5.1)
Years
2015
2016
2017
The above table contains the forecasting of the flood height in coming years.
4.6
The Historigram of the Shahdara flood peaks data pertaining the 88 years from 1925 to 2012
indicates slightly decreasing line with very small value of r 2=0.008. The value of R-square of the
least square line is closer to zero which indicates that the line is almost horizontal. That means
there is neither an upward nor a downward secular at Shahdara.
The Historigram of the flood peaks at Shahdara site
Variable
Actual
Fits
500000
Accuracy Measures
MAPE
70
MAD
51600
MSD
7494464103
Shahdara
400000
300000
200000
100000
0
1
18
27
36
45
Index
54
63
72
81
Fig. (4.6.1)
Z k X k 102514.7 300.04 k
The estimates
and
The maximum value calculated by using the maximum of Zk is 492685.3 corresponding to the
year 1988.The worst flood peaks after some years may also be estimated as
10 n $
Z max
Table (4.6.1) Forecasting of worst flood peaks
Years
Estimated flood peaks
2015
567900
2016
567600
2017
567300
The table (4.6.1) has the forecasting flood height of three years at shahdara.
4.7
The Historigram of the Balakot rainfall data pertaining the 36 years from 1977 to 2012 indicates
increasing line with value of r2 =0.28. The amount of rainfall may increase in coming years and
caused diverse floods. There may be some other reasons along with global warming. The
forecasting of worst rainfall can play a vital role for decision making and hydrological
engineering. It provides the basis for developing the design values for rainfall and flood
protection buildings (dikes).
Variable
Actual
Fits
600
Accuracy Measures
MAPE
20.4
MAD
73.3
MSD
10481.3
Balakot
500
400
300
200
100
4
12
16
20
Index
24
28
32
36
Z k X k 378.2527 2.223k
The estimates
^
and
and
The maximum value calculated by using the maximum of Z k is 319.81 corresponding to the year
2010
Table (4.7.1) Forecasting of worst rainfall
Years
2015
2016
2017
4.8
Estimated rainfall
784.75
786.98
789.44
Let the series Xt considered independent time series based on time. And the series contains the
parabolic trend. Then we have the parabolic trend
E Xk k k2
(4.8.1)
n
S= X k k k
k=1
And
(4.8.2)
S
0
S
0
and
S
0
n n 1 2n 1
6
i 1
X
k 1
kX
k 1
n 1 n 1 2n 1
2
k 1
k 1
(4.8.3)
k k k3 0
2
k 1
(4.8.4)
k 1
k 1
k 1
k 1
k2X k k2 k3 k4 0
4.9
(4.8.5)
The forecasting of the time series data is based on the selection of appropriate model. We used
some techniques to measure the accuracy of the model, which are useful to compare and
forecasting of the different fits to the sampled data. The three methods of measuring the accuracy
of the specified models are:
Mean Absolute Percentage Error (MAPE)
Mean Absolute Deviation (MAD)
Mean Squared Deviation (MSD)
The outliers have a significant effect on these approaches e.g The MAD is slightly affected by
outliers as compared to the MSD. Generally the least value among all three methods is
considered as a good model.
4.10
A visual investigation of the data suggests the quadratic model that could be very useful to
explain the presence of trend in observed data. The value of R square (only 11.9%) variation
explained that the rainfall has very small changes with passage of time. The estimated values of
the quadratic model is as follows
R-Sq = 11.9%
The R Square statistic is a measure of the strength of association between the observed and
model-predicted values of the dependent variable. The large R Square values indicate strong
relationships for both models. The R Square for the Quadratic model is larger, though it is not
clear whether this is due to the Quadratic model capitalizing on chance with an extra parameter .
The R square value
The scatter plot of Risalpur site shows the parabolic trend in the observed data. There are shown
some outliers in the data.
A visual clue from the figure 1 indicates that there are some outliers present in the data. In order
to obtain the more precise examination one can detect the reasons of these outliers.
MAPE
MAD
MSD
Linear Method
24.4
78.0
10543.1
Trend Methods
Quadratic
23.19
72.97
9547.50
Exponential
23.7
75.4
10740.8
700
S
R-Sq
R-Sq(adj)
300
101.733
11.9%
6.5%
600
200
Residual
Risalpur
500
400
100
300
-100
200
-200
100
0
(a)
10
20
k_1
30
40
300
320
(b)
340
360
380
400
Fitted Value
420
440
460
The estimated amount of rainfall by quadratic model provides the best prediction as compared to
the other methods. From table (4.10.2), one can see the difference between the estimated values
by different methods for the same year which clearly shows that the misspecification of the
model will mislead the results.
Table (4.10.2)
Years
Forecasting
Linear
Quadratic
Exponential
2013
377.408
454.100
359.241
2014
378.838
467.966
360.518
2015
380.269
482.488
361.800
2016
381.699
497.664
363.086
2017
383.130
513.494
364.377
A huge difference is observed in forecasting with selected models. Large amount of rainfall is
expected according to second degree curve.
700
Variable
Actual
Fits
Forecasts
600
600
500
Risalpur
500
Risalpur
Variable
Actual
Fits
Forecasts
400
400
300
300
200
200
100
100
4
12
16
20 24
Index
28
32
36
40
12
16
20 24 28
Index
32
36
40
Variable
Actual
Fits
Forecasts
600
Risalpur
500
400
300
200
100
4
12
16
20
24
Index
28
32
36
40
The quadratic model gives the least results among all three methods so the quadratic degree is a
better choice for the forecasting of the rainfall in preceding years.
4.11
Dir Rainfall
The rainfall data at Dir site exhibit that the trend is quadratic. The figure (4.11.1) contains the
fitted plot of the Dir site and the residuals against the fitted. The value of R square is also very
small i.e 18 %.
(4.11.1)
220
75
25.4476
18.5%
13.5%
200
50
Residual
Dir
180
160
25
140
-25
120
100
-50
0
10
20
k_1
30
40
120
(b)
130
140
Fitted Value
150
160
Residuals plot
of
Trend Methods
Quadratic
Linear Method
Exponential
Accuracy
MAPE
11.689
11.594
11.327
MAD
16.964
16.853
16.695
MSD
560.513
558.224
563.690
The two values of accuracy measures including MAPE and MAD of the exponential model are
larger in amount to the corresponding models. From the figure (4.11.1 a) one can perceive that
the presence of outliers can change the true image of the modele.
Table (4.11.2) forecasting of worst rainfall
Years
Forecasting
Linear
2013
163.282
2014
164.528
2015
165.774
2016
167.020
2017
168.265
Graph of three models
Quadratic
166.959
168.801
170.675
172.580
174.516
Exponential
160.698
162.030
163.374
164.728
166.093
From the accuracy measures we can see that the Exponential model receives the smaller values
of MAPE and MAD while the quadratic model has the minimum value of MSD.
4.12
Kohat Rainfall
The Kohat site has a parabolic trend. The visual investigation of the graph suggests the presence
of outliers in the data. The estimated model is as follows.
220
S
R-Sq
R-Sq(adj)
22.4230
4.2%
0.0%
60
200
40
Residual
Kohat
180
160
20
140
-20
120
-40
0
10
20
k_1
30
40
135
140
Fitted Value
145
150
Figure (4.12.1)
From the fig(4.12.1) the graph shows that there is no linear trend in the data. By using the
measurement of accuracy we can observe that the quadratic model gives the best fit for kohat site
of rainfall.
Table (4.12.1)
Measures
of
Linear Method
Accuracy
MAPE
12.006
MAD
17.475
MSD
474.968
Table (4.12.2) Forecasting of rainfall
Trend Methods
Quadratic
Forecasting
Linear
Quadratic
2013
148.110
143.005
2014
148.346
142.413
2015
148.582
141.777
2016
148.818
141.098
2017
149.054
140.376
Graphical Comparison of three models
11.960
17.368
470.556
Exponential
11.674
17.183
477.505
Years
Exponential
146.023
146.233
146.444
146.655
146.866
220
220
Variable
Actual
Fits
Forecasts
200
200
180
Kohat
180
Kohat
Variable
Actual
Fits
Forecasts
160
160
140
140
120
120
4
12
16
20
24
Index
28
32
36
40
12
16
20 24
Index
28
32
36
40
Variable
Actual
Fits
Forecasts
200
Kohat
180
160
140
120
4
12
16
20
24
Index
28
32
36
40
Figure (4.12.2)
The exponential model receives the smaller accuracy values of MAPE and MAD as compared to
other models. We can also observe from the scatter plot Figure (4.12.1) that the pattern is not
followed linear of quadratic trend. The best choice for this data is the exponential model.
4.13
The Marala flood site has a parabolic trend. The visual investigation of the graph suggests the
presence of outliers in the data. The estimated model is as follows.
R-Sq = 4.4%
1200000
S
R-Sq
R-Sq(adj)
1000000
800000
207623
4.4%
2.2%
600000
400000
Residual
Marala
800000
600000
200000
400000
200000
-200000
-400000
0
10
20
30
40
50
60
70
80
90
250000
300000
Fitted Value
Table(4.13.1)
Measures
of
Linear Method
Accuracy
MAPE
56.9958
MAD
159975
MSD
4.35181E+10
Table (4.13.2) Forecasting of worst flood
Years
2013
2014
2015
2016
2017
Linear
322730
322465
322199
321933
321668
Trend Methods
Quadratic
55.3978
156370
4.16378E+10
Forecasting
Quadratic
222405
215376
208196
200867
193387
Exponential
56.9169
157535
4.59399E+10
Exponential
266671
266263
265855
265448
265041
350000
400000
1200000
1200000
Variable
Actual
Fits
Forecasts
1000000
1000000
800000
Marala
800000
Marala
Variable
Actual
Fits
Forecasts
600000
600000
400000
400000
200000
200000
0
1
18
27
36
45 54
Index
63
72
81
90
18
27
36
45 54
Index
63
72
81
90
Variable
Actual
Fits
Forecasts
1000000
Marala
800000
600000
400000
200000
0
1
18
27
36
45
54
Index
63
72
81
90
From the table (4.13.1) we can see that the quadratic model has the smaller values of MAPE,
MAD and MSD. By using these measures of accuracy the best model is the quadratic model
4.14
50
S
R-Sq
R-Sq(adj)
9.78139
16.3%
8.4%
20
10
Residual
D.G.khan
40
30
20
-10
10
-20
0
10
15
20
25
12
k_1_1
14
16
18
20
22
Fitted Value
Figure(4.14.1)
Table(4.14.1)
Measures
of
Accuracy
MAPE
MAD
MSD
Table(4.14.2)
Years
2012
2013
2014
2015
2016
Forecasting
Linear
28.2392
28.6230
29.0069
29.3908
29.7747
Linear Method
Trend Methods
Quadratic
36.1637
7.6410
92.9969
33.3424
7.2195
83.7162
Quadratic
20.5184
19.0494
17.4377
15.6836
13.7869
Exponential
32.4880
7.4172
96.9935
Exponential
26.8601
27.3407
27.8298
28.3277
28.8345
24
26
28
50
50
Variable
Actual
Fits
Forecasts
40
D.G.khan
D.G.khan
40
Variable
Actual
Fits
Forecasts
30
20
30
20
10
10
3
12
15
18
Index
21
24
27
12
15
18
Index
21
24
27
Variable
Actual
Fits
Forecasts
D.G.khan
40
30
20
10
3
12
15
Index
18
21
24
27
Figure (4.14.2)
The quadratic model is better fit for the forecasting of D.G.Khan rainfall site.
4.15
The quadratic model is seemed to be a good model for the Terbela flood site. There is an
indication of outliers in the data which is needed to be investigated and make a better prediction
for the future flood amounts. The R square value is about 60 % shows a huge amount of variation
is accounted. The estimated model is
R-Sq = 0.6%
S
R-Sq
R-Sq(adj)
800000
500000
100191
1.0%
0.0%
400000
300000
Residual
Terbela
700000
600000
500000
200000
100000
400000
0
300000
-100000
200000
0
10
20
k_1
30
40
360000
365000
370000
375000 380000
Fitted Value
385000
Figure(4.15.1)
Table(4.15.1)
Measures
Accuracy
MAPE
MAD
MSD
Table(4.15.2)
Years
2013
2014
2015
2016
2017
of Trend Methods
Linear Method
16
63210
9254221795
Forecasting
Linear
395152
395752
396351
396951
397550
Quadratic
Exponential
16
62645
9238780213
18
63444
9386600743
Quadratic
385601
384652
383621
382509
381315
Exponential
373603
373533
373463
373394
373324
390000
395000
900000
800000
700000
700000
600000
600000
500000
500000
400000
400000
300000
300000
200000
Variable
Actual
Fits
Forecasts
800000
Terbela
Terbela
900000
Variable
Actual
Fits
Forecasts
200000
4
12
16
20 24
Index
28
32
36
40
12
16
20 24
Index
28
32
36
40
Variable
Actual
Fits
Forecasts
800000
Terbela
700000
600000
500000
400000
300000
200000
4
4.16
12
16
20
24
Index
28
32
36
40
Muzafarabad
The linear model is seemed to be a good model for the Muzafarabad rainfall site. There is an
indication of outliers in the data which is needed to be investigated and make a better prediction
for the future rainfall amounts. The R square value is about 70 % shows a huge amount of
variation is accounted. The estimated model is
$k $k 2
Zk X k
Z k X k 129.562 0.053 k 0.004k 2
R-Square= 70.2%
180
22.0272
0.7%
0.0%
50
160
140
Residual
Muzafarabad
25
120
-25
100
80
-50
0
10
20
30
k
40
50
60
123
124
125
126
127
Fitted Value
128
129
130
Table(4.16.1)
Measures
of Trend Methods
Linear Method
Accuracy
MAPE
MAD
MSD
14.434
17.705
459.259
Quadratic
Exponential
14.450
17.726
459.205
14.476
17.841
462.689
Figure (4.16.2)
Graphical comparison of three models
Trend Analysis Plot for Muzafarabad
180
160
Muzafarabad
Muzafarabad
160
140
120
100
Variable
Actual
Fits
Forecasts
180
140
120
100
80
80
1
12
18
24
30 36
Index
42
48
54
60
12
18
24
30 36
Index
42
48
54
60
180
Muzafarabad
160
140
120
100
80
1
12
18
24
30
36
Index
42
48
54
60
A visual clue from the scatter plot of Muzafarabad the linear trend is suspected. From the
Accuracy table the linear trend is found to be good model.
Table (4.16.2) Forecasting of rainfall
Years
Forecasting
Linear
Quadratic
Exponential
2011
123.855
123.311
121.395
2012
123.745
123.144
121.265
2013
123.635
122.975
121.135
2014
123.526
122.804
121.006
2015
123.416
122.631
120.876
The above estimation methods are used as a preliminary analysis of the rainfall and flood data.
To evaluate the foretelling and warning about the floods are needed some further analysis
CHAPTER 5
ANALYSIS OF RECORD VALUES
5.1
Introduction
A record is a specific value or entry which is smaller or larger from all the previous values say
X
j
j 1,2,3,..., n
j 1
where
is largest in magnitude from all the remaining values is called an upper record value and the
value which is smallest in magnitude from the remaining all values is called lower record values.
X kj
Let
be the level of flood in the river on the k th day of jth site. If we are interested in maximum
X kj
then that local maxima known as upper record values. And the
Let x1,x2,x3 . . .xnare the identical and independent distributed random variables
from any
distribution having probability density function f(x) and probability distribution function F(x)
with a specified random sample size.
X U (1) , X U (2) , X U (3) ,..., X U ( r )
If
are the upper record values then the probability density function of
is
1
[ R x ]r 1 f ( x )
- x
r
(5.2.1)
Where the reliability function is as follows:
1 F ( x )
R x ln
r ( x)
d
f(x)
R(x) =
dx
1-F(x )
and
And the joint probability distribution of r upper record values written as follows:
r 1
f ( xU (i ) )
i 1
1 F ( xU (i ) )
(5.2.2)
XU (r)
X U ( s)
is as follows;
f r , s ( x, y )
1
[ R x ]r 1[ R y R x ]s r 1 r ( x ) f ( y )
r sr
(5.2.3)
- y x
where r<s and
5.3
Let x1,x2,x3 . . .xnare the identical and independent distributed random variables
from any
distribution having probability density function f(x) and probability distribution function F(x)
with a specified random sample size.
X L (1) , X L (2) , X L (3) ,..., X L ( r )
If
are the lower record values then the probability density function of
is
f r ( x)
1
[ H x ]r 1 f ( x)
- x
r
(5.3.1)
h( x )
H x ln[ F ( x )]
Where
d
f(x)
H (x) =
dx
F(x)
and
X L (1) , X L (2) , X L (3) ,..., X L ( r )
is given as
f ( xL ( i ) )
1 F ( xL ( i ) )
(5.3.2)
X L ( r ) and X L ( s )
f r , s ( x, y )
is
1
[ H x ]r 1[ H y H x ]s r 1 h( x) f ( y )
r sr
y x
(5.3.3)
5.4
Let X1,X2,X3,Xn are the independent and identically distributed from any distribution having
the probability density function f(x) and cumulative distribution function F(x).
The properties of record values are as follows:
X U (1) , X U (2) , X U (3) ,..., X U ( r )
If
(i)
r 1
f (x
i 1
) 1 F ( xU ( i ) )
is given
1
U (i )
(5.4.1)
X U r
(ii)
is
1
[ R x ]r 1 f ( x)
- x
r
(5.4.2)
r ( x)
R x ln[1 F ( x )]
Where
and
d
R(x) = f(x)[1-F(x)]-1
dx
X U r and X U s
(iii)
is
1
[ R x ]r 1[ R y R x ]s r 1 r ( x ) f ( y )
r sr
- x y
rs
and
(iv)
(5.4.3)
n
(r)
1
=
x n R x
r 1
f(x) dx
(5.4.4)
Product moment of the nth and mth upper record values is as follows
(v)
n,m
r,s
(vi)
1,1
(r),(s)
- 1(r)
1
(s)
,
5.5
(5.4.6)
(i)
is given
as
f1,2,...,r ( xL (1) , xL (2) ,..., xL ( r ) ) f xL ( r )
r 1
f (x
i 1
) 1 F ( xL (i ) )
L(i)
(5.5.1)
X L (r )
(vii)
is
1
[ H x ]r 1 f ( x)
- x
r
(5.5.2)
h( x )
H x ln[ F ( x)]
Where
and
d
1
H (x ) = f(x) F(x)
dx
X L ( r ) and X L ( s )
(viii)
f r , s ( x, y )
is
1
[ H x ]r 1[ H y H x ]s r 1 h( x) f ( y )
r sr
y x
(5.5.3)
(ix)
n
(r)
=
1
n
x H x
r
r 1
f(x ) dx
(5.5.4)
(x)
Product moment of the nth and mth lower record values is defined as
r,s
n ,m
n
m
= E(X L(r)
X L(s)
)
(5.5.5)
(xi)
1,1
(r),(s)
- 1(r)
1
(s)
,
5.6
(5.5.6)
X L 1 ,X L 2 ,........,X L r
Let
are the first r lower record values from the Inverse Weibull distribution
x 0, 0
(5.6.1)
(5.6.2)
The distribution of r lower record values is
f r ( x)
1
[ H x ]r 1 f ( x )
r
H ( x) x
where
1 r 1 x
x
e
r
(5.6.3)
h( x )
H x x
Where
and
d
(x ) = x 1
dx
The mean of LRV from inverse weibul can be found by definition of expectation of a random
variables X
x f x dx
E( X )
c r 1 x
( x ) x e dx
r0
E( X )
(5.6.4)
z x
After substituting
r 1
E( X )
E( X
r 2
r 1
Var ( X )
r 1
(5.6.5)
(5.6.6)
f ( x, , ) x 1e x
, 0
(5.6.7)
f r ( x)
r
r
x r 1e x
(5.6.8)
E ( x)
r
r
r 1 x
dx
x e
0
(5.6.9)
E ( x)
2
r 1
r 2
E(x2 )
(5.6.10)
r 1
r 2
Var ( x)
5.7
(5.6.11)
X L 1 ,X L 2 ,........,X L r
Let
are the first lower record values from the Frechet distribution with
f ( x)
0, 0, and x 0
(5.7.1)
(5.7.2)
x
f r ( x)
r 1
x
exp
(5.7.3)
E ( x)
1
1
r
E ( x2 )
(5.7.4)
1 2
2
1
2
r 2 r r
1
2
1
1
1
Var ( x) 2 r 2 r 2 r r
r
r
5.8
(5.7.5)
(5.7.6)
The likelihood function of the first n lower record values is given by (Arnold et al.)
L f xL n
f xL i
F
n 1
i 1
Li
(5.8.1)
We have the likelihood function for inverse Weibul Distribution of lower record values with
single parameter
n
L( , x) n X L n c1 exp X L n c
i 1
(5.8.2)
(5.8.3)
After setting
LogLf ( x)
0
(5.8.4)
n
X L n c
(5.8.5)
2 LogLf ( x )
n
2
2
2 LogLf ( x )
0
2
(5.8.6)
The variance of the ML estimator by using Rao Cramer Lower Bound we have
var $
n
(5.8.7)
n 1 x c L n
(5.8.8)
of MLE
E
that is
x
f ( x)
n n x
Lf ( x) n i
i 1
i 1
(5.8.9)
i 1
i 1
n
(5.8.10)
LogLf ( x )
n
n
1
1
(5.8.11)
xL n n
LogLf ( x) n n
1
log
i1
(5.8.12)
When is known but and are unknown then the ML estimators are
n
x
n 1 log
i 1
n xL n
(5.8.13)
(5.8.14)
Z
in2
n
where Z i
X L n
(5.8.15)
E Z i
E
n n2
(5.8.16)
As we have
E Zi
H x
n 1
f x dx
(5.8.17)
E Z i
n xL n
n 0
After substituting
1n 1
xL n
xL n
t
dx
(5.8.18)
we get
E Zi n n1
(5.8.18)
Putting the value of equation (5.8.18) in equation (5.8.16) we get
E
5.9
Means
1.2599
1.1373
0.9477
0.8424
0.7722
0.7207
0.7791
Variance
1.3386
0.1238
0.0349
0.0252
0.0161
0.0113
0.0111
The table (5.9.1) contains the mean and variance of lower record values from the inverse weibull
distribution with known values of shape and scale parameters. As from table we can observe the
mean and variance are decreased as the value of r increased
Mean and Variance table of LRV from Frechet distribution.
Table (5.9.2) =3,=3,=2
r
1
2
3
4
5
6
7
8
Table(5.9.3) =5,=5,=4
r
Means
Variance
1
9.8210
3.3460
2
8.6570
0.64
9
3
Means
6.0623
4.7084
4.2569
4.0062
3.8390
3.7164
3.6211
3.5439
Variance
2.7846
0.7016
0.2643
0.1424
0.0900
0.0589
0.0470
0.0366
8.1913
0.3026
4
7.9183
0.1290
5
7.7162
0.0957
6
7.5676
0.0827
7
7.4487
0.0749
8
7.3501
0.0615
The table (5.9.2) and (5.9.3) contains the mean and variance of lower record values from the
Frechet distribution with different known values of shape, scale and location parameters. As from
table we can observe the mean and variance are decreased as the value of r increased.
Chap no 6 Stationary models
6.1 Introduction
The data collected according to the time is a problem of time series analysis. Any group of
observations which is specified in an arrangement of chronological order is called time series
data.
Hence, it is obviously noted that a large number of excellent texts on time series is available. In
which the main focus of the authors is on stationary time series and some have a good
contribution about globally non-stationary series which are used in financial time series.
Mostly the authors suggest the book of (Chatfield 2003) for the preliminary introduction about
the time series. There are many other useful books on time series by some authors i.e Priestley
(1983), Diggle (1990), Brockwell and Davis (1991), and Hamilton (1994). The book by Hannan
(1960) is concise (but concentrated) and Pole et al. (1994) is a good introduction to a Bayesian
way of doing time series analysis.
The record of data according to time can be hourly, daily, monthly, quarterly, yearly etc. For
example, record of temperature after every hour in some specific locations, weekly prices of rice
or wheat in Okara, production and consumption of monthly electricity in a certain area, monthly
or annually rainfall , yearly flood peaks at different Dams in Pakistan etc. All the mentioned
above data is related with time and treated as a time series data. Although a large number of data
is available for research in the field of social sciences. But problem is that, there is lacks in the
quality of data. We should never have ignored the fact that the results obtained from the data are
as good as the quality of data.
As we know the time series data is based on time which exhibits a natural order over time. So
there is highly a chance that the successive observations are inter-correlated. Especially, when a
short interval of time between the observations such as an hour, day, week or month instead of
years
Before discuss the further analysis, I would like to describe a beautiful quotation:
Experience with real-world data, however, soon convinces one that both stationarity and
Gaussianity is fairy tales invented for the amusement of undergraduates.
(Thomson 1994)
Keeping it in mind we observed in literature, the stationary models provide the basis for a great
portion of time series analysis.
Time series forecasting and modeling became most popular now a days in economic and
environmental topics. George Box and Gwilym Jenkins in 1976 used the form as ARIMA (p,d,q)
to describe a large class of models which could describe the behavior of many observed time
series. Where the d indicates the maximum number of times the series are required to be
differenced to make a time series stationary.
6.2
In practice, there are two important questions to be faced for any researcher. (i) how can one find
that the required time series is stationary, (ii) how can we made a time series stationary if it
found a lack of stationary. Although a variety of techniques are available for the detection of
Stationarity. Examination of the variation in mean and variance with respect to time of a time
series gives a visual clue for this purpose is examination in the change of mean and variance
(constant mean and variance suggest the Stationarity in the data).
6.3
The stationary time series provides a basic for inferential analysis of time series data. The unit
root test is most popular and widely used for this purpose from last several decades in the
literature. It is based on a random walk model as followed
X t X t 1 t
1 1
1
From above model if
X t X t 1 t
(6.3.1)
X t X t 1 X t 1 X t 1 t
(6.3.2)
X t 1 X t 1 t
(6.3.3)
X t X t 1 t
(6.3.4)
Where,
is the first-difference operator. As we can see from the equation 3 if the =1 then the
1
given time series is non stationary if
null hypothesis that H0 :
1
if
X t ( X t X t 1 ) t
6.4
Dickey and fuller have 1979 overcome the problem occurred in unit root test. They developed a
(tau )
X t 1
statistic to estimate the value of coefficient of
proposed three version of the regression are used for the test as follows
Yt Yt 1 t
1.
(6.4.1)
yt B1 Yt 1 t
2.
yt 1 2t Yt 1 t
3.
(6.4.2)
(6.4.3)
With drift around a stochastic trend in all above model the authors assumed that the random error
term is uncorrelated. Augmented Dickey fuller test dickey and fuller (1979) developed ADF
test for unit root. This is most appropriate when the error terms are correlated.
The model of the test is
m
Yt 1 2t Yt 1 Yt 1 t
1
1
Constant / intercept
2
Coefficient on time trend
The ADF test is performed in excel as followed
H0 : 0
(Process has a unit root i.e non-stationary)
H1 : 0
(Process has not a unit root i.e stationary)
(6.4.4)
The Dickey and fuller Test has a necessary assumption about the random error term that should
be independently and identically distributed. The Augmented Dickey Fuller (ADF) test designed
to overcome the problem of serial correlation in random error term. The Phillips and Perreon
(1988) suggested a number of unit root nonparametric test with irrespective of lagged difference
terms in serially correlated errorterm. Thats why the distribution of PP test is asymptotically
similar to the ADF test. But it is differ from ADF test in the context of serial correlation of and
the hetroscedasticity of random error term.
It also has test the null and alternative hypothesis as follows:
H0 : The data has a unit root (Non Stationary process)
H1 : The data has not a unit root(Stationary process )
6.4 Analysis of the stationary time series data at different sites
In this section we will use the mentioned above test to see whether the data hold the assumption
of stationary or not. For this purpose, the data from all sites analyzed in a sequence and the data
found to be stationary will be proceed in the section
6.4.1
From the following table (a) we see that the Augmented Dickey Fuller unit root test statistic
gives the
values -2.0725 which is smaller than the critical values at different level of
significance (i.e 5%, 10% etc) so we can reject H0 and conclude that the flood peaks at Marala
do not have a unit root. i.e it is stationary. PP test also reject the null hypothesis and give the
same results.
Table (a)
p value
Statistic
0.03983
-2.0725324
8
0.03983
-2.0725324
critical value
Alpha
-1.9694528
5%
-1.6336516
10%
Autocorrelation function is another important tool for diagnose the stationarity of the series. It
refers whether the time series going anywhere it should have the constant value all over the time
Abbas Keshvani (2013).
Figure 1
Autocorrelation Function for Marala
1.0
1.0
0.8
0.8
0.6
0.6
Partial Autocorrelation
Autocorrelation
0.4
0.2
0.0
-0.2
-0.4
-0.6
0.4
0.2
0.0
-0.2
-0.4
-0.6
-0.8
-0.8
-1.0
-1.0
2
10
12
14
16
18
Lag
20
22
10
12
14
16
18
20
22
Lag
As we see in above graph fig (a) ACF and fig (b) PACF are slightly exponential decay and
remains within the significant range which is an initiative indication of stationary series. In
theoretically the ACF should be close to zero because the process is purely random walk. So the
ACF and PACF suggest that the Marala flood peaks have a stationary series.
Figure 2
Versus Fits
99.9
900000
90
Residual
Percent
99
50
10
600000
300000
0
1
0.1
-800000
-400000
400000
800000
325000
330000
Residual
335000
340000
345000
Fitted Value
Histogram
Versus Order
900000
Residual
Frequency
30
20
10
0
300000
0
-200000
Residual
6.4.2
600000
10
20
30
40
50
60
70
80
Observation Order
TARBELA SITE
From the following table (b) we see that the Augmented Dickey Fuller unit root test statistic
gives the
values -0.9024 which is greater than the critical values at different level of
significance (i.e1%,2%,5%, 10% etc) so we cannot reject H 0 and conclude that the flood peaks
at Tarbela have a unit root. i.e it is nonstationary.
Table (b)
p value
Statistic
0.329485
-0.9024
Figure 3
critical
Alpha
value
-2.72488
-2.43209
-2.01518
-1.66269
1%
2%
5%
10%
1.0
1.0
0.8
0.8
0.6
0.6
Partial Autocorrelation
Autocorrelation
0.4
0.2
0.0
-0.2
-0.4
-0.6
0.4
0.2
0.0
-0.2
-0.4
-0.6
-0.8
-0.8
-1.0
-1.0
1
Lag
As we see in above graph fig (a) ACF and fig (b) PACF are departure from exponential decay.
Even though it remains within the significant rang but behave alike a huge variation at different
number of lags. In theoretically the ACF should be close to zero because the process is purely
random walk. So the ACF and PACF suggest that the Tarbela flood peaks have a non-stationary
series.
Versus Fits
99
400000
Residual
Percent
90
50
-200000
200000
400000
375000
385000
390000
Fitted Value
Histogram
Versus Order
395000
400000
Residual
Frequency
380000
Residual
12
9
6
200000
3
0
200000
10
1
-120000
120000
Lag
240000
Residual
360000
10
15
20
25
Observation Order
30
35
6.4.3
SHAHDARA SITE
From the following table (c) we see that the Augmented Dickey Fuller unit root test statistic
gives the values -3.030007 which is smaller than the critical values at different level of
significance (i.e1%,2%,5%, 10% etc) so we can reject H 0 and conclude that the flood peaks at
Shadara do not have a unit root. i.e it is stationary.
Table (6.4.3 a)
p value
0.003859
Statistic
critical
Alph
-3.030007
value
-2.628747
a
1%
-2.360146
-1.969453
-1.633652
2%
5%
10%
Figure (4.6.3 b)
Partial Autocorrelation Function for Shahdara
1.0
1.0
0.8
0.8
0.6
0.6
0.4
0.4
Autocorrelation
Partial Autocorrelation
0.2
0.0
-0.2
-0.4
0.2
0.0
-0.2
-0.4
-0.6
-0.6
-0.8
-0.8
-1.0
-1.0
2
10
12
14
16
18
20
22
Lag
10
12
14
16
18
20
22
Lag
As we see in above graph fig (a) ACF and fig (b) PACF are slightly exponential decay and
remains within the significant range which is an initiative indication of stationary series. In
theoretically the ACF should be close to zero because the process is purely random walk. So the
ACF and PACF suggest that the Shahdaraflood peaks have a stationary series.
Figure(6.4.3c)
Versus Fits
99.9
900000
90
Residual
Percent
99
50
10
600000
300000
0
1
0.1
-800000
-400000
400000
800000
325000
330000
Residual
335000
340000
345000
Fitted Value
Histogram
Versus Order
900000
Residual
Frequency
30
20
10
0
600000
300000
0
-200000
200000
400000
600000
800000
Residual
6.4.4
10
20
30
40
50
60
70
80
Observation Order
MANGLA SITE
From the following table (d) we see that the Augmented Dickey Fuller unit root test statistic
gives the
values -3.74878 which is smaller than the critical values at different level of
significance (i.e1%,2%,5%, 10% etc) so we can reject H 0 and conclude that the flood peaks at
Mangla do not have a unit root. i.e it is stationary.
Table (6.4.4a)
p value
0.001
Figure (6.4.4b)
Statistic
-3.74878
critical value
Alph
-2.62768
-2.35935
-1.96888
-1.63328
a
1%
2%
5%
10%
1.0
1.0
0.8
0.8
0.6
0.6
Partial Autocorrelation
Autocorrelation
0.4
0.2
0.0
-0.2
-0.4
-0.6
0.4
0.2
0.0
-0.2
-0.4
-0.6
-0.8
-0.8
-1.0
-1.0
2
10
12
14
16
18
20
22
10
12
14
16
18
20
22
Lag
Lag
Graph of ACF for Tarbela Flood peaks (b) Graph of PACF for Tarbela Flood peaks
As we see in above graph fig (a) ACF and fig (b) PACF are slightly exponential decay and
remains within the significant range which is an initiative indication of stationary series. In
theoretically the ACF should be close to zero because the process is purely random walk. So the
ACF and PACF suggest that the Manglaflood peaks have a stationary series.
Versus Fits
99.9
1000000
90
Residual
Percent
99
50
10
1
0.1
500000
-500000
500000
1000000
200000
220000
Residual
Histogram
280000
Versus Order
30
Residual
Frequency
260000
1000000
40
20
10
0
240000
Fitted Value
500000
0
-200000
Residual
10
20
30
40
50
60
Observation Order
70
80
6.4.5
MUZAFARABAD SITE
From the following table (e) we see that the Augmented Dickey Fuller unit root test statistic
gives the values -3.45 which is smaller than the critical values at different level of significance
(i.e 1%,2%,5%, 10% etc) so we can reject H 0 and conclude that the flood peaks at Muzafarabad
site do not have a unit root. i.e it is stationary.
Table (6.4.5a)
p value
Statistic
critical value
Alph
0.003
-3.45
-2.96
-2.4325
-1.8765
-1.4532
a
1%
2%
5%
10%
1.0
1.0
0.8
0.8
0.6
0.6
Partial Autocorrelation
Autocorrelation
Figure (6.4.5b)
0.4
0.2
0.0
-0.2
-0.4
-0.6
0.4
0.2
0.0
-0.2
-0.4
-0.6
-0.8
-0.8
-1.0
-1.0
1
10
11
Lag
12
13
14
10
Lag
11
12
13
14
Versus Fits
99
50
Residual
Percent
90
50
25
0
-25
10
-50
-50
-25
0
Residual
25
50
124
Histogram
25
Residual
50
Frequency
130
Versus Order
12
6
3
0
-25
-50
6.4.6
126
128
Fitted Value
-40
-20
0
20
Residual
40
60
10 15 20 25 30 35 40 45
Observation Order
50 55
BALAKOT SITE
From the following table (f) we see that the Augmented Dickey Fuller unit root test statistic
gives the values -1.02467 which is larger than the critical values at different level of significance
(i.e 1%,2%,5%, 10% etc) so we cannot reject H0 and conclude that the rainfall data at Balakot
Site have a unit root. i.e it is non-stationary.
Table (6.4.6a)
p value
0.35998
Statistic
-1.02467
critical
Alpha
value
-2.99243
-2.7204
-2.3129
-1.4568
1%
2%
5%
10%
1.0
1.0
0.8
0.8
0.6
0.6
Partial Autocorrelation
Autocorrelation
0.4
0.2
0.0
-0.2
-0.4
-0.6
0.4
0.2
0.0
-0.2
-0.4
-0.6
-0.8
-0.8
-1.0
-1.0
1
5
Lag
(b)
5
Lag
Versus Fits
99
200
Residual
Percent
90
50
10
1
100
0
-100
-200
-200
-100
0
Residual
100
200
380
400
Histogram
100
Residual
Frequency
200
6.4.7
440
460
Versus Order
12
6
3
0
420
Fitted Value
0
-100
-200
-200
-100
0
100
Residual
200
10
15
20
25
Observation Order
30
35
RISALPUR SITE
From the following table (g) we see that the Augmented Dickey Fuller unit root test statistic
gives the values -1.6875 which is larger than the critical values at different level of significance
(i.e 1%,2%,5%, 10% etc) so we can reject H0 and conclude that the rainfall at Risalpur site has a
unit root. i.e it is non-stationary.
Table (6.4.7a)
p value
Statistic
0.2973
-1.6875
critical
Alpha
value
-3.9926
-3.7209
-2.3124
-1.9563
1%
2%
5%
10%
Figure(6.4.7b)
Partial Autocorrelation Function for Risalpur
1.0
1.0
0.8
0.8
0.6
0.6
Partial Autocorrelation
Autocorrelation
0.4
0.2
0.0
-0.2
-0.4
-0.6
0.2
0.0
-0.2
-0.4
-0.6
-0.8
-0.8
-1.0
-1.0
1
(a)
0.4
5
Lag
Figure (6.4.7c)
(b)
5
Lag
Versus Fits
99
200
Residual
Percent
90
50
10
1
-300
-200
-150
0
Residual
150
300
330
Histogram
340
350
360
Fitted Value
370
Versus Order
200
12
Residual
Frequency
16
4
0
6.4.8
-200
-200
-100
0
100
Residual
200
300
10
15
20
25
Observation Order
30
35
KOHAT SITE
From the following table (h) we see that the Augmented Dickey Fuller unit root test statistic
gives the values -2.6875 which is larger than the critical values at different level of significance
(i.e 1%,2%,5%, 10% etc) so we cannot reject H0 and conclude that the rainfall data at Kohat site
have a unit root. i.e it is non-stationary.
Table (6.4.8a)
p value
0.2754
Statistic
-2.6875
critical
Alpha
value
-4.7584
-4.0234
-3.3542
-2.9867
1%
2%
5%
10%
1.0
1.0
0.8
0.8
0.6
0.6
Partial Autocorrelation
Autocorrelation
0.4
0.2
0.0
-0.2
-0.4
-0.6
0.4
0.2
0.0
-0.2
-0.4
-0.6
-0.8
-0.8
-1.0
-1.0
1
5
Lag
(b)
5
Lag
Versus Fits
99
60
Residual
Percent
90
50
10
1
40
20
0
-20
-50
-25
0
25
Residual
50
140
Histogram
146
148
60
Residual
Frequency
144
Fitted Value
Versus Order
4
2
40
20
0
-20
6.4.9
142
-20
20
Residual
40
60
10
15
20
25
Observation Order
30
35
DIR SITE
From the following table (i) we see that the Augmented Dickey Fuller unit root test statistic
gives the
values -0.45634 which is larger than the critical values at different level of
significance (i.e 1%,2%,5%, 10% etc) so we can reject H0 and conclude that the rainfall data at
Dir site have a unit root. i.e it is non-stationary.
Table (6.4.9a)
p value
Statistic
0.2254
-0.45634
critical
Alpha
value
-2.7432
-1.9857
-1.4562
-0.9346
1%
2%
5%
10%
1.0
1.0
0.8
0.8
0.6
0.6
Partial Autocorrelation
Autocorrelation
Figure (6.4.9b)
0.4
0.2
0.0
-0.2
-0.4
-0.6
0.4
0.2
0.0
-0.2
-0.4
-0.6
-0.8
-0.8
-1.0
-1.0
1
5
Lag
(b)
5
Lag
Versus Fits
99
40
Residual
Percent
90
50
10
20
0
-20
1
-40
-20
0
Residual
20
40
132
Histogram
136
Versus Order
40
10.0
7.5
Residual
Frequency
134
Fitted Value
5.0
2.5
20
0
-20
0.0
-30
-20
-10
0
10
Residual
20
30
40
10
15
20
25
Observation Order
30
35
Chap No 7
Probability distribution
7.1
Introduction
In this section we will apply the selected distribution to the different sites of flood and rainfall
data. To see, the which distribution is more appropriate and good fit. For this purpose, chi square
test is used as a goodness of fit test. The method of Maximum likelihood method is used for
estimating the parameters of selected distributions
7.2
The following table contains the descriptive statistic summary of Marala Flood peaks.
Table 7.1
Statistic
Size
Range
Values
88
990220
Percentiles
Min
5%
Values
109780
130730
Mean
Variance
Standard
334550
44064000000
209920
deviation
Coefficient
of 0.62745
Variation
Standard error
22377
Skewness
1.5196
Excess Kurtosis
1.8119
7.2.1 Exponential Distribution
10%
25% (Q1)
50% (Median)
145650
194640
255450
75% (Q3)
411760
90%
95%
Max
721300
800040
110000
Table (7.2.1a)
2.9891E-6
Parameters
Values
Graph(7.2.1b)
f(x)
0.28
0.24
0.2
0.16
0.12
0.08
0.04
0
100000
200000
300000
400000
500000
600000
x
Histogram
Exponential
700000
800000
900000
1E+6
1.1E+6
Q-Q Plot
1.1E+6
1E+6
900000
800000
Quantile (Model)
700000
600000
500000
400000
300000
200000
100000
100000
200000
300000
400000
500000
600000
700000
800000
900000
1E+6
1.1E+6
x
Exponential
Test
Kolmogorov
statistics
0.28981
-Smirnov
Anderson-
9.2815
Darling
Chi-Squared
43.038
P values
Critical
Conclusion
0.0000005
value
0.14274
Reject
at
different
levels
of
2.5018
significance
Reject at
different
levels
of
12.592
significance
Reject at
different
levels
of
0.0000001
2
7.2.2
significance
Gamma Distribution
Table (7.2.2a)
Parameters
Values
Table (7.2.2b)
2.5401
1.3171E+5
Q-Q Plot
1.1E+6
0.48
1E+6
0.44
900000
0.4
0.36
800000
0.32
700000
Q
uantile(M
odel)
f(x)
0.28
600000
0.24
500000
0.2
0.16
400000
0.12
300000
0.08
200000
0.04
0
100000
200000
300000
400000
500000
600000
700000
800000
900000
1E+6
1.1E+6
100000
100000
200000
300000
400000
500000
600000
x
Histogram Gamma
Test
Kolmogorov
statistics
0.11674
-Smirnov
AndersonDarling
Chi-Squared
1.1E+6
Critical
Status
0.16777
value
0.14274
Reject
at
different
levels
of
2.5018
significance
Reject at
different
levels
of
12.592
significance
Reject at
different
levels
of
1.1466E-7
Normal Distribution
Table (7.2.3a)
Parameters
Values
Table(7.2.3b)
1E+6
significance
At 20% level of significance the kolmogorov-Smirnov accepted
`7.2.3
900000
P values
9.2815
43.038
800000
Gamma
700000
2.0992E+5
3.3455E+5
Q-Q Plot
1.1E+6
0.48
1E+6
0.44
900000
0.4
800000
0.36
0.32
700000
Quantile(Model)
f(x)
0.28
600000
0.24
500000
0.2
0.16
400000
0.12
300000
0.08
200000
0.04
0
100000
200000
300000
400000
500000
600000
700000
800000
900000
1E+6
1.1E+6
100000
100000
200000
300000
400000
500000
600000
700000
800000
900000
1E+6
1.1E+6
x
Normal
Histogram Normal
Test
Test
P values
Critical
Status
Kolmogorov-
statistics
0.19741
0.00176
value
0.14274
Smirnov
Anderson-
5.5215
2.5018
Darling
Chi-Squared
34.169
0.0000007
7.2.4 Log-Normal Distribution
9.4877
Table (7.2.4a)
Parameters
Values
Table (7.2.4b)
0.5427
12.562
Q-Q Plot
1.1E+6
0.48
1E+6
0.44
900000
0.4
0.36
800000
0.32
700000
Quantile(Model)
f(x)
0.28
600000
0.24
500000
0.2
0.16
400000
0.12
300000
0.08
200000
0.04
0
100000
200000
300000
400000
500000
600000
700000
800000
900000
1E+6
1.1E+6
100000
100000
200000
300000
400000
500000
600000
x
Histogram
700000
800000
900000
1E+6
1.1E+6
Lognormal
Lognormal
Test
Test
P values
Critical value
Status
Kolmogorov
statistics
0.10937
0.2261
0.14274
2.5018
significance
Accepted at different levels of
8.558(20%)
significance
Reject only
at
10.645(<20%
significance
and
1%,2%,5%and 10%
-Smirnov
AndersonDarling
Chi-Squared
7.2.5
1.2029
10.572
0.10253
Logistic Distribution
Table (7.2.5a)
Parameters
Values
Table (7.2.5b)
1.1573E+5
3.3455E+5
20%
level
of
accepted
at
f(x)
0.28
0.24
0.2
0.16
0.12
0.08
0.04
0
100000
200000
300000
400000
500000
600000
700000
800000
900000
1E+6
700000
800000
900000
1E+6
1.1E+6
x
Histogram Logistic
Q-Q Plot
1.1E+6
1E+6
900000
800000
Quantile(Model)
700000
600000
500000
400000
300000
200000
100000
100000
200000
300000
400000
500000
600000
1.1E+6
x
Logistic
Test
Kolmogorov-
statistics
0.20718
Smirnov
Anderson-
5.4775
Darling
Chi-Squared
24.34
P values
Critical
Status
0.000862
value
.14274
2.5018
9.4877
.0000683
7.2.6
Nakagami Distribution
Table (7.2.6a)
Parameters
Values
Table (7.2.6a)
1.5549E+11
M
0.53603
Probability Density Function
Q-Q Plot
1.1E+6
0.48
1E+6
0.44
900000
0.4
0.36
800000
0.32
700000
Q
uantile(M
odel)
f(x)
0.28
600000
0.24
500000
0.2
0.16
400000
0.12
300000
0.08
200000
0.04
0
100000
200000
300000
400000
500000
600000
700000
800000
900000
1E+6
1.1E+6
100000
100000
200000
300000
400000
Kolmogorov-
statistics
0.2089
Smirnov
Anderson-
4.8927
Darling
Chi-Squared
27.636
7.2.7 Weibull Distribution
800000
900000
1E+6
1.1E+6
P values
Critical
Status
0.007578
value
0.14274
2.5018
12.592
0.0011
Table (7.2.7a)
Parameters
Values
Table (7.2.7b)
700000
Nakagami
600000
Histogram Nakagami
Test
500000
2.1248
3.6554E+5
Q-Q Plot
1.1E+6
0.48
1E+6
0.44
900000
0.4
0.36
800000
0.32
700000
Quantile(Model)
f(x)
0.28
600000
0.24
500000
0.2
0.16
400000
0.12
300000
0.08
200000
0.04
0
100000
200000
300000
400000
500000
600000
700000
800000
900000
1E+6
1.1E+6
100000
100000
200000
300000
400000
500000
600000
700000
800000
900000
1E+6
1.1E+6
Histogram Weibull
Weibull
Test
Test
P values
Critical value
Status
Kolmogorov-
statistics
0.1608
0.01867
0.17126(1%)
Smirnov
0.15961(2%
significance
Anderson-
)
2.5018
15.086(1%)
significance
Only accepted at 1% levels of
11.388(2%)
significance
Darling
Chi-Squared
4.1265
14.914
0.01074
and
and
Table (7.2.8a)
Parameters
Values
Table (7.2.8b)
8.4978E+5
3.3455E+5
rejected
rejected
at
at
Q-Q Plot
1.1E+6
0.48
1E+6
0.44
900000
0.4
0.36
800000
0.32
700000
Q
uantile(Model)
f(x)
0.28
600000
0.24
500000
0.2
0.16
400000
0.12
300000
0.08
200000
0.04
0
100000
200000
300000
400000
500000
600000
700000
800000
900000
1E+6
1.1E+6
100000
100000
200000
300000
x
Histogram
Test
Kolmogorov
statistics
0.09508
-Smirnov
Anderson-
0.98598
Inv. Gaussian
600000
700000
800000
900000
1E+6
1.1E+6
Inv. Gaussian
P values
Critical
Status
0.38004
value
0.14274
2.5018
12.592
Darling
Chi-Squared 8.3096
0.21629
7.2.10
Rayleigh Distribution
Parameters
Values
Table (7.2.10a)
500000
400000
2.6693E+5
Q-Q Plot
1.1E+6
0.48
1E+6
0.44
900000
0.4
800000
0.36
0.32
700000
Quantile(Model)
f(x)
0.28
600000
0.24
500000
0.2
0.16
400000
0.12
300000
0.08
200000
0.04
0
100000
200000
300000
400000
500000
600000
700000
800000
900000
1E+6
1.1E+6
100000
100000
200000
300000
400000
500000
600000
Histogram
700000
800000
900000
1E+6
1.1E+6
x
Rayleigh
Rayleigh
Table (7.2.10b)
Test
Test
P values
Kolmogorov
statistics
0.16973
.01099
0.17126 (1%)
Only
0.1596(2%
)
3.9074 (1%)
10% etc.
Only accepted
3.2892(2%
)
16.812 (1%)
10% etc.
Only accepted
15.033(2%
-Smirnov
Anderson-
3.7393
Darling
Chi-Squared
7.2.11
15.19
0.0188
accepted
at
at
at
1%
1%
1%
)
10% etc.
Generalized Extreme value Distribution
Table (7.2.11a)
Parameters
Values
Table (7.2.11b)
0.27275
1.1212E+5
2.2895E+5
levels
levels
levels
of
of
of
Q-Q Plot
1.1E+6
0.48
1E+6
0.44
900000
0.4
0.36
800000
0.32
700000
Q
uantile(M
odel)
f(x)
0.28
600000
0.24
500000
0.2
0.16
400000
0.12
300000
0.08
200000
0.04
0
100000
200000
300000
400000
500000
600000
700000
800000
900000
1E+6
1.1E+6
100000
100000
200000
300000
x
Histogram
Test
Kolmogorov
Statistic
0.07407
500000
600000
700000
800000
900000
1E+6
1.1E+6
400000
P values
Critical
Status
0.69163
value
0.14274
2.5018
12.592
-Smirnov
Anderson-
0.58447
Darling
Chi-Squared 2.3871
0.88088
7.2.12
Frechet Distribution
Table (7.2.12a)
Parameters
Values
Table (7.2.12b)
2.3386
2.3344E+5
11395.0
Q-Q Plot
1.1E+6
0.48
1E+6
0.44
900000
0.4
0.36
800000
0.32
700000
Q
uantile(M
odel)
f(x)
0.28
600000
0.24
500000
0.2
0.16
400000
0.12
300000
0.08
200000
0.04
0
100000
200000
300000
400000
500000
600000
700000
800000
900000
1E+6
1.1E+6
100000
100000
200000
300000
x
Histogram
400000
500000
600000
700000
800000
900000
1E+6
1.1E+6
Frechet (3P)
Frechet (3P)
Test
Test
Critical
Status
Kolmogorov-
Statistic
0.05098
values
0.9556
value
0.1427
2.5018
12.59
Smirnov
Anderson-
0.33501
Darling
Chi-Squared
Conclusion
4.3167
0.7184
The Log Normal and inverse Gaussian distributions are found to be a good fit for the Marala
Site. The Log normal distribution gives significant results for only 20% level of significance and
chi-square GOF test is accepted at less than 20% level of significance. The Rayleigh and weibull
distribution is only accepted at one percent level of significance. The Generalized extreme value
distribution, Inverse Gaussian and frechet distribution are seems to be a good fit to the marala
Site.
7.3
Values
88
553800
Percentiles
Min
5%
Values
22205
28450.0
Mean
Variance
Standard
89196.0
7639100000
87402.0
deviation
Coefficient
of 0.97988
Variation
Standard error
9317.1
Skewness
3.8177
Excess Kurtosis
17.989
7.3.1 Exponential Distribution
10%
25% (Q1)
50% (Median)
32609.0
45555
60000
75% (Q3)
97875.0
90%
95%
Max
182440
213750
576000
1.1211E-5
Parameters
Values
Q-Q Plot
0.8
520000
0.72
480000
440000
0.64
400000
0.56
360000
320000
Q
uantile(M
odel)
f(x)
0.48
0.4
280000
240000
0.32
200000
0.24
160000
0.16
120000
80000
0.08
40000
0
40000
80000
120000
160000
200000
240000
280000
320000
360000
400000
440000
480000
520000
560000
40000
80000
120000
Histogram
Kolmogorov-
Statistic
0.23883
200000
240000
280000
320000
360000
400000
440000
480000
520000
560000
Exponential
Exponential
Test
160000
P values
Critical
Status
6.6876E-5
value
o.14274
7.2808
2.5018
Darling
Chi-Squared
53.271
1.0346E-9
7.3.2 Gamma Distribution
12.592
Smirnov
Anderson-
Table (7.3.2a)
1.0415
Parameters
Values
Table (7.3.2b)
85644.0
Q-Q Plot
560000
0.8
520000
0.72
480000
0.64
440000
400000
0.56
360000
0.48
Q
uantile(Model)
f(x)
320000
0.4
280000
240000
0.32
200000
0.24
160000
0.16
120000
80000
0.08
40000
0
40000
80000
120000
160000
200000
240000
280000
320000
360000
400000
440000
480000
520000
560000
40000
80000
120000
160000
200000
240000
280000
x
Histogram Gamma
400000
440000
480000
520000
560000
Test
Test
P values
Critical
Status
Kolmogorov
Statistic
0.23068
value
1.3389E-4 0.14274
Reject
-Smirnov
at
different
levels
of
at
different
levels
of
significance
Reject at
different
levels
of
significance
Anderson-
6.8804
Darling
Chi-Squared
50.92
2.5018
3.0741E-9 12.592
Reject
significance
Normal Distribution
Table (7.3.3a)
Parameters
Values
Table (7.3.3b)
360000
Gamma
7.3.3
320000
87402.0
89196.0
Q-Q Plot
560000
0.8
520000
0.72
480000
0.64
440000
400000
0.56
360000
0.48
Quantile(M
odel)
f(x)
320000
0.4
280000
240000
0.32
200000
0.24
160000
0.16
120000
80000
0.08
40000
0
40000
80000
120000
160000
200000
240000
280000
320000
360000
400000
440000
480000
520000
560000
40000
80000
120000
160000
200000
240000
280000
x
Histogram Normal
Test Statistic
P values
Critical
Status
Kolmogorov
0.24227
4.9499E-5
value
0.14274
Reject
-Smirnov
at
9.6237
480000
520000
560000
different
levels
of
48.39
2.9566E-9
2.5018
Reject
at
different
levels
of
11.07
significance
Reject at
different
levels
of
Log-Normal Distribution
Table (7.3.4a)
Parameters
Values
Table (7.3.4b)
440000
significance
significance
7.3.4
400000
Test
Darling
Chi-Squared
360000
Normal
Anderson-
320000
0.63479
11.151
0.72
0.64
0.56
f(x)
0.48
0.4
0.32
0.24
0.16
0.08
0
40000
80000
120000
160000
200000
240000
280000
320000
360000
400000
440000
480000
520000
560000
x
Histogram
Lognormal
Q-Q Plot
560000
520000
480000
440000
400000
360000
Quantile(Model)
320000
280000
240000
200000
160000
120000
80000
40000
40000
80000
120000
160000
200000
240000
280000
320000
360000
400000
440000
480000
520000
560000
x
Lognormal
Test
Test
Kolmogorov-
Statistic
0.12515
values
0.11646 20%(0.11248)
Smirnov
Critical value
Status
Only rejected at 20% level of
0.1285(20%
significance
1%,2%,5%
significance
and
and
accepted
10%
level
at
of
Anderson-
1.5027
Darling
Chi-Squared
19.857
0.0029
20%(1.9286)
1.9286(20%
significance
1%,2%,5%
12.592
significance
Rejected at
4
7.3.5
and
and
accepted
10%
level
of
different
level
of
significance
Logistic Distribution
Table (7.3.5a)
48187.0
Parameters
Values
Table (7.3.5b)
89196.0
Q-Q Plot
560000
0.8
520000
0.72
480000
0.64
440000
400000
0.56
360000
0.48
Q
uantile(M
odel)
f(x)
320000
0.4
280000
240000
0.32
200000
0.24
160000
0.16
120000
80000
0.08
40000
0
40000
80000
120000
160000
200000
240000
280000
320000
360000
400000
440000
480000
520000
560000
40000
80000
120000
160000
200000
240000
280000
P values
Statisti
Kolmogorov-
c
0.24277
Smirnov
440000
480000
520000
560000
Critical
Status
value
4.7379E-
0.14274
2.5018
12.592
Anderson-
8.7112
Darling
Chi-Squared
40.643
7.3.6
400000
Logistic
360000
Histogram Logistic
Test
320000
3.4048E-
7
Weibull Distribution
at
Table (7.3.6a)
1.8529
Parameters
Values
Table (7.3.6b)
91883.0
Q-Q Plot
560000
0.8
520000
0.72
480000
0.64
440000
400000
0.56
360000
0.48
Quantile(Model)
f(x)
320000
0.4
280000
240000
0.32
200000
0.24
160000
0.16
120000
80000
0.08
40000
0
40000
80000
120000
160000
200000
240000
280000
320000
360000
400000
440000
480000
520000
560000
40000
80000
120000
160000
200000
240000
280000
x
Histogram Weibull
400000
440000
480000
520000
560000
P values
Critical
Status
0.01003
value
1%(0.1712
rov-
6)
Smirnov
0.159(2
Anderson 4.8786
%)
2.5018
Rejected
at
different
level
of
11.07
significance
Rejected at
different
level
of
Kolmogo
-Darling
Chi-
Test Statistic
360000
Weibull
320000
0.17122
16.301
0.00604
Squared
7.3.7 Inverse Gaussian Distribution:
significance
Table (7.3.7a)
Parameters
Values
Table (7.3.7b)
92896.0
89196.0
Q-Q Plot
560000
0.8
520000
0.72
480000
0.64
440000
400000
0.56
360000
0.48
f(x)
Quantile(Model)
320000
280000
240000
0.4
0.32
200000
0.24
160000
0.16
120000
80000
0.08
40000
0
40000
80000
120000
160000
200000
240000
280000
320000
360000
400000
440000
480000
520000
560000
40000
80000
120000
160000
200000
240000
280000
x
Inv. Gaussian
(a) Histogram of
320000
360000
400000
440000
480000
520000
560000
x
Histogram
Inv. Gaussian
Inverse Gaussian
distribution
Table (7.3.7c)
Test
Test
Kolmogorov-
Statistic
0.18289
P values
Critical
Status
0.00477
value
0.14274
Rejected
Smirnov
Anderson-
at
different
level
of
significance
3.7881
1%(3.9074)
Darling
Chi-Squared
3.2892(2%)
12.592
22.227
0.0011
significance
7.3.8
Rayleigh Distribution
Table (7.3.8a)
Parameters
Values
Table (7.3.8b)
71168.0
different
level
of
Q-Q Plot
560000
0.8
520000
0.72
480000
0.64
440000
400000
0.56
360000
0.48
Q
u
a
n
tile(M
o
d
e
l)
f(x)
320000
0.4
280000
240000
0.32
200000
0.24
160000
0.16
120000
80000
0.08
40000
0
40000
80000
120000
160000
200000
240000
280000
320000
360000
400000
440000
480000
520000
560000
40000
80000
120000
160000
200000
240000
280000
x
Histogram Rayleigh
Test
Kolmogorov-
Statistic
0.23978
Smirnov
400000
440000
8.0591
Darling
Chi-Squared
29.91
520000
560000
P values
Critical
Status
6.1559E-
value
0.14274
2.5018
12.592
4.0891E-
5
Generalized Extreme value Distribution
Table (7.3.9a)
Parameters
Values
Table (7.3.9b)
480000
Anderson-
7.3.9
360000
Rayleigh
320000
K
0.45104
26675.0
52585.0
Q-Q Plot
560000
0.8
520000
0.72
480000
0.64
440000
400000
0.56
360000
0.48
Q
u
a
n
tile(M
o
d
e
l)
f(x)
320000
0.4
280000
240000
0.32
200000
0.24
160000
0.16
120000
80000
0.08
40000
0
40000
80000
120000
160000
200000
240000
280000
320000
360000
400000
440000
480000
520000
560000
40000
80000
120000
160000
x
Histogram
200000
240000
280000
320000
360000
400000
440000
480000
520000
560000
Test
Test
Critical
Status
Kolmogorov-
Statistic
0.06397
values
0.8413
value
0.14274
2.5018
12.592
Smirnov
Anderson-
0.38742
Darling
Chi-Squared
6.0328
7.3.10
0.4195
3
Frechet Distribution
Table (7.3.10a)
Parameters
Values
Table (7.3.10b)
2.1416
57449.0
4775.4
Q-Q Plot
560000
0.8
520000
0.72
480000
0.64
440000
0.56
400000
360000
0.48
Q
u
a
n
tile(M
o
d
e
l)
f(x)
320000
0.4
280000
240000
0.32
200000
0.24
160000
0.16
120000
80000
0.08
40000
0
40000
80000
120000
160000
200000
240000
280000
320000
360000
400000
440000
480000
520000
560000
40000
80000
120000
160000
200000
240000
x
Histogram
280000
320000
360000
400000
440000
480000
520000
560000
Frechet (3P)
Frechet (3P)
Test
Test
P values
Critical
Status
Kolmogorov-
Statistic
0.06598
0.75404
value
0.1427
2.5018
12.592
Smirnov
Anderson-
0.38143
Darling
Chi-Squared
Conclusion
4.8019
0.5497
The log normal and Inverse Gaussian distribution is very sensitive to this data the Anderson
Darling and Kolmogorov Smirnove GOF is significant at 20% level of significance only while
the chi square test is rejected at different level of significance. The GEV distribution and Frechet
distribution is good fit to the data.
7.4
Table (7.4.1)
Statistic
Size
Range
Mean
Variance
Values
89
1067600
234200
43733000000
Percentiles
Min
5%
10%
25% (Q1)
Values
22400
76400.0
88190
117230
50% (Median)
75% (Q3)
157000
274350
90%
95%
Max
440680
760250
109000
Table (7.4.1 a)
4.2699E-6
Parameters
Values
Table (7.4.1b)
Probability Density Function
0.56
0.52
0.48
0.44
0.4
0.36
f(x)
0.32
0.28
0.24
0.2
0.16
0.12
0.08
0.04
0
100000
200000
300000
400000
500000
600000
x
Histogram
Exponential
700000
800000
900000
1E+6
1.1E+6
Q-Q Plot
1.1E+6
1E+6
900000
800000
700000
Q
uantile(M
odel)
600000
500000
400000
300000
200000
100000
100000
200000
300000
400000
500000
600000
700000
800000
900000
1E+6
1.1E+6
x
Exponential
Test
Kolmogorov-
Statistic
0.24434
P values
Critical
Status
3.6524E-5
value
0.14195
Rejected
Smirnov
at
6.1768
Darling
Chi-Squared
34.051
of
2.3256E-6
2.5018
Rejected
at
different
level
of
11.07
significance
Rejected at
different
level
of
significance
Gamma Distribution
MLE = 1.0e+05=1.1245
Parameters
Values
level
significance
Anderson-
7.4.2
different
4.2699E-6
1.8673E+5
f(x)
0.32
0.28
0.24
0.2
0.16
0.12
0.08
0.04
0
100000
200000
300000
400000
500000
600000
700000
800000
900000
1E+6
1.1E+6
x
Histogram
Gamma
Q-Q Plot
1.1E+6
1E+6
900000
800000
Quantile (Model)
700000
600000
500000
400000
300000
200000
100000
100000
200000
300000
400000
500000
600000
700000
800000
900000
x
Gamma
Test
Test Statistic
P values
Critical
value
Status
1E+6
1.1E+6
Kolmogorov-
0.19605
0.0018
0.14195
Smirnov
AndersonDarling
Chi-Squared
Rejected at different
level of significance
4.2518
22.805
3.6774E-4
2.5018
Rejected at different
11.07
level of significance
Rejected at different
level of significance
7.4.3
Normal Distribution
=2.0912E+5 =2.3420E+5
Parameters
Values
2.0912E+5
2.3420E+5= 2.0795E+5
2.3420E+5
f(x)
0.32
0.28
0.24
0.2
0.16
0.12
0.08
0.04
0
100000
200000
300000
400000
500000
600000
700000
800000
900000
1E+6
1.1E+6
x
Histogram
Normal
Q-Q Plot
1.1E+6
1E+6
900000
800000
Quantile (Model)
700000
600000
500000
400000
300000
200000
100000
100000
200000
300000
400000
500000
600000
700000
800000
900000
x
Normal
Test
Test Statistic
P values
Critical
value
Status
1E+6
1.1E+6
Kolmogorov-
0.2252
1.9015E-4
0.14195
Smirnov
Anderson-
Rejected
at
different level of
8.5233
0.14195
significance
Rejected
at
Darling
different level of
Chi-Squared
significance
Rejected
45.1
1.3845E-8
11.07
at
different level of
significance
7.4.4
Log-Normal Distribution
MLE
Parameters
Values
= 12.10510.6903
0.68641
12.105
f(x)
0.32
0.28
0.24
0.2
0.16
0.12
0.08
0.04
0
100000
200000
300000
400000
500000
600000
700000
800000
900000
1E+6
1.1E+6
x
Histogram
Lognormal
Q-Q Plot
1.1E+6
1E+6
900000
800000
Quantile (Model)
700000
600000
500000
400000
300000
200000
100000
100000
200000
300000
400000
500000
600000
700000
800000
900000
x
Lognormal
Test
Test Statistic
P values
Critical
value
Status
1E+6
1.1E+6
Kolmogorov-
0.08973
0.44483
0.14195
Smirnov
Anderson-
Accepted
at
different level of
0.93497
2.5018
significance
Accepted
at
Darling
different level of
Chi-Squared
significance
Accepted
11.882
0.06466
12.592
at
different level of
significance
7.4.5
Logistic Distribution
=1.1530E+5 =2.3420E+5
Parameters
Values
1.1530E+5
1.9489 0.8898
2.3420E+5
f(x)
0.32
0.28
0.24
0.2
0.16
0.12
0.08
0.04
0
100000
200000
300000
400000
500000
600000
700000
800000
900000
1E+6
1.1E+6
x
Histogram
Logistic
Q-Q Plot
1.1E+6
1E+6
900000
800000
Quantile (Model)
700000
600000
500000
400000
300000
200000
100000
100000
200000
300000
400000
500000
600000
700000
800000
900000
x
Logistic
Test
Test Statistic
P values
Critical
value
Status
1E+6
1.1E+6
Kolmogorov-
0.22296
2.2843E-4
0.14195
Smirnov
AndersonDarling
Chi-Squared
Rejected at different
level of significance
7.5814
40.742
1.0578E-7
2.5018
Rejected at different
11.07
level of significance
Rejected at different
level of significance
7.4.6
WeibullDistribution
=1.7178 =2.4515E+5
Parameters
Values
1.7178
MLE 2.5814E+5
2.4515E+5
f(x)
0.32
0.28
0.24
0.2
0.16
0.12
0.08
0.04
0
100000
200000
300000
400000
500000
600000
700000
800000
900000
1E+6
1.1E+6
x
Histogram
Weibull
Q-Q Plot
1.1E+6
1E+6
900000
800000
Quantile (Model)
700000
600000
500000
400000
300000
200000
100000
100000
200000
300000
400000
500000
600000
700000
800000
900000
x
Weibull
Test
Test Statistic
P values
Critical
value
Status
1E+6
1.1E+6
Kolmogorov-
0.13959
0.0564
Smirnov
20%(0.11186) Rejected
10%(0.12779) 10%
Anderson-
3.5537
Darling
Chi-Squared
15.287
0.01814
20%
level
significance
2%(0.15873)
accepted
1%(0.17031)
and 5%
2.5018
significance
Rejected at different
1%(16.812)
level of significance
Only accepted at 1%
15.033
levels of significance
at2%
at
=2.9373E+5 =2.3420E+5
Parameters
Values
2.9373E+5
of
5%(0.14195)
and
2.3420E+5
and
1%,2%
level of
f(x)
0.32
0.28
0.24
0.2
0.16
0.12
0.08
0.04
0
100000
200000
300000
400000
500000
600000
700000
800000
900000
1E+6
1.1E+6
x
Histogram
Inv. Gaussian
Q-Q Plot
1.1E+6
1E+6
900000
800000
Quantile (Model)
700000
600000
500000
400000
300000
200000
100000
100000
200000
300000
400000
500000
600000
700000
800000
900000
x
Inv. Gaussian
Test
Test Statistic
P values
Critical
value
Status
1E+6
1.1E+6
Kolmogorov-
0.12089
0.1365
Smirnov
Anderson-
Accepted at 5%,2%and1%
5%,2%and1%
20%(1.3749) Only rejected at 20% and
1.7242
Darling
1.9286 at
Chi-Squared
5%,2%and1%
5%(11.O7)
Accepted at different level
4.7111
0.45214
Accepted at 5%,2%and1%
of significance
7.4.8
Rayleigh Distribution
=1.8686E+5
Parameters
Values
MLE= 2.2146e+05
1.8686E
f(x)
0.32
0.28
0.24
0.2
0.16
0.12
0.08
0.04
0
100000
200000
300000
400000
500000
600000
700000
800000
900000
1E+6
1.1E+6
x
Histogram
Rayleigh
Q-Q Plot
1.1E+6
1E+6
900000
800000
Quantile (Model)
700000
600000
500000
400000
300000
200000
100000
100000
200000
300000
400000
500000
600000
700000
800000
900000
x
Rayleigh
Test
Test Statistic
P values
Critical
value
Status
1E+6
1.1E+6
Kolmogorov-
0.21416
4.6195E-4
0.14195
Smirnov
AndersonDarling
Chi-Squared
Rejected
at
different
level of significance
7.3189
15.801
0.00744
2.5018
Rejected
at
different
11.07
level of significance
Rejected at different
level of significance
7.4.9
K
0.3965
78946.0
1.3838E+5
f(x)
0.32
0.28
0.24
0.2
0.16
0.12
0.08
0.04
0
100000
200000
300000
400000
500000
600000
700000
800000
900000
1E+6
1.1E+6
x
Histogram
Q-Q Plot
1.1E+6
1E+6
900000
800000
Quantile (Model)
700000
600000
500000
400000
300000
200000
100000
100000
200000
300000
400000
500000
600000
700000
800000
900000
x
Gen. Extreme Value
Test
Test Statistic
P values
Critical
value
Status
1E+6
1.1E+6
Kolmogorov-
0.05547
0.93281
Smirnov
Anderson-
0.36255
Darling
1%(0.17031)
20%(1.3749)
10%(1.9286)
of significance
5%(2.5018)
2%(3.2892)
Chi-Squared
0.61399
0.99616
1%(3.9074)
20%(8.5581)
10%(10.645)
of significance
5%(12.592)
2%(15.033)
1%(16.812)
7.4.10
Parameters
Values
Frechet Distribution
1.3727
1.0992E+5
21478.0
f(x)
0.32
0.28
0.24
0.2
0.16
0.12
0.08
0.04
0
100000
200000
300000
400000
500000
600000
700000
800000
900000
1E+6
1.1E+6
x
Histogram
Frechet
Q-Q Plot
1.1E+6
1E+6
900000
800000
Quantile (Model)
700000
600000
500000
400000
300000
200000
100000
100000
200000
300000
400000
500000
600000
x
Frechet
700000
800000
900000
1E+6
1.1E+6
Test
Test Statistic
KolmogorovSmirnov
AndersonDarling
Chi-Squared
P values
Critical
Status
0.737
value
0.1419
Accepted at different
0.08345
level of significance
5.0444
3.4969
0.7444
2.5018
Accepted at different
12.592
level of significance
Accepted at different
level of significance
Conclusion
The log normal distribution seems to be a good fit for Mangla site. The GOF test is only rejected
at 20 % alpha for Inverse Gaussian distribution the chi square test is accepted at all level of
alpha so the IGD is also good fit to the data. The GEV and Frechet Distribution good fit to the
Mangla site.
7.5
Kohat
Statistic
Size
Range
Mean
Variance
Standard deviation
Coefficient
Variation
Standard error
Skewness
Excess Kurtosis
7.5.1
of
Values
36
94.89
143.74
494.72
Percentiles
Min
5%
10%
25% (Q1)
22.242
50% (Median)
0.15474
75% (Q3)
175.86
3.7071
1.0118
0.85693
90%
95%
Max
188.38
210.34
Exponential Distribution
Parameters
Values
0.0.00696
Values
115.45
116.53
126.57
139.5
155.25
0.36
0.32
0.28
f(x)
0.24
0.2
0.16
0.12
0.08
0.04
0
120
128
136
144
152
160
168
176
184
192
200
208
x
Histogram
Exponential
Q-Q Plot
208
200
192
184
Quantile (Model)
176
168
160
152
144
136
128
120
120
128
136
144
152
160
168
x
Exponential
176
184
192
200
208
Test
Test Statistic
Kolmogorov-
0.55209
Smirnov
P values
Critical
Status
2.0907E-
value
0.22119
Rejected at different
10
Anderson-
12.19
Darling
Chi-Squared
81.348
1.1102E-
level of significance
2.5018
Rejected at different
9.4877
level of significance
Rejected at different
16
7.5.2
level of significance
Gamma
Parameters
Values
41.766
3.4417
0.36
0.32
0.28
f(x)
0.24
0.2
0.16
0.12
0.08
0.04
0
120
128
136
144
152
160
168
176
168
176
184
192
200
208
x
Histogram
Gamma
Q-Q Plot
208
200
192
184
Quantile (Model)
176
168
160
152
144
136
128
120
120
128
136
144
152
160
x
Gamma
184
192
200
208
Test
Test Statistic
Kolmogorov-
0.13474
P values
Critical
Status
0.48888
value
0.22119
accepted at different
Smirnov
level of significance
Anderson-
0.61054
Darling
Chi-Squared
0.72191
0.9486
2.5018
accepted at different
9.4877S
level of significance
accepted at different
level of significance
7.5.3
Normal Distribution
Parameters
Values
143.74
22.242
Q-Q Plot
208
200
192
184
Quantile (Model)
176
168
160
152
144
136
128
120
120
128
136
144
152
160
168
176
184
192
200
208
x
Normal
0.36
0.32
0.28
f(x)
0.24
0.2
0.16
0.12
0.08
0.04
0
120
128
136
144
152
160
168
176
184
x
Histogram
Test
Test Statistic
P values
Normal
Critical
value
Status
192
200
208
Kolmogorov-
0.15494
0.31941
0.22119
Smirnov
AndersonDarling
Chi-Squared
accepted at different
level of significance
0.8509
4.9146
0.29618
2.5018
accepted at different
9.4877
level of significance
accepted at different
level of significance
7.5.4
Log-Normal Distribution
Parameters
Values
4.9572
0.14529
Q-Q Plot
208
200
192
184
Quantile (Model)
176
168
160
152
144
136
128
120
120
128
136
144
152
160
168
176
184
192
200
208
x
Lognormal (3P)
0.36
0.32
0.28
f(x)
0.24
0.2
0.16
0.12
0.08
0.04
0
120
128
136
144
152
160
168
x
Histogram
Lognormal (3P)
176
184
192
200
208
Test
Test Statistic
Kolmogorov-
0.085523
P values
Critical
Status
0.93628
value
0.22119
accepted
Smirnov
at
different
level of significance
Anderson-
0.28257
Darling
Chi-Squared
1.8966
0.59413
2.5180
accepted
at
different
7.8147
level of significance
accepted at different
level of significance
7.5.5
Logistic Distribution
Parameters
Values
12.265
143.74
Q-Q Plot
208
200
192
184
Quantile (Model)
176
168
160
152
144
136
128
120
120
128
136
144
152
160
168
176
184
192
200
208
x
Logistic
0.36
0.32
0.28
f(x)
0.24
0.2
0.16
0.12
0.08
0.04
0
120
128
136
144
152
160
168
x
Histogram
Logistic
176
184
192
200
208
Test
Test Statistic
Kolmogorov-
0.0852
P values
Critical
Status
0.93628
value
0.22119
Smirnov
of significance
Anderson-
0.28257
Darling
Chi-Squared
1.8966
0.59413
2.5018
7.8147
of significance
accepted at different level
of significance
7.5.6
Nakagami Distribution
Parameters
Values
M
9.4017
21143.0
Q-Q Plot
208
200
192
184
Quantile (Model)
176
168
160
152
144
136
128
120
120
128
136
144
152
160
168
176
184
192
200
208
x
Nakagami
0.36
0.32
0.28
f(x)
0.24
0.2
0.16
0.12
0.08
0.04
0
120
128
136
144
152
160
168
x
Histogram
Nakagami
176
184
192
200
208
Test
Test Statistic
Kolmogorov-
0.14202
P values
Critical
Status
0.42288
value
0.22119
accepted
at
Smirnov
different level of
Anderson-
significance
accepted
0.7219
2.5018
at
Darling
different level of
Chi-Squared
significance
accepted
4.7522
0.31368
9.4877
at
different level of
significance
7.5.7
Weibull Distribution
Parameters
Values
8.0821
150.33
Q-Q Plot
208
200
192
184
Quantile (Model)
176
168
160
152
144
136
128
120
120
128
136
144
152
160
168
176
184
192
200
208
x
Weibull
0.36
0.32
0.28
f(x)
0.24
0.2
0.16
0.12
0.08
0.04
0
120
128
136
144
152
160
168
x
Histogram
Weibull
176
184
192
200
208
Test
Test Statistic
Kolmogorov-
0.14942
P values
Critical
Status
0.36137
value
0.22119
accepted at different
Smirnov
level of significance
Anderson-
1.6686
Darling
Chi-Squared
3.27
0.35184
1.3749
accepted at different
7.8147
level of significance
accepted at different
level of significance
7.5.8
Parameters
Values
6003.5
143.74
Probability Density Function
0.36
0.32
0.28
f(x)
0.24
0.2
0.16
0.12
0.08
0.04
0
120
128
136
144
152
160
168
x
Histogram
Inv. Gaussian
176
184
192
200
208
Q-Q Plot
208
200
192
184
Quantile (Model)
176
168
160
152
144
136
128
120
120
128
136
144
152
160
168
176
184
192
200
208
x
Inv. Gaussian
Test
Test Statistic
P values
Critical
Status
Kolmogorov-
0.15524
0.31724
value
0.22119
Smirnov
of significance
Anderson-
0.73821
Darling
Chi-Squared
3.6163
0.46042
2.5018
9.4877
of significance
accepted at different level
of significance
7.5.9
Rayleigh Distribution
Parameters
Values
114.69
0.36
0.32
0.28
f(x)
0.24
0.2
0.16
0.12
0.08
0.04
0
120
128
136
144
152
160
168
176
168
176
184
192
200
208
x
Histogram
Rayleigh
Q-Q Plot
208
200
192
184
Quantile (Model)
176
168
160
152
144
136
128
120
120
128
136
144
152
160
184
192
200
208
x
Rayleigh
Test
Test Statistic
P values
Critical
Status
Kolmogorov-
0.39748
1.2129E-5
value
0.22119
Rejected at different
Smirnov
AndersonDarling
Chi-Squared
level of significance
7.1506
33.736
2.2525E-7
2.5018
Rejected at different
7.8147
level of significance
Rejected at different
level of significance
7.5.10
Parameters
Values
16.71
133.0
Test
Test Statistic
P values
Critical
Status
Kolmogorov-
0.07291
0.98347
value
0.22119
Rejected at different
Smirnov
AndersonDarling
Chi-Squared
level of significance
0.2401
0.21649
0.99455
2.5018
Rejected at different
9.4877
level of significance
Rejected at different
level of significance
7.5.11
Parameters
Values
Frechet Distribution
7.0341
107.41
25.242
0.36
0.32
0.28
f(x)
0.24
0.2
0.16
0.12
0.08
0.04
0
120
128
136
144
152
160
168
x
Histogram
Frechet (3P)
176
184
192
200
208
Q-Q Plot
208
200
192
184
Quantile (Model)
176
168
160
152
144
136
128
120
120
128
136
144
152
160
168
176
184
192
200
208
x
Frechet (3P)
Test
Test Statistic
KolmogorovSmirnov
AndersonDarling
Chi-Squared
P values
Critical
Status
0.9348
value
0.2212
Rejected at different
0.08548
level of significance
0.2981
1.6054
0.6581
2.5018
Rejected at different
7.8147
level of significance
Rejected at different
level of significance
Conclusion
The Gamma Distribution shows insignificant results by GOF test (Anderson Darling,
Kolmogorov Simirnove and chi square) and concludes as a good fit for the rainfall data of Kohat
Site. Normal Log normal logistic Nakagami and inverse Guassian Distribution are also support
the null hypothesis of all three GOF tests.
7.6
Statistic
Size
Range
Mean
Variance
Standard deviation
Coefficient
of
Values
56
98.59
126.98
470.81
21.698
0.17088
Percentiles
Min
5%
10%
25% (Q1)
50% (Median)
75% (Q3)
Values
80.84
90.476
99.366
111.61
124.37
143.27
Variation
Standard error
Skewness
Excess Kurtosis
7.6.1
2.8995
0.24919
-0.25919
Exponential Distribution
Parameters
Values
0.00788
90%
95%
Max
152.87
168.92
179.43
0.28
0.24
f(x)
0.2
0.16
0.12
0.08
0.04
0
80
90
100
110
120
130
140
150
160
170
180
x
Histogram
Exponential
Q-Q Plot
180
170
160
Quantile (Model)
150
140
130
120
110
100
90
80
80
90
100
110
120
130
140
150
160
x
Exponential
Test
Test Statistic
P values
Critical
value
Status
170
180
Kolmogorov-
0.47721
Smirnov
AndersonDarling
Chi-Squared
6.9056E-
0.17823
12
17.902
117.14
Rejected at different
level of significance
2.5018
Rejected at different
7.8147
level of significance
Rejected at different
level of significance
7.6.2
Gamma Distribution
Parameters
Values
34.247
3.7077
0.28
0.24
f(x)
0.2
0.16
0.12
0.08
0.04
0
80
90
100
110
120
130
140
150
160
170
180
x
Histogram
Gamma
Q-Q Plot
180
170
160
Quantile (Model)
150
140
130
120
110
100
90
80
80
90
100
110
120
130
140
150
160
x
Gamma
Test
Test Statistic
P values
Critical
value
Status
170
180
Kolmogorov-
0.07178
0.9152
0.17825
Smirnov
AndersonDarling
Chi-Squared
accepted at different
level of significance
0.27545
3.103
0.68411
2.5018
accepted at different
11.07
level of significance
accepted at different
level of significance
7.6.3
Normal Distribution
Parameters
Values
21.698
126.98
0.28
0.24
f(x)
0.2
0.16
0.12
0.08
0.04
0
80
90
100
110
120
130
140
150
160
170
180
x
Histogram
Normal
Q-Q Plot
180
170
160
Quantile (Model)
150
140
130
120
110
100
90
80
80
90
100
110
120
130
140
150
160
x
Normal
Test
Test Statistic
P values
Critical
value
status
170
180
Kolmogorov-
0.07918
0.8466
0.17823
Smirnov
AndersonDarling
Chi-Squared
accepted at different
level of significance
0.36388
3.7319
0.4435
2.5018
accepted at different
9.4877
level of significance
accepted at different
level of significance
7.6.4
Log-Normal Distribution
Parameters
Values
0.17118
4.8295
0.28
0.24
f(x)
0.2
0.16
0.12
0.08
0.04
0
80
90
100
110
120
130
140
150
160
170
180
x
Histogram
Lognormal
Q-Q Plot
180
170
160
Quantile (Model)
150
140
130
120
110
100
90
80
80
90
100
110
120
130
140
150
160
x
Lognormal
Test
Test Statistic
P values
Critical
value
status
170
180
Kolmogorov-
0.07666
0.87204
0.17823
Smirnov
AndersonDarling
Chi-Squared
accepted at different
level of significance
0.2812
2.5752
0.76513
2.5018
accepted at different
11.07
level of significance
accepted at different
level of significance
7.6.5
Logistic Distribution
Parameters
Values
126.98
0.28
0.24
f(x)
0.2
0.16
0.12
0.08
0.04
0
80
90
100
110
120
130
140
150
160
170
180
x
Histogram
Logistic
Q-Q Plot
180
170
160
Quantile (Model)
150
140
130
120
110
100
90
80
80
90
100
110
120
130
140
150
160
x
Logistic
Test
Test Statistic
P values
Critical
value
status
170
180
Kolmogorov-
0.10072
0.5856
0.17823
Smirnov
AndersonDarling
Chi-Squared
accepted at different
level of significance
0.55964
5.4454
0.24457
2.55964
accepted at different
9.4877
level of significance
accepted at different
level of significance
7.6.6
Weibull Distribution
Parameters
Values
6.9575
134.57
0.28
0.24
f(x)
0.2
0.16
0.12
0.08
0.04
0
80
90
100
110
120
130
140
150
160
170
180
x
Histogram
Weibull
Q-Q Plot
180
170
160
Quantile (Model)
150
140
130
120
110
100
90
80
80
90
100
110
120
130
140
150
160
x
Weibull
Test
Test Statistic
P values
Critical
value
status
170
180
Kolmogorov-
0.09036
0.71625
0.17823
Smirnov
AndersonDarling
Chi-Squared
accepted at different
level of significance
0.77165
6.3995
0.17123
2.5018
accepted at different
9.4877
level of significance
accepted at different
level of significance
7.6.7
Parameters
Values
4348.7
126.98
0.28
0.24
f(x)
0.2
0.16
0.12
0.08
0.04
0
80
90
100
110
120
130
140
150
160
170
180
x
Histogram
Inv. Gaussian
Q-Q Plot
180
170
160
Quantile (Model)
150
140
130
120
110
100
90
80
80
90
100
110
120
130
140
150
160
x
Inv. Gaussian
Test
Test Statistic
P values
Critical
value
status
170
180
Kolmogorov-
0.08824
0.74246
0.17823
Smirnov
AndersonDarling
Chi-Squared
accepted at different
level of significance
0.42201
5.2269
0.38882
2.5018
accepted at different
11.07
level of significance
accepted at different
level of significance
7.6.8
Rayleigh Distribution
Parameters
Values
101.32
0.28
0.24
f(x)
0.2
0.16
0.12
0.08
0.04
0
80
90
100
110
120
130
140
150
160
170
180
x
Histogram
Rayleigh
Q-Q Plot
180
170
160
Quantile (Model)
150
140
130
120
110
100
90
80
80
90
100
110
120
130
140
150
160
x
Rayleigh
Test
Test Statistic
P values
Critical
value
status
170
180
Kolmogorov-
0.30846
3.1424E-5
0.17823
Smirnov
AndersonDarling
Chi-Squared
level of significance
9.6572
49.919
8.3130E-
2.5018
Rejected at different
7.8147
level of significance
Rejected at different
11
7.6.9
Rejected at different
level of significance
Parameters
Values
k
0.18291
20.578
118.3
0.28
0.24
f(x)
0.2
0.16
0.12
0.08
0.04
0
80
90
100
110
120
130
140
150
160
170
180
x
Histogram
Q-Q Plot
180
170
160
Quantile (Model)
150
140
130
120
110
100
90
80
80
90
100
110
120
130
140
150
160
x
Gen. Extreme Value
Test
Test Statistic
P values
Critical
value
status
170
180
Kolmogorov-
0.06799
0.94239
0.17823
Smirnov
AndersonDarling
Chi-Squared
accepted at different
level of significance
0.2588
3.0613
0.69054
2.5018
accepted at different
11.07
level of significance
accepted at different
level of significance
7.6.10
Parameters
Values
Frechet Distribution
7.0341
107.41
25.242
0.28
0.24
f(x)
0.2
0.16
0.12
0.08
0.04
0
80
90
100
110
120
130
140
150
160
170
180
x
Histogram
Frechet
Q-Q Plot
180
170
160
Quantile (Model)
150
140
130
120
110
100
90
80
80
90
100
110
120
130
140
150
160
x
Frechet
Test
Test Statistic
P values
Critical
value
Status
170
180
KolmogorovSmirnov
AndersonDarling
Chi-Squared
0.4537
0.1782
0.1117
level of significance
1.188
2.402
Rejected at different
0.8912
2.5018
Rejected at different
11.07
level of significance
Rejected at different
level of significance
Conclusion
The Gamma distribution gave the results of goodness of fit tests in the favour of null hypothesis
which seems to be a good fit for Muzafarabad site. Normal and log-normal both distributions are
significant for this data. The Generalized extreme value distribution, inverse Gaussian and
Weibull distributions are also found to be good for Muzafarabad rainfall station.
Over all Conclusion
The thirteen distributions are selected for carried out the analysis of average annually rainfall and
annually flood peaks data. Among these distributions some are found to be appropriate for the
rainfall data and some are good for flood peaks. Three distributions including Inverse Gaussian,
Generalized Extreme Value and Frechet distributions are appropriate for the flood peaks data.
And the Gamma distribution, Inverse Gaussian, Log Normal and weibull distribution found to be
good for rainfall data. The Inverse Gaussian Distribution can be used for both, the rainfall and
flood data.
Chapter 8
8.1
Introduction
In section 3 we have estimated the worst rainfall and flood after n years. On the basis of this
estimation one can make decisions to overcome the ultimate disasters. A higher level of levee
around the river, the renovation of dams, construction of buildings and emergency protection can
be made on the basis of this estimation.
8.2
Let us consider the governmental decision as to whether or not to fund building or raising to a
higher level a Levee along a river which is prone to annual flooding. There are three options of
i
construction of building or raising the levee by an amount
$Ci
at a total cost of
where
0 1 2 3 and C1 C2 C3
The annual maximum flood heights for n years in the future are represented by the random
X
j
variables
Xj
j 1
where
i
the maximum height of the flood in j years. For the ith selection
is the flood level the levee can control and thereby prevent damage to farm land and residences.
During the jth year the damage that will be sustained is a function of
X j i
( X j i )
X j i
j i
(8.2.1)
And the number of hectares, inundated (which for simplicity we assume to be proportional to the
water excess over the levee.
a X j i
Hectares inundated
However there is a pay-off annually to the populace and government, when there have been no
homes destroyed and no flooding damage so harvesting is accomplished. The value, when no
damage occurs, again for simplicity assuming proportionality, is $h = [ hectare]. Here h is the
average monitory crop values considering the variety of crops with different worth. This gives
h.a. X j i
$Ci
This loss must be measured against the cost of preventing flooding, namely,
i
the designated levee height
we have
h.a. X
i 1
Ci
(8.2.2)
Assumption I
j 1
Let the
spring flood is stationary and does not change over time.. In this case we write X ~ F and then
we must compare values to see if the value of the crops and homes saved exceeds the amortized
cost, namely,
i 1
Ci
n.h.a
i 1, 2,3
(8.2.3)
E x x dF x
E x xdF x dF x
E x E X dF x F
E x E X F x dx
(8.2.4)
A closed form for the equation (7.2.1) is difficult for distributions which can truly reflect the
annual flood distribution and one must often resort to numerical integration for evaluation. If no
flood exceeds the levee for a number of years then we have a positive return to the state economy
when for some period of n years
E X
Ci
n.a.h
i 1, 2,3
(8.2.5)
max X j
n
j 1
i i 1, 2,3
X
j
Yn max X j
n
j 1
Since
P Yn F n
j 1
So this is the probability the construction of the levee will be positive. Thus based on records of
flood heights over the proceeding m years we can estimate the parameters of a chosen
representative distribution such as perhaps the Pareto distribution, first used by the Italian
Economist Vilfredo Pareto which can reflect a lot of variation.
8.3
Pareto Distribution
The Pareto distribution is widely used in literature as measured a distribution of income and a
population of the city within specific years. A random variable X is said to be a Pareto
distribution.
f x 1
x
; x ; , 0
(8.3.1)
F x 1
F x e
(8.3.2)
ln
(8.3.3)
7.4
and
The estimates of
by using the ML method are provide those values which make the
L ( x, , )
likelihood function
L ( x, , )
i 1
xi 1
(8.4.4)
The calculus methods are mostly involved for maximizing the functions. However, here we dont
have any need of calculus work to observe that with the increase in k the likelihood function gets
very large beyond the bound. Because the value of k can never be larger of minimum of Xj for
all j = 1,2,3,4,n The best way to maximize the L(x,,) is to formulate k as follows:
K min X j
n
j 1
(8.4.5)
L ( x, , )
Since
L ( x, , )
LogL( x, , )
because differentiate of
so the log
likelihood function is
n
(8.4.6)
i 1
Equating to zero
LogL( x, , )
0
We get
(8.4.7)
n
n
n log log( xi ) 0
i 1
n
n
log( xi ) n log
i1
n
n
n
log( xi ) log
i1
i 1
n
n
log( xi ) log
i1
x
log i
i 1
n
(8.4.8)
min x
i
and
8.5
x
log i
i 1
n
The variance and covariance can be found by using the information matrix
Information matrix is the expectation of negative Hessian Matrix. Munir et
al(2012) i.e
2L
I E
i j
where i, j 1, 2,3,...
2L
2
I E 2
L
2L
2L
2
(7.5.1)
n
2
I E
n
n
2
(8.5.2)
We have
var I 1
Where
n
2
n
2
I
n
2
n 1 3
and
2
n 1 3
n 1
COV ,
8.6
(8.5.3)
(8.5.4)
(8.5.5)
Method of Moments
x f x dx
r
(8.6.1)
1 n r
mr X i , r
n i 1
(8.6.2)
To find the estimates through the method of moments we equate the sample moments to the
population.
After some simplification we get
The rth moment is
r E X
r 1
dx
(8.6.3)
After solving equation () we get
r
r
(8.6.4)
and 2
2
2
(8.6.5)
1 m1 and 2 m2
We equate
m12 m2
m2
m12 m12 m2
m12 m2
(8.6.6)
m2 1
m1
m12 m2
2
1
m2 m12 m12 m2
(8.6.7)
8.7
Let us consider a matrix R of order 2x2 to find the asymptotic variance and covariance of the
estimates.
r11
r21
r12
r22
r11
Where
, r12 1
(8.7.1)
r21
and
, r22 2
(8.7.2)
1
R
2
2
1
1
and
1 2
2 2
2
2 2
2 2
2
2
2
(8.7.3)
(8.7.4)
(8.7.5)
Now we consider
RVRT V R 1 RT
(8.7.6)
var
cov ,
cov ,
var
(8.7.7)
and
var m1
cov m1, m2
var m2
(8.7.8)
1 2
2 2
var m1
cov m , m
1
2
cov m1 , m2
var m2
2 2
2
(8.7.9)
var
1 2
2
4 4
(8.7.10)
var
4 2 2
(8.7.11)
Cov ,
1 2 4 2 var m1 1
4 3
(8.7.12)
8.8
Method of Probability weighted method (MPWM) is also used to estimate the parameters of
Pareto distribution.
r E X 1 F X
(8.8.1)
X r
r E X r E X 1 r
(8.8.2)
Where
E X 1 r
r 1 r 1
(8.8.3)
r
r 1
Terbela
MOM
MOM
MLE
5.108
308867.051
3.1168
MLE
272000
6
Mangla
2.1262
140747.95
0.47887
22400
Shahdara
2.433
52535.04
0.87469
22205
Marala
4.8892
218760.1144 1.0463
109780
MOM
MOM
MLE
MLE
Muzaffarabad
0.1430
130.566
2.288
80.84
3.4601
16.8431
1.7071
10.43
Risalpur
4.3830
268.8135
1.1092
130.67
Dir
10.004
120.1379
3.8409
106.32
Kohat
7.6423
124.9403
4.8004
115.45
Balakot
5.0974
333.8674
1.6041
230.67
The figure is taken from Pakistan Flood Impact Assessment report 2011-2012.
REFERENCES
References
Gumble, E.J (1941). The return period of flood flows. Annual mathematical Statistics,
12,163-190.
Huff, F.A. and J.C neill(1959), comparison of several methods for rainfall frequency
A,U.S.Geological Survey,Reston,Va.
Markovic , R.D.(1965).probability function of best fit distributin of annual precipitation
paper
1543-
quantiles,Water Resour.Res.15(5),1055-1064,dio:10.1029/WR015i005p01055
Landwehr, J.M. Matalas, N.C. and Wallis, J.R.(1979 a). Probability weighted moments
compared with some traditional techniques in estimating gumble parameters and
flood peaks data of rivers in Anatolia." Journal of Hydrology 136.1 (1992): 1-31.
10 Aksoy , H.(2000).Use of gamma distribution in Hydrological analysis. Turkish journal of
Engineering and Environmental Sciencess,24,419-428.
11 Koutsoyiannis,D and Baloutsos,G.(2000).Analysis of a long record of annual
inferences.Natural Hazards,29,29-48.
12 Park, J.S. and Jung H.S., Kim, R.S. and Oh, J.H.Modelling summer extreme rainfall over
Korean peninsula using Wakeby distribution.International Journal of Climatology,
21(11),1471-1384.
13 Kuczera,G.(2001).Comprehensian inference.Journal of Hydrology,Volum 248, Issues 14,Pages 152-167.
14 Pathak, C.S.(2001).Frequency analysis of short duration rainfall for central and south
Florida. Section-1, page 178,Bridging the gap: Meeting the Worlds Water and
Environmental resource challenges.
15 Zalina, M.D., Desa, M.N.M.,Nguyen, V.T.V. and Kassim, A.H.M.(2002).Selecting a
probability distribution for
21 Barbson, B.B and Palutikof, J.P (1999) : test of generalized pareto distribution for
predicting extreme wind speeds. Journal of applied Meteorology, 39(9), 1627-1640
22 Thompson, Eric M., Laurie G. Baise, and Richard M. Vogel. "A global index earthquake
approach to probabilistic assessment of extremes." Journal of Geophysical Research:
Solid Earth (19782012) 112.B6 (2007).
23 Soliman, A.A., Abd Ellah, A.H. & Sultan, K. S. (2006). Comparison of estimates using
record Statistics from Weibull model: Bayesian and non-Bayesian approaches.
Computational Statistics and Data Analysis, 51, 2065-2077.
24 Khan, M.S., Pasha, G.R. & Pasha, A.H. (2008).Theoretical analysis of inverse Weibull
distribution. Wseas Transactions on Mathematics, 7(2).
25 Kao, ShihChieh, and Rao S. Govindaraju. "Trivariate statistical analysis of extreme
rainfall events via the Plackett family of copulas." Water Resources Research 44.2
(2008).
26 Sultan, K.S. (2008). Bayesian estimates based on record values from the inverse Weibull
lifetime model. Quality Technology & Quality Management, 5(4), 363 374.
27 Kwon, Young-Moon, Jeong-Woo Han, and Tae-Woong Kim. "Application of bivariate
frequency analysis for estimating design rainfalls." World Environmental & Water
Resources Congress. 2008.
28 Weiss, Jrme, and Pietro Bernardara. "Comparison of local indices for regional
frequency analysis with an application to extreme skew surges." Water Resources
Research 49.5 (2013): 2940-2951.
29 Hamdi, Y., et al. "Extreme storm surges: a comparative study of frequency analysis
approaches." Natural Hazards and Earth System Science 14.8 (2014): 2053-2067.
30 Meeyaem. K & Polpinit(2014). Mathematical Model for Flood Forecasting of the Chi
River
Basin. DOI: 10.7763/IPCBEE. 2014 . V63. 2
Change, I. P. O. C. (2007). Climate change 2007: The physical science basis. Agenda,
6(07), 333.