Statistics For Spatial Analysis: Slides Are Based On Notes of Shri. S.K. Mittal

Statistics for Spatial Analysis
Slides are based on Notes of Shri. S.K. Mittal
STATISTICS
The word `Statistics' has been derived from the Latin word
`Status, the Italian word `Statista' and the German word
`Statistik. Meaning of these words is a `political state' or a
`Government.
Presently, the word statistics is used in two different, but interrelated, ways, viz. (i) as a plural noun, and (ii) as a singular
noun.
As a Plural noun - When used as a plural noun, the word
`statistics' means statistical data. Prof. Horace Secrist defines
statistics in this sense as given below :
By Statistics, we mean aggregate of facts affected to a
marked extent by the multiplicity of causes, numerically
expressed, enumerated or estimated according to reasonable
standard of accuracy, collected in a systematic manner for a
pre-determined purpose and placed in relation to each other.
From the definition of statistics, we

following characteristics :
observe the
CHARACTERISTICS OF STATISTICAL DATA
1.
2.
3.
4.
5.
6.
7.
Aggregates of facts
Numerically expressed
Affected by multiplicity of causes
Estimated according to reasonable standards of
accuracy
Collected in a systematic manner
Collected for a pre-determined purpose
Placed in relation to each other
As a singular noun. As a singular noun, statistics refers

to a science which deals with the methods of collection,
classifying, presenting, comparing and interpreting
numerical data.
In this sense, statistics is also known as `statistical
methods'. The important statistical methods are as follows
STATISTICAL METHODS
1.
Collection of data
2b.
Classification of data
3b.
Presentation of data
3a/4. Analysis of data
2a/5. Interpretation of data
6.
Forecasting of data
We conclude the followings

When used in the sense of data, `statistics' are
numerical statement of facts, capable of further analysis and
interpretation and when used as a science, it is concerned with
the principles and methods used in the collection,
presentation, analysis and interpretation of numerical data in a
sphere of enquiry.
Statistical methods are growing in popularity and are
being widely used in every branch of knowledge. But they
cannot be applied to all kinds of phenomena and cannot
answer all our doubts. They also suffer from various
limitations.
LIMITATIONS OF STATISTICS
1. Does not deal with individual facts
2. Ignores the qualitative aspects
3. Is not an end in itself
4. Can be misused
5. Good understanding is required
COLLECTION OF DATA
A sound structure of statistical investigation is based on a
systematic collection of data. Data is generally classified in two
groups, viz.
(a) internal data and
(b) external data.
Internal data come from the internal records related to
operations of a business firm, records of production, purchase
and the accounting system. This is generally associated with
the organizational and functional activities of the firm. The
internal data can be either insufficient or inappropriate for the
problem under investigation, thus we need external data to
make decisions. The external data are collected and published
by agency external to the enterprise. The external data can be
collected either from the Primary or the Secondary source.
Primary and Secondary Data

The primary data is one, which is collected by the
investigator himself for the first time. In India there are various
agency which collect primary data: National Sample Survey is one
of them.
The secondary data is one, which has already been collected
by a source other than collected by the present investigator.
We may collect the data ourselves but somebody else decides to
make use of this data. The same data will be primary data for us
but secondary for others who make use of data.
Similarly, in order to compare the cost of living in Delhi and
Bombay, we may decide to make use of the data published in `The
Economic Times - here we will be making use of the secondary
data.
DISTINCTION BETWEEN PRIMARY AND SECONDARY DATA

S.No Basis
Primary Data
Secondary Data
1.
Originality
It is original,
It is not original.
because
the
investigator The investigator makes use of
himself collects the data
the data collected by other
agencies.
2.
Collection
It involves large expenses in It is relatively a less costly

terms of time, energy and method.
money.
3.
Suitability
If the data has been collected in It may or may not suit the objects
a systematic manner its of enquiry.
suitability will be positive.
4.
Precautions No extra precautions need be It should be used with care.

taken in making use of this
data.
Methods of collecting Primary data

I. Direct Personal Investigation
II. Indirect Oral Investigation
III. Information through correspondents, and
IV. The Questionnaire Method
Source of secondary data

The chief source of secondary data can be classified
into two groups viz.
(a) Published and
(b) Unpublished
Precautions in the use of secondary data
1. Whether the data are reliable. In order to know the
reliability of data, the integrity and experience of the
collecting organization, the purpose, method of
collection, degree of accuracy and test-checking
must be ascertained.
2. Whether the data are suitable for the purpose?
3. Whether the data are adequate?
Tabular form of data

105
93
97
101
115
149
135
120
130
140
110
93
109
113
98
111
100
102
107
103
90
142
111
108
102
109
107
119
113
96
120
135
91
110
117
104
105
120
114
92
110
120
102
92
114
99
112
107
99
100
115
115
90
136
110
106
123
109
114
109
117
114
98
106
110
104
134
109
127
113
119
113
116
124
123
110
136
132
116
108
121
112
141
109
116
109
141
117
134
98
92
110
109
122
109
97
93
107
104
108
87
89
121
111
110
103
114
113
150
156
104
117
114
110
121
107
106
114
142
114
120
112
116
109
111
113
114
98
113
112
121
99
109
123
111
116
104
99
109
117
109
109
110
97
105
102
109
101
97
103
Grouped Frequency Distribution Table

Class
Interval
Class
Interval
87 - 91
122-126
92 - 96
127-131
97-101
15
132-136
102-106
18
137-141
107-111
38
142-146
112-116
28
147-151
117-121
16
152-156
N=
150
Note that in computations involving classified distribution, the

midpoint will be used to substitute for each score in the interval. For
this reason, we recommend the choice of an odd number for i whenever
possible. Nothing is sacred about this suggestions, it just makes the
midpoint a whole number of units, thus simplifying computation.
Cumulative Distribution Table

Class interval
Cum
Cum %
87-91
92-96
13
97-101
15
28
19
102-106
18
46
31
107-111
38
84
56
112-116
28
112
75
117-121
16
128
85
122-126
133
89
127-131
135
90
132-136
142
95
137-141
145
97
142-146
147
98
147-151
149
99
152-156
150
100
N = 150
The Cumulative Distribution

Arranging data into a cumulative distribution is really helpful. It
allows us to obtain the number (or the proportion) of cases in a distribution below
or above each class interval (or boundary).
Cumulative Distribution Table
Class interval
Cum
Cum %
87-91
92-96
13
97-101
15
28
19
102-106
18
46
31
107-111
38
84
56
112-116
28
112
75
117-121
16
128
85
122-126
133
89
127-131
135
90
132-136
142
95
137-141
145
97
142-146
147
98
147-151
149
99
152-156
150
100
Graphic Techniques
There are always some people who would rather not read
tables, who could understand the information better if it were
presented in pictorial form.
Our prehistoric ancestors
undoubtedly knew this when they made the first cave drawings.
Similarly, the Egyptians, Greeks and Romans used drawings and
sculptures to convey information about their respective societies.
Thus, art was used to carry information throughout the ages. Art
is also valuable to us in describing information.
Graphs, the pictorial forms that follow, are not meant to
substitute for tabular construction. Rather they are meant as
visual aids that help us to describe and think about the shape
of the distribution. In fact, you cannot plan or construct a graph
until you have prepared the corresponding table. The graphic
forms shown here correspond to both qualitative and quantitative
distributions.
The Histogram
Graphic equivalent of the grouped distribution for intervallevel data. It consists of a set of adjacent bars whose heights are
proportional to either the absolute frequencies or to the proportions of
cases in each interval of the variable.
The most noticeable feature of the histogram is its structural
simplicity. Bars are understood more easily than numbers. The
histogram shows the relative concentration of data in each interval as
well as the shape of the distribution.
The Polygon
It is easy to convert a histogram into the much-used
polygon. All we need to do is to connect the midpoints of
the tops of the bars with straight lines.
Polygons are particularly useful when we wish to present a
comparison of two or more distribution on the same graph.
They do not blur their respective outlines, as histograms
do.
The Ogive
When a graph is used to present a cumulative
percentage distribution, it is called an ogive. The ogive
is constructed on a pair of perpendicular axes, just like
the polygon.
The horizontal axis represents the values for the
upper true limits of each class interval, and the vertical
axis indicates the percentage of observations for each
interval.
A dot is then placed directly above the upper true limit
of the class boundary, at whatever height if
appropriate, to indicate the proportion of cases less
than the upper true limit of the interval. After plotting
all interval values with their corresponding
percentages, the dots are joined by straight lines.
MEASUREMENTS
MEASURES OF CENTRAL TENDENCY
A central tendency is a single figure that represents
whole of distribution. Individual observations in a
distribution have the general characteristics of showing a
tendency to concentrate at certain values usually
somewhere in the centre of the distribution.
A central tendency will represent whole of the
distribution. Thus, we talk of average per capita income of
India, average size of holdings in India, average
productivity of labour in India, average cost of production
of cloth, average life of an India, etc.
Three important measures of central tendency are mean,
median and mode.
Arithmetic Mean
Arithmetic mean, or simply known as
`mean', is the most commonly used of all
averages, e.g., we frequently talk of average
monthly income, average monthly expenditure,
average marks secured by the students, average
petrol consumption of car or scooter in a day,
average productivity per farm, average bonus
paid, etc.
Arithmetic mean is defined as the

sum of values of a group of items
divided by the number of items.
_
X=X/N
Median
The effect of an extreme value can be avoided if we
take a measure of central position in a given series. This
position measure is called the median.
Median is a value which divides the series into two
equal parts. Thus if we have the median value, the number
of items less than this value and the number of items more
than this value will be equal.
To get the median value, we make use of the
following formula :
M = Size of (N+1)/2 th item
where M stands for median, and N for the number of items
in the series.
Arithmetic mean is a good measure of central tendency when

we are interested in finding the average value of any variate,
e.g., average revenue, average cost, average productivity etc.
Similarly, median is a good measure when the spread of items
may be more on one side of the distribution. Median is also
useful in those cases where the items are not capable of
measurement in definite units e.g. quantities like intelligence,
health etc.
Mode
A third important measure of central tendency is
called mode, which is denoted as Z. Mode is the most
common value found in a series.
For example, the daily wages of labourers employed in
Defence Colony are Rs. 80, 85, 86, 86, 86, 87, 89, 90. The
modal wage will be Rs.86 because it is most commonly
found or it occurs most frequently.
Relationship between Mean, Median and Mode

Mean, median and mode have their distinct role in statistical
analysis. In no case they can be substituted for one another. In a
moderately asymmetrical distribution, the following relationship exists.
Mode = 3 Median - 2 Mean

Comparative Evaluation of Characteristics of Mean, Median and mode
S.
No.
Characteristics
Measures of Central Tendency

Mean
Median
Mode
It is rigidly defined
Yes
Yes
No
It is situated in the centre of the

distribution
No
Yes
Yes
It is easily understandable
Yes
Yes
Yes
Its calculation is easy
Yes
Yes
No
It is based on all the observations
Yes
No
No
It is capable of further mathematical

treatment
Yes
No
No
It is affected by the choice of sample
Yes
No
No
It is affected by extreme values
Yes
No
No
It can be represented graphically
No
Yes
Yes
MEAN DEVIATION
Mean deviation shows the scatter around in average.
It is like measuring the scatter of the population of a city.
Some people live close to the centre of the city and others at
varying distances. Their average distance from the centre
indicates how scattered or dispersed they are.
Mean deviation is defined as an average or mean of
the deviations of the values from the central tendency. The
central tendency used can be either arithmetic mean or
median. Here we take mean for the calculation of mean
deviations.
M.D. = dx /N
Coefficient of Mean Deviation = M.D / Median
STANDARD DEVIATION
It is another related measure of variation. In mean
deviation we can take the sum of deviations after ignoring their
plus and minus signs. In standard deviation we achieve the same
effect in another way. We square up all the deviations; the
squared deviations will always be positive.
Standard deviation is the square root of the arithmetic mean of
the squared deviations.
Standard deviation is generally
expressed as (read standard deviation sigma).
= (dx2 /N)
CORRELATION
Measure of central tendency, dispersion and skewness
describe the nature of distribution relating to a single variable.
One may also be interested in studying relationship between two
and more variables e.g., income and consumption; price and
demand; quantity of input and output are related variables;
productivity and wage also depends upon each other.
Two variables may be positively related or negatively
related.
price and supply are positively correlated.

price and demand are negatively correlated.
price index & dearness allowance- positively correlated.
strikes and rate of production - negatively correlated.
Methods of Measuring Correlation

It is not sufficient only to know that there exists
correlation between two variables, it is also necessary to
quantify the extent of correlation. For our first purpose we
make use of scatter diagrams, and for our second purpose
we need define the value of co-efficient of correlation.
Scatter Diagram :
A simple measure of correlation between two
variables is obtained by the use of scatter diagrams.
Values of the independent variable are measured on X-axis
in a graph, and values of the dependent variable are
measured on Y-axis. The two values are then plotted in the
graph in the form of dots. When every dot representing a
pair of figures has been plotted, we get a scatter diagram.
Coefficient of Correlation
The mathematical technique which describes the
covariance in ratio terms is known as co-efficient of correlation.
The co-efficient of correlation was initially conceived by
statistician, Karl Pearson.
Karl Pearson's coefficient of
correlation (also known as product-moment co-efficient)
generally denoted by `r' is expressed as follows :
dx dy
r = ------------N x y
dxdy is the sum of the products of deviations of

respective observations in x and y series.
N is the number of items
x is the standard deviation of x series, and
y is the standard deviation of y series
The values of r determine the degree of correlation

between two variables.
r always lies between minus one to plus one.
Value of r
Degree of correlation between two variables
-1
Perfectly negative
+1
Perfectly positive
No relation
0.10 to
0.25
Low degree of correlation
0.30 to
0.55
Moderate correlation
0.60 to
0.99
High correlation
If the sign before r is minus, it will be negative correlation,

and if the sign is plus, it will be positive correlation.
RANK CORRELATION
Prof. Charles Spearman has conceived another coefficient of correlation.
This co-efficient is expressed as R and is based on the ranking
of the various values of the two variables.
6 D2
R = 1 - ----------N(N2 -1)
REGRESSION
The term regression was first used by Sir Francis Galton
in his studies of Inheritance of Stature. He, along with his
friend, Karl Pearson, studied the heights of 1,078 sons along with
the heights of their fathers. It was found out that the tall fathers
tend to have tall sons and the short fathers tend to have short sons
but the average, height of sons of tall fathers was less than the
height of their fathers, the average height of short sons was more
than the average height of their fathers. Galton named this
tendency as `regression'.
It is used to explain the value of one variable with respect to the
value of other variable. It explains the functional relationship
between the two variables.
The relationship is explained with the help of regression lines.
Regression Lines
The line which shows the functional relationship between the two
variables is known as the ' line of best fit '. Since there are two variables,
X and Y, therefore, there are two regression lines.
Regression line of X on Y explains the functional relationship of X when
the value of Y variable is given, whereas, the regression line of Y on X
explains the functional relationship of Y when the value of X variable is
given.
Regression lines and the Coefficient of Correlation

The regression lines help in estimating the nature and the type of
correlation between the two variables.
If the two lines of regression overlap each other the correlation is said to be
perfect correlation. If both the lines intersect at right angles, there is no
correlation at all.
The slope of the lines determines the nature of correlation, if the slope of
the lines is positive the correlation is said to be positive and vice versa. The
degree of correlation can be ascertained with the help of the angles formed
by the two lines.
Regression equations
The regression equations explain the functional relationship
between the two variables. As there are two regression lines, there are
two regression equations.
i) Regression equation of X on Y:
In this equation the probable values of X are estimated with the
help of independent variable Y. Plotting these values on the graph paper
we get the line known as regression of X on Y.
ii) Regression equation of Y on X:

This equation is used in order to estimate the values of Y when
the values of X are given, here the values of Y are dependent on the
values of X. The line showing this relationship is known as regression
line of Y on X.
Method of Least Square

This method is the most useful technique of estimation. It
gives the best, unbiased, linear estimate. The value of two
unknown constants is determined with the help of two normal
equations.
Regression equation of X on Y
on X
X = a + bY
Regression equation of Y
Y = a + bX
The values of constants `a' and `b' can be estimated from

the following normal equations:
Regression of X on Y
Regression of Y
on X
X
X = Na + b Y
XY = a Y + b Y2
Y = Na + b
XY = a X + b X2
MULTIVARIATES
When you study a single variable the case is called a
univariate case. When it is two variables the case is called a
bivariate case. When there are more than 2 variables the case
is called as a multivariate case.
Consider there are n variables. Then the parameter
which we studied earlier such as mean, variance, covariance,
correlation now becomes as
a mean vector
Variance-Covariance matrix and
Correlation matrix

Statistics For Spatial Analysis: Slides Are Based On Notes of Shri. S.K. Mittal

Diunggah oleh

Informasi Dokumen

Deskripsi Asli:

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Statistics For Spatial Analysis: Slides Are Based On Notes of Shri. S.K. Mittal

Diunggah oleh

Hak Cipta:

Format Tersedia

Statistics for Spatial Analysis

Slides are based on Notes of Shri. S.K. Mittal

From the definition of statistics, we

CHARACTERISTICS OF STATISTICAL DATA

As a singular noun. As a singular noun, statistics refers

We conclude the followings

Primary and Secondary Data

DISTINCTION BETWEEN PRIMARY AND SECONDARY DATA

It involves large expenses in It is relatively a less costly

Precautions No extra precautions need be It should be used with care.

Methods of collecting Primary data

Source of secondary data

Tabular form of data

Grouped Frequency Distribution Table

Note that in computations involving classified distribution, the

Cumulative Distribution Table

The Cumulative Distribution

Arithmetic mean is defined as the

Arithmetic mean is a good measure of central tendency when

Relationship between Mean, Median and Mode

Mode = 3 Median - 2 Mean

Measures of Central Tendency

It is situated in the centre of the

Its calculation is easy

It is based on all the observations

It is capable of further mathematical

It is affected by the choice of sample

It is affected by extreme values

It can be represented graphically

price and supply are positively correlated.

Methods of Measuring Correlation

dxdy is the sum of the products of deviations of

The values of r determine the degree of correlation

Degree of correlation between two variables

Low degree of correlation

If the sign before r is minus, it will be negative correlation,

Regression lines and the Coefficient of Correlation

ii) Regression equation of Y on X:

Method of Least Square

The values of constants `a' and `b' can be estimated from

Anda mungkin juga menyukai