Anda di halaman 1dari 29

Contents

1 Introduction 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Signicance and Summary of the Study . . . . . . . . . . . . . . . . . 1
2 Construction Performance 3
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 Test using Hotelling T
2
statistic . . . . . . . . . . . . . . . . . . . . . 3
2.3 Wilcoxons Signed Rank Test for Median Value of
Cement Grout of Specic Gravity. . . . . . . . . . . . . . . . . . . . . 6
2.4 Analysis Using ANOVA Technique . . . . . . . . . . . . . . . . . . . 7
3 Probability Models 11
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.2 Distribution of Idle days . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.3 Distribution of Age . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4 Discriminant And Prole Analysis Of The Data 15
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.2 Discriminant Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.3 Prole Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
5 Correlation And Regression Analysis 20
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
5.2 Multiple Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
5.3 Linear Relationship between Weight of mixture, Diameter of well,
Width of protection wall and Depth of well . . . . . . . . . . . . . . . 21
5.4 Canonical Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . 23
5.5 Relation Between Idle Hours and Cost . . . . . . . . . . . . . . . . . 24
6 Conclusion 26
References 28
iv
Chapter 1
Introduction
1.1 Introduction
Jalanidhi, a World Bank aided Rural Water Supply and Environmental Sanitation
Project was formulated in mid 1999. Project implementation plan was prepared
and appraised in mid 2000 and an agreement with the World Bank was signed on
4th January 2001. The Government have also created an autonomous institution,
viz, Kerala Rural Water Supply and Sanitation Agency (KRWSA) to implement this
project. The project expected to cover 3 lakh households, beneting a population of
more than 15 lakh from the selected Grama Panchyats. Communities in the project
areas are expected to benet from improved and sustainable water supply and en-
vironmental sanitation services. The stakeholders enjoyed time savings in collecting
water, better health from more and cleaner water, improved sanitation and hygiene
practises. Women are considered as the most benecial group. The project made
eorts to mainstream women users into the planning and decision making activities.
Grama panchayat involved in the project got beneted from panchayat strengthen-
ing programmes and mobilisation of internal resources from beneciaries. Government
of Kerala improved institutional capacity to facilitate water supply and sanitation ser-
vices in the state due to the project. The Project has been designed as a demand
responsive Project with a community driven development approach in its implemen-
tation. The project integrated water supply with sanitation, health promotion, and
environmental management and ground water re-charge measures.
There were 18 Water supply Projects implemented KRWSA in Mutholy Grama
Panchayat were cosidered to study
1.2 Signicance and Summary of the Study
In modern times statistics is viewed not as a mere device for collecting numerical
data but as a science of developing sound techniques for handling, analysis and draw-
ing useful and reasonable inference from them. It is now nding wide application
1
is almost all sciences - social as well as Psychology, Education, Economics, Busi-
ness Management etc. The industrial world continues to change at an amazing pace.
The major problems in industry as well as social sectors of society that need serious
work is statistical in nature. The driving force behind the development of modern
statistics has been the need to solve practical problems. Generally speaking, there
is a need to transform data into information and intelligence that cuts across much
of modern industrial life. In industry, statistics is very widely used in Quality control
Summary of the work
In the second chapter, we have checked whether the specications given by the
company are met or not, For this,we use Hotellings T
2
technique. Factorial design
is also used for estimating the eect of each of the function and the interaction
eect. We also test,water acidity contents before and after the construction of well
are signicantly dierent or not.
In the third chapter,we have tted distributions for ideal days, age of employees.
For checking the validity of the test we use Kolmogorov - Smirnov and Chi-square
test.
We present in the fourth chapter,discriminant function for Extra cost because of
change in price of row material,cost of sinking,,Ideal days.Mahalnobis D
2
statistic
for testing the equality of mean vectors of the two populations. Prole analysis are
also used for discriminating the above characteristics. Using F-statistic also tests the
parallelism and average level eects of the Proles.
The 5
th
chapter gives the relationship between some quantity characteristics of
the quality of well. We also use regression analysis technique for the analysis of the
data.Multiple correlation coecients are obtained and their signicance are tested.In
this chapter we also obtain the canonical correlation between two set of measurements.
Also we t a linear model for total cost and machine production and relation between
idle days and cost.
2
Chapter 2
Construction Performance
2.1 Introduction
The quality of construction of well depends on several factors like Diameter of well,width
of protection wall,depth of well,weight of mixture, countery blasting powder, Tor
steel,etc. The has insisted certain standard norms for these quality aspects of the
well, it is essential to test the specications given by the are met or not.
In this chapter we want to examine whether the specications given by the P.S.W.S
are met or not.we try to extract the eect of each of the above mentioned factors and
their interaction eects.
For this purpose the statistical techniques used are Hotellings T
2
,Wilcoxons
signed rank test, ANOVA based on general factorial design,Mann-whitney test,
2.2 Test using Hotelling T
2
statistic
In this section we examine statistically wether the standard norms are met with
respect to some quality characteristics. for this we use the Hotelling T
2
statistic. this
statistic was introduced by hotelling in 1931.Hotellings T
2
can be applied to test for
an assigned value of a mean vector when the dispersion matrix is unknown.
Hotelling T
2
statistic is the multivariate analog of the square of W.S Gossetts t
statistic. The T
2
test is as follows:
let H
0
: =
0
in N(, ), is unknown.
By denition
T
2
= N( x
0
)S
1
1
( x
0
)
Where x is the mean vector of a sample of size N and S
1
1
is the inverse of the sample
covariance matrix.
S
1
= (N 1)
1
A ,

X = N
1
N

=1
X

and
A=
N

=1
(X

x)(x

x)
3
Then F = T
2
(N 1)
1
(N p)p
1
is distributed as F
(p,Np)
The critical region is F F

WhereP[F F

] = , being the signicance level.


(i) Test for some quality specications of the well
using observations form 18 well construction projects. the following quality aspects
of the well are tested.
X
1
: Diameter
X
2
: Width
X
3
: Depth
X
4
: weight
The standard norms for the above quality aspects given by the can be given as

0
=
_

_
6
.2
10
10788
_

_
Thus we want to test the hypothesis H
0
: =
0
The sample mean obtained is

X =
_

_
x
1
x
2
x
3
x
4
_

_
=
_

_
5.1111
0.18278
8.6667
10066.56
_

_
A =
_

_
11.777778 0.004444 5.333333 312.111111
0.004444 0.004561 0.056667 34.632222
5.333333 0.056667 18 6039.333333
312.11111 34.632222 6039.333333 3774008.444
_

_
Then T
2
=87.9382344
F = T
2
(N 1)
1
(N P)P
1
We have N = 18 and p = 4
Therefore, F = 18.104930
The critical value of F with degaees of freedom (4,14) at 5% level of signicance is
F

= 3.11
Here F > F

Therfore we reject H
0
Conclusion.
The construction process does not obeys the standard values set by KRWSA.
4
(ii)Test for the consumption specications of some inputs
In construction PSWS used several inputs and desires to keep the values of these
inputs at a certain level. The inputs considered are the following.
X
1
: Rever sand(m
3
)
X
2
: Country blasting powder(kg)
X
3
: Tor steel(qtl)
we collected data form 18 construction projects.The vector of target values of the
inputes given by the psws is

0
=
_
_
17
6
23.5
_
_
Also

X =
_
_
x
1
x
2
x
3
_
_
=
_
_
5.1111
0.18278
8.6667
_
_
we have to test H
0
: =
0
V
S
H
1
: =
0
A =
_

_
11.777778 0.004444 5.333333 312.111111
0.004444 0.004561 0.056667 34.632222
5.333333 0.056667 18 6039.333333
312.11111 34.632222 6039.333333 3774008.444
_

_
Then T
2
= 246.7103
F = T
2
(N 1)
1
(N P)P
1
We have N = 18 and p = 3
Therefore, F = 72.56186
The critical value of F with degaees of freedom (3,15) at 5% level of signicance is
F

= 3.29
Here F > F

Therfore we reject H
0
Conclusion.
The aimed at consumption specication given by KRWSA is not achieved.
5
2.3 Wilcoxons Signed Rank Test for Median Value of
Cement Grout of Specic Gravity.
In this section we wish to check whether the sample values of cement grout of specic
gravity met with the standard value 1.75 set by the KRWSA. For this purpose we
can use wilcoxon signed rank test.
let M be the median of the population with absolutely continuous d.f F(x). we
have to test H
0
: M = M
0
Vs H
1
: M = M
1
based on a random sample of size n. To
carryout the test we compute the deviations X
I
M
0
for I = 1,2,3,4,.....n and assign
ranks to these deviations based on their absolute magnitudes.
Dene T
+
= Sum of ranks assigned to positive deviations
T

= Sum of ranks assigned to negative deviations.


T = Min (T
+
, T

)
T
+
and T

are called the one sided Wilcoxon signed rank statistic. T is called the
two sided wilcoxon signed rank statistic
If M
1
> M
0
reject H
0
if T
+
> c
1
If M
1
< M
0
reject H
0
if T
+
< c
2
If M
1
= M
0
reject H
0
if T < c
The values of c
1
, c
2
and c can be obtained from the Wilcoxon signed rank tables.
we have taken values of cement grout of specic gravity from 18 well Construction
project. Let this continuous variable be denoted by X. The hypothesis to tested is
H
0
: M = 1.75 Vs H
1
: M = 1.75
Here 1 deviation is zero. So the sample size N is reduced to 17.
N = 17
T
+
= 83
T

= 86
T = Min(83, 86)
= 83
From the table,for .05 and N = 17 values of T

= 28
ie,T > T

We accept H
0
at 5% level of signicance.
Conclusion.
The median value of cement grout of specic gravity is 1.75
6
2.4 Analysis Using ANOVA Technique
In this section we made an attempt to estimate the eects of some factirs which aect
the quality of well and their interactions. That is, the variation in the eect of one
factor with respect to dierent levels of other factors.
Here we considered three factors clay content in River sand Autoclave expanton
or contraton,,steel expanton or contraton and wish to estimate the eects of each of
the factors and its interactions. For this purpose the three factor analysis of variance
model is used. In the analysis, the
The corresponding analysis of variance model is given by
Y
ijkl
= +
i
+
j
+
k
+ ()
ij
+ ()
ik
+ ()
jk
+ ()
ijk
+
ijkl
i = 1, 2, ...., a
j = 1, 2, ...., b
k = 1, 2, ...., c
l = 1, 2, ...., n
Where is a parameter common to all treatment called the overall mean.

i
is the i
th
treatment eect of factor A.

j
is the j
th
treatment eect of factor B.

k
is the k
th
treatment eect of factor C.
()
ij
is the eect of interaction between
i
and
j
.
()
ik
is the eect of interaction between
i
and
k
.
()
jk
is the eect of interaction between
j
and
k
.
()
ijk
is the eect of interaction between
i
,
j
and
k
.

ijkl
are i.i.d random variables as N(0,
2
)
assuming that the factors A,B and C are xed, the analysis of variance table is shown
below: The F test on main interactions follow directly from the expected mean squares
We will give the computing formulas for sums of squares in the ANOVA table
given below:
The total sum of squares is found as SST =
a

i=1
b

j=1
c

k=1
n

l=1
Y
2
ijkl
(Y
2
...../abcn)
The sum of squares for the main eects are found from the totals for factors as follows:
SS
A
=
a

i=1
Y
2
i...
bcn

Y
2
...
abcn
SS
B
=
b

j=1
Y
2
.j..
can

Y
2
...
abcn
SS
C
=
c

k=1
Y
2
..k
abn

Y
2
...
abcn
To compute the two-factor interaction sum of squares, the total for the AxB,AxC,
and BxC cell are needed. It is helpful to decompose the original data table into three
two way tables in order to compute these quantities. The sum of squares are found
form
7
SS
AB
=
a

i=1
b

j=1
(cn)
1
Y
2
ij..
(abcn)
1
Y
2
....
SS
A
SS
B
SS
AC
=
a

i=1
c

k=1
(bn)
1
Y
2
i.k.
(abcn)
1
Y
2
....
SS
A
SS
C
SS
BC
=
b

j=1
c

k=1
(cn)
1
Y
2
.jk.
(abcn)
1
Y
2
....
SS
B
SS
C
The three-factor interaction sum of squares is computed from the three-way cell totals
{Y
ijk.
} as
SS
ABC
=
a

i=1
b

j=1
c

k=1
n
1
Y
2
ijk.
(abcn)
1
Y
2
....
SS
A
SS
B
SS
C
SS
AB
SS
AC
SS
BC
The error sum of squares may be found by subtracting the sum of squares for each
main eect and interaction form the total sum of squares by
SS
E
= SS
T

a

i=1
b

j=1
c

k=1
n
1
Y
2
ijk.
(abcn)
1
Y
2
....
The Analysis of Variance Table for the Three-Factor Fixed Eects Model
Table 2.1
Source of Sum of Degrees of Mean Expected mean F
0
Variation squares Freedom square squares
A SS
A
a-1 MS
A

2
+bcn
2
i
(a 1)
1
MS
A
/MS
E
B SS
B
b-1 MS
B

2
+can
2
j
(b 1)
1
MS
B
/MS
E
Error SS
E
abc(n 1) MS
E

2
C SS
C
c-1 MS
C

2
+abn
2
k
(c 1)
1
MS
C
/MS
E
AB SS
AB
(a-1)(b-1) MS
AB

2
+cn()
2
ij
(a 1)
1
(c 1)
1
MS
AB
/MS
E
AC SS
AC
(a-1)(c-1) MS
AC

2
+bn()
2
ik
(a 1)
1
(c 1)
1
MS
AC
/MS
E
BC SS
BC
(b-1)(c-1) MS
BC

2
+an()
2
jk
(a 1)
1
(c 1)
1
MS
BC
/MS
E
ABC SS
ABC
(a-1)(b-1)(c-1) MS
ABC

2
+n()
2
ijk
[(a 1)(b 1)(c 1)]
1
MS
ABC
/MS
E
Total SS
T
abcn-1
Table 2.2
Autoclave expanton or contraton(B)
Less than .8 Greater than .8 Total
steel(c) 2-4 4-6 2-4 4-6
clay(A)
4-8 23 7 7 15
34 12 10.5 23 79.5
11 5 3.5 8
8-12 10 37 31 8
13 56 46.5 12 127.5
3 19 15.5 4
BC 47 68 57 35
115 92
8
AB Totals AC Totals
B
A less than 0.8 greater than 0.8
4-8 46 33.5
8-12 69 58.5
C
A 2-4 4-6
4-8 44.5 35
8-12 59.5 68
SS
A
= 144
SS
B
= 33.0625
SS
c
= 0.0625
SS
AB
= 0.25
SS
AC
= 20.25
SS
BC
= 115.563
SS
ABC
= 160.911
SS
Total
= 1516.4375
SS
Error
= SS
Total
SS
SubTotal
= 1516.4375 - 474.098
= 1042.339
ANOVA TABLE
Table2.3
Source S.S D.f M.S.S F

A 144 1 144 5.32


B 33.063 1 33.063 5.32
C 0.0625 1 0.0625
AB 0.25 1 0.25
AC 20.25 1 20.25
BC 115.563 1 115.563
ABC 160.9109 1 160.9108
Error 1042.3391 8 130.2924
Total 1516.4375 15
Conclusion.
since the calculated F values for factor Auto clave expantion or contrution(A)
,clay content(B),steel expantion(C) are less than the tabled value, the eects of those
factors do not signicantly aect the quality of the well construction. Also,it can be
seen that there is no signicant interaction between these factors
Testing the equality of water acidity content before and after the
construction of well.
In this section we try to study wether there is any signicant dierence in the
water acidity content before and after the construction of well. for this purpose the
9
test we used is Mann-Whitney test
Let F
1
(x) and F
2
(x) be the distribution functions of two populations. then we
have to test H
0
: F
1
(x) = F
2
(x) Vs H
1
(x) = F
2(x)
. For that we take two indepen-
dent random samples of size n
1
and n
2
respectively. Let them be (x
1
, x
2
, ...x
n
1
) and
(y
1
, y
2
, ...y
n
2
). we arrange the combined observations in the ascending order pof mag-
nitude and assign ranks such that the sum of ranks is [(n
1
+ n
2
)(n
1
+ n
2
+ 1)/2]
Let R
1
- sum of ranks assigned to the observations in the rst sample.
R
2
- sum of ranks assigned to the observations in the second sample.
U
12
-the total number of times a x observation precedes a y observation.
U
21
-the total number of times a y observation precedes a x observation.
Then,mann-whitney U statistics is dened as
U = Min(u
12
, u
21
)
Then U statistics are related to wilcoxon statistics by the relation
U
12
= R
1
[n
1
(n
1
+ 1)/2]
U
21
= R
2
[n
2
(n
2
+ 1)/2]
The null hypothesisH
0
: M
1
= M
2
Vs H
1
: M
1
= M
2
is rejected if U < U

.
The table given below gives the ranks of observations during two shifts.
Table 2.4
x 74 70 73 79 74 68 70 79 72 74 70 72 71 68 76 71 70 73
Rank 15.5 4.5 12 21 15.5 1.5 4.5 21 9.5 15.5 4.5 9.5 7.5 1.5 18 7.5 4.5 12
Y 82 80 81 87 85 77 79 85 80 89 86 81 83 74 81 81 73 80
Rank 30 24 27.5 35 32.5 19 21 32.5 24 36 34 27.5 31 15.5 27.5 27.5 12 24
U
21
= R
2
[n
2
(n
2
+ 1)/2] Here R
2
= 480.5and n
2
= 18
U
21
= 309.5
U
12
= R
1
[n
1
(n
1
+ 1)/2] Here R
1
= 185.5and n
1
= 18
U
12
= 14.5
U = Min(U
12
, U
21
) = 14.5
for 5% critical values, U

= 99
since U > U we reject H
0
conclusion.
Thus we may conclude that the water acidity content before and after the con-
struction of well are signicantly dierent.
10
Chapter 3
Probability Models
3.1 Introduction
This chapter examines the distribution of dierent characteristics of KRWSA. Since
it is virtually never possible to collect all the data, we take a sample of data form
the population.using sample observation we tted some probability models for certain
characteristics like ideal days, age of employees
3.2 Distribution of Idle days
Our assumption is that Idle days,X(days) is exponentially distributed. To test the
validity of our assumption the idle days of 18 well construction projects are taken.
The data collected is presented in the following table.
Table 3.1
Class 0-2 2-4 4-6 6-8 8-10 10-12 12-14 14-16 16-18 18-20
Freq 5 2 3 0 3 0 1 1 2 1
The mean of the distribution is estimated as

= x = 8
Therefore, the assumed probability function is,
f(x) = (1/8)e
x/8
; x > 0
11
The Kolmogorov - Smirnov Goodness of Fit Test
It is a statistical technique used for testing goodness of t of a set of sample ob-
servations to an assumed theoretical distribution. The procedure is named after two
Russian Mathematician A.N.Kolmogrov and N.V Smirnov who were primarily respon-
sible for its development. Thus we have to test H
0
: X F
0
(x)V sH
1
: X F
0
(x).
The test statistic is
D = Sup|F
0
(x) S
N
(x)|
where F
0
(x) is the distribution function proposed by the null hypothesis and S
N
(x) is
the empirical cumulative distribution function. if D is large it is reasonable to reject
the null hypothesis
H
0
: X F
0
(x) = 1 e
.125x
, x > 0.
Table 3.2
X F
0
(X) S
N
(X) |F
0
(X) S
N
(X)|
2 0.221199 0.277777 0.056778
4 0.393469 0.388888 0.004581
6 0.527633 0.555555 0.027922
8 0.617503 0.555555 0.061946
10 0.713495 0.722222 0.008727
12 0.776869 0.722222 0.054648
14 0.826226 0.777777 0.048448
16 0.864665 0.833333 0.031331
18 0.894601 0.944444 0.049844
20 0.917915 1.0 .082085
D
n
= Max|F
0
S
X
| = 0.082085
for n = 18, D
n,
= .279
D
n
< D
n,
Hence we accept the hypothesis that the tted distribution is a good t
Conclusion
The distribution of idle hours is exponential with mean 8. Thus if X(days)denotes
the idle days for a chosen projects, then the probability density function of X is
f(x) = (1/8)e
x/8
; x > 0
12
3.3 Distribution of Age
our assumption is that age of employees working in the well construction project is
normally distributed. for checking the validity of out assumption age of 786 employ-
ees are taken. the data collected is presented in the following table.
Table 3.3
Age 15-20 20-25 25-30 30-35 35-40 40-45 45-50
Freq 5 48 152 240 198 70 11
The mean() and standard deviation() estimated are as follows
= 28.9917
= 5.974102
Therefore the assumed probability function is
f(x)=(5.974102

2)
1
exp{/2{(x 28.9917)/5.974102}; < x <
Chi square goodness of t test
For this the statistic is,

2
=

i
(O
i
Ei)
2
E
i
where O
i
is the observed frequency and E
i
is the expected frequency of the i
th
class.
if calculated
2
is greater than
2

, the tabled value, We reject the hypothesis.


To calculate the expected frequency we rst nd the standard normal variate cor-
responding to the lower limits of each class interval.
Table: 3.4
Age Observed Lower Z =
(x)

(Z) N(Z) Expected


Frequency limit(x) Frequency
- - 0 0 0
15-20 5 15 -2.34206 0.0096 6.9504 7
20-25 48 20 -1.50511 0.0655 40.4716 40
25-30 152 25 -0.66817 0.2514 134.5916 135
30-35 240 30 0.168779 0.5675 228.8564 229
35-40 198 35 1.005724 0.8438 202.72 203
40-45 70 40 1.84267 0.9671 89.2692 89
45-50 11 45 2.679616 0.9963 21.408 21
13
since the expected frequency corresponding to the class.New there are 7 classes
Hence d.f= 7-2-1=4

2
=

(O
i
E
i
)
2
E
i
= 13.78179

2
4
(0.05) = 15.086

2
<
2
4
(0.05)
So we accept the hypothesis that the distribution of the age of employees working in
the well construction project is normal with mean 28.9917 and variance 5.974102.
conclusion.
The distribution of age of employees working in the well construction project is
normal with mean 28.9917 and variance 5.974102.The probability density function is
f(x) = (1/

25.974102)exp(1/2)(x 28.9917)
2
/5.974102 ; < x <
14
Chapter 4
Discriminant And Prole Analysis Of The Data
4.1 Introduction
In this chapter we try to nd out a linear discriminant function based on observations
on four characteristics from two populations. The populations are such that the rst
population
1
is having cost of the well construction projects less than Rs.916000/-
and the second population
2
is having cost of the well construction projects greater
than Rs.916000/-. The three characteristics considered are Extra cost because of
change in price of row material(X
1
),cost of sinking(X
2
),,Ideal days(X
3
). Based on
these observations we want to nd a function which discriminates an observation into
one of the two populations. Also we try to construct a prole on these measurements
and test the hypothesis of parallelism and average level eect.
4.2 Discriminant Analysis
The problem of classication arises when an investigator makes a number of obser-
vations on an individual and wishes to classify the individual into one of the several
populations on the basis of these measurements. In many cases it can be assumed
that there are only a nite number of categories or populations from which an in-
dividual may have originated and each population is characterized by a probability
distribution of measurements. Then the observation on the individual is considered
as a random observation from the population.
In discriminant analysis we have only two categories. Prof. R.A.Fisher gave the
rst clear statement and solution of discriminant analysis problem. He introduced the
discriminant function for distinguishing between two multivariate normal populations
with a common dispersion matrix.
15
Here the two populations are taken such a way that the rst population
1
hav-
ing cost less than Rs.916000/-and the second population
2
having cost greater than
Rs.916000/-. Let X = (X
1
, X
2
, ...X
p
)

be a vector of observations on an individ-


ual.Here the vector is X = (X
1
, X
2
, X
3
)where X
1
- Extra cost because of change in
price of row material, X
2
- cost of sinking,X
3
-Ideal days. Suppose that there are
only two populations
1
and
2
with probability laws be N
p
(
(1)
, ) and N
p
(
(2)
, )
where
1
and
2
are the mean vectors of
1
and
2
respectively and is the common
unknown dispersion matrix. Even though we have tted exponential distribution for
idle days, but for suciently large values of n, the distribution of idle days asymp-
totically tends to normal distribution. Obviously
1
and
2
are three variate normal
distributions with mean vectors
1
and
2
respectively. Now the problem can be
summarized as testing.
H
0
:The given individual with variate value X belongs to
1
Vs
H
1
:The given individual with variate value X belongs to
2
For discriminating an observation between these two populations we consider a
liner function of the form L = bX, called the discriminant function. In choosing this
function the constants b
1
, b
2
, ...b
p
are determined in such a way that the chance of
committing wrong decision is minimum. That is,bX is determined such that L is as
ecient as possible in discriminating between the two populations.
Based on Neyman-Pearson powerful critical region test construction, Wald in 1944
developed a method for constructing the linear discriminant function. Using this we
classify an observation X into
1
if b

X > c and X into


2
if b

X c
where c = 1/2b

[
(
1) +
(2)]
and
b =
1
d =
1
[
(1)

(2)
]
Since
(1)
,
(2)
and are unknown we use their estimates namely, x
(1)
, x
(2)
and S
b = S
1
d = S
1
( x
(1)
x
(2))
The sample dispersion matrix S is given by
S =
_

_
23759.38 5429.688 267.6563
5429.688 17699.22 123.8281
267.6563 123.8281 15.77344
_

_
x
(1)
=
_
_
1342.5
2756.25
1.375
_
_
x
(2)
=
_
_
2180
2980
11.5
_
_
d = ( x
(1)
x
(2)
) =
_
_
837.5
223.75
10.125
_
_
16
b = S
1
d =
_

_
0.03433
0.00179
0.0453
_

_
C = 1/2b

[ x
(1)
+ x
(2)
]
= -65.8978
The discriminant function is given by
L = b

X = 0.03433X
1
0.00179X
2
0.0453X
3
Conclusion.
An observation x belongs to the rst population
1
if L = b

x > c
ie,0.03433X
1
0.00179X
2
0.0453X
3
65.8978
where,X
1
: Extra cost of row material
X
2
: Cost of Sinking
X
3
: Ideal days
Test for equality of mean vectors
Now we can test whether the mean vectors of the two populations are signicantly
dierent or not. Here we want to test
H
0
:The mean vectors of the two populations
1
and
2
are equal.
Vs
H
1
:The mean vectors of the two populations
1
and
2
are not equal.
The test statistic used here is
F = N
1
N
2
(N
1
+N
2
)
1
(N
1
+N
2
p1)[p(N
1
+N
2
2)]
1
D
2
F(p, N
1
+N
2
p1)
where,D
2
= bd =29.61055
N
1
= 8,N
2
= 10, p = 3.
F = 38.384046
At 5% level of signicance the tabled value of F is F
(3,14)
(0.05) = 3.34.
The critical region is given by F > F

.
Here F > F

and hence we reject the null hypothesis H


0
:
(1)
=
(2)
Conclusion.
Hence we conclude that the mean vectors are signicantly dierent.
17
4.3 Prole Analysis
Prole analysis is a graphical tool for discriminating between two or more populations.
Suppose that a test is conducted based on a sample of n observations form a given
population with mean vector
(1)
= (
(1)
1
,
(1)
2
, ...
(1)
p
) Then the graph obtained by
joining the points (1,
(1)
1
), (2,
(1)
2
), ....(P,
(1)
p
) successively is called the prole of the
population. If the population. If the population mean vector is unknown then it is
estimated by the sample mean vector
Now we consider two populations namely, N
p
(
(1)
, ) and N
p
(
(2)
, ) having the
same dispersion matrix . Then plotting the proles of two populations the hy-
pothesis regarding the mean vectors can be tested easily. Here we consider three
characteristics namely, Extra cost because of change in price of row material(X
1
),
cost of sinking(X
2
) and Ideal days(X
3
).
Test concerning parallelism of proles
we have to test whether the proles of the two populations are parallel or similar.
H
0
:
(1)
k+i

(1)
K
=
(2)
k+i

(2)
K
; k=1,2,...p-1.
H
0
: C[
(1)

(2)
] = 0
where C =
_

_
1 1 0 ... 0
0 1 1 ... 0
. . . ... .
. . . ... .
. . . ... .
0 0 0 1 1
_

_
(p1)p
The test statistic used here is
F = T
2
[N
1
+ N
2
2]
1
(N
1
+ N
2
p)[p 1]
1
F(p 1, N
1
+ N
2
p)
where T
2
= (N
1
+ N
2
)(N
1
+ N
2
2)(

X

Y )

A
1
(

X

Y )
where,

X = C

X
(1)

Y = C

X
(2)
A =
_
380150 86875
86875 2831875
_
18
T
2
= 285.9293
and F = 134.0293
the C.R is given by F > F

.
Form tables of F - distribution,for = .05, F
2,15
(.05) = 3.68
Since F > F

the decision is to reject H


0
.
Conclusion.
The proles are not parallel.
Test concerning average level of proles
We have to test whether the proles are at the same average level.
H
0
: [
(1)
1
+
(1)
2
+ ... +
(1)
P
]p
1
= [
(2)
1
+
(2)
2
+ ... +
(2)
P
]p
1
That is H
0
: L

(1)
= L

(2)
, where L = (1,1,...,1) Where L

(1)
and L

(2)
are
the means of two unvariate normal populations, obtained by the transformation
X = L

X
(1)
and Y = L

X
(2)
and the variance of the population is 1

1.
The test statistic used here is
T
2
= N
1
N
2
[N
1
+ N
2
]
1
(N
1
+ N
2
2)(

X

Y )
2
L

AL F(1, N
1
+ N
2
2)
Now,

X = 4100.125,

Y = 5171.5
L

AL =849867.375
T
2
=6.93 10
+13
The critical region is given by F > F

.
From tables of F distribution for = .05, F
(1,16)
(.05) =4.49
Since F > F the test leads to rejection of H
0
.
Conclusion.
The population proles are not at the same average level.
19
Chapter 5
Correlation And Regression Analysis
5.1 Introduction
This chapter is concerned with the study of the relationship between some character-
istics related to the quality of well. The characteristic we considered include Diameter
of well,width of protection wall,depth of well,weight of mixture. In our study we nd
the multiple correlation between weight of mixture (Y) and other three characteristics
such as Diameter of well (X
1
),width of protection wall (X
2
),depth of well (X
3
) We
also established a liner relationship between Y and x
1
, X
2
and X
3
For that we use
the concept of regression analysis.
correlation is a measure of degree of relationship between two or more variables.
One of the most widely used statistical technique used by an applied statistician is
correlation analysis.
5.2 Multiple Correlation
The concept of multiple correlation is very important in statistical analysis.Whenever
we are interested in studying the joint eect of a group of variables upon a variate
not included in that group, we adopt the technique of multiple correlation.
Here we calculate the multiple correlation coecient of Y on X
1
, X
2
andX
3
and
it is given by
R
Y
(X
1
X
2
X
3
) =
_
1 [|C|/|C
44
|]
Correlation matrix is given by,
20
C =
_

_
1 0.0467835 0.26393 0.7327363
0.0467835 1 0.0191825 0.2976026
0.26393 0.0191825 1 0.197750522
0.7327363 0.2976026 0.197750522 1
_

_
|C| = 0.350339
|C
44
| = .927311
R = R
Y
(X
1
X
2
X
3
) = 0.788796
Test for Signicance of Multiple correlation
Here we have to test
H
0
: R = 0 Vs H
1
: R = 0
The test statistic is
F = R
2
(N k 1)[(1 R
2
)k]
1
F(k, N k 1).
The C.R is given by F > F

F = 7.685517
Form tables of F distribution F
(3,14)
(0.05) = 2.96
Here F > F
0.05
So we reject the null hypothesis.
conclusion.
There exists a strong relation between weight of mixture of the well and other
three characteristics nanely Diameter of well,width of protection wall and depth of
well.
5.3 Linear Relationship between Weight of mixture, Diame-
ter of well, Width of protection wall and Depth of well
since there is a strog relation between Weight of mixture (X
4
) and other three Char-
acteristics, namely Diameter of well(X
1
),Width of protection wall(X
2
) and Depth of
well(X
3
), it is meaningful to derive a linear relationship between them. to nd the
linear relationship we use the concept of regression analysis.
in statistics Regression technigue is applicable in all those elds where two or
more relative variables have the tendency to go back to the mean. Regression analysis
refers to the methods by whichestimates are made of the values of a variable from
the knowledge of one or more other variables and to the measurement of the errors
involved in this estimation process.
Here we have to obtain the multiple linear regression equation of X
4
on X
1
, X
2
and X
3
. the regression equation is given by
21
X
4
= b
41.23
X
1
+ b
42.13
X
2
+ b
43.12
X
3
where
b
41.23
= (
4
/
1
)[|C
41
|/|C
44
|] x
1
= X
1


X
1
b
42.13
= (
4
/
2
)[|C
42
|/|C
44
|] x
2
= X
2


X
2
b
43.12
= (
4
/
3
)[|C
43
|/|C
44
|] x
3
= X
3


X
3
x
4
= X
4


X
4

1
= 0.8089011 |C
44
| = 0.92731058

2
= 0.01591839 |C
41
| = 0.664667

3
= 1 |C
42
| = 0.24511

4
= 457.8942 |C
43
| = 0.012652
b
41.23
= 405.7407434
b
42.13
= 7603.511752
b
43.12
= 6.247397
The linear regression equation of X
4
on X
1
, X
2
andX
3
is
405.7407434X
1
7603.511752X
2
6.247397X
3
Test for the signicance of Regression coecients
We have to test
H
0
: b
41.23
= 0 against H
1
: b
41.23
= 0
H
0
: b
42.23
= 0 against H
1
: b
42.13
= 0
H
0
: b
43.23
= 0 against H
1
: b
43.12
= 0
The test statistics used is
T
i
= {b
4i.230
/S
i
} t
(Nk1)
where S
2
i
=
2
4.123
/[
2
i
(N k 1)]
Here,
4.123
=
4
_
|C|/|C
44
| = 287.2107
S
2
1
= 9004.971512, t
1
= 4.275681
S
2
2
= 23252800, t
2
= 1.576801
S
2
3
= 5892.146433, t
3
= .081388376
Form table of T with 14 d.f, t
/2
= 2.145
Hypothesis Calculated Tablevalue Decision
|t|value t
/2
H
0
: b
41.23
= 0 4.2756809 2.145 RejectH
0
H
0
: b
42.13
= 0 1.576801 2.145 AcceptH
0
H
0
: b
43.12
= 0 .081388 2.145 AcceptH
0
Conclusion.
The regression equation of Weight of mixture,Diameter of well,Width of protection
wall and Depth of well is given by
X
4
= 4.2756809X
1
+ 0X
2
+ 0X
3
22
5.4 Canonical Correlation
multiple corrlation was used to measure the association between one variable and a
set of other variables. The multiple correlation is the maximum correlation between
one variable and a linear function of the other variables. This concept is generalised
by Hotellings(1935) to study the association between two sets of variables and this
concept is known as canonical correlation.
Let X =
_
X1
....
X
p
_
be a p-variate vector patitioned into two vectors X
(
1) = (X
1
, ....., X
r
)
and X
(
2) = (X
r+1
, .....X
p
)
The canonical correlation between X
(
1) and X
(
2) can be used to measure the
correlation between the two sets of variables X
(
1) and X
(
2).
Let be the dispersion matrix corresponding to the above p-variate vector. now
we partition as follows.
=
_

11

12

21

22
_
Then the canonical correlation

between X
(
1) and X
(
2) is obtained by the equa-
tion
|
1
22

21

1
11

12

2
I| = 0
Let
2
1

2
2
... be the roots of the above equation. Then
1
is known as the
linear combinations (L
1
1
X
(
1), M
1
1
X
(
2)), L
1
2
X
(
1), M
1
2
X
(
2))... are called the canonical
variables.
the equation used is

21

1
11

12
M =
2

22
M

1
22

21

1
11

12
M =
2
M, and

21
L
22
M = 0
if is unknown, is replaced by the MLE (1/N)A
Here one set is the Diameter of well(X
1
) and Width of protection wall(X
2
) and
the other set is denoted by Depth of well(X
3
) Weight of mixture (X
4
). The rst set
is denoted by X
(
1) and the second set is denoted byX
(
2).
The dispersion matrix is given by
=
_

_
0.654321 0.000247 0.296296 17.339506
0.000247 0.0002531 0.003148 1.924012
0.296296 0.003148 1 335.518519
17.339506 1.924012 335.518519 209667.36
_

_
The rst canonical correlation

1
= 0.242546
The second canonical correlation

2
= 0.0654509
The rst canonical variables are
23
M
1
=
_
_
1.230652
7.292942
_
_
L
1
=
_
_
8.81096710
2
22.93690 10
3
_
_
The second canonical variables are
M
2
=
_
_
0.882743
0.000333
_
_
L
2
=
_
_
7.82127
.011401
_
_
Corr(L
1
X
(
1), M
1
X
(
2)) =
1
=0.242546
Corr(L
2
X
(
2), M
2
X
(
2)) =
2
=0.0654509
5.5 Relation Between Idle Hours and Cost
Form the data on Idle hours(in Min) and costes(in 1000s) we can observe that a
linear relationship between these two.
The model is
Y =A+BX
The normal equations are

Y = nA + B

XY = A

X + B

X
2
The estimated values are
A = 82.30704
B =1.13685
The tted model is
Y = 82.30704 + 1.13685X
To test H
0
: B = 0 Vs H
1
: B = 0
The appropriate test statistic is
T =
_

B B/
_
[a
ii
e

e/(n k)]
_
t
nk
under H
0
24
Where a
ii
is the (i, i)
th
element of (X

X)
1
The C.R is given by |t| > t
/2
(X

X) =
_
n

x
2
_
=
_
18 126
126 1590
_
(X

X)
1
=
_
0.124765 0.00989
0.00989 0.001412
_
a
22
= 0.001412
n = 18
k = 2
e

e = 26260.326
t = 2.361544
|t| = 2.361544
Form tables of t-distribution for = .05 and d.f =16
t
/2
= 2.12
since |t| > t
/2
, the hypothesis that B = 0 is not admissible.
The model is
Y =82.30704 + 1.13685 X
25
Chapter 6
Conclusion
The important ndings of the study are presented in this chapter.
With reference to section 2.2 one can verify that the production specications
and consumption specications set by the median value of Cement Grout of Specic
Gravity met with standard values set by the KRWSA. Form the next section with
the help of ANOVA technique it is found that there is no signicant interaction eect
. Form this section we noticed that the water acidity content in the two shifts water
acidity content before and after the construction of well are signicantly dierent.
Form section 3.2 one can see that the exponential distribution is a good t for
idle hours of production. Then we studied the age of employees working in the
well construction site in section 3.3 and found that it is normally distributed. The
goodness of t of the various distributions to the respective variables were tested by
using Kolmogrov-Smirnov test or Chi-Square test.
In section 4.2 we found the discriminant function for discriminating observations
from populations having costes < 916000 and > 91600. using the Extra cost be-
cause of change in price of row material(X
1
),cost of sinking(X
2
),,Ideal days(X
3
). The
discriminant function so obtained is L = 0.03433X
1
0.00179X
2
0.0453X
3
The discriminant function is then used to construct the Mahalanobis D
2
Statistic
for testing the equality of mean vectors of the two populations. This test admits the
Hypothesis That the mean vectors are signicantly dierent. In section 4.3 the prole
analysis technique is illustrated. Then, parallelism and average eect of the proles
are tested by using F statistic and conclude that the proles are neither parallel nor
the average eects are same.
From section 5.2 one gets an idea about the multiple correlation between the
characteristics such as Weight of mixture,Diameter of well,Width of protection wall
and Depth of well. We conclude that there is a strong correlation between Weight of
mixture and other three characteristics. In the next section, the linear relationship
between Weight of mixture Diameter of well,Width of protection wall and Depth of
well is established with the help of regression analysis.
in section 5.4 we obtained the canonical correlation between the set of measure-
ments on the Diameter of well(X
1
),Width of protection wall(X
2
) and other set be
26
Depth of well(X
3
)Weight of mixture (X
4
)
we got

1
= 0.242546

2
= 0.0654509
as the rst and second canonical correlation respectively and the corresponding canon-
ical variables be
(L
1
1
X
(1)
, M
1
1
X
(12))
and (L
1
2
X
(1)
, M
1
2
X
(2)
)
In section 5.5 we determined the following relationship between Idle hours and
costes.
Y =82.30704 + 1.13685 X
27
References
Anderson,T.W.(1984),An Introduction to Multivariate Statistical Analysis.2
nd
Edition;
Wiley Eastern Pvt. Ltd.
Balakrishnan, N. and Cohen, A.C.(1991),Order Statistics and Inference:Estimation
Methods;
Academic Press,London.
Douglas c.Montgomery.(1976),Design and Analysis of Experiment;
John Wiley and Sons.
Gibbons,J.D.(1971),Non-Parametric Statistical Inference;
Mc.Graw.Hillkogagakusha Ltd,New York.
Gibbons,J.D.(1976),Non-Parametric Methods for Quantitative Analysis,Holt;
Rinehart and winston,New York.
Rao,C.R.(1974),Linear Statistical Inference and its Applications;
Wiley Eastern,New Delhi.
Rohatgi,V.K.(1976),An Introduction to Probability Therory and Mathematical Statis-
tics;
Wiley Eastern,New Delhi.
Seber,G.A.F.(1984),Multivariate Observations;
John Wiley and Sons,New York.
28