Consider the ideal or perfect gas law that connects the pressure (P ), volume (V ) and temperature (T ) of a gas, and
is given by
P V = RT
where R is gas constant and is fixed for a given gas. So given any two of the three variables P , V and T of the gas,
we can determine the third one. Such an approach is deterministic. However, gases may not follow the ideal gas
law at extreme temperatures. Similarly, there are gases which do not follow the perfect gas law even at moderate
temperatures. Here the point is that a given physical phenomenon need not to follow some deterministic formula or
model. In such cases where the deterministic approach fails, we collect data related to the physical process and do
the analysis using the statistical methods. These methods enable us to understand the nature of physical process
to a certain degree of certainty.
Classification of Statistics
2.1
Descriptive Statistics
It belongs to the data analysis in the cases where the data set size is manageable and can be analysed analytically
or graphically. It is the statistics that you studied in your school classes. Recall histograms, pie charts etc.
2.2
Inferential Statistics
It is applied where the entire data set (population) can not be analysed at one go or as a whole. So we draw a
sample (a small or manageable portion) from the population. Then we analyse the sample for the characteristic of
interest and try to infer the same about the population. For example, when you cook rice, you take out few grains
and crush them to see whether the rice is properly cooked. Similarly, survey polls prior to voting in elections, TRP
ratings of TV channel shows etc are samples based and therefore belong to the inferential statistics.
2.3
Model Building
Constructing a physical law or model or formula based on observational data pertains to model building. Such an
approach is called empirical. For example, Keplers laws about planetary motion belong to this approach.
Statistics is fundamentally based on the theory of probability. So first we discuss the theory of probability, and
then we shall move to the statistical methods.
Theory of Probability
Depending on the nature of event, the probability is usually assigned/calculated in the following three ways.
3.1
Personal Approach
An oil spill has occurred from a ship carrying oil near a sea beach. A scientist is asked to find the probability that
this oil spill can be contained before it causes widespread damage to the beach. The scientist assigns probability
to this event considering the factors such as the amount of oil spilled, wind direction, distance of sea beach from
the spot of oil spill etc. Naturally such a probability may not be accurate since it depends on the expertise of the
scientist as well as the information available to the scientist. Similarly, the percentage of crops destroyed due to
heavy rains in a particular area as estimated by an agriculture scientist belongs to the personal approach.
3.2
An electrical engineer employed at a power house observes that 80 days out of 100 days, the peak demand of power
supply occurs between 6 PM to 7 PM. One can immediately conclude that on any other day there are 80% chances
of peak demand of power supply between 6 PM to 7 PM.
The probability assigned to an event (such as in the above example) after repeated experimentation and observation belongs to relative frequency approach.
3.3
Classical Approach
Combination of Events
If A and B are any two events in a sample space S, then the event A B implies either A or B or both; A B
implies both A and B; A B implies A but not B; A implies not A, that is, A = S A.
eg. Let S be sample space in a roll of a fair die. Then S = {1, 2, 3, 4, 5, 6}. Let A be the event of getting an
even number and B be the event of getting a number greater than 3. Then A = {2, 4, 6} and B = {4, 5, 6}. So
A B = {2, 4, 5, 6}, A B = {4}, A B = {2} and A = {1, 3, 5}.
Classical Formula of Probability
Let S be sample space of a random experiment, where all the possible outcomes are equally likely. If A is any event
in S, then probability of A denoted by P [A] is defined as
P [A] =
Numer of elements in A
n(A)
=
.
Number of elements in S
n(S)
eg. If S is sample space for toss of two fair coins, then S = {HH, HT, T H, T T }. The coins being fair, here all
the four outcomes are equally likely. Let A be the event of getting two heads. Then A = {HH}, and therefore
P [A] = 1/4.
The classical approach is applicable in the cases (such as the above example) where it is reasonable to assume
that all possible outcomes are equally likely. The probability assigned to an event through classical approach is the
accurate probability.
Axioms of Probability
The classical formula of probability as discussed above suggests the following:
(i) For any event A, 0 P [A] 1.
(ii) P [] = 0 and P [S] = 1.
(iii) If A and B are mutually exclusive events, then P [A B] = P [A] + P [B].
These are known as axioms of the theory of probability.
Deductions from Classical Formula
One may easily deduce the following from the classical formula:
(i) If A and B are any two events, then P [A B] = P [A] + P [B] P [A B]. This is called law of addition of
probabilities.
= 1 P [A]. It follows from the fact that A and A are mutually exclusive and A A = S with P [S] = 1.
(ii) P [A]
(iii) If A is subset of B, then P [A] P [B].
Ex. From a pack of well shued cards, one card is drawn. Find the probability that the card is either a king or an
ace. [Ans. 4/52 + 4/52 = 2/13]
Ex. Two dice are tossed once. Find the probability of getting an even number on the first die or a total of 8. [Ans.
18/36 + 5/36 3/36 = 5/9]
Conditional Probability and Independent Events
Suppose a bag contains 10 Blue, 15 Yellow and 20 Green balls where all balls are identical except for the color. Let
A be the event of drawing a Blue ball from the bag. Then P [A] = 10/45 = 2/9. Now suppose we are told after
the ball has been drawn that the ball drawn is not Green. Because of this extra information/condition, we need to
change the value of P [A]. Now, since the ball is not Green, the total number of balls can be considered as 25 only.
Hence, P [A] = 10/25 = 2/5.
The extra information given in the above example can be considered as another event. Thus, if after the
experiment has been conducted we are told that a particular event has occurred, then we need to revise the value
of the probability of the previous event(s) accordingly. In other words, we find the probability of an event A under
the condition that an event B has occurred. We call this changed probability of A as the conditional probability
of A when B has occurred. We denote this conditional probability by P [A/B]. Mathematically, it is given by
P [A/B] =
n(A B)
n(A B)/n(S)
P [A B]
n(A B)
=
=
=
.
n(S B)
n(B)
n(B)/n(S)
P [B]
n
!
P [Bi ]P [A/Bi ].
i=1
Proof: Since B1 , B2 , .... , Bn are exhaustive and mutually exclusive events in the sample space S, so S =
B1 B2 ... Bn . It follows that
A = A S = (A B1 ) (A B2 ) ... (A Bn ).
Now B1 , B2 , .... , Bn are mutually exclusive events. Therefore, A B1 , A B2 , .... , A Bn are mutually
exclusive events. So we have
P [A] = P [A B1 ] + P [A B2 ] + ... + P [A Bn ] =
n
!
i=1
P [A Bi ] =
n
!
P [Bi ]P [A/Bi ].
i=1
Bayes Theorem
Let B1 , B2 , .... , Bn be exhaustive and mutually exclusive events in a sample space S of a random experiment,
each with nonzero probability. Let A be any event in S with P [A] = 0, then
P [Bi /A] =
P [Bi ]P [A/Bi ]
.
n
!
P [Bi ]P [A/Bi ]
i=1
P [Bi ]P [A/Bi ]
P [A Bi ]
=
.
P [A]
P [A]
If a variable X takes real values x corresponding to each outcome of a random experiment, it is called a random
variable. The random variable is said to be discrete if it assumes finite or countably infinite real values. The
behavior of random variable is studied in terms of its probabilities. Suppose a random variable X takes real values
x with probabilities P [X = x]. Then
a function f defined by f (x) = P [X = x] is called density
!
! function of
X provided f (x) 0 for all x and
f (x) = 1. Further, a function F defined by F (x) =
f (x) is called
X=x
Xx
f (x) = P [X = x]
1
4
1
2
1
4
F (x) = P [X x]
1
4
3
4
Ex. A fair coin is tossed again and again till head appears. If X denotes the number of tosses in this experiment,
then X = 1, 2, 3, ........ since head can appear in the first toss, second toss, third toss and so on. So here the discrete
random variable X assumes countably infinite values. The function f given by
X=x
f (x) = P [X = x]
1
2
2
" 1 #2
2
3
" 1 #3
2
...
...
" #x
or f (x) = 12 , x = 1, 2, 3, ........, is the density function of X since f (x) 0 for all x and
$ %x
1
!
!
1
= 2 1 = 1 ( The sum of infinite G.P. a + ar + ar2 + .... = a/(1 r) ).
f (x) =
2
1 2
x=1
X=x
1
2
1
2
Xx
5
5.1
Let X be a random variable with density function f . Then, the expectation of X denoted by E(X) is defined as
!
E(X) =
xf (x).
X=x
Ex. Let X denotes the number of heads in a toss of two fair coins. Then X assumes the values 0, 1 and 2 with
probabilities 14 , 12 and 41 respectively. So E(X) = 0 41 + 1 12 + 2 41 = 1.
Note: (i) The variance E(X) of the random variable X is its theoretical average. In a statistical setting, the
average value, mean value1 and expected value are synonyms. The mean value is demoted by . So E(X) = .
(ii) If X is a random variable and c is a constant, then it is easy to verify that E(c) = c and E(cX) = cE(X).
Also, E(X + Y ) = E(X) + E(Y ), where Y is another random variable.
(iii) The expected or the mean value of the random variable X is a measure of the location of the center of values
of X.
5.2
Variance
Let X and Y be two random variables assuming the values X = 1, 9 and Y = 4, 6. We observe that both the
variables have the same mean values given by X = Y = 5. However, we see that the values of X are far away
from the mean or the central value 5 in comparasion to the values of Y . Thus, the mean value of a random variable
does not account for its variability. In this regard, we define a new parameter known as variance. It is defined as
follows.
If X is a random variable with mean , then its variance, denoted by Var(X) is defined as the expectation of
(X )2 . So, we have
Var(X) = E[(X )2 ] = E(X 2 ) + 2 2E(X) = E(X 2 ) + E(X)2 2E(X)E(X) = E(X 2 ) E(X)2 .
Ex. Let X denotes the number of heads in a toss of two fair coins. Then X assumes the values 0, 1 and 2 with
probabilities 14 , 12 and 41 respectively. So
E(X) = 0 14 + 1 21 + 2 14 = 1,
E(X 2 ) = (0)2 41 + (1)2 21 + (2)2 41 = 23 .
Var(X)= 32 1 = 21 .
Note: (i) The variance Var(X) of the random variable X is also denoted by 2 . So Var(X) = 2 .
(ii) If X is a random variable and c is a constant, then it is easy to verify that Var(c) = 0 and Var(cX) = c2 Var(X).
Also, Var(X + Y ) = Var(X) + Var(Y ), where X and Y are independent2 random variables.
5.3
Standard Deviation
The variance of a random variable, by definition, is sum of the squares of the dierences of the values of the random
variable from the mean value. So variance carries squared units of the original data, and hence is a pure number
often without any physical meaning. To overcome this problem, a second measure of variability is employed known
as standard deviation and is defined as follows.
Let X be a random variable with variance 2 . Then the standard deviation of X denoted by is the the
nonnegative square root of X, that is,
(
= Var(X).
Note: A large standard deviation implies that the random variable X is rather inconsistent and somewhat hard
to predict. On the other hand, a small standard deviation is an indication of consistency and stability.
1 From
your high school mathematics, you know that if we have n distinct values x1 , x2 , ...., xn with frequencies f1 , f2 , ...., fn
n
!
respectively and
fi = N , then the mean value is
i=1
n
!
i=1
#
n
n "
!
!
fi
fi x i
=
xi =
f (xi )xi .
N
N
i=1
i=1
where f (xi ) = fNi is the probability of occurrence of xi in the given data set. Obviously, the final expression for is the expectation of
a random variable X assuming the values xi with probabilities f (xi ).
2 Independent random variables will be discussed later on.
5.4
Moments
Let X be a random variable and k be any positive integer. Then E(X k ) defines the kth ordinary moment of X.
Obviously, E(X) = is the first ordinary moment, E(X 2 ) is the second ordinary moment and so on. Further,
the ordinary moments can be obtained from the function E(etX ). For, the ordinary moments E(X k ) are coecients
k
of tk! in the expansion
E(etX ) = 1 + tE(X) +
t2
E(X 2 ) + ............
2!
'
dk &
E(etX ) t=0 .
k
dx
Thus, the function E(etX ) generates all the ordinary moments. That is why, it is known as the moment generating
function and is denoted by mX (t). Thus, mX (t) = E(etX ).
Geometric Distribution
Suppose a random experiment consists of a series of independent trials to obtain success, where each trial results
into two outcomes namely success (s) and failure (f ) which have constant probabilities p and 1 p respectively in
each trial. Then the sample space of the random experiment is S = {s, f s, f f s, ..........}. If X denotes the number
of trials in the experiment, then X is a discrete random variable with countably infinite values given by X =
1, 2, 3, .......... Trials being independent, we have P [X = 1] = P [s] = p, P [X = 2] = P [f s] = P [f ]P [s] = (1 p)p,
P [X = 3] = P [f f s] = P [f ]P [f ]P [s] = (1 p)2 p,........... Consequently, the density function of X is given by
f (x) = (1 p)x1 p, x = 1, 2, 3......
The random variable X with this density function is called geometric3 random variable. Given the value of the
parameter p, the probability distribution of the geometric random variable X is uniquely described.
For the geometric random variable X, we have (please try the proofs)
t
pe
(i) mX (t) = 1qe
t , where q = 1 p and t < ln q,
(ii) E(X) = 1/p, E(X 2 ) = (1 + q)/p2 ,
(iii) Var(X) = q/p2 .
Ex. A fair coin is tossed again and again till head appears. If X denotes"the
#x number of tosses in this experiment,
then X is a geometric random variable with the density function f (x) = 21 , x = 1, 2, 3, ......... Here p = 12 .
Binomial Distribution
Suppose a random experiment consisting of a finite number n of independent trials is performed, where each
trial results into two outcomes namely success (s) and failure (f ) which have constant probabilities p and 1 p
respectively in each trial. Let X denotes the number of successes in the n trials. Then X is a discrete random
variable with values X = 0, 1, 2, ...., n. Now, corresponding to X = 0, there is only one point in the sample space
namely f f f....f (where f repeats n times) with probability (1 p)n . Therefore, P [X = 0] = (1 p)n . Next,
corresponding to X = 1 there are n C1 points sf f...f , f sf...f , f f s...f , ...., f f f...s (where s appears once and f
repeats n 1 times) in the sample space each with probability (1 p)n1 p. Therefore, P [X = 1] = n C1 (1 p)n1 p.
Likewise, P [X = 2] = n C2 (1 p)n2 p2 ,........, P [X = n] = pn . Consequently, the density function of X is given by
f (x) =
Cx (1 p)nx px , x = 0, 1, 2, 3......, n.
The random variable X with this density function is called binomial4 random variable. Once the values of the
parameters n and p are given/determined, the density function uniquely describes the binomial distribution of X.
3 The
name geometric because the probabilities p, (1 p)p, (1 p)2 ,.... in succession constitute a geometric progression.
name binomial because the probabilities (1 p)n , n C1 (1 p)n1 p,....., pn in succession are the terms in the binomial expansion
of ((1 p) + p)n .
4 The
For the binomial random variable X, we have (please try the proofs)
(i) mX (t) = (q + pet )n , where q = 1 p,
(ii) E(X) = np,
(iii) Var(X) = npq.
Ex. Suppose a die is tossed 5 times. What is the probability of getting exactly 2 fours ? (Here n = 5, p = 1/6,
x = 2, and therefore P [X = 2] =5 C2 (1 1/6)52 (1/6)2 = 0.161.)
Hypergeometric Distribution
Suppose a random experiment consists of choosing n objects without replacement from a lot of N objects given
that r objects possess a trait of our interest in the lot of N objects. Let X denotes the number of objects possessing
the trait in the selected sample of size n. Then X is a discrete random variable and assumes values in the range
max[0, n (N r)] x min(n, r). Further, X = x implies that there are x objects possessing the trait in
the selected sample of size n, which should come from the r objects possessing the trait. On the other hand, the
remaining n x objects are without trait in the selected sample of size n. So these should come from the N r
objects without trait available in the entire lot of N objects. It follows that the number of ways to select n objects,
where x objects with trait are to be chosen from r objects and n x objects without trait are to be chosen from
N r objects, is r Cx .N r Cnx . Also, the number of ways to select n objects from the lot of N objects is N Cn .
r
P [X = x] =
Cx .N r Cnx
.
NC
n
r
N r
The random variable X with the density function f (x) = Cx . N CnCnx , where max[0, n(N r)] x min(n, r)
is called hypergeometric random variable. The hypergeometric distribution is characterized by the three parameters
N , r and n.
" #
" # " N r # ) N n *
For the hypergeometric random variable X, it can be shown that E(X) = n Nr and Var(X) = n Nr
N
N 1 .
The hypergeomeric probabilities can be approximated satisfactorily by the binomial distribution provided n/N
0.5.
Ex. Suppose we randomly select 5 cards without replacement from a deck of 52 playing cards. What is the
probability of getting exactly 2 red cards ? (Here N = 52, r = 26, n = 5, x = 2, and therefore P [X = 2] = 0.3251.)
Poission Distribution
Observing discrete occurrences of an event in a continuous region or interval5 is called a Poission process or Poission
experiment. For example, observing the white blood cells in a sample of blood, observing the number of BITSPilani
students placed with more than one crore package in five years etc. are Poission experiments.
Let denote the number of occurrences of the event of interest per unit measurement of the region or interval.
Then the expected number of occurrences in a given region or interval of size s is k = s. If X denotes the number
of occurrences of the event in the region or interval of size s, then X is called a Poission random variable. Its
probability density function can be proved to be
f (x) =
ek k x
, x = 0, 1, 2, ....
x!
We see that the Poission distribution is characterized by the single parameter k while a Poission process or experiment is characterized by the parameter .
It can be shown that
t
mX (t) = ek(e
5 Note
1)
, E(X) = k = Var(X).
that the specified region could take many forms. For instance, it could be a length, an area, a volume, a period of time, etc.
Ex. A healthy person is expected to have 6000 white blood cells per ml of blood. A person is tested for white
blood cells count by collecting a blood sample of size 0.001ml. Find the probability that the collected blood sample will carry exactly 3 white blood cells. (Here = 6000, s = 0.001, k = s = 6 and x = 3, and therefore
6 3
P [X = 3] = e 3!6 .)
Ex. In the last 5 years, 10 students of BITSPilani are placed with a package of more than one crore. Find the
probability that exactly 7 students will be placed with a package of more than one crore in the next 3 years. (Here
6 7
= 10/5 = 2, s = 3, k = s = 6 and x = 7, and therefore P [X = 7] = e 7!6 .)
10
Uniform Distribution
A random variable X is said to follow uniform distribution if it assumes finite number of values all with same chance
of occurrence or equal probabilities. For instance, if the random variable X assumes n values x1 , x2 , .... , xn with
equal probabilities P [X = xi ] = 1/n, then it is uniform random variable with density function given by
f (x) =
1
, x = x1 , x2 , ...., xn .
n
The moment generating function, mean and variance of the uniform random variable respectively read as
,2
+ n
n
n
n
1!
1! 2
1!
1 ! txi
2
e , =
xi , =
x
xi .
mX (t) =
n i=1
n i=1
n i=1 i
n i=1
Ex. Suppose a fair die is thrown once. Let X denotes the number appearing on the die. Then X is a discrete
random variable assuming the values 1, 2, 3, 4, 5, 6. Also, P [X = 1] = P [X = 2] = P [X = 3] = P [X = 4] = P [X =
5] = P [X = 6] = 1/6. Thus, X is a uniform random variable.
11
A continuous random variable is a variable X that takes all values x in an interval or intervals of real numbers, and
its probability for a particular value is 0. For example, if X denotes the time of peak power demand in a power
house, then it is a continuous random variable because the peak power demand happens over a continuous period
of time, no matter how small or big it is. In other words, it does not happen at an instant of time or at a particular
value of time variable.
A function f is called density function of a continuous random variable X provided f (x) 0 for all x,
 b
 x
f (x)dx = 1 and P [a X b] =
f (x)dx. Further, a function F defined by F (x) =
f (x)dx is called
The kth ordinary moment (E(X k )), mean () and variance ( 2 ) of X are respectively, given by
. k
/

d
xk f (x)dx =
E(X k ) =
[m
(t)]
,
X
dxk
t=0
= E(X) =
10
xf (x)dx,
= E(X ) E(X) =
x f (x)dx
$
%2
xf (x)dx .
Remarks (i) Thecondition f (x) 0 implies that the graph of y = f (x) lies on or above xaxis.
Further,
x < 0.1
0,
6.25x2 1.25x + 0.625, 0.1 x 0.5
F (x) =
1,
x > 0.5
P [0.2 X 0.3] = 0.1875.
= 0.3667.
2 = 0.00883.
12
A continuous random variable X is said to have uniform distribution if its density function reads as
0 1
ba , a < x < b
f (x) =
0,
elsewhere
In this case, the area under the curve is in the form of a rectangle. That is why the name rectangular is there.
You may easily derive the following for the uniform distribution.
xa
0,
xa
, a<x<b
F (x) =
ba
1,
xb
mX (t) =
ebt eat
.
(b a)t
b+a
.
2
(b a)2
.
2 =
12
13
11
Gamma Distribution
A continuous random variable X is said to have gamma distribution with parameters and if its density function
reads as
x
1
x1 e , x > 0, > 0, > 0,
()
4
where () = 0 ex x1 dx is the gamma function.6
The moment generating function, mean and variance of the gamma random variable can be derived as
%
$
1
.
mX (t) = (1 t)
t<
f (x) =
= ,
2 = 2 .
Note: The special case of gamma distribution with = 1 is called exponential distribution. Therefore, density
function of exponential distribution reads as
f (x) =
1 x
e , x > 0, > 0,
On the other hand, the special case of gamma distribution with = 2 and = /2, being some positive integer,
is named as ChiSquared (2 ) distribution. Its density function is
f (x) =
1
" #
2
22
x 2 1 e
x
2
, x > 0.
The 2 distribution arises so often in practice, extensive tables for its cumulative distribution function have
been derived.
14
Normal Distribution
A continuous random variable X is said to follow Normal distribution7 with parameters and if its density
function is given by
1
f (x) =
2
1 x
e 2
"2
For the normal random variable X, one can immediately verify the following:

1 2 2
f (x)dx = 1, mX (t) = et+ 2 t , Mean = , Variance = 2 .
This shows that the two parameters and in the density function of normal random variable X are its mean
and standard deviation, respectively.
Note: If X is a normal random variable with mean and variance 2 , then we write X N (, 2 ).
should remember that (1) = 1, () = ( 1)( 1), (1/2) = and () = ( 1)! when is an integer.
distribution was first described by De Moivre in 1733 as the limiting case of Binomial distribution when number of trials is
infinite. This discovery did not get much attention. Around fifty years later, Laplace and Gauss rediscovered normal distribution while
dealing with astronomical data. They found that the errors in astronomical measurements are well described by normal distribution.
The normal distribution is also known as Gaussian distribution.
6 One
7 Normal
12
1
e 2 , < z < .
(z) =
2
The corresponding cumulative distribution function is given by
z
1
(z) =
(z)dz =
2
z2
e 2 dz.
The normal probability curve is symmetric about the line X = or Z = 0. Therefore, we have
P [X < 0] = P [X > 0] = 0.5, P [a < Z < 0] = P [0 < Z < a].
The probabilities of the standard normal variable Z in the probability table of normal distribution are given in
terms of cumulative distribution function (z) = F (z) = P [Z z] (See Table 5 on page 697 in the text book). So
we have
P [a < Z < b] = P [Z < b] P [Z < a] = F (b) F (a).
From the normal table, it can be found that
P [X  < ] = P [ < X < + ] = P [1 < Z < 1] = F (1) F (1) = 0.8413 0.1587 = 0.6826.
This shows that there is approximately 68% probability that the normal variable X lies in the interval ( , + ).
We call this interval as the 1 confidence interval of X. Similarly, the probabilities of X in 2 and 3 confidence
intervals are respectively, are given by
P [X  < 2] = P [ 2 < X < + 2] = P [2 < Z < 2] = 0.9544,
P [X  < 3] = P [ 3 < X < + 3] = P [3 < Z < 3] = 0.9973.
Ex. A random variable X is normally distributed with mean 9 and standard deviation 3. Find P [X 15],
P [X 15] and P [0 X 9].
Sol. We have Z =
X9
3 .
Note: If X is normal random variable with mean and variance 2 , then P [X  < k] = P [Z < k] =
F (k) F (k). However, if X is not a random variable, then the rule of thumb for the required probability is given
by the Chebyshevs inequality.
13
Chebyshevs Inequality
If X is a random variable with mean and variance 2 , then
P [X  < k] 1
1
k2
.
Note that the Chebyshevs inequality does not yield the exact probability of X to lie in the interval (k, +k)
rather it gives the minimum probability for the same. However, in case of normal random variable, the probability
obtained is exact. For example, consider the 2 interval ( 2, + 2) for X. Then, Chebyshevs inequality gives
P [X  < 2] 1 14 = 0.75. In case, X is normal variable, we get the exact probability P [X  < 2] = 0.9544.
However, the advantage of Chebyshevs inequality is that it applies to any random variable of known mean and
variance.
Chapter 5
So far we have studied a single random variable either discrete or continuous. Such random variables are called
univariate. Problems do arise where we need to study two random variables simultaneously. For example, we may
wish to study the heights and weights of a group of students up to the age of 20 years. Typical questions to ask are,
What is the average height of students of age less than or equal to 18 years? or, Is the height independent of
weight?. To answer this type of questions, we need to study what are called twodimensional or bivariate random
variables.
! !
fXY (x, y) = 1
X=x Y =y
Distribution function
The distribution function of (X, Y ) is given by
! !
F (x, y) =
fXY (x, y).
Xx Y y
Expectation
The expectation or mean of X is defined as
! !
E[X] =
xfXY (x, y) = X .
X=x Y =y
Covariance
If X and Y are the means of X and Y respectively, then covariance of X and Y , denoted by Cov(X, Y ) is defined
as
Cov(X, Y ) = E[(X X )(Y Y )] = E[XY ] E[X]E[Y ].
Ex. In an automobile plant, two tasks are performed by robots, the welding of two joints and tightening of three
bolts. Let X denote the number of defective bolts and Y denote the number of improperly tightened bots produced
per car. The probabilities of (X, Y ) are given in the following table.
X/Y
0
1
2
fY (y)
0
0.84
0.06
0.01
0.91
1
0.03
0.01
0.005
0.45
2
0.02
0.008
0.004
0.032
3
0.01
0.002
0.001
0.013
fX (x)
0.9
0.08
0.02
1
fXY (x, y)
fXY (0, 0) + fXY (0, 1) + fXY (0, 2) + fXY (0, 3) + fXY (1, 0) + fXY (1, 1)
X=0 Y =0
+fXY (1, 2) + fXY (1, 3) + fXY (2, 0) + fXY (2, 1) + fXY (2, 2) + fXY (2, 3)
=
=
0.84 + 0.03 + 0.02 + 0.01 + 0.06 + 0.01 + 0.008 + 0.002 + 0.01 + 0.005 + 0.004 + 0.001
1
2
!
fXY (x, 0) = fXY (0, 0) + fXY (1, 0) + fXY (2, 0) = 0.84 + 0.06 + 0.01 = 0.91.
X=0
(iv) From the given Table, we notice that fXY (0, 0) = 0.84, fX (0) = 0.9 and fY (0) = 0.91. So we have
fX (0)fY (0) = 0.819 = fXY (0, 0).
This shows that X and Y are not independent.
(v) We find
2 !
3
!
E[X] =
xfXY (x, y) = 0.12,
E[Y ] =
X=0 Y =0
2 !
3
!
X=0 Y =0
2 !
3
!
E[XY ] =
X=0 Y =0
"
"
"
"
Distribution function
The distribution function of (X, Y ) is given by
" x " y
F (x, y) =
fXY (x, y)dxdy.
Expectation
The expectation or mean of X is defined as
! !
E[X] =
xfXY (x, y)dxdy = X .
Covariance
If X and Y are the means of X and Y respectively, then covariance of X and Y , denoted by Cov(X, Y ) is defined
as
Cov(X, Y ) = E[(X X )(Y Y )] = E[XY ] E[X]E[Y ].
Ex. Let X denote a persons blood calcium level and Y , the blood cholesterol level. The joint density function of
(X, Y ) is
fXY (x, y) = k, 8.5 x 10.5, 120 y 240.
(i) Find the value of k.
(ii) Find the marginal densities of X and Y .
(iii) Find the probability that a healthy person has a cholesterol level between 150 to 200.
(iv) Are the variables X and Y independent?
(v) Find Cov(X, Y ).
Sol. (i) fXY (x, y) being joint density function, we have
1=
240
120
10.5
kdxdy = 240k.
8.5
240
120
1
1
dy = , 8.5 x 10.5.
240
2
8.5
(iii) The probability that a healthy person has a cholesterol level between 150 to 200, is
P [150 Y 200] =
200
fY (y)dy =
150
5
.
12
(iv) We have
fX (x)fY (y) =
1
1
1
=
fXY (x, y).
2 120
240
E[X] =
E[Y ] =
! !
E[XY ] =
240
120
240
120
10.5
8.5
10.5
x
dxdy = 9.5,
240
y
dxdy = 180,
240
8.5
240 ! 10.5
120
8.5
xy
dxdy = 1710.
240
xyfX (x)fY (y)dxdy ( fXY (x, y) = fX (x)fY (y) as X and Y are given independent.)
"!
#
yfY (y)
xfX (x)dx dy
yfY (y)E[X]dy
!
E[X]
yfY (y)dy
=
=
E[X]E[Y ].
Note. Converse of the above result need not be true, that is, if E[XY ] = E[X]E[Y ], then X and Y need not
be independent. For instance, see the following table for the joint density function of a two dimensional discrete
random variable (X, Y ).
X/Y
1
4
fY (y)
2
0
1/4
1/4
1
1/4
0
1/4
1
1/4
0
1/4
2
0
1/4
1/4
fX (x)
1/2
1/2
1
We find that E[X] = 5/2, E[Y ] = 0 and E[XY ] = 0. So E[XY ] = E[X]E[Y ]. Next, we see that fX (1) = 1/2,
fY (1) = 1/4 and fXY (1, 1) = 1/4. So fX (1)fY (1) = fXY (1, 1), and hence X and Y are not independent.
In fact, we can easily observe the dependency X = Y 2 . Thus, covariance between X and Y gives only a rough
indication of any association that may exist between X and Y . Also it does not describe the type or strength of
the association. The linear relationship between X and Y can be predicted by using a measure known as Pearson
coecient of correlation.
XY =
Cov(X, Y )
.
X Y
It can be proved that XY lies in the range [1, 1]. Further, XY  = 1 if and only if Y = 0 + 1 X for some
real numbers 0 and 1 = 0.
Note that if XY = 0, we say that X and Y are uncorrelated. It does not imply that X and Y are unrelated.
Of course, the relationship, if exists, would not be linear.
2
In Robots example, X
= 0.146, Y2 = 0.268, Cov(X, Y ) = 0.046 and therefore XY = 0.23.
fXY (x, y)
, (fY (y) > 0).
fY (y)
fXY (x, y)
, (fX (x) > 0).
fX (x)
X/y = E[X/Y = y] =
xfX/y dx.
The graph of X/y versus y is called regression line of X on Y . Similarly, the graph of
Y /x = E[Y /X = x] =
yfY /x dy,
" 33 " 33
27
1
627 ln 33/27 .
"x
(ii) fX (x) = 27 xc dx = c(1 27/x), 27 x 33
" 33 c
fY (y) = y x dx = c(ln 33 ln y), 27 y 33.
We observe that fXY (x, y) = c/x = fX (x)fY (y). So X and Y are not independent.
(iii) P [X 30, Y 28] =
(iv) We have
fX/y =
" 28 " 30
27
c
x dxdy
= 0.15.
1
fXY (x, y)
=
, y x 33.
fY (y)
x(ln 33 ln y)
fXY (x, y)
1
=
, 27 y x.
fX (x)
x 27
! 33
! 33
fX/y=30 dx =
P [X > 32y = 30] =
fY /x =
32
32
X/y=30 =
33
xfX/y=30 dx =
30
33
30
1
dx = 0.32.
x(ln 33 ln 30)
1
dx = 31.48.
ln 33 ln 30
33
xfX/y dx =
33
y
33 y
1
dx =
.
ln 33 ln y
ln 33 ln y
Curve of regression of Y on X is
! x
! x
y
1
Y /x =
yfY /x dx =
dx = (x + 27).
x
27
2
27
27
Chapter 6
The inferential statistics is essentially based on random sampling from the population. So it is important to
understand the meaning of random sample.
Random Sample
A random sample of size n from the distribution of X is a collection n independent random variables, each with
the same distribution as of X.
It may noted that the term random sample is used in three dierent but closely related ways in applied
statistics. It may refer to objects for study or to the random variables associated with the selected objects for study
or to the numerical values assumed by the associated random variables as illustrated in the following example.
Suppose we wish to find the mean eective life of lithium batteries used in a particular model of pocket calculator
so that a limited warranty can be placed on the product. For this purpose, we randomly choose n batteries from
the population of batteries. Here, prior to the actual selection of the batteries, the life span Xi (i = 1, 2, ..., n) of
the ith battery is a random variable. It has the same distribution as X, the life span of batteries in the population.
The random variables Xi are independent in the sense that the value assumed by one has no eect on the value
assumed by any other variable. Thus, the random variables X1 , X2 , ......, Xn constitute a random sample. For the
selected sample of n batteries, the random variables X1 , X2 , ......, Xn shall assume n real values x1 , x2 , ......, xn .
In the above example, the selected n batteries, the associated n random variables X1 , X2 , ......, Xn and the
values x1 , x2 , ......, xn assumed by the n random variables, all refer to random sample in the context under
consideration.
Statistics
A statistic is a random variable whose numerical value can be determined from the random sample. In other words,
a statistic is a random variable that is a function of the variables X1 , X2 , ......, Xn in the random sample. Some
statistics are described in the following.
Sample Mean
Let X1 , X2 , ......., Xn be a random sample from the distribution of X. Then the statistic
sample mean and is denoted by X. So X =
n
"
n
"
Xi /n is called the
i=1
Sample Median
Let x1 , x2 , ......., xn be a random sample of observations arranged in the order from the smallest to the largest.
The sample mean is the middle observation if n is odd otherwise it is the average of the two middle observations.
Sample Variance
Let X1 , X2 , ......., Xn be a random sample of size n from the distribution of X. Then the statistic
S2 =
n
!
(Xi X)2
i=1
Important Remark: It can be shown that the statistics S 2 tends, on the average, to underestimate 2 , the
n
!
(Xi X)2 is divided by n 1 in place of n. In this way, S 2 is
population variance. To improve the situation,
i=1
unbiased for 2 , that is, centred at the right spot. In case, X1 , X2 , ......., Xn constitute the entire population, then
S 2 = 2 =
n
!
(Xi X)2
i=1
S =
n
!
(Xi2
i=1
"
n
!
i
n(n 1)
Xi
#2
Sample Range
The sample range is defined as the dierence between the largest and the smallest observations.
Ex. A random sample of 9 observations is given as follows:
310 400 406 410 450 395 401 408 415
Find sample mean, median, variance, standard deviation and range. (Ans. Sample mean= 408.3, Median= 406,
Variance= 303.25, Standard deviation= 17.4, Range= 60).
Chapter 7
Unbiased point estimator
= . For example, if X1 , X2 , ......., Xn
A parameter is an unbiased estimator for a parameter if and only if E[]
is a random sample of size n from a distribution with mean , then the sample mean X is an unbiased estimator
for . For,
E[X] = E[(X1 + X2 + ... + Xn )/n] = (E[X1 ] + E[X2 ] + ... + E[Xn ])/n = ( + + ... + )/n = (n)/n =
since X1 , X2 , ......., Xn constitute the random sample from the distribution having mean , so each of the random
variables Xi has mean .
It is desirable that the unbiased estimator has a small variance for large sample sizes.
n
!
(Xi X)2
n1
i=1
is an unbiased estimator for the population variance 2 . Also, it can be shown that S is not unbiased for . This
emphasizes the fact that unbiasedness is desirable but not essential in an estimator.
Xi /5 = 20
p or 17.8 = 20
p
i=1
n
!
f (xi )
i=1
known as the likelihood function for the sample. Find the expression for that maximizes the likelihood function.
Note that the likelihood function gives the probability of getting the sample X1 , X2 , ......., Xn from the distribution
of the random variable X. So we find the value of that maximizes the value of the likelihood function. This value
of serves as an estimate for the parameter .
Ex. Let X1 , X2 , ......., Xn be a random sample from a normal distribution with mean and variance 2 . The
density for X is
1 x 2
1
f (x) = e 2 ( ) .
2
1 xi 2
1
e 2 ( ) =
L(, ) =
2
i=1
$n
212
n
!
i=1
(xi )2
10
1 !
ln L(, ) = n ln 2 n ln 2
(xi )2 .
2 i=1
1!
xi ,
n i=1
2 =
1!
(xi )2 .
n i=1
Thus, the maximum likelihood estimators for the parameters and 2 are
n
=X
and
2 =
1!
(xi X)2 .
n i=1
Note: The estimator obtained from the method of moments often agrees with the one obtained from the method
of maximum likelihood. If it does not happen in some case, then the maximum likelihood estimator is preferred.
Theorem: Let X1 and X2 be independent random variables with mgf mX1 (t) and mX2 (t) respectively. Let
Y = X1 + X2 . Then the mgf of Y is given by
mY (t) = mX1 (t)mX2 (t).
Proof: We have
mY (t) = E[etY ] = E[etX1 +tX2 ] = E[etX1 ]E[etX2 ] = mX1 (t)mX2 (t),
since etX1 and etX2 are independent as X1 and X2 are independent.
1
2 2
Ex. The mgf of a normal random variable with mean and variance 2 is mX (t) = et+ 2 t . Let X1 , X2 , .......,
Xn be independent normal variables with means 1 , 2 , ......., n and variance 12 , 22 , ......., n2 , respectively. Let
Y = X1 + X2 + ..... + Xn . Then, we have
mY (t) =
n
"
n
!
mXi (t) = e
i=1
n
!
i
+ 21
i=1
2
i2
t
i=1
Theorem: Let X be a random variable with mgf mX (t), and Y = + X. Then the mgf of Y is given by
mY (t) = et mX (t).
Proof: We have
mY (t) = E[etY ] = E[et+tX ] = E[et etX ] = et E[etX ] = et mX (t).
Theorem: Let X1 , X2 , ......., Xn be a random sample of size n from a normal distribution with mean and
variance 2 . Then X is normally distributed with mean and variance 2 /n.
Proof: We know that
1
mX (t) = et+ 2
2 2
t+ 12
'
2
n2
(
t2
It follows that
mX (t) = m X1 +X2 +.....+Xn (t) = m X1 (t)m X2 (t)....m Xn (t) = e( n
1
+....
n )t+ 2
'
2
n2
(
2
+....+
t2
n2
=e
t+ 12
'
2
n
(
t2
11
Confidence interval
A 100(1 )% confidence interval for a parameter is a random interval [L1 , L2 ] such that P [L1 L2 ] = 1 ,
regardless the value of .
variance 2 . Then X is normally distributed with mean and variance 2 /n. Therefore, Z = /
follows a
n
standard normal distribution. We utilize this fact to find confidence intervals for the unknown . Let us find 95%
confidence interval for . From the normal probability distribution table, we have
P [1.96 Z 1.96] = F (1.96) F (1.96) = 0.95.
X
1.96] = 0.95.
/ n
or P [1.96
In general, 100(1 )% confidence interval for is [L1 , L2 ] = [X z/2 / n, X + z/2 / n]. Here z/2 is the
value of Z =
/ n
such that P [Z > z/2 ] = P [Z < z/2 ] = /2. Obviously, P [z/2 Z z/2 ] = 1 .
Note: If the sample is drawn from a nonnormal distribution, then the following theorem helps us in getting the
confidence intervals for .
Ex. Find the 95% confidence interval for mean of population given a sample
8.0
12.5
13.4
14.2
13.6
14.2
8.6
19.0
13.2
14.9
11.5
17.9
13.6
14.5
16.0
17.0
Chapter 8
We have seen how to estimate both mean and variance of a distribution via point estimation. We have also seen
how to construct a confidence interval for the mean of a normal distribution when its variance is assumed to be
known. Unfortunately, in most of the statistical studies, the assumption that 2 is known is unrealistic. If it is
necessary to estimate the mean of a distribution, then its variance is usually unknown. In what follows we shall
learn how to make inferences on the mean and variance when both of these parameters are unknown.
12
n
!
i=1
(Xi X)2 / 2
Thus, 95% confidence interval for 2 is [L1 , L2 ] = [(n 1)S 2 /20.025 , (n 1)S 2 /20.975 ].
In general, 100(1 )% confidence interval for 2 is [L1 , L2 ] = [(n 1)S 2 /2/2 , (n 1)S 2 /21/2 ].
Ex. Find the 95% confidence interval for 2 of a normal population based on the following sample:
3.4
3.0
1.4
3.5
4.2
3.6
3.1
2.0
2.5
1.5
4.0
4.1
3.1
1.7
3.0
0.4
1.4
1.8
5.1
3.9
2.0
2.5
1.6
0.7
3.0
Sol. Here n = 25 and S 2 = 1.408. From the 2 probability distribution table, for 24 degrees of freedom, we have
20.025 = 39.4 and 20.975 = 12.4. So 95% confidence limits are given by
L1 = (n 1)S 2 /20.025 = 24(1.408)/39.4 = 0.858,
L2 = (n 1)S 2 /20.975 = 24(1.408)/12.4 = 2.725.
1A
random variable X is said to follow chisquare distribution with degrees of freedom if its density function is given by
f (x) =
1
x/21 ex/2 ,
(/2)2/2
x > 0.
of independent chisquare random variables is also a chisquare random variable with degrees of freedom equal to the sum of degrees
of freedom of all the independent random variables. It follows that if X1 , X2 , ......., Xn is a random sample of size n from a normal
%
n $
#
Xi 2
is a chisquare random variable with n degrees of freedom.
distribution with mean and variance 2 , then
i=1
13
S/ n
follows a T distribution2 with n 1 degrees of freedom.
X
, let us find 95% confidence interval for . Let t0.025 and t0.025 denote the values of
Denoting Tn1 = S/
n
Tn1 such that P [Tn1 t0.025 ] = 0.025 = P [Tn1 t0.025 ]. Obviously, we have
X
t0.025 ] = 0.95.
S/ n
X
t0.025 ] = 0.95. (Because of symmetry of T distribution, t0.025 = t0.025 )
S/ n
Ex. Find the 95% confidence interval for of a normal population based on the following sample:
52.7
62.2
45.3
52.4
43.9
56.5
63.4
38.6
41.7
33.4
53.9
46.1
71.5
61.8
65.5
44.4
47.6
54.3
66.6
60.7
55.1
50.0
70.0
56.4
Sol. Here n = 24, X = 53.92 and S = 10.07. From the T probability distribution table, for 23 degrees of freedom,
we have t0.025 = 2.069. So 95% confidence limits are given by
Z is a standard normal variable and X2 is an independent chisquared random variable with degrees of freedom, then the
!
random variable T = Z/ X2 / is said to follow a T distribution with degrees of freedom.
The density function of a T random variable reads as
f (t) =
( + 1)/2
(/2)
"
1+
t2
#(+1)/2
< t < .
The graph of this density function is symmetric about the line t = 0 and tends to the standard normal curve as the number of degrees
of freedom increases.
14
Hypothesis testing
In the theory of hypothesis testing, the experimenter/researcher proposes a hypothesis on population parameter
. The hypothesis proposed by the experimenter/researcher is known as alternative or research hypothesis and is
denoted by H1 . Negation of H1 is called null hypothesis and is denoted by H0 . While testing a hypothesis on
population parameter , the statement of equality = 0 (known as null value of ), is always included in H0 .
Further, H1 being the research hypothesis, it is expected that the evidence leads us to reject H0 and thereby to
accept H1 .
Ex. Highway engineers think that the reflective highway signs do not perform properly because more than 50%
of the automobiles on the road have misaimed headlights. If this contention is supported statistically, a tougher
inspection program will be put into operation. Let p denote the proportion of automobiles with misaimed headlights.
Since the engineers wish to support p > 0.5, so the research hypothesis H1 and the null hypothesis H0 are
H1 : p > 0.5
H0 : p 0.5
Note that p = 0.5, the null value of p, is included in H0 .
15
Power: Suppose a researcher puts a great deal of time, eort and money into designing and carrying out an experiment to gather evidence to support a research theory. Therefore, the researcher would like to have the probability
of rejecting the null hypothesis when the research theory is true. This probability is called power of the test.
Note that both the probabilities, power and , are calculated under the assumption that the research theory is
true. The researcher will either fail to reject the null hypothesis with probability or will reject the null hypothesis
with probability power.
+ power = 1 or power = 1 .
In the previous example, we found = 0.392 under the assumption that the research theory is true (p = 0.7).
Therefore, power= 1 0.392 = 0.608.
Significance testing
Suppose we want to test
H0 : p 0.1
H1 : p > 0.1
based on a sample of size 20. Let the test statistic is X, the number of successes that are observed in 20 trials. If
p = 0.1, the null value of p, then X follows a binomial distribution with mean E[X] = 20(0.1) = 2. So values of
X somewhat greater than 2 will lead to the rejection of null hypothesis. Suppose we want to be very small, say
0.0001. From the binomial probability distribution table, we have
P [X 9 : p = 0.1] = 1 P [X 8 : p = 0.1] = 1 0.9999 = 0.0001.
So the critical region of the test is {C = 9, 10, ......., 20}. Now suppose we conduct the test and observe 8 successes.
It does not fall into C. So via our rigid rule of hypothesis testing we are unable to reject H0 . However, a little
thought should make us a bit uneasy with this decision. We find
P [X 8 : p = 0.1] = 1 P [X 7 : p = 0.1] = 1 0.9996 = 0.0004.
It means we are willing to tolerate 1 chance in 10000 of making a Type I error. But we shall declare 4 chances
in 10000 of making such an error too large to risk. There is so little dierence between these probabilities that it
seems a bit silly to insist with our original cut o value 9.
Such a problem can be avoided by adopting a technique known as significance testing where we do not preset
and hence do not specify a rigid critical region. Rather, we evaluate the test statistic and then determine the
probability of observing a value of the test statistic at least as extreme as the value noted, under the assumption
= 0 . This probability is known as critical level or descriptive level of significance or P value of the test. We
reject H0 if we consider this P value to be small. In case, an level has been preset to ensure that a traditional or
industry maximum acceptable level is met, we compare the P value with the preset value. If P , then we can
reject the null hypothesis atleast at the stated level of significance.
Ex. Automotive engineers are using more and more aluminium in manufacturing the automobiles in hopes of
reducing the cost and improving the petrol mileage. For a particular model, the mileage on highway has a mean 26
kmpl with a standard deviation of 5 kmpl. It is hoped that a new design manufactured by using more aluminium
will increase the mean petrol mileage on highway maintaining the standard deviation of 5 kmpl. So we test the
hypothesis
H0 : 26
H1 : > 26 (the new design incereases the petrol mileage on highway)
Suppose 36 vehicles with new design are tested on highway and the mean petrol mileage is found to be 28.04
kmpl. Here, n = 36 and sample mean is X = 28.04. We choose X as the test statistic since X is an unbiased
estimator for the population mean . We know X is approximately normally distributed with mean = 26 and
16
H0 : 0
H1 : > 0
(Righttailed test)
II
H0 : 0
H1 : < 0
(Lefttailed test)
III
H0 : = 0
H0 : = 0
(Twotailed test)
Tests of hypothesis on are actually conducted by testing H0 : = 0 against one of the alternatives > 0 ,
< 0 and = 0 . In particular, the values of the test statistic that lead us to reject 0 and to conclude that
> 0 will also lead us to reject any value less than 0 . Similarly, the values of the test statistic that lead us to
reject 0 and to conclude that < 0 will also lead us to reject any value greater than 0 . For this reason many
statisticians prefer to write the above three tests as
I
H0 : = 0
H1 : > 0
(Righttailed test)
II
H0 : = 0
H1 : < 0
(Lefttailed test)
III
H0 : = 0
H0 : = 0
(Twotailed test)
This emphasizes the fact that while performing a hypothesis test on , is computed assuming that = 0 .
Similarly, while performing a significance test on , P value is computed under the assumption = 0 .
Ex. The maximum acceptable level for exposure to microwave radiation in Mumbai is an average of 10 microwatts
per square centimeter. It is feared that a large television transmitter may be polluting the air nearby by pushing
the level of microwave radiation above the safe limit. So we want to test
H0 : 10
H1 : > 10 (unsafe)
Obviously, a righttailed
test is applicable here. Suppose a sample of 25 readings is to be obtained. Then our test
statistic (X 10)/(S/ 25) follows a T24 distribution when = 10. Let us preset . If we make a Type I error, we
shall shut down the transmitter unnecessarily. On the other hand, if we make a Type II error, we shall fail to detect
potential health hazard. We want small but not so small as to force very large. Let us choose = 0.1. From the
T distribution probability table, we find that the critical point of the test is 1.318. Suppose
the sample of 25 readings
gives X = 10.3 and S = 2. So the observed value of the test statistic is (X 10)/(S/ 25) = (10.310)/(2/5) = 0.75,
which is less than the critical value 1.318. Therefore, we are unable to reject H0 and conclude that the observed
data do not support the contention that the transmitter is forcing the average microwave level level above the safe
limit.
17
Now, let us find the P value of the test, that is, P [T24 0.75]. From the T distribution probability table, we
find that P [T24 > 0.685] = 1 P [T24 0.685] = 1 0.75 = 0.25. Also, P [T24 > 1.318] = 1 P [T24 1.318] =
1 0.9 = 0.1. Next, the observed value of the test statistic is 0.75, which lies between 0.685 and 1.318. It follows
that the P value of the test, given by P [T24 0.75], is greater than 0.1 but less than 0.25. Since the P value of
the test is greater than the preset value 0.1. So we are unable to reject H0 in favor of H1 at the stated level of
significance.
Ex. See example 8.5.5 from the text book for a twotailed test on mean.
X
.
n
It serves as an estimator for the population proportion p. Note that p = X, that is, p is the mean of the selected
random sample. Therefore, by Central Limit Theorem, p is approximately normally distributed with same mean
as of each Xi and variance equal to (VarXi )/n. Now, the density of Xi is given by
xi : 1
f (xi ) : p
0
1p
So mean of Xi is E[Xi ] = p and Var(Xi ) = E[Xi2 ] (E[Xi ])2 = p p2!= p(1 p). Hence, p is approximately normal
with mean p and variance p(1 p)/n. It implies that Z = (
p p)/ p(1 p)/n is standard normal variable, and
the 100(1 )% confidence interval on p is [L1 , L2 ], where
!
L1 = p z/2 p(1 p)/n
L2 = p + z/2
!
p(1 p)/n
!
p(1 p)/n
Ex. In a randomly selected sample of 100 bulbs from the output of a factory, 91 bulbs are found to be working fine
without any defect. Find 95% confidence interval on the population proportion of nondefective bulbs.
Sol. Here, n = 100 and p = 91/100 = 0.91. Also, from the normal table, z0.05/2 = 1.96. So 95% confidence limits
on the population proportion of nondefective bulbs are
!
!
L1 = p z/2 p(1 p)/n = 0.91 1.96 0.91(1 0.91)100 = 0.091 0.056 = 0.854
L2 = p + z/2
!
!
p(1 p)/n = 0.91 + 1.96 0.91(1 0.91)100 = 0.091 + 0.056 = 0.966
So with 95% confidence, we expect production of nondefective bulbs from the factory between 85.4% and 96.6%.
18
2
z/2
p(1 p)
d2
Note that this formula can be used if the prior estimate p for p is available. Otherwise, we use the formula
n=
2
z/2
4d2
In case, the prior estimate p for p is not available, then for the 95% confidence interval with length 0.02, we
need to select a sample of size
n=
(1.96)2
= 9604.
4(0.01)2
P [Z 1.70] = 0.0446.
It implies that the P value of our test lies between 0.0446 and 0.0455. Considering this small p value, we reject H0
and conclude that p > 0.7.
Note. In the above example, the hypothesis testing on p does not assume that the sample size is large. In fact, the
criteria p0 = 0.7 > 0.5 and n(1 p0 ) = 200(1 0.7) = 60 > 5 is met. So the Binomial distribution is approximated
by normal distribution.
19