Anda di halaman 1dari 18

Probability and Stochastic Processes


1. Basic Probability

1.1 Probability can be defined in many ways. It is the chance of occurrence of an outcome,
measured by the ratio of that favorable outcome to the total number of outcomes. It can also be
understood as the measure of uncertainty in the system. Understanding this uncertainty helps us
understand the system, hence yielding better and accurate predictions.

Usually, probability can be interpreted in two ways.

In frequency interpretation, the probability of an outcome is considered the property of the

outcome in that system ie, if the experiment is repeated n number of times, we get a good idea of
the chances of occurrence of the outcomes. This interpretation is highly favored by scientists.

In the subjective interpretation, the probability of occurrence is the belief of the person
quoting. Hence it is more of a personal concept and is favored by philosophers.

Regardless of the interpretation, the mathematics of probability doesn‟t change.

1.2 Key concepts:

1. Random experiment ensures that he outcomes are random ie, the outcome cannot be
predicted with certainty prior to experimentation. Rolling of a dice is a random experiment.
Random variables are the possible outcomes of a random experiment.

2. Performing a random experiment is called a trial. This trial can be performed infinite number
of times and has a well-defined set of possible outcomes. The outcome of a trial is termed as an

3. Sample space is the set of all possible outcomes of an experiment. For the determination of
the sex of a newborn the sample space S={girl,boy}. Hence an event is a subset of S.

4. If E and F are two events, their union event EUF is defined as the set containing all outcomes
that are either in E or in F, or in both E and F. ie, EUF occurs if E occurs, F occurs or E and F
occurs. This can be remembered with the term at least. If a dice is rolled once, S={1,2,3,4,5,6}.
Let E={multiples of 2} and F={multiples of 3}, EUF={2,3,4,6}

5. If E and F are two events, their intersection event EF is defined as the set containing all
outcomes present in both E and F. ie, EF occurs if and only if E and F occurs. This can be
remembered with the term all. For the above sample space S and events E and F, EF={6}

Probability and Stochastic Processes

6. E and F are said to be mutually exclusive events if EF=Φ, ie, E and F both cannot occur in the
same trial. For the above sample space S, if E={even numbers} and F={odd numbers}, EF= Φ.

7. E and F are said to be collectively exhaustive if EUF=S ie, if E and F together contain the
entire sample space. For subsets E1,E2,…En, if E1UE2U…UEn=S or UEi=S, where i=1,2,..n, it is
said to be collectively exhaustive. For the above sample space S, if E={prime numbers} and
F={even numbers}, EUF=S.

If for the above sample space S, E={odd numbers} and F={even numbers}, EUF=S is both
mutually exclusive as well as collectively exhaustive.

8. For an event E, the complement of E, Ec , consists of all outcomes in the sample space that are
not present in E. Hence , EUEc =S

9. Equally likely outcomes have equal chance of occurrence and no outcome occurs in
preference to others.

10. Favourable outcome is the outcome of interest in the random event.

11. Independent events are events whose outcome doesn‟t affect the outcome of the other
events, hence making their probabilities independent.

1.3 Algebra of Events

Unions, intersections and comiliments follow commutative, associative and distributive laws.

In short,

1. Commutative: E U F = F U E EF = FE

2. Associative: E U (F U G) = (E U F) U G E(FG) = (EF)G

3. Distributive: (E U F)G = EG U FG EF U G = (E U G) (F U G)

DeMorgan’s laws:



1.4 Definitions of probability

Classical definition:

Consider a random experiment that resuts in n exhaustive, mutually eclusive and equally likely
utcomes of which m are favourable to the outcome of event E. Then the probability of E s given

Probability and Stochastic Processes

P(E) = number of favourable outcomes/total number of outcomes

= m/n

Statistical definition:

Suppose an event E occurs r times in N repetitions of a random experiment then the ration r/N
gives the relative frequency of the event E and it does not vary appreciably from one trial to
another. In the limiting case where N becomes sufficiently large, it more or less settles to a
number which is called the probability of E.

P(E) = lim N->inf (r/N)

1.5 Axioms of probability

Given a sample space S of a random experiment, the probability of the occurrence of any event e
is defines as a set function P(E) satisfying the following axioms

1. 0 ≤ P(E) ≤ 1

2. P(S) = 1

3. For any sequence of mutually exclusive events E1, E2…

P(U i=1 to n Ei) = ∑ i=1 to n P(Ei) , n=1,2,…,inf


1. Since m & n ≥0, P(E) = 0 when m=0 , impossible/null event

= 1 when m=n , definite/certain event


0 ≤ P(E) ≤ 1

2. The probability of non-occurance of E is given by Ē or Ec

P(Ē) = (n-m)/n

= 1-(n/m)

= 1- P(E)

Probability and Stochastic Processes


P(E) + P(Ē) = 1

P(E U Ē) = 1

P(S) = 1

3. P(E U F) = P(E) + P(F) – P(EF)

Since the events are mutually exclusive, P(EF)=0

P(UEi)=P(E1) + P(E2) +…

= ∑ i=1 to n P(Ei)


1. For any 2 events A & B, the probability of ocuurance of at least one of them is given by
P(AUB) = P(A) + P(B) – P(AB). This is the addition theorem of probability.

1. If 2 events A & B are independent, P(AB) = P(A) * P(B)

2. If A & B are disjoint, P(AUB) = P(A) + P(B) , as P(AB) = 0

3. For events A,B & C, P(AUBUC) = P(A) + P(B) + P(C) – P(AB) – P(BC) – P(CA) + P(ABC)

Formulae to remember:

1. P(EUF) = P(E) + P(F) – P(EF)

If mutually exclusive, P(EF) = 0

2. P(EF) = P(E)*P(F) if E and F are independent.

3. P(E) + P(Ē) = 1

4. nCr represents the number of different groups of size r that can be selected from a set of size n
where the order of selection is not considered relevant.
Cr = n! / (r!*(n-r)!)


Q1. A total of 28% of American males smoke cigarettes, 7% smoke cigars, and 5% smoke both
cigars and cigarettes. What percentage of males smoke neither cigarettes nor cigars?
Probability and Stochastic Processes

Ans. Let E be the event that a randomly chosen make is a cigarette smoker.

Let F be the event that a randomly chosen male is a cigar smoker.

The probability that this person is both a cigarette and cigar smoker is

P(EUF) = P(E) + P(F) – P(EF)

= 0.28 + 0.07 - 0.05

= 0.3

Thus the probability that this person is not a smoker is the complement of our answer, 0.7.

Hence, 70% of American males smoke neither cigarettes nor cigars.

Q2. Flip a fair coin three times. What is the probability that you get

a. exactly 3 heads

b. exactly 2 heads

c. at most 2 heads

d. at least 2 heads


a. P(exactly 3 H) = 1/8 (by counting)

or, we know that the events are independent. Hence,

P(3H) = P(H) * P(H) * P(H) * (combination among themselves)


= 1/8

b. P(2H) = 3/8 (by counting)


P(2H) = P(H) * P(H) * P(T) * (2H combination)

= ½ * ½ * ½ * (3*2/2) (1st H has 3 options, 2nd H has 2 options, 2H can

arrange themselves in 2 ways. Hence, (3*2)/2)


Probability and Stochastic Processes

c. P(at most 2H) = P(0H) + P(1H) + P(2H)

= 1 – P(3H)

= 1- (1/8)

= 7/8

d. P(at least 2H) = P(2H) + P(3H)

= (3/8) + (1/8)


The above numerical will be done using probability distribution function in the later exercises.

Also, refer Khan Academy videos for further explanation.

Q3. A committee of size 5 is to be selected from a group of 6 men and 9 women. If the selection
is random, what is the probability that the committee consists of 3 men and 2 women?

Ans. Randomly selected means, each of the 15C5 combinations are equally likely.

Men, 6C3 choices for 3 men

Women, 9C2 choices for 2 women

Hence, P = (6C3*9C2) / 15C5 (We multiply the probabilities because of „and‟)

= 240/1001

Q4. The birthday problem

If n people are present in a room, what is the probability that at least one pair share the same
birthday? How large should n be so that this probability is less than ½?

Ans. P(At least two people share a birthday) = 1 – P(all unique birthdays)

The first person could have any of the 365 days as their birthday ie, (365-0). The second person
can only have 364 days ie, (365-1). Hence, the nth person has (365-n+1) days. The total sample
space will be 365^n , as each person has their birthdays on any one of the 365 days.

Hence the P(at least one shared birthday) = 1 – ((365*364*…*(365-n+1))/365n)

We can solve using Taylor series.

For n = 23, P = 1 – 0.4995

= 0.5005 >1/2
Probability and Stochastic Processes

This shows that when ≥23, P≤1/2. We need to remember that we are talking about pairs
oindividuals having probability = 1/365. In a room of 23 people, there are 23C2 = 253 different
pairs of individuals. Hence the result is no longer surprising.

1.6 Conditional probability

Conditional probability deals with the probability associated with an event, given that a related
event has occurred. We recalculate the probabilities when additional information is provided.

We deal here with dependent events.

The sample points in the ample space reduce when conditional probability is applied.

Consider a dice rolled twice. Hence its sample space has 36 outcomes. Given that the first dice
roll showed 3, what is the probability that the sum of the two dice rolls equals 8?

Here, the sample space reduces to S = {(3,1) , (3,2) , (3,3) , (3,4) , (3,5) , (3,6)}

As all the outcomes are equally likely, our desired probability will be 1/6.

This conditional probability of E given that F has occurred is denoted by P(E|F)

We can see that our reduced smaple space is in fact the intersection of our two events E and F.

Hence, P(E|F) = P(EF)/P(F)


Q5. There are 5 defective (fails immediately), 10 partially defective (fails after few hours) and
25 acceptable transistors. A transistor is chosen at random and tested. If it does not immediately
fail, what is the probability that it is acceptable?

Ans. P(acceptable | not defective) = P(acceptable, not defective) / P(not defective)

= P(acceptable) / P(not defective)

(Acceptable and not defective implies acceptable)

= 25/35

Q6. There is a father-daughter dinner to be held at Sharma‟s workplace for all employees having
at least 1 daughter. If Sharma is invited, what is the probability that he has 2 daughters?

Ans. Since Sharma is invited, we know that he has at least 1 daughter. Here, the sample space
reduces from S = {(g,g) , (g,b) , (b,g) , (b,b)} to S‟ = {(g,g) , (g,b) , (b,g)}

Let A denote event that at at least one is a girl and B denote both girls.

Probability and Stochastic Processes

P(B|A) = P(BA) / P(A)


1.7 Bayes’ Formula

P(E) = P(E|F)P(F) + P(E|FC)[1-P(F)]

Bayes‟ formula states that the probability of an event E is a weighted average of the conditional
probability of E given that F has occurred and the conditional probability of E given that F has
not occurred.


Let E and F be two events.

E can be written as E=EF U EFC

P(E) = P(EF) + P(EFC)

= P(E|F)P(F) + P(E|FC)P(FC)

= P(E|F)P(F) + P(E|FC)[1-P(F)]

Formulae to remember:

1. P(E|F) = P(EF)/P(F)

2. P(E) = P(E|F)P(F) + P(E|FC)[1-P(F)]

3. P(E|F) = P(F|E)P(E) / P(F|E)P(E) + P(F|EC)P(EC)

Q7. An insurance company has statistics that show that an accident-prone person will have an
accident at some time within a fixed 1-year period with probability .4, whereas this probability
decreases to .2 for a non-accident-prone person. If we assume that 30 percent of the population is
accident prone, what is the probability that a new policy holder will have an accident within a
year of purchasing a policy? If he has an accident within a year of purchasing his policy, what is the
new probability that he is accident prone?

Ans. Let A denote the person to have an accident within a year. Let B denote that he is accident
prone. Hence BC denotes he isn‟t accident prone.

P(A) = P(A|B)P(B) + P(A| BC)P(BC)

= (0.4)(0.3) + (0.2)(0.7)

= 0.26

Probability and Stochastic Processes

P(B|A) = P(BA) / P(A)

= P(A|B)P(B) / P(A)

= (0.3)(0.4) / 0.26


Q8. A laboratory blood test is 99 percent effective in detecting a certain disease when it is, in
fact, present. However, the test also yields a “false positive” result for 1 percent of the healthy
persons tested. If .5 percent of the population actually has the disease, what is the probability a
person has the disease given that his test result is positive?

Ans. Let D be the event that the tested person has the disease and E the event that his test result
is positive. The desired probability is

P(D|E) = P(DE) P(E)

= P(E|D)P(D) / (P(E|D)P(D) + P(E|Dc)P(Dc) )

= (.99)(.005) / ((.99)(.005) + (.01)(.995))

= .3322

2. Probability Distribution

2.1 Introduction to probability distribution

A probability distribution is a table or an equation that links each outcome of a statistical

experiment with its probability of occurrence.

Let us first consider a random variable. Those quantities of interest that are determined by the
result of the experiment are known as random variables.

 Random variables whose set of possible values can be written either as a finite sequence
x1, ... , xn, or as an infinite sequence x1, ... are said to be discrete. Eg. Number of errors
in a page.
 If the random variables take on a continuum of possible values, these are known as
continuous random variables. Eg. Lifetime of a car.
 For a discrete random variable X , we define the probability mass function p(a) of X by

p(a) = P{X = a}

Probability and Stochastic Processes


p(xi) > 0, i = 1, 2, ... and,

∑i=1 to inf p(xi) = 1

 For a continuous random variable X, we define probability density function p(b)

P{X ∈ B} = ∫B f (x) dx


1 = P{X ∈ (-inf, +inf)} = ∫-inf to +inf f (x) dx and

P{X = a} = ∫a to a f (x) dx = 0 ie, at any particular value, the probability is assumed as 0.

2.2 Expectation

If X is a discrete random variable taking on the possible values x1, x2, ... , then the expectation
or expected value of X , denoted by E[X ], is defined by

E[X] = ∑i xiP{X = xi}

The expected value of X is a weighted average of the possible values that X can take on, each
value being weighted by the probability that X assumes it.

 For instance, if the probability mass function of X is given by p(0) = 1/3 , p(1) = 2/3,

E[X] = 0*(1/3) + 1*(2/3)

= 2/3

 Even though we call E[X] the expectation of X , it should not be interpreted as the value
that we expect X to have but rather as the average value of X in a large number of
repetitions of the experiment. The concept of expectation is analogous to the physical
concept of the center of gravity of a distribution of mass.
 If we roll a fair dice and calculate the expectation,

E[X] = 1*(1/6) + 2*(1/6) + 3*(1/6) + 4*(1/6) + 5*(1/6) + 6*(1/6)

= 7/2

This means, as the number of rolls tend to infnity, the outcomes will be approximately 7/2.

Probability and Stochastic Processes

2.2.1 Properties of expectation

 E[aX + b] = aE[X] + b

 E[X + Y ] = E[X] + E[Y]
 E[XY] = E[X]E[Y] , if X & Y are independent.

2.3 Variance

The possible variation of X could be determined by looking at how far apart X is from its mean
on the average.

If X is a random variable with mean μ, then the variance of X, denoted by Var(X ), is defined by

Var(X) = E[(X − μ)2]

Var(X) = E[X2] − (E[X]) 2

Ie, the variance of X is equal to the expected value of the square of X minus the square of the
expected value of X.

For a rolling of dice, the variance would be computed as follows

E[X2] = 12*(1/6) + 22*(1/6) + 32*(1/6) + 42*(1/6) + 52*(1/6) + 62*(1/6)

= 91/6

We know that E[X] = 7/2. Hence, E[X]2 = 49/4

Var(X) = E[X2] − (E[X]) 2

= (91/6) – (49/4)

= 35/12

2.3.1 Properties of variance

 Var(aX + b) = a2 Var(X )
 The quantity sqrt(Var(X )) is called the standard deviation of X .
Probability and Stochastic Processes

 The covariance of two random variables X and Y , written Cov(X, Y), is defined by
Cov(X, Y) = E[(X − μx )(Y − μy)]
where μx and μy are the means of X and Y , respectively.
 Cov(X, Y) = E[XY] − E[X]E[Y]
 Cov(X, X) = Var(X)
 Cov(X + Z, Y) = Cov(X, Y) + Cov(Z, Y)
 Var(X + Y) = Var(X) + Var(Y) + 2Cov(X , Y)
 The strength of the relationship between X and Y is indicated by the correlation between
X and Y, a dimensionless quantity obtained by dividing the covariance by the product of
the standard deviations of X and Y.

Corr(X,Y) = Cov(X,Y) / sqrt(Var(X)Var(Y))

2.4 Few probability distributions

As mentioned, a probability distribution is a table or an equation that links each outcome of a

statistical experiment with its probability of occurrence. We shall go through 6 probability
distributions essential for data analysis.

2.4.1 Bernoulli Distribution

A Bernoulli distribution has only two possible outcomes, namely 1 (success) and 0 (failure), and
a single trial. So the random variable X which has a Bernoulli distribution can take value 1 with
the probability of success, say p, and the value 0 with the probability of failure, say q or 1-p.

 The probability mass function is given by: px(1-p)1-x where x belongs to (0, 1).

 The probabilities of success and failure need not be equally likely

 The expectation is given by: E(X) = 1*p + 0*(1-p) = p

 The variance is given by: V(X) = E(X²) – [E(X)]² = p – p² = p(1-p)

Probability and Stochastic Processes

Eg. Winning or losing a match.

2.4.2 Uniform Distribution

When the probabilities of all the possible outcomes are equally likely, it is said to follow uniform

 The probability density function

is f(x) = 1 / (b-a) for –inf<a≤x≤b<+inf
 a and b are the parameters
 Expectation is given by E(X) = (a+b) / 2
 Variance is given by V(X) = (b-a)2 / 12
 The standard function has parameters a = 0 and b = 1.

Eg. Rolling of a dice

Q1. The number of bouquets sold daily at a flower shop is uniformly distributed with a
maximum of 40 and a minimum of 10. Calculate the probability that the daily sales will fall
between 15 and 30. What is the probability that the daily sales will be greater than 20?

Ans. a = 10, b = 40

P(between 15 and 30) = (30-15) * 1/(40-10)

= 0.5

P(greater than 20) = (40-20) * 1/(40-10)

= 0.67

2.4.3 Binomial Distribution

A distribution where only two outcomes are possible, such as success or failure, and where the
probability of success and failure is same for all the trials is called a Binomial Distribution.

Probability and Stochastic Processes

 The outcomes need not be equally likely.

 Each trial is independent

 The probability mass function is given by: P(X) = nCx pxqn-x

 The expectation is given by E(X) = n*p

 The variance is given by V(X) = n*p*q

Q2. Lets consider the same coin toss numerical from last chapter.

Flip a fair coin three times. What is the probability that you get

a. exactly 3 heads

b. exactly 2 heads

c. at most 2 heads

d. at least 2 heads


There are 2 possible outcomes. Let H denote success and T denote failure. Here, p = q = 1/2

a. P(exactly 3H) = nCx pxqn-x

= 3C3 (1/2)^3 (1/2)^0

= 1/8

b. P(exactly 2H) = 3C2 (1/2)^2 (1/2)^1

= 3/8

Probability and Stochastic Processes

c. P(at most 2H) = P(0H) + P(1H) + P(2H)

= 3C0 (1/2)^0 (1/2)^3 + 3C1 (1/2)^1 (1/2)^2 + 3C2 (1/2)^2 (1/2)^1

= 1/8 + 3/8 + 3/8

= 7/8


= 1 – P(3H)

= 1 - 3C3 (1/2)^3 (1/2)^0

= 1 – 1/8


d. P(at least 2H) = P(2H) + P(3H)

= 3C2 (1/2)^2 (1/2)^1 + 3C3 (1/2)^3 (1/2)^0

= 3/8 + 1/8

= 1/2

2.4.4 Normal Distribution

Normal distribution represents the behavior of most of the situations in the universe. Any
distribution is known as Normal distribution if it has the following characteristics:

1. The mean, median and mode of the distribution coincide.

2. The curve of the distribution is bell-shaped and symmetrical about the line x=μ.
3. Exactly half of the values are to the left of the center and the other half to the right.

 The probability density function is given by

Probability and Stochastic Processes

 The expectation is given by μ, the mean.

 The variance is given by σ^2, σ is the standard deviation.
 µ and σ are the parameters.
 A standard normal distribution is defined as the distribution with mean 0 and standard
deviation 1.
 The empirical rule tells you what percentage of your data falls within a certain number
of standard deviations from the mean:
• 68% of the data falls within one standard deviation of the mean.
• 95% of the data falls within two standard deviations of the mean.
• 99.7% of the data falls within three standard deviations of the mean.

2.4.5 Poisson Distribution

Poisson Distribution is applicable in situations where events occur at random points of time and
space wherein our interest lies only in the number of occurrences of the event.

A distribution is called Poisson distribution when the following assumptions are valid:

1. Any successful event should not influence the outcome of another successful event.
2. The probability of success over a short interval must equal the probability of success over a
longer interval.
3. The probability of success in an interval approaches zero as the interval becomes smaller.

Let µ denote the mean number of events in an interval of length t. Then, µ = λ*t, where λ denotes
the rate at which event occurs. When t = 1 unit, µ = λ

 The probability mass function is given by

 The expectation is given by µ

 The variance is given by µ

Probability and Stochastic Processes

2.4.6 Exponential distribution

Exponential distribution is widely used for survival analysis.

 The Probability density function is given by f(x) = λe-λx, x ≥ 0 and parameter λ>0 which
is called the rate.
 For survival analysis, λ is called the failure rate of a device at any time t, given that it has
survived up to t.
 The expectation is given by 1/ λ
 The variance is given by (1/ λ)2

2.4.7 Geometric Distribution

It is a discrete analog of the exponential distribution.

 The geometric distribution is a discete distribution for , 1, 2,... having probabaility

mass function

 The expectation is given by (1-p) / p

 The variance is given by (1-p) / p2

Probability and Stochastic Processes

Topics to be covered:

Hypergeometric Distribution
Central Limit Theorem: Refer YouTube videos
Least square method
Chi-square statistic: Refer „‟
Hypothesis testing, Type I and Type II errors: Refer „‟
Markov chains: Refer „‟, visually explained, also, PBS Infinite series, YouTube
Numericals: Refer Statistics text books, Khan Academy, YouTube


Bayesian vs frequentist statistics: YouTube

Continuous and Discrete infinities: one, two, three to infinity by George Gamow
Conditional Probability: Balls in a box lecture, B V Rao, YouTube
Random walk: PBS Infinite series, YouTube
Sample Selection: „‟


1. Introduction to Probability and Statistics, Sheldon M Ross

2., a blog

3. Khan Academy, YouTube


5. PBS Infinite series, YouTube