Science
Lecture 1: Probability - REVISION
Logistic details
Dr. Van Khanh Nguyen
MS. Quoc Le
Textbooks:
A first course in probability (Sheldon Ross)
Probability and Computing – Randomized
Algorithm and Probabilistic Analysis (Upfal
et. al.)
Probability
Mathematical tools to deal with uncertain
events.
Applications include:
Web search engine: Markov chain theory
Data Mining, Machine Learning: Data mining,
Machine learning: Stochastic gradient, Markov
chain Monte Carlo,
Image processing: Markov random fields,
Design of wireless communication systems:
random matrix theory,
Optimization of engineering processes:
simulated annealing, genetic algorithms,
Finance (option pricing, volatility models): Monte
Carlo, dynamic models,Design of atomic bomb
(Los Alamos): Markov chain Monte Carlo.
Plan of the course
Probability
Combinatorial analysis; i.e. counting
Axioms of probability
Conditional probability and inference
Discrete & continuous random variables
Multivariate random variables
Properties of expectation, generating function
Additional topics: Poisson and Markov processes
Simulation and Monte Carlo methods
Applications
Application: Verifying Polynomial
Identities
Computers can make mistakes:
Incorrect programming
Hardware failures
sometimes, use randomness to check output
Example: we want to check a program that multiplies
together monomials
E.g.: (x+1)(x-2)(x+3)(x-4)(x+5)(x-6) ?= x6-7x3+25
5 Probability for CS
How to use randomness
Assume the max degree of F & G is d. Use this algorithm:
Pick a uniform random number from:
{1,2,3, … 100d}
Check if F(r)=G(r) then output “equivalent”, otherwise
“non-equivalent”
Note: this is much faster than the previous way – O(d) vs.
O(d2)
One-sided error:
“non-equivalent” always true
“equivalent” can be wrong
How it can be wrong:
If accidentally picked up a root of F(x)-G(x) = 0
This can occur with probability at most 1/100
6 Probability for CS
Axioms of probability
We need a formal mathematical setting for
analyzing the randomized space
Any probabilistic statement must refer to the underlying
probability space
Definition 1: A probability space has three
components:
A sample space , which is the set of all possible
outcomes of the random process modeled by the
probability space
A family of sets representing the allowable events,
where each set in Φ is a subset of the sample space
and
A probability function Pr: Φ ◊R satisfying definition 2
below
An element of Ω is called a simple or elementary event
In the randomized algo for verifying polynomial
identities, the sample space is the set of
7 integers {1,…100d}.
Probability for CS
10 Probability for CS
How to improve the algo for
smaller failure probability?
Can increase the sample space
E.g. {1,…, 1000d}
Repeat the algo multiple times, using
different random values to test
If F(r)=G(r) for just one of these many
rounds then output “non-equivalent”
Can sample from {1,…100d} many times
with or without replacements
11 Probability for CS
Notion of independence
Def3: Two events E and F are independent iff (if and only if)
Pr(EF)= Pr(E) . Pr(F)
12 Probability for CS
Notion of conditional
probability
Def 4: The condition probability that
event E occurs given that event F
occurs is
Pr(E|F) = Pr(EF) /Pr(F)
Note this con. pro. only defined if Pr(F)>0
When E and F are independent and Pr(F)>0
then
Pr(E|F) = Pr(EF) /Pr(F) = Pr(E).Pr(F) /Pr(F) = Pr(E)
Intuitively, if two events are independent
then information about one event should
not affect the probability of the other
event.
13 Probability for CS
Sampling without
replacement
Again assume FG
We repeat the algorithm k times: perform k
iterations of random sampling from [1,…100d]
What is the prob that all k iterations yield roots of
F-G, resulting in a wrong output by our algo?
Need to bound Pr(E1 E2 … Ek)
Pr(E E … E )= Pr(E |E … E
1 2 k k 1 k-1 ) . Pr(E1 E2 …
Ek-1 )
= Pr(E ). Pr(E |E ). Pr(Pr(E |E E ). … Pr(E |E … E
1 1 2 3 1 2 k 1 k-1 )
Need to bound Pr(Ej|E1 … Ekj1 ): d-(j-1) /100d-(j-1)
So Pr(E E … E ) Π
100d-(j-1) ( /100 ) ,
d-(j-1) / 1 k
1 2 k j=1,k
slightly better
Use d+1 Probability
14
iterations: for CS always give correct answer. Why?
Efficient?
Random variables
Def 5: A random variable X on a sample
space Ω is a real-valued function on Ω ;
that is X: Ω R. A discrete random
variable is a random variable that takes
on only finite or countably infinite
number of values
So, “X=a” represents the set {s |X(s)=a}
Pr(X=a) = X(s)=a Pr(s)
Eg. Let X is the random variable representing
the sum of the two dice. What is the prob of
X=4?
15 Probability for CS
Random variables
Def6: Two random variables X and Y are
independent iff for all values x and y:
Pr( (X=x)(Y=y)) = Pr(X=x). Pr(Y=y)
16 Probability for CS
Expectation
Def 7: The expectation of a discrete random
variable X, denoted by E[X] is given by
E[X] = iiPr(X=i)
where the summation is over all values in
range of X
E.g Compute the expectation of the random
variable X representing the sum of two
dice
17 Probability for CS
Linearity of expectation
Theorem:
E[i=1,n Xi] = i=1,n E[Xi]
E[c X] = c E[X] for all constant c
18 Probability for CS
Bernoulli and Binomial random
variables
19 Probability for CS
The hiring problem
HIRE-ASSISTANT(n)
1 best←0
2 for i←1 to n
3 do interview candidate i
4 if candidate i is better than candidate best
5 then best←i
6 hire candidate i
20 Probability for CS
Cost Analysis
We are not concerned with the running time
of HIRE-ASSISTANT, but instead with the
cost incurred by interviewing and hiring.
Interviewing has low cost, say ci, whereas
hiring is expensive, costing ch. Let m be
the number of people hired. Then the cost
associated with this algorithm is O
(nci+mch). No matter how many people
we hire, we always interview n candidates
and thus always incur the cost nci,
associated with interviewing.
21 Probability for CS
Worst-case analysis
In the worst case, we actually hire every
candidate that we interview. This situation
occurs if the candidates come in increasing
order of quality, in which case we hire n
times, for a total hiring cost of O(nch).
22 Probability for CS
Probabilistic analysis
Probabilistic analysis is the use of
probability in the analysis of problems.
In order to perform a probabilistic
analysis, we must use knowledge of the
distribution of the inputs.
For the hiring problem, we can assume
that the applicants come in a random
order.
23 Probability for CS
Randomized algorithm
24 Probability for CS
Indicator random variables
The indicator random variable I[A]
associated with event A is defined as
1 i f A occurs
I [ A] =
0 i f A does not occur
•Lemma: Given a sample space and an
event A in the sample space , let
XA=I{A}. Then E[XA]=Pr(A).
25 Probability for CS
Analysis of the hiring problem
using indicator random variables
Let X be the random variable whose
value equals the number of times we
hire a new office assistant and Xi be the
indicator random variable associated
with the event in which the ith
candidate is hired. Thus,
X=X +X +…+X
1 2 n
2 best←0
3 for i←1 to n
4 do interview candidate i
5 if candidate i is better than
candidate best
6 then best←i
7 hire candidate i
27 Probability for CS
PERMUTE-BY-SORTING(A)
1 n←length[A]
2 for i←1 to n
3 do P[i] ←RANDOM(1,n3)
4 sort A, using P as sort keys
5 return A
28 Probability for CS
RANDOMIZE-IN-PLACE(A)
1 n←length[A]
2 for i←1 to n
29 Probability for CS