Anda di halaman 1dari 28

Lecture:2

Sample Space

The set 𝑺 is a sample space for an experiment if every physical outcome of


the experiment refers to a unique element of 𝑺.

In effect two requirements are embodied in the foregoing definition.


(1) Every physical outcome of the experiment must refer to some element in
the sample space.
(2) The uniqueness condition means that each physical outcome must refer
to only one element in the sample space.

# Suppose a box contains 100 items of a particular sort, say 100 capacitors, and each capacitor has
a unique production number running from 1101 to 1200. if an experiment consists of randomly
selecting a single capacitor from the box , then an appropriate sample space would be
𝑺𝟏 = {𝟏𝟏𝟏𝟏, 𝟏𝟏𝟏𝟏, … . . , 𝟏𝟏𝟏𝟏}
It would also be appropriate to employ the sample space
𝑺𝟐 = {𝟏𝟏𝟏𝟏, 𝟏𝟏𝟏𝟏, 𝟏𝟏𝟏𝟏, … . . , 𝟏𝟏𝟏𝟏}
One might argue that 𝑺𝟐 is less suitable from a modeling perspective, since no physical observation
will correspond to the element 1100; nevertheless , 𝑺𝟐 still satisfies the two necessary modeling
requirements. Insofar as probability is concerned, the probability of choosing 1100 will eventually
be set to zero. A set that cannot serve as a sample space is
𝑺𝟑 = {𝟏𝟏𝟏𝟏, 𝟏𝟏𝟏𝟏, … . . , 𝟏𝟏𝟏𝟏}
since no element in 𝑺𝟑 corresponds to the selection of the capacitor with production number 1200.
The elements of a sample space are called outcomes. An outcome is a logical entity and refers only to
the manner in which the phenomena are viewed by the experimenter. For instance in the foregoing
example, if
𝑺𝟒 = {𝟎. 𝟎𝟎𝟎𝝁𝝁, 𝟎. 𝟎𝟎𝟎𝝁𝝁}

Then only two outcomes are realized. While there might be all sorts of information available regarding
the chosen capacitor, once 𝑺𝟒 has been chosen as the sample space, inly the measured capacitance is
relevant, since only its observation will result in an outcome ( relative to 𝑺𝟒 ).

Events
In most probability problems, the investigator is interested not merely in the collection of outcomes
but in some subset of the sample space. A subset of a sample space is known as an event. Two
events that do not intersects are said to be mutually exclusive (disjoint). More generally, the events
𝑬𝟏 , 𝑬𝟐 … , 𝑬𝒏 are said to be mutually exclusive if
𝑬𝒊 ∩ 𝑬𝒋 = ∅
For any 𝒊 ≠ 𝒋, ∅ denoting empty set.
Probability( Modeling Random Processes for Engineers and Managers, James J. Solberg, John-Wiley 7 Sons Inc., 2009)
When the “probability of an event’ is spoken of in everyday language , almost everyone has a rough
idea of what is meant. It is fortunate that this is so, because it would be quite difficult to introduce the
concept to someone who had never considered it before. There are at least three distinct ways to
approach the subject, none of which is wholly satisfying.

The first to appear, historically , was the frequency concept. If an experiment were to be repeated many
times, then the number of times that event was observed to occur, divided by the number of times that
the experiment was conducted, would approach a number that was defined to be the probability of the
event. The ratio of the number of chances for success out of the total number of possibilities is the
concept with most elementary treatment of probability start. This definition proved to be somewhat
limiting, however, because circumstances frequently prohibit repetition of an experiment under
precisely the same conditions, even conceptually. Imagine trying to determine the probability of global
annihilation from meteor collision.

To extend the notion of probability to a wider class applications, a second approach involving the idea
of “ subjective” probabilities emerged. According to this idea, the probability of an event need not
relate to the frequency with which it would occur in an infinite number of trials; it is just a measure of
the degree of likelihood we believe the event to possess. This definition covers even the hypothetical
events, but seems a bit too loose for engineering applications. Different people could attach different
probabilities to the same event.

Most modern texts use the third concept, which relies upon axiomatic definition. According to this
notion, probabilities are just elements of an abstract mathematical system obeying certain axioms. This
notion is at once the most powerful and the most devoid of real world meaning. Of course, the axioms
are not purely arbitrary; they were selected to be consistent with the earlier concepts of probabilities
and to provide them with all of the properties everyone would agree they should have.
We will go with the formal axiomatic system , so that we can be rigorous in the mathematics. We want
to be able to calculate probabilities to assist in making good decisions. At the same time, we want to
bear in mind the real world interpretation of probabilities as measures of the likelihood of events in the
world. The whole point of learning the mathematics is to be able to use it in everyday life.

A probability is a function , 𝑷 . 𝒎𝒎𝒎𝒎𝒎𝒎𝒎 𝒆𝒆𝒆𝒆𝒆𝒆 𝒐𝒐𝒐𝒐 𝒓𝒓𝒓𝒓 𝒏𝒏𝒏𝒏𝒏𝒏𝒏, 𝒂𝒂𝒂 𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔

1. 𝟎 ≤ 𝑷 𝑨 ≤ 𝟏, 𝒇𝒇𝒇 𝒂𝒂𝒂 𝒆𝒆𝒆𝒆𝒆 𝑨.

2. 𝑷 𝑺 = 𝟏, 𝒘𝒘𝒘𝒘𝒘 𝑺 𝒊𝒊 𝒕𝒕𝒕 𝒘𝒘𝒘𝒘𝒘 𝒔𝒔𝒔𝒔𝒔𝒔 𝒔𝒔𝒔𝒔𝒔, 𝒐𝒐 𝒕𝒕𝒕 certain𝒆𝒆𝒆𝒆𝒆

3. 𝑰𝑰 𝑬𝟏 , 𝑬𝟐 , … . , 𝑬𝒊 , … .
𝒊𝒊 𝒂𝒂𝒂 𝒇𝒇𝒇𝒇𝒇𝒇 𝒐𝒐 𝒊𝒊𝒊𝒊𝒊𝒊𝒊𝒊 𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄 𝒐𝒐 𝒎𝒎𝒎𝒎𝒎𝒎𝒎𝒎 𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆 𝒆𝒆𝒆𝒆𝒆𝒆, 𝒕𝒕𝒕𝒕
𝑷 ⋃∞
𝒊=𝟏 𝑬𝒊 =𝑷 𝑬𝟏 + 𝑷 𝑬𝟐 + ⋯ … . .

Once 𝑺 has been endowed with a probability measure , 𝑺 is called a probability space.
Some of the additional basic laws of probability (which could be proved from the foregoing ) are:
4. 𝑷 ∅ = 𝟎 𝑤𝑤𝑤𝑤𝑤 ∅ 𝑖𝑖 𝑡𝑡𝑡 𝑒𝑒𝑒𝑒𝑒 𝑠𝑠𝑠 𝑜𝑜 𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖 𝑒𝑒𝑒𝑒𝑒.
5. 𝑷 𝑨� =𝟏 − 𝑷 𝑨 . In other words, the probability that an event does not occur is 1 minus the
probability that it does occur.
6. 𝑷 𝑨 ∪ 𝑩 = 𝑷 𝑨 + 𝑷 𝑩 − 𝑷(𝑨 ∩ 𝑩), for any two events, 𝑨 and 𝑩. When the events are not
mutually exclusive ( when there is some possibility for both A and B to occur) then one has to subtract
off the probability that they both occur.
7. 𝑷 𝑨 𝑩 = 𝑷(𝑨 ∩ 𝑩)/𝑷(𝑩) provided 𝑃(𝐵) ≠ 0. This “ basic law” is , in reality , a definition of the
conditional probability of an event, 𝑨, given that another event , 𝑩, has occurred.
8. 𝑷 𝑨 𝑩 = 𝑷(𝑨) if and only if 𝐴 and 𝐵 are independent. This rule can be taken as the formal
definition of independence.
9. 𝑷 𝑨 ∩ 𝑩 = 𝑷 𝑨 𝑷(𝑩) if and only if 𝑨 and 𝑩 are independent.

A set of events 𝑩𝟏 , 𝑩𝟐 , … , 𝑩𝒏 constitute a partition of the sample space 𝑺 if they are mutually
exclusive and collectively exhaustive, that is ,
𝑩𝒊 ∩ 𝑩𝒋 =∅ for every pair 𝑖 and 𝑗

and
⋃𝒏𝒊=𝟏 𝑩𝒊 =𝑺
In simple terms, a partition is just any way of grouping and listing all possible outcomes such that no
outcome appears in more than one group. When the experiment is performed , one and only one of
the 𝑩𝒊 will occur.
10. 𝑷 𝑨 = ∑𝒊 𝑷 𝑨 𝑩𝒊 𝑷(𝑩𝒊 ) for any partition 𝑩𝒊 , 𝒊 = 𝟏, 𝟐, 𝟑, … 𝒏. This is one of the most useful
relationship in modeling applications. It one expression of the so called law of total probability.
Counting
Given a finite sample space
𝑺 = 𝒘𝟏 , 𝒘𝟐 , … . , 𝒘𝒏
of cardinality 𝒏, the hypothesis of equal probability is the assumption that
the physical conditions are such that each outcomes in 𝑺 possesses equal
probability:
𝑷 𝒘𝟏 = 𝑷 𝒘𝟐 =…….. 𝑷 𝒘𝒏 = 𝟏⁄𝒏
In such a case , the probability space is said to be equi-probable.

Given an urn containing 𝒏 numbered balls, we consider three selection


protocols:
1. Selection with replacements, order counts: For 𝒌 > 𝟎, 𝒌 balls are selected one at a
time, each chosen ball is returned to the urn, and the numbers of the chosen balls are recorded
in the order of selection.
2. Selection without replacement, order counts: For 𝟎 < 𝒌 ≤ 𝒏, 𝒌 balls are selected,
a chosen ball is not returned to the urn, and the numbers of the chosen balls are recorded in
the order of selection.
3. Selection without replacement, order does not count: : For 𝟎 < 𝒌 ≤ 𝒏, 𝒌 balls
are selected, a chosen ball is not returned to the urn, and the numbers of the chosen balls are
recorded without respect to the order of selection. Note that this process is equivalent to
selecting all 𝒌 balls at once.
1. Ordered Selection with Replacement
If 𝑵 = 𝟏, 𝟐, 𝟑, … , 𝒏 gives the number of the balls in the urn, then each outcome
resulting from the selection process is of the form

𝒏𝟏 , 𝒏𝟐 , … . , 𝒏𝒌 ; 𝒏𝒊 𝒊𝒊 𝒕𝒕𝒕 𝒊𝒊𝒊𝒊𝒊𝒊𝒊 𝒊𝒊𝒊𝒊𝒊𝒊𝒊𝒊𝒊𝒊𝒊 𝒃𝒃𝒃𝒃𝒃𝒃𝒃 𝟏 𝒂𝒂𝒂 𝒏.

The number of possible selection is 𝒏𝒌 .

#In deciding the format for a memory word in a new computer, the designer decides on a length of 16
bits. Since each bit can be 0 or 1, the problem of deciding on the number of possible words can be
modeled as making 16 selections from an urn containing 2 balls. Thus there are 𝟐𝟏𝟏 = 𝟔𝟔, 𝟓𝟓𝟓 possible
words.

2. Ordered Selection without Replacement- (Permutation)


In counting the number of possible selection processes without replacement for
which order counts, we are counting permutations.
# To get an idea of the counting problem , consider the set 𝑨 = 𝟏, 𝟐, 𝟑, 𝟒 . The set of permutation of
two objects from 𝑨 is
𝑸 = 𝟏, 𝟐 , 𝟏, 𝟑 , 𝟏, 𝟒 , 𝟐, 𝟏 , 𝟐, 𝟑 , 𝟐, 𝟒 , 𝟑, 𝟏 , 𝟑, 𝟐 , 𝟑, 𝟒 , 𝟒, 𝟏 , 𝟒, 𝟐 , (𝟒, 𝟑)
So that there are 12 permutation. Because of the non-replacement requirement, there are exactly three
ordered pairs in 𝑸 for each choice of a first component value. Thus , there is a total of 12=4X3
permutation.
If 𝑆 is a set containing 𝑛 elements and 0 < 𝑘 ≤ 𝑛 , then there exist
𝑛 𝑛 − 1 𝑛 − 2 … … … (𝑛 − 𝑘 + 1)
permutations of 𝑘 elements from 𝑆. Letting 𝑃(𝑛, 𝑘) denote the number of
permutations and employing factorials,
𝑛!
𝑃 𝑛. 𝑘 =
𝑛−𝑘 !
#Consider an alphabet consisting of 9 distinct symbols from which strings of length 4 that do not use the
same symbol twice are to be formed. Each string is a permutation of 9 objects taken 4 at a time, and thus
there are
𝑷 𝟗, 𝟒 = 𝟗 × 𝟖 × 𝟕 × 𝟔 = 𝟑𝟑𝟑𝟑
possible passwords.

Now suppose the 4 symbols are chosen uniformly randomly with replacement. What is the probability
that a string will be formed in which no symbol is utilized more than once? Let E denote the event
consisting of all words with no symbol appearing more than once, then the desired probability is
𝑷(𝟗,𝟒)
𝑷 𝑬 = =0.461
𝟗𝟒

Fundamental Principle of Counting


Consider the problem of counting 𝒌 −tuples formed according to the following scheme:
1. The first component of the 𝒌 −tuples can be occupied by any one of the 𝒓𝟏 𝐨𝐨𝐨𝐨𝐨𝐨𝐨.
2. No matter which object has been chosen to occupy the first component, any one of the 𝒓𝟐 objects
can occupy the second component.
3. Proceeding recursively, no matter which objects have been chosen to occupy the first 𝒋 −
𝟏components, any one of the 𝒓𝒋 objects can occupy the 𝒋 −th component.
The fundamental principle of counting states that there are 𝑟1 𝑟2 𝑟3 … 𝑟𝑘 possible
𝑘 −tuples that can result from application of the selection scheme . Equivalently
𝐸 = 𝑟1 𝑟2 𝑟3 … 𝑟𝑘
Following Fig.1 illustrates the fundamental principle of counting using a tree
diagram.

Fig.1

Here 4 possible branches can be chosen for the first selection, 2 for the second,
and 2 for the third. As a result , the tree contains 4 × 2 × 2 = 16 𝑓𝑓𝑓𝑓𝑓 𝑛𝑛𝑛𝑛𝑛.
It is crucial to note that at each of three stages (selections) of the tree, the
number of branches emanating from the nodes is the same. ; otherwise as
Illustrated in Fig.2 , the multiplication technique of the fundamental principal
does not apply. The requirement that there be a constant number of emanating
branches at each stage corresponds to the condition in the selection protocol
that , at each component , the number of possible choices for the component is
fixed and does not depend on the particular objects chosen to fill the preceding
components.

Fig.2

Unordered selection Without replacement-Combination


When selecting balls without replacement from an urn without regard to the
order of selection, the result of the procedure consists of a list of non-repeated
elements, the ordering in the list being irrelevant. A list of elements for which
the order of the listing can be interchanged without affecting the list is simply a
set. Thus the outcomes are subsets of the set from which the elements are
being chosen. Employing the urn model terminology , each outcome is a subset
consisting of 𝒌 balls from the original collection of 𝒏 balls in the urn.
Using the set 𝐴 = 1,2,3,4 , the set of all possible subsets that result from selecting
two elements according to the unordered, without replacement protocol is the set
of sets
{ 1,2 1,3 , 1,4 , 2,3 , 2,4 , 3,4 }
Each subset resulting from the unordered , without replacement protocol is called a
combination (of 𝑛 objects taken 𝑘 at a time)- 𝑛𝑘𝐶 or 𝐶 𝑛, 𝑘 .

Again consider the set 𝐴. It can be readily seen that there are two 2-tuple
permutations for each 2-element combination. Thus , each 2-element subset from
𝐴 yields 2! =2 permutations. This reasoning resulting in
𝑃 𝑛, 𝑘 = 𝑘! 𝐶 𝑛, 𝑘 .
Or
𝑃(𝑛,𝑘) 𝑛!
𝐶 𝑛, 𝑘 = =
𝑘! 𝑘! 𝑛−𝑘 !
DISCRETE RANDOM VARIABLES AND THEIR DISTRIBUTIONS
(Probability and statistics for computer scientists- Michael Baron, Chapman & Hall/CRC, 2007.)
A random variable is a function of an outcome,
𝑿=𝒇 𝝎 .
In other words , it is a quantity that depends on chance. The domain of the random
variable is the sample space. Its range can be the set of all real numbers 𝑹, or only
the positive numbers 𝟎, +∞ , 𝒐𝒐 the integers 𝒁, or the interval (𝟎, 𝟏) , etc.,
depending on what possible values the random variable can potentially take.

Once an experiment is completed , and the outcome 𝝎 is known , the value of


random variable 𝑿(𝝎) becomes determined.

# Consider an experiment of tossing 3 fair coins and counting the number of heads. Certainly, the same
model suits the number of girls in a family with 3 children, the number of 1’s in a random binary code
consisting of 3 characters, etc.

Let 𝑿 be the number of heads 9 girl’s, 1’s ) . Prior to an experiment, its value is not known. All we can
say is that 𝑿 has to be an integer between 0 and 3. Since assuming value is an event, we can compute
probabilities,
𝟏 𝟏 𝟏 𝟏
𝑷 𝑿 = 𝟎 = 𝑷 𝒕𝒕𝒕𝒕𝒕 𝒕𝒕𝒕𝒕𝒕 = 𝑷 𝑻𝑻𝑻 = . . =
𝟐 𝟐 𝟐 𝟖
3
𝑃 𝑋 = 1 = 𝑃 𝐻𝐻𝐻 + 𝑃 𝑇𝑇𝑇 + 𝑃 𝑇𝑇𝑇 =
8
3
𝑃 𝑋 = 2 = 𝑃 𝐻𝐻𝑇 + 𝑃 𝐻𝐻𝐻 + 𝑃 𝑇𝐻𝐻 =
8
1
𝑃 𝑋 = 3 = 𝑃 𝐻𝐻𝐻 =
8
Summarizing,

𝒙 𝑷{𝑿 = 𝒙}
0 1/8
1 3/8
2 3/8
3 1/8
Total 1

This table contains everything that is known about random variable 𝑿 prior to the experiment.
Before we know the outcome 𝝎, we cannot tell what 𝑿 equals to. However, we cam list all the
possible values of 𝑿 and determine the corresponding probabilities.
Definition

Collection of all probabilities related to 𝑿 is the distribution of 𝑿. The function


𝑷 𝒙 =𝑷 𝑿=𝒙
Is the probability mass function (pmf). The cumulative distribution function (cdf) is defined
as
𝑭 𝒙 = 𝑷 𝑿 ≤ 𝒙 = � 𝑷(𝒚)
𝒚≤𝒙

Recall that one way to compute the probability of an event is to add probabilities of all the
outcomes in it. Hence, for any set 𝑨,

𝑷 𝑿𝝐 𝑨 = ∑𝒙𝝐𝝐 𝑷(𝒙).

When 𝑨 is an interval , its probability can be computed directly from the cdf , 𝑭(𝒙),
𝑷 𝒂 <𝑿 ≤ 𝒃 =𝑭 𝒃 −𝑭 𝒂 .
# (Errors in independent modules) . A program consists of two modules. The number of
errors 𝑋1 in the first module has the pmf 𝑃1(𝑥), and the number of errors 𝑋2 in the second
module has the pmf 𝑃2(𝑥), independently of 𝑋1 , where

𝒙 𝑃1(𝑥) 𝑃2(𝑥)
0 0.5 0.7
1 0.3 0.2
2 0.1 0.1
3 0.1 0

Find the pmf and cdf of 𝑌 = 𝑋1 + 𝑋2 , the total number of errors.

Sol.: We break the problem into steps. First, determine all possible values of 𝑌, then compute
the probability of each value. Clearly, the number of errors 𝑌 is integer that can be as low as
0 + 0 = 0 and as high as 3 + 2 = 5. Since 𝑃2 3 = 0, the second module has at most 2 errors.
Next,
𝑃𝑌 0 = 𝑃 𝑌 = 0 = 𝑃 𝑋1 = 𝑋2 = 0 = 𝑃1 0 𝑃2 0 = 0.5 ∗ 0.7 = 0.35
𝑃𝑌 1 = 𝑃 𝑌 = 1 = 𝑃1 0 𝑃2 1 + 𝑃1 1 𝑃2 0 = 0.5 ∗ 0.2 + 0.3 ∗ 0.7 = 0.31
𝑃𝑌 2 = 𝑃 𝑌 = 2 = 𝑃1 0 𝑃2 2 + 𝑃1 1 𝑃2 1 + 𝑃1 2 𝑃2 0
= 0.5 ∗ 0.1 + 0.3 ∗ 0.2 + 0.1 ∗ 0.7 = 0.18
𝑃𝑌 3 = 𝑃 𝑌 = 3 = 𝑃1 0 𝑃2 3 + 𝑃1 1 𝑃2 2 + 𝑃1 2 𝑃2 1 + 𝑃1 3 𝑃2 0
= 0.5 ∗ 0 + 0.3 ∗ 0.1 + 0.1 ∗ 0.2 + 0.1 ∗ 0.7 = 0.12
𝑃𝑌 4 = 𝑃 𝑌 = 4 = 𝑃1 2 𝑃2 2 + 𝑃1 3 𝑃2 1 =0.1*0.1+0.1*0.2=0.03
𝑃𝑌 5 = 𝑃 𝑌 = 5 = 𝑃1 3 ∗ 𝑃2 (2)=0.1 ∗ 0.1 = 0.01
The cumulative function 𝐹 𝑦 can be similarly computed.
Families of Discrete Distributions

We now consider the most commonly used families of discrete distributions.


Amazingly, absolutely different phenomena can be adequately described by the
same mathematical model, or a family of distributions. Say, the number of
virus attacks , received e-mails, error messages, network blackouts, telephone
calls, traffic accidents, earthquakes, and so on can all be modeled by the
Poisson family of distributions.

Bernoulli Distribution
The simplest random variable (excluding non-random ones!) takes just two
possible values. Call them 0 and 1.

Definition: A random variable with two possible values , 0 and 1, is called


Bernoulli variable, its distribution is Bernoulli distribution, and any experiment
with a binary outcome is called Bernoulli trial.
Good or defective components, parts that pass or fail tests, transmitted or lost signals, working or
malfunctioning hardware, sites that contain or do not contain a key word etc., are examples of
Bernoulli trials. All these experiments fit the same Bernoulli model, where we shall use generic names
for the two outcomes: “successes” and “failures”.
If 𝑃 1 = 𝑝 is the probability of success, then 𝑃 0 = 𝑞 = 1 − 𝑝is the
probability of a failure. We can then compute the expectation and variance as:
𝐸 𝑋 = 𝑥̅ = � 𝑥𝑥 𝑥 = 0 1 − 𝑝 + 1 𝑝 = 𝑝
𝑥
𝑉𝑉𝑉 𝑋 = 𝜎2 = ∑𝑥 𝑥 − 𝑥̅ 2 𝑃(𝑥)= 𝑝 1 − 𝑝 = 𝑝𝑝

𝑝 = 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 𝑜𝑜 𝑠𝑠𝑠𝑠𝑠𝑠𝑠
𝑞 = 1 − 𝑝 𝑖𝑖 𝑥 = 0
Bernoulli distribution 𝑃 𝑥 =�
𝑝 𝑖𝑖 𝑥 = 1
𝐸 𝑋 = 𝑝 ; 𝑉𝑉𝑉 𝑋 = 𝑝𝑝

In fact, we see that there is a whole family of Bernoulli distributions,


indexed by a parameter 𝑝. Every 𝑝 between 0 and 1 defines another
Bernoulli distribution. The distribution with 𝑝 = 𝑞 = 0.5 carries the
highest level of uncertainty because 𝑉𝑉𝑉 𝑋 = 𝑝𝑝 is maximized by
𝑝 = 𝑞 = 0.5. Distribution with lower or higher 𝑝 lave lower variances.
Extreme parameters 𝑝 = 0 and 𝑝 = 1 define non-random variables 0
and 1, respectively, their variance is 0.
Binomial distribution
Consider a sequence of independent Bernoulli trials and count the number of
successes in it. This may be the number of defective computers in a shipment,
the number of updated files in a folder, the number of e-mails with attachments
etc.,
Definition
A variable described as the number of successes in a sequence of independent Bernoulli
trials has Binomial distribution. Its parameter are 𝒏 − 𝒕𝒕𝒕 𝒏𝒏𝒏𝒏𝒏𝒏 𝒐𝒐 𝒕𝒕𝒕𝒕𝒕𝒕, and 𝒑, the
probability of success.

Binomial probability mass function is


𝒏 𝒙 𝒏−𝒙
𝑷 𝒙 =𝑷 𝑿=𝒙 = 𝒑 𝒒 , 𝒙 = 𝟎, 𝟏, 𝟐, … . , 𝒏
𝒙
Which is the probability of exactly 𝒙 successes in 𝒏 trials. In this formula, 𝒑𝒙 is the
probability of 𝒙 successes , probabilities being multiplied due to independence of trials.
Also , 𝒒𝒏−𝒙 is the probability of the remaining (𝒏 − 𝒙) trials being failures. Finally ,
𝒏 𝒏!
= is the number of elements of the sample space 𝑺 that form the event 𝑿 = 𝒙 .
𝒙 𝒙! 𝒏−𝒙 !
This is the number of possible ordering of 𝒙 successes and (𝒏 − 𝒙) failures among 𝒏 trials
and it is computed as 𝑪 𝒏, 𝒙 .
The expectation , 𝐸 𝑋 is thus given as:
𝑛 𝑥 𝑛−𝑥
𝐸 𝑋 = 𝑥̅ =∑𝑛𝑥=0 𝑥𝑥 𝑥 = ∑𝑛𝑥=0 𝑥 𝑝 𝑞 =𝑛𝑛
𝑥
Similarly the variance is given by
𝑛 2 𝑛−𝑥
𝑉𝑉𝑉 𝑋 = 𝜎 2 = ∑𝑛𝑥=0(𝑥 − 𝑥̅ )2 𝑝 𝑞 =𝑛𝑛𝑛
𝑥
# An exciting computer game is released. Sixty percent of players complete all the levels.
Thirty percent of them will then buy an advanced version of the game. Among 15 users ,
what is the expected number of people who will buy the advanced version? What is the
probability that at least two people will buy it?

Sol: Let 𝑿 be the number of people (successes) , among the mentioned 15 users (trials),
who will buy the advanced version of the game. It has Binomial distribution with 𝒏 = 𝟏𝟏
Trials and the probability of success
𝒑 = 𝑷 𝒃𝒃𝒃 𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂
= 𝑷 𝒃𝒃𝒃 𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂 𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄 𝒂𝒂𝒂 𝒍𝒍𝒍𝒍𝒍𝒍 𝑷 𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄 𝒂𝒂𝒂 𝒍𝒍𝒍𝒍𝒍𝒍
= 𝟎. 𝟑𝟑 ∗ 𝟎. 𝟔𝟔 = 𝟎. 𝟏𝟏
𝑬 𝑿 = 𝟏𝟏 ∗ 𝟎. 𝟏𝟏 = 𝟐. 𝟕
And
𝑷 𝑿 ≥ 𝟐 = 𝟏 − 𝑷 𝑿 < 𝟐 = 𝟏 − 𝑷 𝟎 − 𝑷 𝟏 = 𝟏 − (𝟏 − 𝒑)𝒏 −𝒏𝒏(𝟏 − 𝒑)𝒏−𝟏=0.7813
Geometric distribution

Consider a sequence of independent Bernoulli trials. Each trial results in a “ success” or a


“failure”.

Definition

The number of Bernoulli trials needed to get the first success has Geometric distribution.

# A search engine goes through a list of sites looking for a given key phrase. Suppose the
search terminates as soon as the key phrase is found. The number of sites visited is
Geometric.

# A hiring manager interviews candidates , one by one, to fill a vacancy. The number of
candidates interviewed until one candidate receives an offer has Geometric distribution.

Geometric random variables can take any integer value from 1 to infinity , because one
needs at least 1 trial to have the first success, and the number of trials needed is not limited
by any specific number. ( For example, there is no guarantee that among the first 10 coin
tosses there will be at least one head.) The only parameter is 𝒑, the probability of a
“success”.
Geometric probability mass function has the form

𝑃 𝑥 = 𝑃 𝑡𝑡𝑡 𝑑𝑑𝑑𝑑𝑑 𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑖𝑖 𝑡𝑡𝑡 𝑥 − 𝑡𝑡 𝑡𝑡𝑡𝑡𝑡 = (1 − 𝑝)𝑥−1 𝑝, 𝑥 = 1,2, … .

which is the probability of (𝑥 − 1) failures on the first (𝑥 − 1) trials and a success


on the last trial.

Observe that
1
∑𝑥 𝑃(𝑥) = ∑∞
𝑥=1(1 − 𝑝)
𝑥−1 𝑝= 𝑝 =1
1−(1−𝑝)
The mean and variance is given as:
∞ ∞
𝑥−1
𝑑 𝑥
𝑑 1 1
𝑥̅ = 𝐸 𝑋 = � 𝑥(1 − 𝑝) 𝑝 = 𝑝 �𝑞 = 𝑝 =
𝑑𝑑 𝑑𝑑 (1 − 𝑞) 𝑝
𝑥=1 𝑥=0
1−𝑝
𝑉𝑉𝑉 𝑋 =
𝑝2
Here we have defined 𝑞 = 1 − 𝑝.
# (St. Petersburg Paradox). This paradox was noticed by a Swiss mathematician
Daniel Bernoulli (1700-1782), a nephew of Jacob. It describes a gambling strategy
that enables one to win any desired amount of money with probability one.

Consider a game that can be played any number of times. Rounds are independent, and each time your
winning probability is 𝑝. The game does not have to be favorable to you or even fair. This 𝑝 can be any
positive probability. For each round , you bet some amount 𝑥. In case of a success , you win 𝑥. If you lose the
round , you lose 𝑥.

The strategy is simple . Your initial bet is the amount that you desire to win eventually. Then, if you win a
round, stop. If you lose a round , double your bet and continue.

Say the desired profit is $100. The game will progress as follows:

Balance
Round Bet ….if lose …. If win
1 100 -100 +100 and stop
2 200 -300 +100 and stop
3 400 -700 +100 and stop
… … …… …….
Sooner or later, the game ill stop, and at this moment, your balance will be $100. Guaranteed! But this is not
what D.Bernoulli called a paradox.

How many rounds should be played? Since each round is a Bernoulli trial, the number of them , 𝑋 , until the
first win is a Geometric random variable with parameter 𝑝.

1
Is the game endless? No, on the average , it will last 𝐸 𝑋 = 1⁄𝑝 rounds. In a fair game with 𝑝 = , one will
2
need 2 rounds, on the average., to win the desired amount. In an “unfair” game, with 𝑝 < 1⁄2, it will take
longer to win, but still a finite number of rounds. For example with 𝑝 = 0.2 i.e., one win in 5 rounds, then
on the average , one stop after 1⁄𝑝 = 5 rounds. This is not a paradox yet.

Finally , how much money does one need to have in order o be able to follow this strategy? Let 𝑌 be the
amount of the last bet. According to the strategy, 𝑌 = 100. 2 𝑋−1. It is a discrete random variable whose
expectation is
𝐸 𝑌 = ∑𝑥 100. 2𝑥−1 𝑃𝑋 𝑥 = 100 ∑∞ 𝑥=1 2
𝑥−1
(1 − 𝑝)𝑥−1 𝑝
∞ 100𝑝
𝑖𝑖 𝑝 > 1�2
= 100𝑝 �[2 1 − 𝑝 ] 𝑥−1
= �2(1 − 𝑝)
𝑥=1 +∞ 𝑖𝑖 𝑝 ≤ 1�2
This the St.Petersburg Paradox ! A random variable that is always finite has an infinite expectation! Even
when the game is fair a 50-50 chance to win , one has to be (on the average! ) infinitely rich to follow this
strategy.
Negative Binomial distribution (Pascal)

In the foregoing , we played the game until the first win. Now keep playing until
we reach a certain number of wins. The number of played games is then Negative
Binomial.

Definition

In a sequence of independent Bernoulli trials, the number of trials needed to


obtain 𝒌 successes has Negative Binomial distribution.

Negative Binomial probability mass function is


𝑃 𝑥 = 𝑃 𝑡𝑡𝑡 𝑥 − 𝑡𝑡 𝑡𝑡𝑡𝑡𝑡 𝑖𝑖 𝑡𝑡𝑡 𝑘 − 𝑡𝑡 𝑠𝑠𝑠𝑠𝑠𝑠𝑠
𝑘 − 1 𝑠𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢 𝑖𝑖 𝑡𝑡𝑡 𝑓𝑓𝑓𝑓𝑓 𝑥 − 1 𝑡𝑟𝑟𝑟𝑟𝑟,
=𝑃
𝑎𝑎𝑎 𝑡𝑡𝑡 𝑙𝑙𝑙𝑙 𝑡𝑡𝑡𝑡𝑡 𝑖𝑖 𝑎 𝑠𝑠𝑠𝑠𝑠𝑠𝑠
𝑘−1
= (1 − 𝑝)𝑥−𝑘 𝑝𝑘
𝑥−1
This formula accounts for the probability of 𝑘 successes, the remaining (𝑥 − 𝑘)
failures, and the number of outcomes- sequences with 𝑘 −th success coming on
the 𝑥 − 𝑡𝑡 trial.
Negative Binomial distribution has two parameters , 𝑘 and 𝑝. With 𝑘 = 1, it
becomes Geometric. Also , each negative binomial variable can be represented as
a sum of independent Geometric variables,
𝑋 = 𝑋1 + 𝑋2 + ⋯ … . . +𝑋𝑘
With the same probability of success 𝑝. Indeed, the number of trial until the
𝑘 − 𝑡𝑡 success consists of a Geometric number of trials 𝑋1 until the first fuccess,
an additional geometric number of trials 𝑋2 until the second success , etc.

Therefore
𝐸 𝑋 = 𝐸 𝑋1 + 𝑋2 + ⋯ . +𝑋𝑘 = 𝑘⁄𝑝;
𝑘(1−𝑝)
𝑉𝑉𝑉 𝑋 = 𝑉𝑉𝑉(𝑋1 + 𝑋2 + ⋯ . +𝑋𝑘 )=
𝑝2
#(Sequential testing). In a recent production 5% of certain electronic components are defective. We
need to find 12 non-defective components for our 12 new computers. Components are tested until 12
non defective ones are found. What is the probability that more than 15 components will have to be
tested?
Sol.: Let 𝑿 be the number of components tested until 12 non-defective ones are found. It is a number of
trials needed to see 12 successes, hence 𝑿 has Negative Binomial distribution with 𝒌 = 𝟏𝟏 and
𝒑 = 𝟎. 𝟎𝟎. We need
𝑷 𝑿 > 𝟏𝟏 =
∑∞𝟏𝟔 𝑷 𝒙 = 𝟏 − 𝑭 𝟏𝟏 . 𝐓𝐓𝐓𝐓𝐓𝐓𝐓𝐓𝐓 𝐨𝐨𝐨 𝐧𝐧𝐧𝐧 𝐭𝐭𝐭 𝐭𝐭𝐭𝐭𝐭 𝐨𝐨 𝐍𝐍𝐍𝐍𝐍𝐍𝐍𝐍 𝐛𝐛𝐛𝐛𝐛𝐛𝐛𝐛 𝐝𝐝𝐝𝐝𝐝𝐝𝐝𝐝𝐝𝐝𝐝𝐝.
However one may compute the left hand side using the following argument.
𝑷 𝑿 > 𝟏𝟏 = 𝑷{𝒎𝒎𝒎𝒎 𝒕𝒕𝒕𝒕 𝟏𝟏 𝒕𝒕𝒕𝒕𝒕𝒕 𝒏𝒏𝒏𝒏𝒏𝒏 𝒕𝒕 𝒈𝒈𝒈 𝟏𝟏 𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔}
= 𝑷{𝟏𝟏 𝒕𝒕𝒕𝒕𝒕𝒕 𝒂𝒂𝒂 𝒏𝒏𝒏 𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔}
= 𝑷{𝒕𝒕𝒕𝒕𝒕 𝒂𝒂𝒂 𝒇𝒇𝒇𝒇𝒇 𝒕𝒕𝒕𝒕 𝟏𝟏 𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔 𝒊𝒊 𝟏𝟏 𝒕𝒕𝒕𝒕𝒕𝒕}
= 𝑷{𝒀 < 𝟏𝟏}
Where 𝒀 is the number of successes (non defective components) in 15 trials, which is a binomial
variable with parameters 𝒏 = 𝟏𝟏 and 𝒑 = 𝟎. 𝟗𝟗. Therefore
𝑷 𝑿 > 𝟏𝟏 = 𝑷 𝒀 < 𝟏𝟏 = 𝑷 𝒀 ≤ 𝟏𝟏 = 𝑭 𝟏𝟏 = 𝟎. 𝟎𝟎𝟎𝟎.

Poisson distribution
This distribution is related to a concept of rare events, or Poissonian events. Essentially
it means that two such events are extremely unlikely to occur within a very short time
or simultaneously. Arrivals of jobs, telephone calls , e-mail messages , traffic accidents,
network blackouts, virus attacks, error in software, floods, earthquakes are example of
rare events.
This distribution bears the name of a famous French mathematician Sim𝑒𝑒𝑒 ́ Denis
Poisson (1781-1840).

𝜆 = 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓, 𝑎𝑎𝑎𝑎𝑎𝑎𝑎 𝑛𝑛𝑛𝑛𝑛𝑛 𝑜𝑜 𝑒𝑒𝑒𝑒𝑒𝑒


𝑒 −𝜆 𝜆𝑥
Poisson distribution
𝑃 𝑥 = ,
𝑥 = 0,1,2, … .
𝑥!
𝐸 𝑋 =𝜆
𝑉𝑉𝑉 = 𝜆
# ( New accounts) . Customers of an internet service provider initiate new accounts at the average
rate of 10 accounts per day. (a) What is the probability that more than 8 new accounts will be initiated
today? (b) What is the probability that more than 16 accounts will be initiated within 2 days?
Sol.: (a) Here 𝝀 = 𝟏𝟏. The probability that more than 8 new accounts ,
𝒆−𝟏𝟏 (𝟏𝟏)𝒙
𝑷 𝑿 > 𝟖 = 𝟏 − 𝑭𝑿 𝟖 = 𝟏 − ∑𝟖𝒙=𝟎 =1-0.333=0.667
𝒙!
(b) The number of accounts , 𝒀, opened within 2 days does not exceed 𝟐𝟐. Rather, 𝒀 is another Poisson
random variable whose parameter equals 20. Therefore
𝑷 𝒀 > 𝟏𝟏 = 𝟏 − 𝑭𝒀 𝟏𝟏 = 𝟏 − 𝟎. 𝟐𝟐𝟐 = 𝟎. 𝟕𝟕𝟕.

𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵 ∶ 𝑏 𝑘; 𝑛, 𝑝 ≈ 𝑃𝑃𝑃𝑃𝑃𝑃𝑃(𝜆)
Poisson approx. to Binomial where 𝑛 ≥ 30, 𝑝 ≤ 0.05, 𝑛𝑛 = 𝜆
𝑛 𝑘 𝑛−𝑘
Here 𝑏 𝑘; 𝑛, 𝑝 = 𝑝 𝑞
𝑘

Mathematically, it means closeness of binomial and Poisson pmf,


𝑒 −𝜆 𝜆𝑘
𝑛 𝑘
lim 𝑝 (1 − 𝑝)𝑛−𝑘 =
𝑛→∞,𝑝→0,𝑛𝑛→𝜆 𝑘 𝑘!

* When 𝒑 ≥ 𝟎. 𝟗𝟗 , the Poisson approximation is applicable too. The probability of a failure


𝒒 = 𝟏 − 𝒑 is small in this case. Then, we can approximate the number of failures , which is
also Binomial.
#(Birthday problem). Consider a class with 𝑵 ≥ 𝟏𝟏 students. Compute the probability that at least
two of them have their birthdays on the same day. How many students should be in class in order to
have this probability above 0.5?

𝑵 𝑵(𝑵−𝟏)
Sol.: Let 𝒏 = 𝟐
=
𝟐
pairs of students in this class. In each pair , both students are born on the
same day with probability 𝒑 = 𝟏⁄𝟑𝟑𝟑. Each pair is a Bernoulli trial because the two birthdays either
match or don’t match. Besides, matches in two different pairs are “nearly” independent. Therefore ,
𝑿, the number of pairs sharing birthdays, is “almost’ Binomial. For 𝑵 ≥ 𝟏𝟏, 𝒏 ≥ 𝟒𝟒 is large, and 𝒑 is
small, thus we shall use Poisson approximation with 𝝀 = 𝒏𝒏 = 𝑵(𝑵 − 𝟏)/𝟕𝟕𝟕,
𝑷 𝒕𝒕𝒕𝒕𝒕 𝒂𝒂𝒂 𝒕𝒕𝒕 𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔 𝒔𝒔𝒔𝒔𝒔𝒔𝒔 𝒃𝒃𝒃𝒃𝒃𝒃𝒃𝒃 = 𝟏 − 𝑷{𝒏𝒏 𝒎𝒎𝒎𝒎𝒎𝒎𝒎}
𝟐
= 𝟏 − 𝑷 𝑿 = 𝟎 ≈ 𝟏 − 𝒆−𝝀 ≈ 𝟏 − 𝒆−𝑵 /𝟕𝟕𝟕
𝟐
Solving the inequality 𝟏 − 𝒆−𝑵 /𝟕𝟕𝟕 > 0.5, we obtain 𝑵 > √(𝟕𝟕𝟕 𝐥𝐥 𝟐) = 𝟐𝟐. 𝟓. That is , in a class of at
least 𝑵 = 𝟐𝟐 students, there is a more than 50% chance that at least two students were born on the
same day of the year.

Anda mungkin juga menyukai