Anda di halaman 1dari 20

Manual PQRS

(Probabilities, Quantiles and Random Samples)

Sytse Knypstra

      
Contents
Working with PQRS discrete distributions: continuous distributions:
What can PQRS do? With finite support: • normal
Specify a probability distribu- • Bernoulli • gamma
tion • binomial • exponential
Find probabilities • discrete uniform • chi-square
Find quantiles • hypergeometric • Student’s t
Drawing random samples, ran- With a countably infinite support: • F
domly assign • geometric • Cauchy
Probability distributions • negative binomial • folded normal
Concepts • Poisson • beta
Specifications Distributions under the null hypoth- • uniform
References esis of the test statistics for the fol- • non-central beta
Literature lowing tests: • non-central chi-square
• Wilcoxon signed rank test • non-central t
• Wilcoxon rank sum test • non-central F
• Mann-Whitney test • logistic
• Kruskal-Wallis test • log-normal
• Friedman test • Pareto
• Weibull
• Gumbel
• inverse Gaussian
• double exponential

      
What can PQRS do?
For a large number of distributions you can: For the options a through e you first have to specify a
a. find probabilities, probability distribution:
b. find quantiles (in a sense the inverse of probabili- 1. Directly after the start of PQRS the selected dis-
ties), tribution is the (standard-) normal distribution
c. draw random samples. (see the window below Distribution). You may se-
lect a different probability distribution from the
In addition PQRS can: list that is shown after clicking on the arrow at its
d. show the graph of the probability function (for dis- right.
crete distributions) or the graph of the probability 2. Specify the values of the parameter(s) behind the
density function (for continuous distributions), = sign.
e. show the graph of the cumulative distribution func- 3. Press Enter on the keyboard or click Apply New
tion (cdf), Distribution.
f. randomly assign units to treatments, or, more specif-
ically: assign the numbers 1, 2, . . . , n to a number of At the right hand side on the tabsheet pdf or pmf you
groups. will see the probability density function (if the distri-
bution is continuous) or the probability mass function
(if the distribution is discrete) respectively. On the
tabshheet cdf the cumulative distribution function is
shown and on the tabsheet formulas the formulas are
shown that apply to the new probability distribution.

      
Find probabilities
If probabilities have to be found for the selected It is also possible to drag the slide window (click on
distribution, and the selected tabsheet is pdf or pmf, one of the two side arrows with the left mouse button,
then fill in an x-value in in the slide window direct below keep it pressed and move the mouse to the left or to
the graph (first delete the old value). Then press Enter the right).
on the keyboard or click with the left mouse button on If you release the button, then the probability P(X <
Compute Probabilities. x), the probability P(X > x) and possibly the proba-
The slide window will now move and in the windows be- bility P(X = x) are shown.
low it other values appear. The left hand window shows On the tabsheet cdf you will also find a slide window,
the probability P(X < x), the right hand window shows directly below the graph. You can enter a new x-value
the probability P(X > x) and if the probability of the in this window. Then press Enter on the keyboard or
value x is non-zero (only for discrete distributions, when click on Compute Probabilities.
the tabsheet pmf is selected) then another window right In the window to the left of the graph the probability
below the slide window is shown which contains the prob- P(X ≤ x) is shown. Its value corresponds to the value
ability P(X = x). of the cumulative distribution function in the point x,
often denoted by F (x).

      
Find quantiles
On the previous page it was explained how to find prob- If the selected tabsheet is pmf, in the case of discrete
abilities associated with a certain x-value. The opposite distributions, usually there is no x-value for which the
can also be done: you can find an x-value given a certain probability P(X ≤ x) exactly equals the prescribed
probability. Quantiles are x-values such that the prob- probability. In that case an x-value is presented for
ability of x or of a value smaller than x is equal to a which the probability P(X ≤ x) is slightly larger.
specified probability. If the selected tabsheet is pdf or pmf, also the probabil-
Quantiles are easily found if the selected tabsheet is cdf. ity P(X > x) can be specified, and after pressing Enter
For example if the selected distribution is the standard or clicking on Compute Quantile the corresponding x-
normal distribution and the quantile for the probability value is shown. If the selected tabsheet is pmf, there is
0.975 is wanted, then enter 0.975 in the small window usually no x-value for which the right hand side proba-
to the left of the graph. Press Enter on the keyboard or bility exactly meets the specification. In that case the
click on Compute Quantile and in the slide window below x-value is shown for which the probability P(X > x)
the graph the corresponding x-value appears: 1.96. is slightly smaller.
If the selected tabsheet is pdf, it is just as easy. Fill in
the probability 0.975 in the left probability window and
press Enter or click on Compute Quantile.

      
Drawing random samples, randomly assign
When doing experiments it may be necessary to ran-
For each selected probability distribution it is possible to
draw a random sample. Select from the menu: Sample domly assign experimental units to treatments. Sup-
and then: Draw random sample. A dialog window appears pose 75 subjects have to be assigned to three groups.
in which the desired sample size (n) has to be specified. Select from the menu: Sample and then: Randomly
Press OK and the outcomes of the sample are shown on assign. A dialog window appears in which the number
a separate tabsheet Sample. of experimental units and the number of groups have to
be specified. Press OK and a tabsheet Sample appears
These sample values can be copied to the clipboard
with the number of columns equal to the number of
(right-click and choose Select All, then Ctlr-C) and saved
groups; the numbers (in our case 1 – 75) are randomly
to a file: from the menu select: Sample and then Save.
assigned to three groups, each of size 25.
In order to erase the tabsheet Sample: select from the This assignment can be copied to the clipboard (right-
menu: Sample and then Clear. click and choose Select All, then Ctlr-C) and saved to
a file: from the menu select: Sample and then Save.
In order to erase the tabsheet Sample: select from the
menu: Sample and then Clear.

      
Concepts
A random variable is a variable that can take differ- In a continuous distribution the probabilities of in-
ent values, depending on chance. A random variable is tervals are given by means of an auxiliary function, the
usually denoted by a capital letter, for example X. A probability density function (pdf). The probabil-
random variable is always associated with a probability ity of an interval (a, b) is the area below the graph
distribution. of the probability density function and above this in-
A probability distribution specifies probabilities of terval. The total probability, the probability of the
values x or probabilities of intervals a < x < b. Prob- interval (−∞, ∞), is 1. Examples are the normal dis-
abilities have a value between 0 and 1. The probability tribution and the exponential distribution.
distributions in PQRS are either discrete or continuous. The (cumulative) distribution function F (x)
In a discrete distribution probabilities are given for (cdf) gives for each real number x the probability of
certain values x; these probabilities are positive and their a value at most equal to x, so F (x) = P(X ≤ x). The
sum is 1. The function f (x) = P(X = x) that gives the distribution function is a non-decreasing function of
probabilities for each real value x is called the probabil- x with values between 0 and 1. As a consequence of
ity (mass) function (pmf) Examples of discrete dis- the given definition the function is right-continuous: if
tributions are the binomial distribution and the Poisson F (x) makes a jump in some point c, then its value in
distribution. c is the limit of the values from the right of c.

      
Concepts
A quantile for a given probability p is the x-value for The variance is a measure for the spread (width) of
which F (x) = p. This is true for continuous distribu- the distribution. Another measure for the spread is the
tions. For discrete distributions it is usually not possible standard deviation, the square root of the variance.
to find an x-value that meets this condition exactly. In A probability distribution can be specified by its prob-
that case in PQRS x is chosen such that F (x) is slightly ability density function, its probability (mass) func-
larger than p. tion or the cumulative distribution function. In some
The support of a distribution is for a discrete distri- cases the probability distribution can also be speci-
bution the set of x-values for which the corresponding fied by means of the moment generating function
probability is strictly positive. For a continuous distrib- (mgf). Some properties of probability distributions
ution it can be described as the set of x-values for which can be easily derived by means of moment generating
the probability density function is positive. functions. Formally its definition is (in a neighbour-
The expected value is a measure for the centre of hood of the point t = 0): M (t) = E(etX ).
the distribution. If the probability distribution is rep-
resented as a mass distribution on a beam, then the ex-
pected value is the point of balance where the beam has
to be supported in order to keep it in balance. The ex-
pected value is for discrete distributions not necessarily
in the support.
The expected value of the random variable X is denoted
by E(X).

      
Specifications
Bernoulli binomial Poisson discrete uniform
parameter(s) 0<p<1 n = 1, 2, . . ., λ>0 M integer
0<p<1 N integer; M < N
support x = 0, 1 x = 1, 2, . . . , n x = 0, 1, . . . x = M, M + 1, . . . , N
(n) x e−λ λx
probability P(X = 0) = 1 − p x p (1 − p)
n−x
x!
1
N −M +1
function and P(X = 1) = p
M +N
expected value p np λ 2
(N −M )(N −M +2)
variance p(1 − p) np(1 − p) λ 12
t eM t +e(M +1)t +···+eN t
mgf 1 − p + pet (1 − p + pet )n eλ(e −1) N −M +1

The Bernoulli distribution is a special case of the binomial distribution (n = 1).


The probability distribution of the number of ’successes’ in n independent experiments, each with probability p
of ’succes’ is binomial.
For the probability distribution of the number of times a certain event occurs in a certain time interval often the
Poisson distribution is used.
An example of the discrete uniform distribution: the outcome of a throw of a fair die (M = 1, N = 6).

      
Specifications
hypergeometric geometric negative binomial
parameter(s) N = 2, 3, . . . 0<p<1 r = 1, 2, . . .
N1 = 1, . . . , N − 1 0<p<1
n = 1, . . . , N − 1
support x = max(0, n − N + N1 ), . . . , min(n, N1 ) x = 0, 1, . . . x = 0, 1, . . .
−N1
(Nx1 )(Nn−x ) (r+x−1) r
probability p(1 − p)x p (1 − p)x
(Nn ) x
function
nN1 1−p r(1−p)
expected N p p
value
nN1 (N −N1 )(N −n) 1−p r(1−p)
variance N 2 (N −1) p2 p2
[ ]r
p p
mgf 1−(1−p)et (1−(1−p)et )

The hypergeometric distribution is used if we have a set consisting of N objects, N1 of which having a certain
property. n objects are randomly chosen without replacement from the set. The number of objects drawn having
the property then has a hypergeometric distribution.
The geometric distribution is a special case of the negative binomial distribution (r = 1).
The distribution of the number of ’failures’ before the r-th ’success’ in a series of independent Bernoulli experi-
ments with probability p of success is negative binomial.
Attention: Sometimes the negative binomial distribution is defined as the number of experiments up to and
including the r-th ’success’ in the same setting.

      
Specifications
Wilcoxon signed Wilcoxon rank sum test Mann-Whitney
rank test test
parameter(s) n = 1, 2, . . . m = 1, 2, . . . m = 1, 2, . . .
n = 1, 2, . . . n = 1, 2, . . .
support x = 1, 2, . . . , n(n+1)
2 x = m(m+1)
2 , . . . , m(m+1)
2 + mn x = 0, . . . , mn
1 1 1
expected 4 n(n + 1) 2 m(m + n + 1) 2 mn
value
1 1 1
variance 24 n(n + 1)(2n + 1) 12 mn(m + n + 1) 12 mn(m + n + 1)

The Wilcoxon signed rank test is used for testing the null hypothesis of symmetry in a population. If 0 is
the point of symmetry, then the test statistic is the sum of the ranks of the positive observations.
The Wilcoxon rank sum test and the Mann-Whitney test are both used to test the null hypothesis of
equality of location of two populations.
For the Wilcoxon rank sum test the test statistic is R, the sum of the ranks of the first sample (with m elements).
In the Mann-Whitney test the test statistic U is the number of pairs (xi , yj ) of observations xi from the first
sample and yj from the second sample, for which xi > yj .
The two tests are equivalent: Wilcoxon’s rank sum R = Mann-Whitney’s U + 12 m(m + 1).

      
Specifications
Kruskal-Wallis test Friedman test
parameter(s) k = number of groups, k = number of treatments,
n1 = 1, 2, . . . , n2 = 1, 2, . . . , ..., nk = 1, 2, . . . b = number of blocks
support x≥0 x≥0
The Kruskal-Wallis test is used for testing whether the expected treatments differ between k > 2 groups if no
parametric distribution is assumed for the observations. The test statistic H is based on the ranks assigned to

k ( )2
the observations (if they are ranked together in increasing order): H = N (N 12
+1) ni Ri. − N 2+1 . HereN is
i=1
the total number of observations, ni is the number of observations in group i, and Ri. is the average rank of the
observations in group i.

Friedman’s test is used to test whether the effects of treatments differ in a completely randomised block design.
k ( )2
12N ∑
The test statistic is Q = k(k+1) Ri. − 21 (k + 1)
i=1
(same notation as in Kruskal-Wallis).

The exact distribution of both the test statistics H and Q can be computed in PQRS within reasonable time
only if the number of groups and the number of observations is small. In other cases the distributions of both H
and Q can be approximated by a chi-square distribution with k − 1 degrees of freedom.

      
Specifications
normal gamma exponential chi-square
parameter(s) µ α>0 λ>0 ν>0
σ 2 > 0 (of σ > 0) λ>0
support −∞ < x < ∞ x>0 x>0 x>0
(x−µ)2 ν
− λα α−1 −λx 2− 2 ν−2
λe−λx 2 e− 2
x
probability √ 1 e
Γ(α) x e Γ( ν2 ) x
2σ 2
2πσ 2
α 1
expected value µ λ λ ν
α 1
variance σ2 2 λ2

1 2 2 ( λ )α
λ
(1 − 2t)− 2
λ ν
mgf eµt+ 2 σ t
λ−t λ−t

The Gaussian or normal distribution plays an important role in statistics. One of the reasons is the Central
Limit Theorem.
A special case: the standard-normal distribution, with µ = 0 and σ 2 = 1.
In the gamma distribution sometimes instead of λ a parameter θ = λ1 is taken as its second parameter.
The exponential distribution is a special case of the gamma distribution (α = 1) and a special case of the
Weibull distribution (b = 1); it is used in modelling waiting times in queues.
The chi-square distribution is also a special case of the gamma distribution (α = ν2 and λ = 12 ). If the number
of degrees of freedom is very large, it is in PQRS approximated by a normal distribution.

      
Specifications
Student’s t F Cauchy
parameter(s)/ ν>0 m>0 α
df n>0 β>0
support −∞ < x < ∞ x>0 −∞ < x < ∞
Γ( ν+1 2 Γ( m+n
2 ) √1 2 ) [ ]
ν+1 m m m−2 mx − m+n
density Γ( ν2 ) νπ
(1 + xν )− 2 Γ( m )Γ( n ( n ) 2 x 2 (1 + n )
)
2 1
2
2 2 πβ 1+( x−a
b )
n
expected value 0 if ν > 1; n−2 does not exist
does not exist if ν ≤ 1
ν 2n2 (m+n−2)
variance ν−2if ν > 2; m(n−2)2 (n−4)
does not exist
does not exist if ν ≤ 2
The t-distribution is used when testing the null hypothesis of equality of means based on random samples from
normal distribution(s). If the number of degrees of freedom (df) is very large, PQRS will approximate the
t-distribution by a standard-normal distribution.
The F-distribution is used for tests of equality of two variances when we have samples from normal distributions
and also in the framework of linear models (linear regression and analysis of variance). If the number of degrees
of freedom (df) are very large, PQRS will approximate the F -distribution by a chi-square distribution.
The Cauchy distribution is a special case of the Student’s t-distribution (ν = 1). A special property of the
Cauchy distribution is that the expected value and the variance do not exist.

      
Specifications
folded normal beta uniform non-central beta
parameter(s) µ α>0 α α>0
σ2 > 0 β>0 β>α β>0
λ
support x≥0 0<x<1 α<x<β
probability f (x) + f (−x) 1 α−1 (1 − x)β−1 1
B(α,β) x β−α
density
√ µ2
2 − 2σ2 α α+β
expected σ πe +µ{1−2F (0)} α+β 2
value
αβ (β−α)2
variance µ2 + σ 2 − { E(X)}2 (α+β+1)(α+β)2 12
eβt −eαt
mgf (β−α)t

The folded normal distribution is the distribution of X = |Y | when Y has a normal distribution with
parameters µ and σ 2 . The functions f (.) and F (.) in this column are the density function and the cumulative
distribution function of Y .
The beta distribution has a positive probability density only on the interval 0 < x < 1.
The beta distribution with parameters 0 and 1 gives a uniform distribution as a special case (α = 1 and
β = 1).
The non-central beta distribution is closely related to the non-central F -distribution.
The algorithms for the non-central beta distribution is based on a paper by Frick (1990).

      
Specifications
non-central chi2 non-central F non-central t
parameter(s) ν>0 m>0 ν>0
λ≥0 n>0 δ
λ≥0
support x>0 x>0 −∞ < x < ∞
(m+λ)n √ Γ( ν−1 )
expected value ν+λ m(n−2) δ ν2 Γ( ν2 )
2
[ ]2
( n )2 (2m+4λ)(n−2)+2(m+λ)2 2
ν(1+δ )
ν−1
ν δΓ( 2 )
variance 2ν + 4λ m (n−2)2 (n−4) n−2 − 2 Γ( ν )
2
∑k
If X1 , . . . , Xk are independent and Xi ∼ N(µi , 1), then U = i=1 Xi2 has a non-central chi-square distribu-

tion with k degrees of freedom and non-centrality parameter λ = ki=1 µ2i .
If, independent of U , a variable V is defined which has a (central) chi-square distribution with m degrees of
freedom, then F = VU/k /m has a non-central F distribution with k and m degrees of freedom and non-centrality
parameter λ.
If X ∼ N(δ, 1), and V has, independent of X, a (central) chi-square distribution with m degrees of freedom, then
T = √ X has a non-central t distribution with m degrees of freedom and non-centrality parameter δ.
V /m
If the non-centrality parameter is 0, then the distribution is the corresponding central distribution.
The algorithms for the non-central distributions are based on ideas from a paper by Frick (1990).

      
Specifications
logistic Gumbel Pareto
parameter(s) α α α>0
β>0 β>0 θ>0
support −∞ < x < ∞ −∞ < x < ∞ x>α
[( ) ]
θ
cdf 1
− x−α
exp[− exp((a − x)/b)] θ αx − 1
1+e β
θα
expected value α α + 0.577216β θ−1
β 2 π2 π2 β 2
variance 3 6 does not exist
mgf eαt Γ(1 − βt)Γ(1 + βt) eαt Γ(1 − βt)

      
Specifications
Weibull log-normal inverse Gaussian double expo-
nential
parameter(s) a>0 µ µ>0 α
b>0 σ2 > 0 λ>0 β>0
support x>0 x>0 x>0 −∞ < x < ∞
(ln x−µ)2 √ λ(x−µ)2 |x−α|
b − − 1 −
probability abxb−1 e−ax √ 1 e 2σ 2 λ
2πx3
e 2µ2 x
2β e
β
2πσ 2 x2
density
σ2
a− b Γ(1 + 1b )
1
expected value eµ+ 2 µ α
− 2b 2 2 µ2
variance a [Γ(1 + 2
b) − e2µ+2σ − e2µ+σ λ 2β 2
Γ2 (1 + 1b )]
eαt
mgf 1−(βt)2

The double-exponential distribution is also called the Laplace distribution.

      
Literature
For a nice scheme showing the relationships between the For the programmering of PQRS the following sources
various distributions see were used:
Casella, G., Berger, R.L. (2002) ’Statistical Infer-
ence’ (2nd ed.) W.H.Press, B.P.Flannery, S.A.Teukolsky,
W.T.Vetterling (1989) ’Numerical Recipes in Pas-
For detailed information about probability distributions cal’.
see the series ’Distributions in Statistics’ by Johnson and
Kotz: P. L’Ecuyer (1988) ’Efficient and Portable Com-
bined Random Number Generators’, Communications
N.L.Johnson, S.Kotz (1969) ’Distributions in Statis-
of the ACM, Vol. 31 Nr 6 (June 1988).
tics, Discrete distributions’.
N.L.Johnson, S.Kotz (1970) ’Distributions in Statis- L. Devroye (1986), Non-Uniform Random Variate
tics, Continuous univariate distributions-1’. Generation.
N.L.Johnson, S.Kotz (1970) ’Distributions in Statis-
tics, Continuous univariate distributions-2’. H. Frick (1990) ’Algorithm AS R84. A remark on
Algorithm AS 226: computing non-central beta prob-
abilities.’, Appl. Stat. 39: pp. 311-312.

PQRS can be downloaded from


http://members.home.nl/sytse.knypstra/PQRS/
The program PQRS was designed and written by Sytse Knypstra in Delphi.
His e-mail adres is: Sytse.Knypstra@home.nl.

      
Literature

      

Anda mungkin juga menyukai