need clarified and detailed derivation of mean and variance of a hyper-geometrie distribution.
Ifa box contains NV balls, @ of them are black and N ~ are white, and n number of balls are drawn
at random without replacement, then the probability of getting 1 black balls (and obviously n —
white balls) is given by the following pan.
Tho pmfis
‘The meanis given by:
b= Ble) = np = na/N
and, variance
na(N ~ a)(.V ~ n) n
ont) ge Egy [Na
where
a=1-p=(N—a)/N
want the step by step procedure to derive the mean and variance. Thank you.This is a rather old question but itis worth revisiting this computation. Let
_ G2
O
where I have used m instead of a. We can ignore the details of specifying the support if we use the
conventions on binomial coefficients that evaluate to zero; eg., (8) = Oi k ¢ {0,...,m}. Then we
observe the identity
2(™) m! m(m —1)! — (mt
(2) = sates Soe ee (22):
whenever both binomial coefficients exist. Thus
Pr[X
Caeg ry
Fie
zPr|X =a] =m:
and we see that
CGE)
gay mys Ee eo
and the sum is simply the sum of probabilities for a hypergeometric distribution with parameters
N— 1, m~—1,n— Landis equal to 1. Therefore, the expectation is E|X] = mn/N. To get the
second moment, considerae »(2) =m a(n he m(m v(2-3):
‘which is just an iteration of the frst identity we used. Consequently
aya) 2) 9
a(e—a)pix a NEN oe) eM a ae)
and again by the same reasoning, we find
mim —1)n(n— 1)
Bae N="
It is now quite easy to see that the "factorial moment”
BIX(X—1)...(X- kb a)]= 7°22 oe dD.
In fact, we can write this in terms of binomial coefficients aa well:
[O)-P
This gives us a way to recover raw and central moments; e.g.,
Var[X] = B(x] —E[x}? = E[X(X — 1) +X) —E[XP = ELX(X— y] + E|X]G— EX),‘The tials ara not indepandant, but they are identiealy distributed, and indeed, exchangeable, so
‘thatthe covariance between two of them doesn't depend on which two they are, They expected
number of black balls on any one trial is a/N, so just add that up n times.
x) |, but you also need the covariance
between two trials. The probability of getting a black bell on both of the frst tivo trials is,
a(a~ 1)
NW-1)"
‘The variance for one trial is pq = p(1— p) = + . (1 _
So the covariance is
cov(X;, Xa) = E(XiXe) — (E Xi)(E Xa)
= Pr(X, = X2 = 1) — (Pr(X
a(a—1) ay?
~ N(N=1) ~ Gy) .
Add up nvariances and n(n — 1) covariances to get the variance:
var(Xi t+ =~ + Xn) = S)var(Xe) + J) cow(%s,X3).
7 ats
(Youll need to do a bit of routine algebraic simplification.)
m(m—1)n(n—1) | mn mn\ _ mn(N~m)(N~n)
vain = ve te) = aa
for example, What is nice about the above derivation is that the formula for the expeetation of (1) is
very simple to remember.