Statistics

Confidence Intervals
The 95% CI for the population mean is:

𝜎
𝑌̅ ∓ 1.96
√𝑛
This can be generalized to any parameter which is normally distributed. Chebyshev’s rule gives a more general
form of this formula.
𝜎
Standard Error, SE =
√𝑛
The t-test
Used to compare quantitative data between two samples, to see if the difference between them is just random
variance in a single population or the samples are truly from two populations w/ different mean values. It
simply calculates the chance of seeing the observed difference if the two samples are from the same population
the critical cut-off value is usually 5%. Like other tests, the test involves a summary statistic, distributed over a
characteristic distribution.
The subtracted 0 is because our null hypo states that the difference between the two samples is 0, it reminds us
of what we’re testing. The “standard” t test procedure is for n<30 and the two samples w/ the same variance. It
is as follows:
(̅̅̅
𝑦1 − 𝑦 ̅̅̅2 ) − 0
𝑡𝑠 =
𝑆𝐸(𝑌̅̅̅1 −𝑌̅̅̅2)
Unpooled SE:
𝑆𝐸(𝑌̅̅̅1 −𝑌̅̅̅2) = √𝑆𝐸12 + 𝑆𝐸22

⏟
𝑠1 𝑠
√ + 2
√ 1 √𝑛2
𝑛
Remember, SEs add like Pythagorus.

Pooled SE is the weighted average of s12 & s22, with the weights equal to the degree of freedom from each sample:
2
(𝑛1 − 1)𝑠12 + (𝑛2 − 1)𝑠22
𝑠pooled =
(𝑛1 + 𝑛2 ) − 2
The pooled SE is formulated as:
2
1 1
𝑆𝐸pooled = √𝑠pooled ( + )
𝑛1 𝑛2
The unpooled & pooled formulas are quite similar:
2 2
𝑠1 𝑠2 𝑠pooled 𝑠pooled
𝑆𝐸(𝑌̅̅̅1 −𝑌̅̅̅2 ) =√ + & 𝑆𝐸pooled = √ +
√𝑛1 √𝑛2 𝑛1 𝑛2
In analyzing data w/ unequal n, one needs to decide between the two methods. With σ1=σ2, the pooled method
should be used , however the unpooled method yields pretty similar results in this case. With unequal
population standard deviations, the pooling is wrong and the unpooled method should be used. Therefore, the
pooled method is never necessary, and is of no extra benefit; that’s why everyone prefers the unpooled method,
which is easier. The confidence interval for the difference between the two means is as follows:
(𝑦̅1 − 𝑦̅2 ) ∓ 𝑡0.05 SE(𝑌̅1 −𝑌̅2 )
Note: use t0.025 in one-tail t-distribution tables.

The t value can be found from the associated tables, df obtained from 3 methods (6.7.1 Myra), the lower n in n-
1 gives a lower estimate, while n+n-2 gives a liberal estimate. The exact df lies in between, the formula is just
too long!
One sample t-test for the mean of a normal distribution w/ unknown variance
Used when we want to, for example, compare a subpopulation w/ the whole population. Population variance is
unknown, but the sample SD is known. One sample t-test is used, w/ the denominator being the sample’s SE.
𝑥̅ − 𝜇0
𝑡=
𝑠𝑑 ⁄√𝑛
It’s just like the z statistic, but for a small sample size (n). The t distribution is different for every value of n-1,
but for n=∞, it will become a normal distribution.
The Z-test
One sample z-test for the mean of a normal distribution w/ known variance
Same as the previous, but w/ the denominator being the population’s σ, divided by the sample’s n
𝑥̅ − 𝜇0
𝑧=
𝜎⁄√𝑛
It uses the normal distribution for any value of n. use P(Z=X)=97.5% for 5% significance level.
The F-test
Can be interpreted as a global mean value comparison, not one between two mean values. This test involves two
df values, one for the numerator and one for denominator.
𝐻0 : 𝜇1 = 𝜇2 = ⋯ = 𝜇𝑖 , 𝐻1 : The 𝜇𝑖 are not equal
The test statistic for the f-test:
Mean square between groups [MS(between)] :
∑𝐼𝑖=1 𝑛𝑖 (𝑦̅ − 𝑦̿)2

MS (between) =
𝐼−1
It is the weighted average of variance among the groups (i.e. take all the groups as data, calculate the ordinary
variance, but take each group’s size into account)
The pooled variance [MS(within)]:

2
∑𝐼𝑖=1(𝑛𝑖 − 1) 𝑠𝑖2
𝑠𝑑𝑝𝑜𝑜𝑙𝑒𝑑 =
∑𝐼𝑖=1(𝑛𝑖 − 1)
It is the weighted average of each group’s variance.
Finally, the test statistic!
𝑝𝑜𝑜𝑙𝑒𝑑 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒
𝑓=
𝑀𝑆(𝑤𝑖𝑡ℎ𝑖𝑛)
There is quite the relation between f-test and t-test. T-test w/ pooling is actually an f-test w/ I=2, and 𝑡𝑠2 = 𝐹𝑠
Linear Regression, Correlation & Covariance

Correlation coefficient formula:
𝑛
1 𝑥 − 𝑥̅ 𝑦 − 𝑦̅
𝑟= ∑( )( )
𝑛−1 𝜎𝑥 𝜎𝑦
𝑖=1
Or writing the formula for the sd values:

∑𝑛𝑖=1(𝑥 − 𝑥̅ )(𝑦 − 𝑦̅)
𝑟=
√∑𝑛𝑖=1(𝑥 − 𝑥̅ )2 ∑𝑛𝑖=1(𝑦 − 𝑦̅)2
Least-squares regression line of Y on X:
𝜎𝑦
Slope: 𝑎 = 𝑟 ( )
𝜎𝑥
Intercept: 𝑏 = 𝑦̅ − 𝑎𝑥̅
Giving the line:
𝑦̂ = 𝑎𝑥̂ + 𝑏
𝑟 2 will describe the portion of the variance in Y that is explained by the linear relationship between X
& Y.
The relation between covariance and correlation:
𝐶𝑜𝑣(𝑋, 𝑌) = 𝐸[(𝑋 − 𝜇𝑥 )(𝑌 − 𝜇𝑦 )]
𝑟 = 𝐶𝑜𝑟𝑟(𝑋, 𝑌) = 𝐶𝑜𝑣(𝑋, 𝑌)⁄

(𝜎𝑥 𝜎𝑦 )
The r value is dimensionless, normalized and is therefore more meaningful.

The Normal Approximation
To a binomial distribution: A normal distribution w/ mean np and variance np(1-p) can be used to approximate
a binomial distribution w/ parameters n & p when np(1-p)>5
To a Poisson distribution: A Poisson distribution w/ parameter λ is approximated by a normal distribution w/
mean and variance both λ.

Statistics

Diunggah oleh

Informasi Dokumen

Deskripsi Asli:

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Statistics

Diunggah oleh

Hak Cipta:

Format Tersedia

Confidence Intervals

The 95% CI for the population mean is:

𝑆𝐸(𝑌̅̅̅1 −𝑌̅̅̅2) = √𝑆𝐸12 + 𝑆𝐸22

Remember, SEs add like Pythagorus.

The pooled SE is formulated as:

The unpooled & pooled formulas are quite similar:

Note: use t0.025 in one-tail t-distribution tables.

Mean square between groups [MS(between)] :

∑𝐼𝑖=1 𝑛𝑖 (𝑦̅ − 𝑦̿)2

The pooled variance [MS(within)]:

It is the weighted average of each group’s variance.

Finally, the test statistic!

Linear Regression, Correlation & Covariance

Or writing the formula for the sd values:

𝐶𝑜𝑣(𝑋, 𝑌) = 𝐸[(𝑋 − 𝜇𝑥 )(𝑌 − 𝜇𝑦 )]

𝑟 = 𝐶𝑜𝑟𝑟(𝑋, 𝑌) = 𝐶𝑜𝑣(𝑋, 𝑌)⁄

The r value is dimensionless, normalized and is therefore more meaningful.

Anda mungkin juga menyukai