Statistics Cheatsheet

=
x
N
Population
i
(basic)
x= N xi2 ( xi ) N2
2
x
n
Sample
i
(basic)
( x )
i
s=
( x x )
i
n 1
n xi2 ( xi ) n(n 1)
= f i xi =
i
(frequency)
x = f i xi
i 2 i 2 i i
(frequency)
f ( f x )
i i
f x ( f x )
s=
n f i ( x f i xi ) 2 = n 1
n n 1
( f x
i
2 i
( f i xi )
Binomial (specific amount of samples)
Poisson (specific duration of test period) Use formula ( is expected rate/period; x = amount tested)
n! P( x) = p x q n x x!(n x)!
Check table of formula (n samples with x success)
= np
z=
= npq
x e x! = = np =
P ( x) =
Becomes Normal when np>5 and nq>5
x 0.5 (test statistic 0.5 corrected for discrete!)
(go to tables if you approximate Binomial with Poisson!) Becomes Normal when >5
z=
x 0.5 (test statistic 0.5 corrected for discrete!)
Estimating the Standard Deviation Chebyshevs theorem (always holds): amount of data within k of the mean is: Empirical rule of thumb (random i.e. normal sample): amount of data within k of the mean is: 68% for 1 95% for 2 99% for 3 Estimating Risk To measure risk, one often takes the coefficient of variation (CV) to compare for example stocks: / (higher=riskier) An alternative way to measure risk is to plot (risk) versus r (return) to find the efficient frontier (this is the upper left tangent of all data points in the graph, representing the highest return with lowest risk optimum) Functions of Random Variables What is the behaviour of function W when it is a function of random variables X and Y
1 1 outside: 2 k k2
W = a + bX + cY
Then E(W) is: And VAR(W) is:
2 2 2 w = b 2 x + c 2 y + 2bc xy
w = a + b x + c y
Where:
xy = xy x y (a higher cov. does not mean a closer relationship since influenced by units; Cov(x,,y) < 0 means that x & y move in opposite direction; >0 means same direction; =0 y & x independent)
xy = CORR[ X , Y ] =
xy COV [ X , Y ] = (if xy +/- 0.7 -> strong) STDDEV [ X ] * STDDEV [Y ] x y
= 0.05, z = 1.645 2 = 0.05, z = 1.960 = 0.01, z = 2.326 2 = 0.01, z = 2.576 = 0.02, z = 2.054 2 = 0.02, z = 2.326 = 0.1, z = 1.28
These formulas are true for all distributions. Only if probability is calculated ~N has to be assumed for combined Variables.
Central Limit Theorem: If the sample size is large (n>30) the sampling distribution of the sample mean x is approximately a normal distribution. The standard deviation of the sampling distribution of the sample mean is / n and is called the standard error of the mean (becomes smaller with bigger sample size).
Hypothesis testing Null hypothesis Alternative hypothesis
Confidence Intervals
H 0 : = 0
H A : < 0 H A : > 0 H A : 0
P x z / 2 x + z / 2 = 1 n n
Interval n>30 and known: n>30 and unknown: n<30 and known: n<30 and unknown: For proportions: Regression coefficients: Regression forecast: test-statistic
Reject H0 if:
x z / 2 x z / 2 x z / 2
z z z z z z / 2 or z z / 2
H0 is true Accept H0 OK H0 is false 2: p = 0 OK
n s n n
s n
x t / 2,n 1
Reject H0 For Alpha 5%; Z= +/-1.645 For Alpha 1%; Z= +/-1.645
1 : p value = F ( Z )
s n
p z/2 p/ q n b t / 2,n k 1 SE b Y f t / 2,n k 1 s f
S tan dardError =
or other term to the right
2 z z2 p1 ) ( p /2 n= = /2 2 E E
Error bound: sample size for e E or x% E (if p)
na m x
(p .5 = =0 ) 2 4E
z2/2
a xx% confidence interval for xx is [0.5; 0.7]; 0.50.7
/ n x z= s/ n x z= / n x t= s/ n x /np z= p q/n b b0 t= SE b Y f Y0 t= sf
z=
Building a Regression Model 1) Select variables for regression analysis and decide if there should be a constant or line should go through origin 2) Study scatter plots and remove extreme points 3) Study correlation matrix. For multi collinearity check the variables in the regression analysis (neither significant => remove variable with highest p-value; one is significant => take out the other; both are significant => leave both in, unless > 0.99 or signs of coefficients are distorted) 4) Run regression (again) 5) Take out variables one by one using t-test or p-value significance and run regression again 6) Check signs of coefficients with theoretical outcome Regression Analysis Postulated: Regression Residuals 1) Check autocorrelation (do Durbin-Watson test) (DW: 0=pos autocorr < ?? < OK < ?? < neg autocor=4) =>transformation/missing variable/measurement error? 2) Check homoskedasticity (positive residual correlation) and heteroskedasticity (increasing residual error) =>transformation/isolate the source/a significant variable may look insignificant? 3) Check if residuals are normally distributed (histogram) =>not so important (get more observations)
Yi = a + bX i n X2
Real:
Yi = a + bX i + u i
b=
XY XY
X2
XY XY
n
2 x
n a = Y bX
Regression error measures
r= n x y 2 Y Y =SR =1SE i S S 2 R= 2 S ST S (i Y Y) ST
XY XY
F=
R2 / k (1 R 2 ) / ( n k 1)
( )
n = number of samples k = number of variables (2 for simple reg.) t-value = coefficient/standard error p-value = 2*Fz(t-value) SS_ = Sum of squares Standard error of regression (For n>100 se approximates sf)
2 Radj = 1
n 1 SSE n k 1 SST
se =
Y Y i i
nk
2
S= + S S S T S S RE
X ) 1 ( f X sf =s 1+ + e n X X 2 (i )
Standard error of forecast

Statistics Cheatsheet

Diunggah oleh

Informasi Dokumen

Deskripsi Asli:

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Statistics Cheatsheet

Diunggah oleh

Hak Cipta:

Format Tersedia

=

Binomial (specific amount of samples)

Check table of formula (n samples with x success)

Becomes Normal when np>5 and nq>5

x 0.5 (test statistic 0.5 corrected for discrete!)

x 0.5 (test statistic 0.5 corrected for discrete!)

xy COV [ X , Y ] = (if xy +/- 0.7 -> strong) STDDEV [ X ] * STDDEV [Y ] x y

Hypothesis testing Null hypothesis Alternative hypothesis

Reject H0 For Alpha 5%; Z= +/-1.645 For Alpha 1%; Z= +/-1.645

p z/2 p/ q n b t / 2,n k 1 SE b Y f t / 2,n k 1 s f

or other term to the right

Error bound: sample size for e E or x% E (if p)

a xx% confidence interval for xx is [0.5; 0.7]; 0.50.7

Regression error measures

Standard error of forecast

Anda mungkin juga menyukai