Statistical Formulas

[QMM] Statistical formulas
1. Mean
The mean, or average, of a collection of numbers x
1
, x
2
, . . . , x
N
is
x =
x
1
+x
2
+ +x
N
N
=
1
N
x
i
.
2. Standard deviation
The standard deviation is dened as
S =
_
(x
1
x)
2
+ + (x
N
x)
2
N 1
=
_
1
N 1
(x
i
x)
2
.
One may nd in some textbooks an alternative version, with N in the denominator. When the
author wishes to distinguish between both versions, the N version is presented as the population
standard deviation, while the N 1 is the sample standard deviation.
3. The normal distribution
The normal density curve is given by a function of the form
f(x) =
1
2
exp
_
(x )
2
2
2
_
.
In this formula, y are two parameters which are dierent for each application of the model.
A normal density curve has a bell shape (Figure 1). The parameter , called the population
mean, has an straightforward interpretation: the density curve peaks at x = . The parameter ,
called the population standard deviation measures the spread of the distribution: the higher
, the atter the bell. The case = 0, = is called the standard normal.
Probabilities for the normal distribution are calculated as (numerical) integrals of the density. For
most people, the only probability needed is
p
_
1.96 < X < + 1.96
= 0.95.
This formula provides us with an interval which contains 95% of the population. The tails
contain the remaining 5%.
4. Condence limits for the mean
The formula for the 95% condence limits for the mean is
x 1.96
S
N
.
[QMM] Statistical formulas 1 20120301
4 2 0 2 4
0
.
0
0
.
1
0
.
2
0
.
3
0
.
4
0
.
5
Figure 1. Three normal density curves
Here, N is the number of data points, x the sample mean and S the sample standard deviation.
Textbooks recommend replacing the factor 1.96, derived from the normal distribution, by a factor
taken from the Student t distribution, but the correction becomes irrelevant when N is high.
5. Correlation
For two dimensional data (x
1
, y
1
), (x
2
, y
2
), . . . , (x
N
, y
N
), the (linear) correlation is
R =
_
x
i
x)(y
i
y
_
_
_
x
i
x
_
2

_
y
i
y
_
2
.
Always 1 R 1.
6. Coecients of the regression line
Given N data points (x
1
, y
1
), (x
2
, y
2
), . . . , (x
N
, y
N
), the regression line has an equation y = b
0
+b
1
x,
in which b
0
and b
1
are the regression coecients: b
1
is the slope, and b
0
the intercept. The
formulas are
b
1
= R
S
Y
S
X
, b
0
= y b
1
x.
R is the linear correlation. y and x are the means of Y and Y , respectively. S
Y
and S
X
are the
standard deviations.
7. R square statistic
In a linear regression equation, the R
2
statistic is the proportion of the total variability of the
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G G
G
G
G
G
G G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G G
G
G
G
G
G
G G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G G G
G
G
G
G G
G
G
G
3 2 1 0 1 2 3
1
0
1
2
3
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G G
G
G
G
G
G
G
G
G
G
G
G
G
G
G G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
3 2 1 0 1 2 3
1
0
1
2
3
Figure 2. Regression lines with R = 0.8 and R = 0.2
dependent variable explained by the equation
R
2
=
Explained variability
Total variability
.
More explicitly, if y
1
, y
2
, . . . , y
N
are the observed valued of the dependent variable Y , with mean
y and y
1
, y
2
, . . . , y
N
are the values predicted by the equation,
R
2
=
_
y
i
y
_
2
_
y
i
y
_
2
.
Always 0 R
2
1. In simple regression (a single independent variable), R
2
coincides with the
square of the correlation.
8. Adjusted R square
An adjusted R
2
statistic, dened as
Adjusted R
2
= 1
(1 R
2
)(N 1)
N p 1
,
is used sometimes to compare regression equations. N is the number of data points and p the
number of independent variables in the equation. The adjustment becomes irrelevant when N is
high.

Statistical Formulas

Diunggah oleh

Informasi Dokumen

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Statistical Formulas

Diunggah oleh

Hak Cipta:

Format Tersedia

[QMM] Statistical formulas

Anda mungkin juga menyukai