Anda di halaman 1dari 9

Statistical Inference!

Now with 33% more pages!

created by Sherrinford (r/APStudents) April 2018/2019 | Reddit | Discord: Sherrinford#0290


Probability Study Guide

Common Confidence & Significance Levels

Confidence Significance Z* Value When to Use


Level Level

90% 0.10 1.645 Not very serious topic / Girl Scout cookies

95% 0.05 1.96 When in doubt, use this / Court of law

99% 0.01 2.576 Very serious topic

Errors

Error Definition

Type I Reject the H0 when it’s actually true

Type II Fail to reject the H0 when it’s actually false

Other Things to Remember

Thing What’s important about it

Confidence Intervals *ARE NOT PROBABILITY!


Confidence intervals are statements about the future - “In the long run, X% of all confidence intervals constructed in this method will
contain the true parameter.”

Sx vs. σ T-distributions are used because Sx is an estimator of the variability of the test statistic (as opposed to σ, which is exact)

Power The formal definition of power is 1-beta (probability of a Type 2 error). You want the power of a significance test to be high. In order
to increase power, increase the sample size. If you can’t, decrease alpha.

S (Standard error of Estimate of the variability of the prediction of y based on x


the residuals)

Z test vs. T test Z test is used when σ is known. T test is used when σ is unknown. Think about normality later.

T-distributions are robust (resistant to outliers)

Degrees of freedom - For sample t-tests/intervals: n-1


(df) - For chi-square goodness of fit: # of categories - 1
- For chi-square homogeneity: (r-1)(c-1)
- For linear regression inference: n-2
- Round down if using Table D

Margin of error and - “Margin of error” includes z* or t*


sample size - The square root part is the standard error (standard deviation adjusted for sample size)
- Widths of confidence intervals are inversely proportional to the square root of the sample size (1/root(n))
- For example, if the sample size increased by a factor of 9, the width of the confidence interval would be ⅓ as wide as
it was before
- As a general rule, increasing the sample size decreases variability
List of Tests & Intervals

Method When to Use Conditions Test Hypotheses Test Conclusion Interval Conclusion

1-Sample Z Test σ is known Random: H0: μ = some number “Since a p-value of [p- “We are [significance
1-Sample Z Interval - Given value] is [greater/less] level]% confident that
counts are involved Independence: Ha: μ ≠, <, > some than an alpha of the true mean [context]
- N ≥ 10n number is between [lower
[significance level]...
one population Normality: bound] and [upper
- Given OR bound]”
- Approximate we fail to reject the null
normality of hypothesis that
sampling [context]”
distribution (n ≥ OR
30 / CLT) we have significant
evidence to reject the
null hypothesis [context]
in favor of the
alternative [context]

2-Sample Z Test σ is known Random: H0: μ1 = μ2 or μ1-μ2 “Since a p-value of [p- “We are [significance
2-Sample Z Interval - Given =0 value] is [greater/less] level]% confident that
counts are involved Independence: than an alpha of the true mean [context
- Read context to Ha: μ1 ≠, <, > μ2 or 1] is between [lower
[significance level]...
two populations determine if both μ1-μ2 ≠, <, > 0 bound] and [upper
(comparing two means) samples are bound] [higher/lower]
independent we fail to reject the null than the mean [context
- N ≥ 10n hypothesis that 2]”
Normality: [context]”
- Given OR OR
- Approximate we have significant
normality of evidence to reject the
sampling
null hypothesis [context]
distributions (n ≥
30 / CLT) in favor of the
alternative [context]
1-Proportion Z Test σ is known Random: H0: P = some number “Since a p-value of [p- We are [significance
1-Proportion Z Interval - Given value] is [greater/less] level]% confident that
proportions are Independence: Ha: P ≠, <, > some than an alpha of the true proportion of
involved - N ≥ 10n number [context] is between
[significance level]...
Normality: [lower bound] and
one population - Given OR [upper bound]”
- Approximate we fail to reject the null
normality of hypothesis that
sampling [context]”
distribution (np̂ ≥ OR
10 and n(1-p̂ ) ≥ we have significant
10) evidence to reject the
null hypothesis [context]
in favor of the
alternative [context]

2-Proportion Z Test σ is known Random: H0: P1 = P2 or P1-P2 = “Since a p-value of [p- “We are [significance
2-Proportion Z Interval - Given 0 value] is [greater/less] level]% confident that
proportions are Independence: than an alpha of the true proportion of
involved - Read context to Ha: P1 ≠, <, > P2 or [context 1] is between
[significance level]...
determine if both P1-P2 ≠, <, > 0 [lower bound] and
two populations samples are [upper bound]
independent we fail to reject the null [higher/lower] than the
- N ≥ 10n hypothesis that proportion of [context
Normality: [context]” 2]
- Given OR OR
- Approximate we have significant
normality of evidence to reject the
sampling null hypothesis [context]
distributions in favor of the
(n1p̂ 1 ≥ 5, n1(1- alternative [context]
p̂ 1) ≥ 5, n2p̂ 2 ≥ 5,
and n2(1-p̂ 2) ≥ 5)
1-Sample T Test Sx is known / σ is Random: H0: μ = some number “Since a p-value of [p- “We are [significance
1-Sample T Interval unknown - Given value] is [greater/less] level]% confident that
Independence: Ha: μ ≠, <, > some than an alpha of the true mean [context]
counts are involved - N ≥ 10n number is between [lower
[significance level]...
Normality: bound] and [upper
one population - Given OR bound]
- Approximate we fail to reject the null
normality of hypothesis that
sampling [context]”
distribution (n ≥ OR
30 / CLT) we have significant
- Approximate evidence to reject the
normality of
null hypothesis [context]
population (n <
30, check graph) in favor of the
alternative [context]

Paired T Test Sx of the differences is


Random: H0: μ(diff) = some “Since a p-value of [p- “We are [significance
Paired T Interval known / σ of the - Given number value] is [greater/less] level]% confident that
differences is unknown *PAIRED T TESTS ARE NOT than an alpha of the true mean
INDEPENDENT! Ha: μ(diff) ≠, <, > difference [context] is
[significance level]...
counts are involved Normality: some number between [lower bound]
- Given OR and [upper bound]
one population, two - Approximate we fail to reject the null
treatments normality of the hypothesis that
- Twins? sampling [context]”
- One test subject distribution of the OR
type, two mean difference we have significant
treatments? (n ≥ 30 / CLT)
evidence to reject the
- Approximate
normality of null hypothesis [context]
population of in favor of the
differences (n < alternative [context]
30, check graph)
2-Sample T Test Sx1 and Sx2 are known / Random H0: μ1 = μ2 or μ1-μ2 “Since a p-value of [p- We are [significance
2-Sample T Interval σ1 and σ2 are unknown sampling/assignment: =0 value] is [greater/less] level]% confident that
- Given than an alpha of the true mean [context
counts are involved Independence: Ha: μ1 ≠, <, > μ2 or 1] is between [lower
[significance level]...
- Read context to μ1-μ2 ≠, <, > 0 bound] and [upper
one population determine if both bound] [higher/lower]
samples are we fail to reject the null than the mean [context
independent hypothesis that 2]
Normality: [context]”
- Given OR OR
- Approximate we have significant
normality of
evidence to reject the
sampling
distributions (n ≥ null hypothesis [context]
30 / CLT) in favor of the
- Approximate alternative [context]
normality of
populations (n <
30, check graph)

Chi-Square: Goodness Extension of the 1- Random H0: P1 = X, P2 = Y, P3 “Since a p-value of [p- N/A
of Fit (GOF) Proportion Z Test = Z… value] is [greater/less]
All expected values are than an alpha of
One population greater than 1 Ha: There is a
[significance level]...
difference between
One row/column Less than 20% of expected the [observed] and
values are less than 5 the [expected] in at we fail to reject the null
Is there a significant least one category. hypothesis that there is
difference between the *CHI-SQUARE no difference [context]”
expected and observed DISTRIBUTIONS ARE NOT OR
proportions? NORMAL! we have significant
evidence to reject the
null hypothesis [context]
in favor of the
alternative there is a
difference in at least
one category [context]
Chi-Square: Extension of the 2- Random H0: P1 = P2 = P3… “Since a p-value of [p- N/A
Homogeneity Proportion Z Test value] is [greater/less]
All expected values are Ha: There is a than an alpha of
Multi-population greater than 1 difference in at least
[significance level]...
one category
Multiple rows/columns Less than 20% of expected between
values are less than 5 [populations]. we fail to reject the null
Is there a significant hypothesis that there is
difference in at least no difference [context]”
one proportion OR
between the we have significant
categories?
evidence to reject the
null hypothesis [context]
in favor of the
alternative there is a
difference in at least
one category [context]
Chi-Square: Is there a significant Random H0: There is no “Since a p-value of [p- N/A
Association/Independe association between association between value] is [greater/less]
nce categorical variables? All expected values are [variable 1] and than an alpha of
greater than 1 [variable 2].
[significance level]...
Less than 20% of expected Ha: There is an
values are less than 5 association. we fail to reject the null
hypothesis that there is
no difference [context]”
OR
we have significant
evidence to reject the
null hypothesis [context]
in favor of the
alternative there is a
difference in at least
one category [context]
Linear Regression Is there a linear Random H0: The slope of the “Since a p-value of [p- “We are [significance
(Slope) T Test relationship between true line of regression value] is [greater/less] level]% confident that
Linear Regression quantitative variables? Observations are b/t [x + context] and than an alpha of the true slope of the
(Slope) T Interval independent [y+ context] is 0. regression line [context]
[significance level]...
is between [lower
Linearity/Residuals: Ha: The slope is ≠, <, > bound] and [upper
- Standard 0 we fail to reject the null bound]
deviation of y is hypothesis that the
the same about slope of the true line of Formula: b +/- t*SEb
the true line regression [context] is
(scattered) 0”
- Plot of the
OR
residuals is
approximately we have significant
normal evidence to reject the
null hypothesis that the
slope of the true line of
regression [context] is 0
in favor of the
alternative that there is
[a/a positive/a
negative] relationship
[context]”

Anda mungkin juga menyukai