If
Addition rule: If we say A & B are mutually exclusive Independence: A and B are independent X,Y are independent if for every X and Y Probability Table: ( ) (( ) ) 1
Probability Tree:
Expected Value:
Variance: (Positive)
Joint Distribution:
Marginal Distribution:
Covariance: ( )
Correlation Coefficient:
If
then
If X,Y independent If X,Y are not necessarilyindependent Continuous Random variables: Density function: The area under between a and b is Total area under is 1
negative linear relationship, -1 is perfect negative no linear relationship positive linear relationship, 1 perfect positive Corr is a measure of linear association, it does not imply causality! Uniform Distribution: X~U[a,b] X can get any value between a and b for every x
Normal Distribution:
Binomial Distribution: X is the number of successes in n independent trials; probability for success is p (for failure 1-p)
If
given in a table Steps for solution: 1. Look for 2. Transform to ( ) 3. look for the answer in the table If X,Y are from normal distribution that is also Normal A Simple Random Sample A sample of size n from a population of N objects, where - every object has an equal probability of being selected - objects are selected independently follow the same probability distribution are independent Two ways to think about it: - Sample with replacement - Sample only a small fraction of the whole population Sample Mean Distribution: Population Size N, , Sample Size Sample Mean for large n (n>30) ( )
If
and
then
Sample Proportion Distribution: - The proportion of the population that has the characteristic of interest - The proportion of the Sample that has the characteristic of interest For large n ( , ( ( ))
Confidence interval for sample mean when is known: Need to make sure that are Normal or z-values: (look for in the normal table) The margin of error: confidence interval: confident that: For a given ME, , and ( )
( )
( ) ( )
Confidence interval for sample mean when Need to make sure that are Normal or
is unknown:
If the population is normally distributed, then with n-1 degrees of freedom (values in table) confidence interval: confident that: ( )
has a t distribution
( ) ( )
Hypothesis testing (mean): 1. Formulate the hypothesis 2. Calculate (and ), decide on 3. Assume is true 4.Decide on hypothesis by (a)Rejection area: reject , if
or or
(b) p-value:
p-value
p-value
p-value
reject
if p-value
p-value
p-value
p-value
Hypothesis testing (proportion) 1. Formulate the hypothesis 2. Calculate , decide on 3. Assume is true 4.Decide on hypothesis by (a)Rejection area: reject , if (b) p-value: reject if p-value
p-value (
) p-value (
or
p-value
Linear regression: Assumptions: 1. Y is linearly related to X 2. The error term The variance of the errorterm, , does not depend on the x-values. 3. The error terms are alluncorrelated. Estimation of model: Th r i a r al lin ar r lation: The regression provides estimators for the coefficients and to the standard error.
Least squares method: The relation is represented as a line: or Choose such that:
is minimized
The precentage of explanid variation: R is the correlation coefficient between y and x Adjusted R-squares: For multiple regression we adjust the R-squares by the number of independent variables:
Prediction of y given
Test statistic for (k independent variables) Standard error of single Y-values given X: ( )
Testing the linear realationship in multiple regression: at least for one Check the p-value given in the regression output and compare to the desired reject if p-value
Standard error of the mean Y given X: Confidence interval for the mean Y given X: (k independent variables)
Check the p-value for and compare to the desired reject if p-value Non linear relationships: 1. Logarithmic transformation of the dependent variable: Use when the y grows or depreciate at a certain rate in all 2. Transformation of the independent variables Replace by , , or any other transforamtion that makes sense. Try when the relation is not linear and 1 does not make sense
Categorial variables: If a qualitative variable can have only one of two values, introduce an independent variable that is 0 for the one value and 1 for the other. for k categories we need k-1 variables (base category is when all variables are 0) Trends if the dependent variable increases or decreases in time, use a time counter or de-trend the data first. Seasonal behavior if there are seasonal patterns, use dummies or seasonally adjusted data. Time Lags if a independent variable influences the dependent variable over a number of time periods, include past values in the model. Multicollinearity a high correlation between two or more independent variables.Effects: The standard errors of the s are inflated The magnitude (or even the signs) of the s may be different from what we expect Adding or removing variables produces large changes in the coefficient estimates or their signs The p-value of the F-test is small, but none of the t-values is significant
Decision Theory: How to build a decision tree: 1. List all decisions and uncertainties 2. Arrange them in a tree using decision and chance nodes Time 3. Label the tree with probabilities (for chance nodes) and payoffs(at least at the end of each branch). 4. Solve the tree/ fold back by taking the maximum of payoffs/ minimum of costs for decision nodes and the expected value for chance nodes toget the EMV (expected monetary value) Solve
Regressions examples: *Obtain 90% CI for expected change in TASTE if the concentration of LACTIC ACID increased by .01:
*What is the expected increase in peak load for 3 degree increase in high temp:
Prediction: Use output to estimate a 90% lower bound for predicted BAC of individual who had 4 beers 30 minutes ago: First calculate BAC hat. To obtain a lower bound on the blood alcohol level, we need to estimate S Forecast. See formula. S Forecast = 0.01844 So a lower bound =
Find 90% confidence interval for the expected weight of a car full of ore: