Data Analysis Hypothesis testing Type I Errors, Type II Errors and Statistical Power
Type I error (): the probability of rejecting
the null hypothesis when it is actually true.
Type II error (): the probability of failing to
reject the null hypothesis given that the alternative hypothesis is actually true.
Statistical power (1 - ): the probability of
correctly rejecting the null hypothesis. Choosing the Appropriate Statistical Technique Testing Hypotheses on a Single Mean
One sample t-test: statistical technique that
is used to test the hypothesis that the mean of the population from which a sample is drawn is equal to a comparison standard. Testing Hypotheses about Two Related Means
Paired samples t-test: examines differences in
same group before and after a treatment.
The Wilcoxon signed-rank test: a non-
parametric test for examining significant differences between two related samples or repeated measurements on a single sample. Used as an alternative for a paired samples t- test when the population cannot be assumed to be normally distributed.
McNemar's test: non-parametric method used
on nominal data. It assesses the significance of the difference between two dependent samples when the variable of interest is dichotomous. It is used primarily in before-after studies to test for an experimental effect. Testing Hypotheses about Two Unrelated Means
Independent samples t-test: is done to see if
there are any significant differences in the means for two groups in the variable of interest. Testing Hypotheses about Several Means
ANalysis Of VAriance (ANOVA) helps to
examine the signicant mean differences among more than two groups on an interval or ratio-scaled dependent variable. Regression Analysis
Simple regression analysis is used in a situation where
one metric independent variable is hypothesized to affect one metric dependent variable.
Multiple regression analysis is used in a situation where
we use more than one independent variable to explain variance in the dependent variable.
Standardized regression coefficients (beta coefficients) are
the estimates resulting from a multiple regression analysis performed on variables that have been starndardized.
A dummy variable is a variable that has two or more
distinct levels. Dummy variables allow us to use nominal/ordinal variables as independent variables to explain, understand, predict the dependent variable.
Multicollinearity is an encountered statistical phenomenon
In which two or more independent variables in a multiple regression model are highly correlated. Simple Linear Regression Other Multivariate Tests and Analysis 1. Discriminant Analysis helps to identify the independent variables that discriminate a nominally scaled dependent variable of interest.
2. Logistic Regression used when the dependent variable is
nonmetric.
3. Conjoint Analysis is a statistical technique that is used in
many fields including marketing, product management, and operations research.
4. Two-way ANOVA used when we want to examine the effect
of two nonmetric independent variables on a single metric dependent variable.
5. MANOVA tests mean differences among groups across
several dependent variables simultaneously by using sums of squares and ceoss-product matrices.
6. Canonical Correlation used when we want to examine the
relationship between two/more dependent variables and several independent variables. Data Warehousing, Data Mining, and Operations Research
Data Warehouse it serves all data that collected
from disparate sources including those pertaining to the companys finance, manufacturing, sales, etc.
Data Mining is a strategic tool of the company for
reaching new levels of business intelligence. Data mining helps to clarify the underlying patterns in different business activities which in turn facilitates decision making.
Operations Research/Management Science
another tool that used by the manager to identify, analyze, solve intricate problems of great complexity. It provides an additional tool to the manager by using quantification to supplement personal judgement. Some Software for Data Analysis