Elementary Statistics
Basic Terminology
Population: a well-defined collection of observations. The mean and standard deviation of a population are represented by and respectively. E.g. Age of all employees of company XYZ
Sample: a subset of a population. The mean and standard deviation of the sample are represented by and x s respectively. E.g. A random sample of size 100 from the above data of ages of all employees of company XYZ. Parameter: characteristic of a population. E.g. = mean of ages for the above mentioned population
Elementary Statistics
Basic Terminology
Descriptive statistics: calculate figures, tables, or graphs to summarize or characterize a set of data. Each figure summarizes a set of number in a unique way. Tables such as frequency count, stem leaf plots. Graphs such as histograms, Box and Whisker plot. Measures of variability such as Standard deviation, variance. Measures of Central tendency such as mean, median, mode.
Elementary Statistics
Types of Data
There are two types of quantitative Data: Continuous Data: Continuous data can take any value between two given values of data. Continues data usually has a unit attached. E.g. Time, Length, Weight
Discrete Data: Discrete data can take only few defined values. The other terms used interchangingly with discrete data are binary and cardinal. E.g. Yes/No, 0/1, High medium Low.
Elementary Statistics
Measures of Central Tendency
Mean: commonly known as arithmetic average or average. E.g. average of {1, 2, 5, 3, 4} is equal to (1+2+5+3+4)/5 which is 3 Median: Median is the central point if the given data were to arranged in increasing or decreasing order. E.g. if data {1, 2, 5, 3, 4} is arranged in increasing order 1
2
Median
3 4 5
Mode: Mode is the data point with highest frequency, i.e. one which appears the most. E.g. in data {1, 2, 5, 2, 3, 4, 2, 2 } 2 appears the most (4 times) hence 2 is mode. Other less frequently used measures of central tendency are geometric mean and harmonic mean
5
Elementary Statistics
Measures of Dispersion
Range: it is the difference between highest and lowest values in the data. E.g. in data {1, 3, 2, 3, 2, 5, 4, 2} the lowest value is 1 and highest is 5. So Range = 5-1 = 4 Standard deviation and Variance: both measure variability of data about the mean. Variance is square of Standard deviation.
Population Standard deviation with N observations
2 x ( ) i
n 1
Note: sample standard deviation is divided by n-1 to compensate for loss in degrees of freedom
Elementary Statistics
Measures of Position
Quartiles: Every data set has three quartiles. If the data is sorted in ascending order the first quartile is at 25th percentile (Q1), second quartile is at 50th percentile (Q2) and third quartile is at 75th percentile (Q3).
Data is ascending order
25% Q1
50%
Q2 (median)
75%
Q3
Elementary Statistics
Distribution
In statistics we study outcome of random experiments1. Random variable is the assigned value to each of the outcomes of experiment. Chances associated with each of these outcome is called probability of outcome. Sum total of all probabilities of outcomes for the experiment is always one. When these probabilities for each of the outcome for given experiment is plotted on graph is called probability distribution. Distribution can be of two type: Discrete (outcomes can take limited number of values) Continuous
of a dice is an random experiment. The outcome of this experiment could take any value from {1,2,3,4,5,6}. With total number of outcomes being 6, the chances associated with each one of them to happen is 1/6.
1Rolling
Elementary Statistics
Experiment to Distribution
Distributions are the result of random events. Examine probabilities. Probability that an event will occur: Number of favorable outcomes Total number of outcomes Plotting the outcomes against the associated probability gives one distribution.
Elementary Statistics
Plotting Distribution
An Experiment: Study distribution of outcomes from throw of two dice. Outcome: Outcome is defined as sum of numbers appear on the two dice Total outcomes from 1st dice = 6 Total outcomes from 2nd dice = 6 Total possible combinations from the experiment = 6*6 = 36 The outcomes would than vary from 2 to 12. (as 1 is the least and 6 the maximum one can get on a dice)
10
10
Elementary Statistics
Plotting Distribution
Lets look further into possibilities. E.g. How could we get sum of 3? And what is the associated probability? The ways in which 3 can happen are {1,2} and {2,1} Two ways Since we have total 36 combinations from the throw Probability (x=3) = 2/36 = 0.056
11
11
Elementary Statistics
Plotting Distribution
0.167
0.139
0.139
0.111
0.111
Probability
0.056
0.083
0.083
0.056
0.028
0.028
Outcome
10
11
12
12
Elementary Statistics
Normal Distribution
Normal distribution is a continuous distribution (given the range the random variable can take any value within) The two reasons why normal distribution is very important distribution are: It has some properties which makes it applicable to many situations where to make inference about the population taking sample is necessary The distribution is close to fitting the actual observed frequency of many phenomena. E.g. height, weight of a sample from a country.
13
13
Elementary Statistics
Normal Distribution
Characteristics of Normal Distribution: The distribution is uni-modal. Mean lies at the center of distribution, and the median and mode coincide with mean. The two tails of distribution extend indefinitely and never touch axis.
14
14
Elementary Statistics
Normal Distribution
1 Point of Inflection: The point on the curve which is 1 standard deviation far from the mean
Example of two normal distributions with equal mean and different standard deviation.
15
15
Elementary Statistics
Minitab1
Six sigma uses a lot of statistical tools for data exploration and analysis. Minitab is a statistical software which provides one with statistical capabilities without getting into details of manual mathematical calculations. Minitab1 helps Six Sigma professional with: Data manipulation tools Descriptive and Graphical analysis of data Hypothesis testing and designing of experiments
1MINITAB
16
16
Elementary Statistics
Minitab: How does it look ?
Following are the different sections of Minitab window
Menu Toolbar Session window to display the output of statistical test and calculations Worksheet for pasting data to be analyzed
Status bar
17
17
Elementary Statistics
Minitab: Data Types
Minitab recognizes three types of data and is displayed in the column number:
Numerical or numbers
Date/Time
18
18
Elementary Statistics
Minitab: Menu Items
Minitab menu can be broadly described as follows:
Data menu: This menu consists of tools on worksheet manipulation and data manipulation. Includes important utilities like stack/un-stack, sort, rank data. Also includes utilities for coding and changing data types. Calc menu: Provides calculator to change the data from existing column using calculations. Also has utilities to create new patterned data and generating data based on selected distributions. Stat menu: This has all statistical analysis tools from descriptive, control charts , hypothesis tests to all other statistical utilities to analyze data. Graph menu: This menu has all the graphs. Other menu options File, Edit, Window, Help etc. remain similar to any other software
19
19
Elementary Statistics
Minitab: Stack / Unstack Data
Stack/ Un-stack data: Following examples shows what is stacking of data, unstack is just opposite of stack.
Stacking data in worksheet would mean putting columns of data one below another in the same column.
1 1 1 2 2 2 3 3 3 1 1 1 2 2 2 3 3
20
20
Elementary Statistics
Minitab: Stack / Unstack Data
Unstack Column input window: Select Data > Unstack Columns
Select column which needs to be unstacked
Includes the column which has subscript by which the main column will be un-stacked
21
21
Elementary Statistics
Minitab: Stack / Unstack Data
Unstack Column output: The output is displayed in worksheet
Stacked data and subscripts in column C2 Un-stacked data in three different columns
22
Elementary Statistics
Minitab: Coding of Data
Coding of data: Sometimes one requires to code the data. For example all the 1s, 2s should be changed to A and B respectively. The utility can be found under Data menu.
23
23
Elementary Statistics
Minitab: Coding of Data
Code Input Screen:
Select column which needs to be coded
24
24
Elementary Statistics
Minitab: Coding of Data
Code output: The column C1 is coded and the output is displayed on worksheet in column C2.
25
25
Elementary Statistics
Minitab: Change Data Type
Change Data Type: Sometimes one requires to code the data. For example all the 1s, 2s should be changed to A and B respectively. The utility can be found under Data menu.
26
26
Elementary Statistics
Minitab: Change Data Type
Change Data Type Input: Example used is Numeric to Test
Select column whose data needs to be changed from numeric data type to text data type
27
27
Elementary Statistics
Minitab: Change Data Type
Change Data Type Output: Column C1 with numeric data is changed to text data and output is displayed in C2
28
28
Elementary Statistics
Minitab: Calculator
Using Calculator: This utility provides arithmetic and logical operations, mathematical functions. Expressions may include columns, numbers, and text. Example: C1 + C2 would mean each new row entry would have summation of corresponding row entries from C1 and C2
found
29
29
Elementary Statistics
Minitab: Calculator
Calculator Input Screen:
Keypad with basic arithmetic and logical operators Select column which will store output
Expression (calculation). In this example each new data will be generated by subtracting mean and dividing by standard deviation.
30
30
Elementary Statistics
Minitab: Calculator
Calculator output for the example: Each corresponding value is calculated based on expression. E.g. the first entry 0.92 equals (25mean (age))/standard deviation (age)
31
31
Elementary Statistics
Minitab: Make Patterned Data
Making Patterned Data: This utility provides easy way to fill a column with numbers that follow a pattern. E.g. if someone would like a new sequence with each number 1,2 repeated twice and the whole sequence repeated twice. The new sequence would look like 1,1,2,2,1,1,2,2
under
32
32
Elementary Statistics
Minitab: Make Patterned Data
Make Patterned Data Input Screen: Example for text patterned data
Select column where output would be displayed
33
33
Elementary Statistics
Minitab: Make Patterned Data
Make Patterned Data output for the example: Each of the names is repeated twice and the sequence is repeated twice as requested in input screen
34
34
Elementary Statistics
Minitab: Random Data
Random Data: This utility generates random data based on the selected distribution. The utility can either sample data from a existing column or generate random data based on 24 different distributions.
35
35
Elementary Statistics
Minitab: Sample from Columns
Example: Sample from Column
Number of row data to be samples
36
36
Elementary Statistics
Minitab: Sample from Columns
Sample from Column output: As per input three rows of data is sampled without replacement
37
37
Elementary Statistics
Minitab: Data from Distribution
Data Based on Distribution: Out of 24 distributions provided any one of the distribution can be used based on requirement to generate rows of data. Following is the input screen for Normal Distribution with 10 rows of data with mean=0 and st dev=1.0
Number of required rows of data
Storage column
Mean
Standard deviation
38
38
Elementary Statistics
Minitab: Data from Distribution
Data Based on Distribution: Based on the input the out throws 10 rows of data with normal distribution with mean=0 and st dev=1. Also written as N(0,1)
39
39
Elementary Statistics
Minitab: Using Brush
Brush is the utility which can be used to identify data point/s on the graph. It can be found under Editor menu. In the following example brush is used to identify data on right, it reflects on worksheet as 9th and 10th row.
New window opens up and displays selected row number Use mouse to select data point on graph
40
40
Elementary Statistics
Minitab: Using Help
One can become expert in using Minitab over a period of time. The help menu can be referred for almost everything where one needs help. The Minitab 14 help menu has: Search mode for keywords StatGuide for all statistical references Complete tutorials on all the utilities
41
41