Anda di halaman 1dari 41

Elementary Statistics

Elementary Statistics and Minitab Primer


1

Elementary Statistics
Basic Terminology
Population: a well-defined collection of observations. The mean and standard deviation of a population are represented by and respectively. E.g. Age of all employees of company XYZ

Sample: a subset of a population. The mean and standard deviation of the sample are represented by and x s respectively. E.g. A random sample of size 100 from the above data of ages of all employees of company XYZ. Parameter: characteristic of a population. E.g. = mean of ages for the above mentioned population

Statistic: characteristic of a sample. E.g. mentioned sample

= mean of ages for above x


2

Elementary Statistics
Basic Terminology
Descriptive statistics: calculate figures, tables, or graphs to summarize or characterize a set of data. Each figure summarizes a set of number in a unique way. Tables such as frequency count, stem leaf plots. Graphs such as histograms, Box and Whisker plot. Measures of variability such as Standard deviation, variance. Measures of Central tendency such as mean, median, mode.

Elementary Statistics
Types of Data
There are two types of quantitative Data: Continuous Data: Continuous data can take any value between two given values of data. Continues data usually has a unit attached. E.g. Time, Length, Weight

Discrete Data: Discrete data can take only few defined values. The other terms used interchangingly with discrete data are binary and cardinal. E.g. Yes/No, 0/1, High medium Low.

Elementary Statistics
Measures of Central Tendency
Mean: commonly known as arithmetic average or average. E.g. average of {1, 2, 5, 3, 4} is equal to (1+2+5+3+4)/5 which is 3 Median: Median is the central point if the given data were to arranged in increasing or decreasing order. E.g. if data {1, 2, 5, 3, 4} is arranged in increasing order 1
2

Median

3 4 5

Mode: Mode is the data point with highest frequency, i.e. one which appears the most. E.g. in data {1, 2, 5, 2, 3, 4, 2, 2 } 2 appears the most (4 times) hence 2 is mode. Other less frequently used measures of central tendency are geometric mean and harmonic mean
5

Elementary Statistics
Measures of Dispersion
Range: it is the difference between highest and lowest values in the data. E.g. in data {1, 3, 2, 3, 2, 5, 4, 2} the lowest value is 1 and highest is 5. So Range = 5-1 = 4 Standard deviation and Variance: both measure variability of data about the mean. Variance is square of Standard deviation.
Population Standard deviation with N observations
2 x ( ) i

Sample Standard deviation with n observations


2 ( x x ) i

n 1

Note: sample standard deviation is divided by n-1 to compensate for loss in degrees of freedom

Elementary Statistics
Measures of Position
Quartiles: Every data set has three quartiles. If the data is sorted in ascending order the first quartile is at 25th percentile (Q1), second quartile is at 50th percentile (Q2) and third quartile is at 75th percentile (Q3).
Data is ascending order
25% Q1

50%

Q2 (median)

75%

Q3

Elementary Statistics
Distribution
In statistics we study outcome of random experiments1. Random variable is the assigned value to each of the outcomes of experiment. Chances associated with each of these outcome is called probability of outcome. Sum total of all probabilities of outcomes for the experiment is always one. When these probabilities for each of the outcome for given experiment is plotted on graph is called probability distribution. Distribution can be of two type: Discrete (outcomes can take limited number of values) Continuous

of a dice is an random experiment. The outcome of this experiment could take any value from {1,2,3,4,5,6}. With total number of outcomes being 6, the chances associated with each one of them to happen is 1/6.

1Rolling

Elementary Statistics
Experiment to Distribution
Distributions are the result of random events. Examine probabilities. Probability that an event will occur: Number of favorable outcomes Total number of outcomes Plotting the outcomes against the associated probability gives one distribution.

Elementary Statistics
Plotting Distribution
An Experiment: Study distribution of outcomes from throw of two dice. Outcome: Outcome is defined as sum of numbers appear on the two dice Total outcomes from 1st dice = 6 Total outcomes from 2nd dice = 6 Total possible combinations from the experiment = 6*6 = 36 The outcomes would than vary from 2 to 12. (as 1 is the least and 6 the maximum one can get on a dice)

10

10

Elementary Statistics
Plotting Distribution
Lets look further into possibilities. E.g. How could we get sum of 3? And what is the associated probability? The ways in which 3 can happen are {1,2} and {2,1} Two ways Since we have total 36 combinations from the throw Probability (x=3) = 2/36 = 0.056

Similarly one can calculate probability associated with 2, 4, 5 up to 12

11

11

Elementary Statistics
Plotting Distribution
0.167

0.139

0.139

0.111

0.111

Probability
0.056

0.083

0.083

0.056

0.028

0.028

Outcome

10

11

12

Sum of all the probabilities i.e. P(2)+P(3)++P(12) = 1


12

12

Elementary Statistics
Normal Distribution
Normal distribution is a continuous distribution (given the range the random variable can take any value within) The two reasons why normal distribution is very important distribution are: It has some properties which makes it applicable to many situations where to make inference about the population taking sample is necessary The distribution is close to fitting the actual observed frequency of many phenomena. E.g. height, weight of a sample from a country.

13

13

Elementary Statistics
Normal Distribution
Characteristics of Normal Distribution: The distribution is uni-modal. Mean lies at the center of distribution, and the median and mode coincide with mean. The two tails of distribution extend indefinitely and never touch axis.

14

14

Elementary Statistics
Normal Distribution
1 Point of Inflection: The point on the curve which is 1 standard deviation far from the mean

Example of two normal distributions with equal mean and different standard deviation.
15

15

Elementary Statistics
Minitab1
Six sigma uses a lot of statistical tools for data exploration and analysis. Minitab is a statistical software which provides one with statistical capabilities without getting into details of manual mathematical calculations. Minitab1 helps Six Sigma professional with: Data manipulation tools Descriptive and Graphical analysis of data Hypothesis testing and designing of experiments

1MINITAB

is registered trademark of Minitab Inc.

16

16

Elementary Statistics
Minitab: How does it look ?
Following are the different sections of Minitab window
Menu Toolbar Session window to display the output of statistical test and calculations Worksheet for pasting data to be analyzed

Windows bar: Displays all the minimized windows in the project

Status bar

17

17

Elementary Statistics
Minitab: Data Types
Minitab recognizes three types of data and is displayed in the column number:

Numerical or numbers

Text like Red or India 11

Date/Time

18

18

Elementary Statistics
Minitab: Menu Items
Minitab menu can be broadly described as follows:
Data menu: This menu consists of tools on worksheet manipulation and data manipulation. Includes important utilities like stack/un-stack, sort, rank data. Also includes utilities for coding and changing data types. Calc menu: Provides calculator to change the data from existing column using calculations. Also has utilities to create new patterned data and generating data based on selected distributions. Stat menu: This has all statistical analysis tools from descriptive, control charts , hypothesis tests to all other statistical utilities to analyze data. Graph menu: This menu has all the graphs. Other menu options File, Edit, Window, Help etc. remain similar to any other software

19

19

Elementary Statistics
Minitab: Stack / Unstack Data
Stack/ Un-stack data: Following examples shows what is stacking of data, unstack is just opposite of stack.
Stacking data in worksheet would mean putting columns of data one below another in the same column.
1 1 1 2 2 2 3 3 3 1 1 1 2 2 2 3 3

In Minitab the utility can be found under data menu.

20

20

Elementary Statistics
Minitab: Stack / Unstack Data
Unstack Column input window: Select Data > Unstack Columns
Select column which needs to be unstacked

Includes the column which has subscript by which the main column will be un-stacked

21

21

Elementary Statistics
Minitab: Stack / Unstack Data
Unstack Column output: The output is displayed in worksheet
Stacked data and subscripts in column C2 Un-stacked data in three different columns

Similarly the stack utility is used


22

22

Elementary Statistics
Minitab: Coding of Data
Coding of data: Sometimes one requires to code the data. For example all the 1s, 2s should be changed to A and B respectively. The utility can be found under Data menu.

23

23

Elementary Statistics
Minitab: Coding of Data
Code Input Screen:
Select column which needs to be coded

Column where output is required

Key to code the data. E.g. 1 should be coded as A

24

24

Elementary Statistics
Minitab: Coding of Data
Code output: The column C1 is coded and the output is displayed on worksheet in column C2.

25

25

Elementary Statistics
Minitab: Change Data Type
Change Data Type: Sometimes one requires to code the data. For example all the 1s, 2s should be changed to A and B respectively. The utility can be found under Data menu.

26

26

Elementary Statistics
Minitab: Change Data Type
Change Data Type Input: Example used is Numeric to Test

Select column whose data needs to be changed from numeric data type to text data type

Give column name where new values need to be saved

27

27

Elementary Statistics
Minitab: Change Data Type
Change Data Type Output: Column C1 with numeric data is changed to text data and output is displayed in C2

28

28

Elementary Statistics
Minitab: Calculator
Using Calculator: This utility provides arithmetic and logical operations, mathematical functions. Expressions may include columns, numbers, and text. Example: C1 + C2 would mean each new row entry would have summation of corresponding row entries from C1 and C2

The utility can be under Calc menu.

found

29

29

Elementary Statistics
Minitab: Calculator
Calculator Input Screen:
Keypad with basic arithmetic and logical operators Select column which will store output

Expression (calculation). In this example each new data will be generated by subtracting mean and dividing by standard deviation.

Dropdown of type of functions like arithmetic, trigonometry, statistics etc.

List of all the functions

30

30

Elementary Statistics
Minitab: Calculator
Calculator output for the example: Each corresponding value is calculated based on expression. E.g. the first entry 0.92 equals (25mean (age))/standard deviation (age)

31

31

Elementary Statistics
Minitab: Make Patterned Data
Making Patterned Data: This utility provides easy way to fill a column with numbers that follow a pattern. E.g. if someone would like a new sequence with each number 1,2 repeated twice and the whole sequence repeated twice. The new sequence would look like 1,1,2,2,1,1,2,2

The utility can be found Calc menu

under

32

32

Elementary Statistics
Minitab: Make Patterned Data
Make Patterned Data Input Screen: Example for text patterned data
Select column where output would be displayed

Text values to be used in making patterned data

Multiplier for value repetition

Multiplier for sequence repetition

33

33

Elementary Statistics
Minitab: Make Patterned Data
Make Patterned Data output for the example: Each of the names is repeated twice and the sequence is repeated twice as requested in input screen

34

34

Elementary Statistics
Minitab: Random Data
Random Data: This utility generates random data based on the selected distribution. The utility can either sample data from a existing column or generate random data based on 24 different distributions.

35

35

Elementary Statistics
Minitab: Sample from Columns
Example: Sample from Column
Number of row data to be samples

Name of the Column from where sample to be drawn

Storage column for sample

Option of sampling with or without replacement

36

36

Elementary Statistics
Minitab: Sample from Columns
Sample from Column output: As per input three rows of data is sampled without replacement

37

37

Elementary Statistics
Minitab: Data from Distribution
Data Based on Distribution: Out of 24 distributions provided any one of the distribution can be used based on requirement to generate rows of data. Following is the input screen for Normal Distribution with 10 rows of data with mean=0 and st dev=1.0
Number of required rows of data

Storage column

Mean

Standard deviation

38

38

Elementary Statistics
Minitab: Data from Distribution
Data Based on Distribution: Based on the input the out throws 10 rows of data with normal distribution with mean=0 and st dev=1. Also written as N(0,1)

39

39

Elementary Statistics
Minitab: Using Brush
Brush is the utility which can be used to identify data point/s on the graph. It can be found under Editor menu. In the following example brush is used to identify data on right, it reflects on worksheet as 9th and 10th row.
New window opens up and displays selected row number Use mouse to select data point on graph

Row is marked with black dot

40

40

Elementary Statistics
Minitab: Using Help
One can become expert in using Minitab over a period of time. The help menu can be referred for almost everything where one needs help. The Minitab 14 help menu has: Search mode for keywords StatGuide for all statistical references Complete tutorials on all the utilities

41

41

Anda mungkin juga menyukai