Anda di halaman 1dari 28

Introduction to Statistics

Quantitative Methods for Economics Dr. Katherine Sauer Metropolitan State College of Denver
6/26/12

An Introduction to Statistics Statistics is a discipline concerned with the - collection, classification and interpretation of quantitative data - application of probability theory to the analysis and estimation of population parameters A statistic is a sample characteristic. - a sample is a subset of the population - a population is the entire set of objects being studied A parameter is a population characteristic.
6/26/12

A. Random Samples When selecting a sample from the population, a random sample should be used. - minimize any bias simple random sample = sample in which every member of the population has an equal, non-zero and known probability of being selected random sample = sample in which every member of the population has a non-zero, known but not necessarily equal probability of being selected

6/26/12

Problems with Random Samples: 1. population list may be unavailable 2. may be time consuming and/or costly 3. non-response may be high 4. may be biased if certain subgroups are numerically small but important to the study

6/26/12

Stratified Random Sampling - ensures key sub-groups in the population are included in the random sample First, divide population into sub-groups (strata). Calculate how many observations are needed from each strata to reflect their proportion in the population. Then, choose a random sample of that many observations from each subgroup.

6/26/12

Example: Suppose a firms employees are classified in the following way:


Classification of Employees Category Male Female Management 10 20 Professional 50 40 Administration 40 60 Services 60 20 Total 160 140 Total 30 90 100 80 300

This firm wishes to administer an office climate survey to a random sample of 30 employees. How many people from each strata should get the survey?

6/26/12

Step 1: Calculate the proportion of the population which is required for the sample. proportion of the population to be = total sample size sampled total population size = 30 300 = 0.10 So, 1/10 of each group needs to be selected for the sample.

6/26/12

Step 2: Multiply each groups size by the proportion you calculated. - round to the nearest whole number
Classification of Employees Category Male Female Management 10 20 Professional 50 40 Administration 40 60 Services 60 20 Total 160 140
Number Needed fromEach Group Category Male Female Management 1 2 Professional 5 4 Administration 4 6 Services 6 2 Total 16 14
6/26/12

Total 30 90 100 80 300

16 + 14 = 30

Step 3: Select the required number from each group randomly. Ex: Assign a number between 1 and 10 to each man from management.
Name Ben Damian Greg Jeremy Matt Mohammad Nick Simon Teddy Will Assign ID 1 2 3 4 5 6 7 8 9 10

Use the randbetween function in Excel to randomly generate a number between 1 and 10. The number the Excel generates will be the ID number of the person chosen for the sample.

6/26/12

In Excel, when you type =r in any cell, a list of functions that start with the letter r will pop up.

6/26/12

Since you know you want to use the randbetween function, finish typing it and end the command with (1,10) to indicate you want a random number between 1 and 10.

Hit enter to see your random number. Looking back at our table, Mohammad would be chosen for the sample. We only need one member from this group.

6/26/12

Cluster Sampling Clusters are geographical areas or units like schools, households, etc.. Once the clusters have been defined, the required number of clusters is selected randomly. Then, depending on the nature of the research, all or some of the individuals in each cluster are surveyed. One-Stage Clustering If each cluster is divided into smaller clusters and a random sample of them is chosen, it is called Two-Stage Clustering. Multi-Stage Clustering is another option.

6/26/12

Problems with Stratified and Cluster Sampling: A stratified sample suffers from the same problems as a simple random sample. Each cluster should be representative of the population but in reality this may be difficult to achieve.

6/26/12

B. Non Random Sampling A population list may not be available. The researcher may have to use their judgment to determine the selection of the sample. (judgment samples) Stratified Quota Sample - calculate the number needed from each strata - the selection of individuals from each strata is not random Other - self-selected sample - focus group - opportunity sample

6/26/12

II. Sorting and Classifying Data Qualitative or Categorical Data = defined by some characteristic or quality nominal data = a group characteristic like gender or profession ordinal data = the result of ranking something in order of preference (e.g. products, TV shows)

6/26/12

Quantitative or Numeric Data = described numerically by counts or measurements discrete = can only take certain distinct values e.g. a die can only turn up 1,2,3,4,5 or 6 when thrown once continuous = can be any value from a continuous set of values e.g. temperature - usually round to a specified number of decimal places

6/26/12

Suppose we have the following raw data on MP3 player sales for the month of January.
30 49 15 55 38 Number of MP3 Players Sold Daily 11 29 34 54 31 42 45 25 18 13 25 13 55 38 31 43 22 37 20 36 36 25

Lets construct a Frequency Distribution Table in order to make some sense of the numbers. Typically between 5 and 20 intervals are chosen. Lets choose 10-19, 20-29, 30-39, 40-49 and 50-59 as our intervals.
6/26/12

First, we need to sort our data. In Excel, click on the cell at the top of your data column. Then from the data tab, click sort.

6/26/12

Once the data are sorted, tally the frequency of sales in each interval.
January Sales of MP3Players Daily Sales Frequency 10to 19 5 20to 29 6 30to 39 9 40to 49 4 50to 59 3 Total 27

Now that we have sorted and tallied our data, we can more easily make observations about it. For Example:

6/26/12

We can also present the frequency data in a graph. Highlight the top left cell. Click the Insert tab. Select Column. Select 2-D.

6/26/12

Frequency
12

10

Youll need to change the title of your graph and add appropriate axis labels. Also, turn off the legend.
6/26/12

Daily MP3 Player Sales in January


12

10

Now that we have constructed our graph, we can visually make sense of the data.
6/26/12

Frequency

0 Number of MP3 Players Sold in a Day

A histogram is a graphical representation of frequency distributions for numeric data. - the area of each rectangle is proportional to the frequency of the interval - intervals may be equal or unequal - typically no gaps in bars
Daily MP3 Player Sales in January
12 10 8 6 4 2 0 Number of MP3 Players Sold in a Day

In an Excel chart, right click on a bar and select format data series. Select no gap.

Frequency

6/26/12

Example of unequal intervals:

Why did Total International Emigration by Age: 1995 and 2002 they Outflow (thousands) choose Age 1995 2002 the ages Under 15 32.6 25 15 - 24 69.1 91.9 they did 25 - 44 106.5 186.4 for these 45 - 64 21 46.2 intervals 65 and over 7.3 9.9 ? The age group Under 15 contains all ages that round from 0 to not 15. So, it has a lower bound of zero and an upper bound of 14.499999. - interval of 14.4999 The age group 15 24 contains all ages that round from 15 to 24. - interval of 10 (14.5 to 24.4999999)
6/26/12

The age group 25 44 contains all ages that round from 25 to 44. - interval of 20 The age group 45 64 contains all ages that round from 45 to 64. - interval of 20 The age group 65 and over has no upper bound. We may wish to choose a reasonable one. If we choose 84 (reasonable), the interval will be 20. There is a way to calculate the height of the histogram bars by hand, but most people simply use the command in their data processing software.
6/26/12

A Cumulative Frequency Graph (Ogive) depicts the total number of data that have values less than the upper class boundary of each interval as given in the frequency distribution table. Recall our frequency distribution table for MP3 players.
January Sales of MP3Players Daily Sales Frequency 10to 19 5 20to 29 6 30to 39 9 40to 49 4 50to 59 3 Total 27

Re-working it we get,
Daily Sales 0 to 9 10 to 19 20 to 29 30 to 39 40 to 49 50 to 59 Frequency 0 5 6 9 4 3 Less than Cumulative frequency 10 0 20 5 30 11 40 20 50 24 60 27

6/26/12

The resulting Ogive graph:


Cumulative Frequency of Daily MP3 Player Sales
12

10

Cumulative Frequency

0 Daily Sales of MP3 Players

6/26/12

Skills: basic terminology of statistics given raw data: perform stratified random sampling construct frequency distribution table construct frequency distribution chart construct ogive graph

6/26/12

Anda mungkin juga menyukai