Data
Quantitative Qualitative
(numerical) (categorical)
Qualitative Data
• Ideas
• Opinions
• Categorical Evaluation
Examples:
Color Preference
Favored Political Candidate
Quality Evaluation - Defective of non-defective
Quantitative Data
Annual Income
Football Attendance
Interest Rates
Industrials Average
Number of Defective Parts in a Shipment
Number of Late Deliveries Last Month
Percentage of Satisfied Customers
Discrete Continuous
Cross-sectional and Time series data
• Cross-sectional data: Data collected at
the same or approximately the same
point in time.
Examples?
• Time series data: Data collected over
several time periods.
Examples?
Data Sources
Primary Secondary
Data Collection Data Compilation
Print or Electronic
Observation Survey
Experimentation
Methods of Obtaining Data
• Observation
• Personal Interviews
– Structured
– Unstructured Primary Data
• Telephone Surveys
• Mail Questionnaires
• Bar Codes
• Scanners
• Reproduced
Secondary Data
Data Collection
• Designing experiments
– Does aspirin help reduce the risk of heart attacks?
• Observational studies
– Polls - Patil’s approval rating
Statistical Methods
• Descriptive statistics
– Collecting and describing data
• Inferential statistics
– Drawing conclusions and/or making
decisions concerning a population based
only on sample data
Descriptive Statistics
• Collect data
– e.g. Survey
• Present data
– e.g. Tables and graphs
• Characterize data ∑X i
n
– e.g. Sample mean =
Inferential Statistics
• Estimation
– e.g.: Estimate the population
mean weight using the sample
mean weight
• Hypothesis testing
– e.g.: Test the claim that the
population mean weight is 120
pounds
Drawing conclusions and/or making decisions
concerning a population based on sample results.
Population consists
of all bulbs A sample of 200
manufactured with bulbs is
the new filament. manufactured with
the new filament.
Average lifetime is
unknown
Population Sample
Populations and Samples
Sample
Population
Sample
Sample
Samples
Again Sample Defined:
A Subset of a population.
A Representative Sample
– Has the characteristics of the population
N = 40 shipments
Organizing the Data
Step 1
Form a Data Array: Sort the data in numerical ord
Raw Data:
0 2 3 4 1 0 0 1
3 0 3 1 1 0 0 0
2 2 0 0 0 1 2 0
4 1 0 1 0 0 0 1
1 0 0 0 0 1 3 1
Data Array
Low 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 1 1 1 1 1
1 1 1 1 1 1 2 2
2 2 3 3 3 3 4 4 High
Organizing the Data
Step 2
Construct a Frequency Distribution
• Ungrouped Frequency Distribution
– When the variable has only a few different values
– Number of data values may be high or low
x Frequency
0 19
1 11
2 4
Frequency Distribution
3 4 N = 40 values
4 2
Presenting Data
Forming a Histogram
On-Time Delivery Example
25
20
Frequency
15
10
0
x
0 1 2 3 4
Days Late
Relative Frequency Distribution
On-Time Delivery Example
x Frequency Relative Frequency
0 19 19/40 = .475
1 11 11/40 = .275
2 4 4/40 = .100
3 4 4/40 = .100
4 2 2/40 = .050
40 1.000
.50
.375
.25
0 x
0 1 2 3 4
Days Late
Comparing Two Companies:
On-Time
1
Delivery
2
Distributions
x Frequency Frequency
0 19 190
1 11 110
2 4 40
3 4 40
4 2 20
40 400
Comparing Two Companies:
On-Time
1 2
Delivery Distributions
X F RF F RF
0 19 .475 190 .475 0.5
1 11 .275 110 .275 0.4
2 4 .100 40 .275
0.3
3 4 .100 40 .275 Comp 1
Comp 2
4 2 .050 20 .050 0.2
0.1
40 400
0
0 1 2 3 4
Not a
Histogram !!!
Cumulative Frequency Distribution
On-Time Delivery Example
Cumulative Frequency
Histogram
X F CF
0 19 19
1 11 30
2 4 34
3 4 38
4 2 40
40
Design of Survey Research
• Choose an appropriate mode of response
– Reliable primary modes
• Personal interview
• Telephone interview
• Mail survey
– Less reliable self-selection modes (not appropriate
for making inferences about the population)
• Television survey
• Internet survey
• Printed survey on newspapers and magazines
• Product or service questionnaires
Design of Survey Research(continue
d)
• Identify broad categories
– List complete and non-overlapping categories
that reflect the theme
• Formulate accurate questions
– Make questions clear and unambiguous. Use
universally-accepted definitions
• Test the survey
– Pilot test the survey on a small group of
participants to assess clarity and length
Design of Survey Research(continue
d)
k=8
Stratified Samples
• Population divided into two or more groups
according to some common characteristic
• Simple random sample selected from each
group
• The two or more samples are combined into
one
Cluster Samples
• Population divided into several “clusters,”
each representative of the population
• Simple random sample selected from each
• The samples are combined into one
Population
divided
into 4
clusters.
Advantages and Disadvantages
• Simple random sample and systematic sample
– Simple to use
– May not be a good representation of the population’s
underlying characteristics
• Stratified sample
– Ensures representation of individuals across the entire
population
• Cluster sample
– More cost effective
– Less efficient (need larger sample to acquire the same
level of precision)
Evaluating Survey Worthiness
• What is the purpose of the survey?
• Is the survey based on a probability sample?
• Coverage error – appropriate frame
• Non response error – follow up
• Measurement error – good questions elicit
good responses
• Sampling error – always exists
Types of Survey Errors
Excluded from
• Coverage error frame.
Bad Question!