CHAPTER 1
INTRODUCTION AND DESCRIPTIVE STATISTICS
1-1.
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
1-2.
Data are based on numeric measurements of some variable, either from a data set
comprising an entire population of interest, or else obtained from only a sample (subset)
of the full population. Instead of doing the measurements ourselves, we may sometimes
obtain data from previous results in published form.
1-3.
The weakest is the Nominal Scale, in which categories of data are grouped by qualitative
differences and assigned numbers simply as labels, not usable in numeric comparisons.
Next in strength is the Ordinal Scale: data are ordered (ranked) according to relative size
or quality, but the numbers themselves don't imply specific numeric relationships.
Stronger than this is the Interval Scale: the ordered data points have meaningful distances
between any two of them, measured in units. Finally is the Ratio Scale, which is like an
Interval Scale but where the ratio of any two specific data values is also measured in units
and has meaning in comparing values.
1-4.
quantitative/ratio
qualitative/nominal
quantitative/ratio
qualitative/nominal
quantitative/ratio
quantitative/interval
quantitative/ratio
quantitative/ratio
quantitative/ratio
quantitative/ratio
quantitative/ordinal
Name:
Wealth:
Age:
Industry:
Country of Citizenship:
Qualitative
Quantitative
Quantitative
Qualitative
Qualitative
1-5.
Ordinal.
1-6.
A qualitative variable describes different categories or qualities of the members of a data set,
which have no numeric relationships to each other, even when the categories happen to be coded
as numbers for convenience. A quantitative variable gives numerically meaningful information, in
terms of ranking, differences, or ratios between individual values.
1-7.
The people from one particular neighborhood constitute a non-random sample (drawn
from the larger town population). The group of 100 people would be a random sample.
1-1
1-8.
A sample is a subset of the full population of interest, from which statistical inferences
are drawn about the population, which is usually too large to permit the variables to be
measured for all the members.
1-9.
A random sample is a sample drawn from a population in a way that is not a priori biased
with respect to the kinds of variables being measured. It attempts to give a representative
cross-section of the population.
y
116.4
118.8
130.8
Quartiles
1st Quartile
Median
3rd Quartile
121
128
133
IQR
1-2
12
Percentile
rank of y
10
15
65
Quartiles
1st Quartile
Median
3rd Quartile
1-16.
-0.65
-0.15
0.575
IQR
1.225
1-3
y
4.0
0
0
Quartiles
1st Quartile
Median
3rd Quartile
1.17.
2
3
5
IQR
y
43.0
0
0
Quartiles
1st Quartile
Median
3rd Quartile
31.5
51
162.75
IQR
131.25
1-18.
The mean is a central point that summarizes all the information in the data. It is sensitive to
extreme observations. The median is a point "in the middle" of the data set and does not contain
all the information in the set. It is resistant to extreme observations. The mode is a value that
occurs most frequently.
1.19.
Median = 128
Modes = 128, 134, 136 (all have 4 points)
1-4
1-20.
Median
128
Mode
128
1-22.
Median
70
Mode
45
1-23.
Median
Mode
199.875
Median
51
1-5
Mode
#N/A
1-24.
1-25. (Using the template: Basic Statistics.xls, enter the data in column K.)
Basic Statistics from Raw Data
Measures of Central tendency
Mean 21.75
Median 13
Mode 12
iti
gr
ou
r
Pf
ize
of
t
M
ic
ro
s
ile
M
ob
Ex
xo
n
G
E
AT
&T
-10
In
te
l
Mean = 17.571
Median = 16.9
Outliers: -6.9, 46.5
18.35
Median
19.1
Mode
#N/A
1-28.
1-29.
The most important measures of variability are the variance and its square root- the standard
deviation. Both reflect all the information in the data set.
1-30.
For a sample, we divide the sum of squared deviations from the mean by n 1, rather than by n.
1-6
1-31.
For the data of Problem 1-13, assumed a sample: Range = 136 109 = 27
Variance = 57.74
Standard deviation = 7.5986
Variance
St. Dev.
If the data is of a
Sample
Population
57.7386364
7.59859437
1-32.
1-33.
1-34.
Variance
321.378788
St. Dev.
17.9270407
Variance
St. Dev.
1-35.
Population
If the data is of a
Sample
Population
3.98095238
1.99523241
1-36.
data points, so Chebyshev's theorem holds. The data set is not mound-shaped, so the empirical
rule does not apply.
1-37.
1-7
1-38.
1-39.
points, so Chebyshev's theorem holds. The data set is not mound-shaped, so the empirical rule
does not apply.
1-40.
n 16, x 199.9, s 332.1, so x 2s 464.3, 864.1 ; this captures 15/16 of the data points,
so Chebyshev's theorem holds. The data set is not mound-shaped, so the empirical rule does not
apply.
1-41.
Electrolux
GE
Matsushita
Whirlpool
B-S
Philips
Maytag
1-42.
Stock 5
Stock 4
Stock 3
Stock 2
Stock 1
0
10
15
20
1-8
1-43.
Endowments ($ billions)
$ billions
4
3
2
1
Texas
A&M
Columbia
Stanford
Yale
Princeton
University
24.13
Median
23.65
Measures of Dispersion
If the data is of a
Sample
Population
Variance
70.6312222
St. Dev.
8.40423835
1-44.
Texas
Harvard
30
25
20
15
10
5
0
1
1-9
10
1-45.
13.333333
Median
12.5
1-46.
1-47.
Using MINITAB
Stem
4 5
8 6
14 6
(9) 7
11 7
3 8
Leaves
5688
0123
677789
002223334
55667889
224
1-10
20<25
Sales
15<20
10<15
5<10
0<5
frequencies
Sales ($)
8
7
6
5
4
3
2
1
0
1-48.
8.5
7.9
C1
7.3
6.7
6.1
5.5
34 cases
There are no outliers. Distribution is skewed to the left.
1.49.
1-50.
A stem-and-leaf display is a quickly drawn type of histogram useful in analyzing data. A box plot
is a more advanced display useful in identifying outliers and the shape of the distribution of the
data.
Stem
1 0
1 1
1 2
7 3
(13) 4
11 5
2 6
1 7
Leaves
5
234578
2234567788899
012235678
3
8
1-11
1-51.
The data are narrowly and symmetrically concentrated near the median (IQR and the whisker
lengths are small), not counting the two extreme outliers.
C1
60
40
20
0
31 cases
1-52.
Wider dispersion in data set #2. Not much difference in the lower whiskers or lower hinges of the
two data sets. The high value, 24, in data set #2 has a significant impact on the median, upper
hinge and upper whisker values for data set #2 with respect to data set #1.
1-53.
Mean = 127
Var = 137
sd = 11.705
mode = 127
outliers: TWA, Lufthansa
1-12
160
150
140
130
120
110
100
1-54.
Stem-and-leaf of C2
Leaf Unit = 1.0
f
13
18
(6)
21
15
8
6
3
2
Stem
1
1
2
2
3
3
4
4
5
N = 45
Leaves
0011111223444
55689
022333
567789
0122234
78
012
7
23
1-55.
Outliers are detected by looking at the data set, constructing a box plot or stem-and-leaf display.
An outlier should be analyzed for information content and not merely eliminated.
1-56.
The median is the line inside the box. The hinges are the upper and lower quartiles. The inner
fences are the two points at a distance of 1.5 (IQR) from the upper and lower quartiles. Outer
fences are similar to the inner fences but at a distance of 3 (IQR). The box itself represents 50%
of the data.
1-13
1-57.
Mine A:
f
2
4
7
(5)
7
4
4
3
1
Stem
3
3
4
4
5
5
6
7
8
Mine B:
f
2
4
6
9
(3)
7
4
1
Leaves
24
57
123
55689
123
0
36
5
Stem
2
2
3
3
4
4
5
5
Leaves
34
89
24
578
034
789
012
9
Values for Mine A are smaller than for Mine B, right-skewed, and there are three outliers. Values
for Mine B are larger and the distribution is almost symmetric. There is larger variance in B.
1-58.
1-59.
Box Plot
1.60.
Lower
Hinge
0.275
Median
0.6
4.88
Median
4.9
1-14
Upper
Hinge
1.15
Upper
Whisker
1.6
Box Plot
0 to 60 times
Lower
Whisker
4.2
1-61.
Lower
Hinge
4.725
Median
4.9
Upper
Hinge
5.1
Upper
Whisker
5.3
1-15
Standard Deviation: since the variance is not affected by adding 5 to each data point,
neither is the standard deviation.
Skewness: Since each data point is increased by 5 and the average has also been shown to
increase by the same factor, the differences between each individual new data point and the
new average will not change. Therefore, the numerator in the formula for skewness is not
affected. Since the standard deviation is not affected as well (the denominator), there is no
change in the value for skewness.
Kurtosis: Since each data point is increased by 5 and the average has also been shown to
increase by the same factor, the differences between each individual new data point and the
new average will not change. Therefore, the numerator in the formula for kurtosis is not
affected. Since the standard deviation is not affected as well (the denominator), there is no
change in the value for kurtosis.
Interquartile Range: given that both the first quartile and the third quartile increased by the
same factor, 5, the difference between the two values remains the same.
c. Multiplying each data point by a factor 3 results in the following changes. The mean,
median, mode, first quartile, third quartile and 80th percentile values will be increased by the
same factor 3. In addition, the standard deviation and the range will also increase by the
same factor 3. The variance will increase by the factor squared, and the skewness and
kurtosis values will remain unchanged.
d. Multiplying all data points by a factor 3 and adding a value 5 to each data point has the
following results. The order of operation is first to multiply each data point and then add a
value to each data point. Each data point is first multiplied by the factor 3 and then the
value 5 is added to each newly multiplied data point. Multiplying each data point by the
factor 3 yields the results listed in c). Adding a value 5 to the newly multiplied data points
yields the results listed in a).
1.62. [Using the template: Basic Statistics.xls]
Measures of Central tendency
Mean
41.01
Median
23.8
Measures of Dispersion
Variance
St. Dev.
If the data is of a
Sample
Population
1136.941
33.7185557
1-16
1-63. = 504.688
= 94.547
504.6875
Median
501.5
Mode
#N/A
Range
IQR
346
149.5
Measures of Dispersion
Variance
St. Dev.
If the data is of a
Sample
Population
8939.15234
94.5470906
1-64.
Step 1: Enter the data from problem 1-63 into cells Y4:Y35 of the template: Histogram.xls from Chapter
1. The template will order the data automatically.
Step 2: We need to select a starting point for the first class, an ending point for the last class, and
a class interval width. The starting point of the first class should be a value less than the
smallest value in the data set. The smallest value in the data set is 344, so you would
want to set the first class to start with a value smaller than 344. Lets use 320. We also
selected 710 as the ending value of the last class, and selected 50 as the interval width.
The data input column and the histogram output from the template are presented below.
The end-point for each class is included in that class; i.e., the first class of data goes from
more than 320 up to and including 370, the second class starts with more than 370 up to
and including 420, etc.
1-17
1-65.
1-66.
6
Ogive: TV Sets
20
3
2
47.5
42.5
37.5
32.5
27.5
22.5
17.5
12.5
7.5
1
0
TV sets
cum freq
freq
5
4
15
10
5
0
10
15
20
25
30
TV Sets
1-67.
2
7
(3)
6
4
2
2
1-68.
Stem
1
1
2
2
3
3
4
Leaves
24
56789
023
55
24
01
C2
30
24
18
12
1-18
35
40
45
Stem Leaves
3 1
012
4 1
9
12 2
1122334
(9) 2
556677889
6 3
024
3 3
57
1 4
1 4
1 5
1 5
1 6
2
The data is skewed to the right with one extreme outlier (62) and three suspected outliers
(10,11,12)
80
C1
60
40
20
0
1.71.
8.0666667
Median
Mode
10
Based on just these three measures, cheap wine appears to work well in cooking
1-19
1-72.
Median
20.2
Measures of Dispersion
If the data is of a
Sample
Population
Variance
0.10909091
St. Dev.
0.33028913
Box Plot
10
11
12
Motorolas Stock
Prices
Lower
Whisker
19.8
Lower
Hinge
20.075
Median
20.2
1-20
Upper
Hinge
20.525
Upper
Whisker
20.8
1-73.
Mean = 33.271
sd = 16.945
var = 287.15
QL = 25.41
Med = 26.71
QU = 35
Outliers: Morgan Stanley (91.36%)
C1
80
60
40
20
15 cases
1-74.
Mean = 3.18
sd = 1.348
var = 1.817
QL = 1.975
Med = 2.95
QU = 3.675
Outliers: 8.70
C1
7
5
3
1
20 cases
1-21
1-75.
a.
b.
c.
d.
Minitab output:
Mean
56.28
8.12
StDev
42.73
12.21
Median
43.40
8.60
While the average of the Change in Provisions is close to the 4.1 average for all banks, the
average of the Change in Bad Loans is considerably higher than the industry average of 11.00.
The box plot for change in Bad Loans does not show any outliers.
1.76.
IQR = 3.5
data is right-skewed
9.5 is more likely to be the mode, since the data is right-skewed
Will not affect the plot.
100
80
60
40
20
0
1-22
The box plot for change in Provisions does show one possible outlier for W Holding at 37.3:
Boxplot of change in Provisions
40
change in Provisions
30
20
10
-10
1.77.
Mean
186.7
StDev
355.6
Median
56.2
The average for the bank assets of the 19 lending institutions is larger than the industry average of
149.30.
1-23
The box plot of bank assets show three possible outliers for Bank of America (1459), Wachovia
(707.1), and Wells Fargo (481.9)
Boxplot of bank assets
1600
1400
bank assets
1200
1000
800
600
400
200
0
1-78.
Mean
1720.2
Median
930
56.266667
Median
57
Measures of Dispersion
If the data is of a
Sample
Population
Variance
164.780952
153.795556
St. Dev.
12.8367033
12.4014336
The mean and median for the 15 selected countries are higher than the overall mean approval
rating of 53%.
1.78.
The chart indicates that there is a significantly large difference between the annual sales per
square foot for Apple Stores relative to the other four companies listed.
1-24
Measures of Dispersion
If the data is of a
Sample
Population
1987680.96
1409.8514
Variance
St. Dev.
1-80.
Mean = 99.039
sd = .4366
var = .1907
Median = 99.155
1-81.
Mean = 17.587
sd = .466
var = .2172
Measures of Central tendency
Mean
17.5875
Median
17.5
Mode
18.3
Range
IQR
1.4
0.75
Measures of Dispersion
If the data is of a
Sample
Population
Variance 0.21716667 0.20359375
St. Dev. 0.46601144 0.45121364
1-82.
Mean = 259.82
sd = 357.24
259.82
Median
9.5
Measures of Dispersion
If the data is of a
Sample
Population
Variance
127622.462
St. Dev.
357.242861
1-25
1-83.
Mean = 37.17
sd = 13.128
Median = 34
Measures of Central tendency
Mean
37.166667
Median
34
Measures of Dispersion
If the data is of a
Sample
Population
Variance
172.333333
St. Dev.
13.1275791
1-84. Stock Prices for period: April, 2001 through June, 2001 [Answers will vary due to dates
used.]
a). Mean and Standard Deviation for Wal-Mart
Basic Statistics from Raw Data
51.041478
Median
51.1266
Mode
50.158
Range
IQR
6.1911
1.9613
Measures of Dispersion
If the data is of a
Sample Population
Variance 2.25711298 2.22128579
St. Dev. 1.50236912 1.49039786
Higher Moments
If the data is of a
Sample Population
Skewness 0.07083784 0.06913994
(Relative) Kurtosis -0.711512 -0.7500338
1-26
10.450952
Median
10.66
Mode
11.8
Range
IQR
3.51
1.955
Measures of Dispersion
If the data is of a
Sample Population
Variance 0.9852023 0.96956417
St. Dev. 0.99257358 0.9846645
Higher Moments
If the data is of a
Sample Population
Skewness -0.4070262 -0.3972703
(Relative) Kurtosis -1.132009 -1.1378913
for K-Mart:
CV = 0.9846645 / 10.450952 = 0.0942
CV = 0.99257358 / 10.450952 = 0.09497
d). There is a greater degree of risk in the stock prices for K-Mart than for Wal-Mart over
this three month period.
e). For DJIA
Wal-Mart stocks provided a less risky return for this time period relative to DJIA and KMart.
f). 100 Shares of Wal-Mart stocks purchased April 2, 2001:
Price = $50.5674 Cost = $5056.74
Mean of holding 100 shares: $5104.15
1-27
Std dev of holding 100 shares: 1.4904 (rounded: if data considered a population)
1.5024 (rounded: if data considered a sample)
1-85.
a) & b): CPI and Gas prices for period: June 97 through May 01. (Non-seasonally
adjusted series.)
CPI index converted (by 100) in order to compare both series on same chart. There is no
seasonal pattern present in the CPI index. Steady trend present in CPI; considerable
variability in gas prices. Gas prices increased considerably more than the overall CPI for
the same time period.
1-28
1-87.
No.
6812
1992
3865
26518
99587
168723
168778
124398
72128
38118
20971
11636
10378
%
0.90%
0.26%
0.51%
3.52%
13.21%
22.38%
22.39%
16.50%
9.57%
5.06%
2.78%
1.54%
1.38%
AIDS cases by age
Under 5: (0.90%)
Ages 5 to 12: (0.26%)
Ages 65 or older: (1.38%)
Ages 13 to 19: (0.51%)
Ages 60 to 64: (1.54%)
Ages 55 to 59: (2.78%)Ages 20 to 24: (3.52%)
Ages 50 to 54: (5.06%)
Ages 25 to 29: (13.21%)
Ages 45 to 49: (9.57%)
No.
324822
282720
137575
5546
2234
1010
%
43.09%
37.50%
18.25%
0.74%
0.30%
0.13%
1-29
Lower
Hinge
650000
340000
Salaries 2004
Upper
Upper
Median
Hinge Whisker
1550000 5750000 9500000
775000 3875000 8000000
Cubs
White Sox
1-30
1-89
5.1477778
Median
5.35
Measures of Dispersion
If the data is of a
Sample
Population
Variance
0.36249444
St. Dev.
0.60207512
5000
4500
4000
3500
3000
2500
2000
1500
1000
500
0
Jan
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
Feb Mar
2000
3940.35
4696.69
4572.83
3860.66
3400.91
3966.11
3766.99
4206.35
3672.82
3369.63
2597.93
2470.52
1-31
Apr
May Jun
Jul
Nov Dec
2) Compare 2006 with 2007. [Please note: at the time of printing, data for 2007 was available only
through close on 5?25/07.]
Plots suggest there may be more volatility in 2006.
Standard deviation for 2006 = 105.3317
Standard deviation for 2007 = 82.3060
1-32
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
2006
2305.82
2281.39
2339.79
2322.57
2178.88
2172.09
2091.47
2183.75
2258.43
2366.71
2431.77
2415.29
2007
2463.93
2416.15
2421.64
2525.09
2604.52
2588.96
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
S&P
1438.24
1406.82
1420.86
1482.37
1530.62
1502.56
NASDAQ
2463.93
2416.15
2421.64
2525.09
2604.52
2588.96
There was more volatility in the NASDAQ Index in 2007 than in the S&P 500 Index in 2007.
Standard deviation for NASDAQ in 2007 = 82.3060
Standard deviation for S&P 500 in 2007 = 49.1033
4) Comparison of the NASDAQ with DJIA for 2000
1-33
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
DJI
12621.69
12268.63
12354.35
13062.91
13627.64
13360.26
NASDAQ
2463.93
2416.15
2421.64
2525.09
2604.52
2588.96
There was more volatility in the DJI Index in 2007 than in the NASDAQ Index.
Standard deviation for NASDAQ in 2007 = 82.3060
Standard deviation for DJIA in 2007 = 554.948
5). Answers will vary given date of assignment.
1-34