Anda di halaman 1dari 55

Chapter Two

Organization and Presentation of


Data

Describing

Data: Frequency
Presentation

Distributions and Graphic

GOALS
When you have completed this chapter, you will be able to:
1. Organize data in an array and into a frequency distribution.
2. Portray a frequency distribution in a histogram, frequency
polygon,and cumulative frequency polygon.
3. Present data using such graphic techniques as line charts,
bar charts, and pie charts, in order to interpret the
data being graphed.

Raw data, or data that have not been summarized


in any way, are sometimes referred to as

ungrouped data.

Collected data need to be organized in such a way


as to condense the information they contain in a
way that will show patterns of variation clearly.
Precise methods of analysis can be decided upon
only when the characteristics of the data are
understood.
Data that have been organized in a frequency
distribution are called grouped data.
3

1. ARRAY (ORDERED ARRAY) is a serial arrangement of


numerical data in ascending or descending
order
Ordered array is an appropriate way of presentation
when the data are small in size(usually less than 20)
Example 1. Fifteen students were selected and
given a blood pressure check. Their
systolic pressures are recorded
below.
135, 120, 116, 119, 121, 125, 132, 136, 124,
130, 120, 140, 110, 120, 130
Array:
110, 116,119, 120, 120, 120, 121, 124, 125, 130,
130, 132, 135, 136, 140
4

2. Stem-and-Leaf Display
A simple way to see how the data are
distributed and where concentrations of
data exist
METHOD: Separate the sorted data series
into leading digits (the stems) and
Chap 2-5

Organizing Numerical Data:


Stem and Leaf Display
A stem-and-leaf display organizes data into groups
(called stems) so that the values within each group (the
leaves) branch out to the right on each row.
Age of College Students

Age of
Surveyed
College
Students

Day Students
16

17

17

18

18

18

19

19

20

20

21

22

22

25

27

32

38

42

Night Students
18

18

19

19

20

21

23

28

32

33

41

45

Day Students

Stem

Leaf

Night Students
Stem

Leaf

67788899

8899

0012257

0138

28

23

15

Chap 2-6

The raw data are the numbers of Congressional bills


vetoed during the administrations of seven U.S. presidents,
from Johnson to Clinton.
Johnson Nixon Ford Carter Reagan Bush Clinton
Vetoes 30
43
66
31
78
44
38
In stem-and-leaf terms, we could describe these data as
follows:
Stem (10s Digit) Leaf (1s Digit)
3/018
(represents 30, 31, and 38)
4/34
(represents 43 and 44)
6/6
(represents 66)
7/8
(represents 78)
7

3. Tabular Presentation
- a process of condensing classified data and arranging
them systematically in rows and columns.
A. Frequency Table for Categorical Variable
The figures found in the cells of the main body are the frequencies
and the percentages
Example 1. Distribution of Employees in Terms of Civil Status
Civil Status

FREQUENCY

PERCENTAGE

Single

725

36.25%

Married

250

12.50%

Widowed

375

18.75%

Separated

650

32.50%

TOTAL

2000

100%
8

Cross Tabulations:
The Contingency Table
A survey was conducted to study the importance of brand
name to consumers as compared to a few years ago. The
results, classified by gender, were as follows:
Importance of Brand
Name

Male

Female

Total

More

450

300

750

Equal or Less

3300

3450

6750

Total

3750

3750

7500

Chap 2-9

Example: Marada Inn


Guests staying at Marada Inn were asked to rate the
quality of their accommodations as being excellent,
above average, average, below average, or poor. The
ratings provided by a sample of 20 quests are shown
below.

Below Average Average


Above Average Above Average
Average Below Average
Average
Poor
Above Average Excellent
Average
Above Average
Above Average Average

Above Average
Above Average Above
Below Average
Poor
Above Average
Average
10

Example: Marada Inn

Frequency Table
Rating
Frequency
Poor
2
Below Average
3
Average
5
Above Average
9
Excellent
1
Total
20
11

B.

Contingency Table

A contingency table(or a two-way frequency table) is a table in which


frequencies correspond to two variables.(One variable is used to
categorize rows and a second variable is used to categorized
columns)
Example 2 Distribution of Employees in Terms of Gender and Smoking Status
Smoking Habit

Gender
male

female

Total

Smoker

100

70

170

Non-smoker

50

80

130

Total

150

150

300
12

Questions
1.
2.
3.
4.
5.
6.

How many are males in the sample?


How many are smokers in the sample?
How many females are non-smokers?
What percentage of the sample smokes?
What percentage of the smokers are males?
What percentage of the females are smokers?

13

C.

Frequency Distribution Table


- refers to the tabular arrangement (grouping) of all observations into
intervals or classes together with the count of the number of
observations that fall in each interval or class.
Remarks:
1. There is no clear and definite method of finding the number of intervals.
Finding the number of intervals depend upon the size of the data. It is
generally accepted that the number of intervals to use is from 6 to 15
intervals. If fewer than 6 intervals are used, much loss of information
will be brought about due to lumping of many observations into class. If
too many classes are used, it could present some irregularities in the
graphical representation because many classes or intervals may
contain small frequencies.
2. There is no definite rule in choosing the starting lower limit and the
class width. The starting lower limit could be the smallest observation
or any number closest to the lowest observation, or any multiple of the
class size (c).
3. In constructing the class limits, one may get one more or one less than
the suggested number of intervals.
14

Definitions:
Class interval the numbers defining a class
Class limits
the smallest and largest values that can fall in a
given class
Class boundaries
numbers that are halfway between the
upper limit of a class and the lower limit of the next class
Class size
length of the class interval; computed by taking
the difference between two successive upper/lower class
boundaries or
class limits.
Class mark
midpoint of an interval; computed by taking the
average of the lower and upper class limits of a given class
interval
Relative frequency obtained by dividing the class frequency by
the total number of observations
Relative percentage obtained by multiplying the relative
frequency the relative frequency by 100%

15

How to Construct a Frequency Distribution Table

(Suggested Steps)

1. Determine the range (R) of the observations.


R = highest value lowest value
2. Determine the number of class intervals (k) k n
Suggested Rule: k must be an integer. If the computed k is not
an integer, then
round it off to the next higher
integer.
c

R
k

3. Determine the class width (c).


Suggested Rule: c must have the same number of decimal places
as the original data.
4. Determine the lower limit and the upper limit of the class
intervals.
5. Determine the frequency for each interval, class marks, class
boundaries, cumulative frequencies (less than and greater than
cf), and the corresponding percentage.
16

Additional Columns in the FDT


1. Class boundaries(CB). The CBs are obtained by taking the midpoint

of the gaps between classes

LCB = LL (0.5)(one unit of measure)


UCB = UL + (0.5)(one unit of measure)
2.Class marks. It is the midpoint of a class.

LL UL UCB LCB

2
2
3. Relative Frequency(RF). This is the frequency of a class expressed in
proportion to the total number of observations.

RF

xi

Frequency
.
n

17

4. Cumulative Frequency. This is the accumulated frequency


of a class.
The <CF (less than CF). It is the total number of observations
whose values do not exceed the upper limit of the class.

The >CF( greater than CF).It Corresponds to the total


number of observations whose values are not less than the
lower limit of the class.
5. Relative Cumulative Frequency. This is the cumulative
frequency of a class expressed in proportion to the total
number of observations.

18

Example 3 The following are the length of service (in months)


of a
sample of 50 employees in a certain shoe factory.
70

87

99

112

127

78

88

99

114

132

80

89

100

115

132

80

89

106

117

132

82

92

106

119

135

84

94

107

120

136

85

95

108

123

136

86

95

109

124

140

87

97

110

125

140

87

98

112

125

146

Range = 146 70
= 76

k=sqrt(50) = 7.07
k8

c=R/k
=76/8
= 10

Organize the data using frequency distribution table.

EXCEL function: frequency(data_array, bins_array)


Press: Crtl-Shift-Enter

19

Output: Frequency Distribution Table


Class
Intervals

Freq

Class Marks

Class
Boundaries

<CF

>CF

Percentage

20

Output: Frequency Distribution Table


Class Intervals

Freq

70-79

80-89

12

90-99

100-109

110-119

120-129

130-139

140-149

Total

50
21

Example 2

A recent report showed the


following data for percentages of
executives in 42 top US
corporations suffering from drug
abuse problems.
Construct the frequency
distribution.

5.1

9.1

13.4

5.5

9.1

13.7

5.9

9.3

14.2

6.5

9.8

14.3

6.8

9.9

15

10

15.2

7.2

10.2

15.3

7.3

10.3

16

8.3

11

16.3

8.4

11.5

16.3

8.5

11.7

16.7

8.5

12.3

17

8.7

12.7

17.3

8.8

13.2

17.5

22

1.Find the highest and lowest value


2.Determine the range . It is the
difference between the highest and
lowest values in the data set.
3.Determine the number of classes k. The
number of classes is somewhat
arbitrary. In general your table should
have been between 5 and 20 classes.
A simple rule you can follow to
approximate the number
of classes is
k n
4.Determine the width of the class
intervals. (Round off result to the
nearest value whose precision is the
same as those of the raw data.
5.Select a starting point for the lower
class limit. (This can be the smallest
data or any convenient number less
than the smallest data value.

1.Hv=

Lv=

2.Range=Hv-Lv

Range=

3.n=
sqrt(n) =

4. w=range/k
w=
w=
w
5. Starting point
can be
23

Output: Frequency Distribution Table


Class
Intervals

Freq

5.1-6.8

6.9-8.6
8.7-10.4
10.512.2
12.314.0

7
10
3

14.115.8
15.917.6

Class Marks

5.95
7.75

Class
Boundaries

<CF

>CF

Relative
Percentage

5.05-

9.55
11.35

5
13.15
14.95
7
16.75
24

SEATWORK

The following are the average weekly mortgage interest rates for a
40-week period.

25

1. In the following stem-and-leaf display for a set of


two-digit integers, the stem is the 10s digit, and each leaf
is the 1s digit. What is the original set of data?
2/002278
3/011359
4/1344
5/47

26

SEATWORK
The accompanying data describe the hourly wage
rates (dollars per hour) for 30 employees of an electronics firm:
22.66 24.39 17.31
21.02 21.61
20.97 18.58 16.61
19.74 21.57 20.56 22.16 20.16 18.97 22.64 19.62
22.05 22.03 17.09 24.60 23.82 17.80
16.28 19.34
22.22 19.49 22.27 18.20
19.29 20.43
Construct a frequency distribution for these data.

27

2.58 In 2007, unemployment rates in the 50 U.S. states were reported as


follows.Source: Bureau of the Census, Statistical Abstract
of the United States 2009, p. 373.

28

a. Construct a stem-and-leaf display for these data.


b. Construct a frequency distribution for these data.
c. Determine the interval width and the class mark for
each of the classes in your frequency distribution.
d. Based on the frequency distribution obtained in part
(b), draw a histogram and a relative frequency polygon
to describe the data.

29

In a study of reaction times to a specific stimulus, an animal trainer


obtained the following data, given in seconds.
With reference to the table, determine:
a. the upper limit of the fourth class.
b.The class midpoint of the third class
c.The class boundaries of the second class
d.The size of the fifth class interval.
e.The number of animals which respond
to the stimulus in 3.0 to 3.6 seconds.

Class limits
2.3 2.9
3.0 3.6
3.7 4.3
4.4 5.0
5.1 5.7
5.8 6.4

Frequency
10
12
6
8
4
2

f.The number of animals which respond to the stimulus after 4.35 seconds
g.The percentage of animals which respond to the stimulus in 5.1 to 5.7 seconds.
hh.The percentage of animals which respond to the stimulus in less that 3.65
seconds.
30

GRAPHICAL
PRESENTATION

31

GRAPHICAL PRESENTATION
a method of presenting numerical
values or relationships in pictorial form.

1.LINE GRAPHS
2.BAR GRAPHS
3. PIE CHARTS

32

Membership Growth of
FICCO
Year
Total Number of
members

1980

987

1990

9186

2000

24026

2010

140128

2012

163722
33

Line Graph
Membership Growth Of FICCO

180000

163722

160000

140128

140000
120000
100000
Total Number of members

80000
60000
40000
20000

24026

987

9186

0
1980

1990

2000

2010

2012

34

Line graphs are typically used to show the


change or trend in a variable over time.

The line graph is capable of simultaneously showing


values of two quantitative variables (y, or vertical axis,
and x, or horizontal axis); it consists of linear segments
connecting points observed or measured for each
variable.

35

36

A Bar Chart- consists of a series of rectangular bars where the


length of the bar represents the magnitude to be
demonstrated.
It can be used to depict any of the levels of
measurement (nominal, ordinal, interval, or ratio).
Bar Chart
9
F
r
e
q
u
e
n
c
y

8
7
6
5

4
3
2
1
0
43-48

49-54

55-60

61-66

67-72

73-78

79-84

85-90

Class Intervals

37

38

Example 2.Construct a bar chart for the number


of unemployed per 100,000 population for
selected cities during 2001
City
Atlanta, GA
Boston, MA
Chicago, IL
Los Angeles, CA
New York, NY
Washington, D.C.

Number of unemployed
per 100,000 population
7300
5400
6700
8900
8200
8900

39

Number of unemployed per 100,000 population


Washington, D.C.

New York, NY

Los Angeles, CA

Chicago, IL

Boston, MA

Atlanta, GA
0

1000

2000

3000

4000

5000

6000

7000

8000

9000

10000

40

A Pie Chart is useful for displaying a relative frequency


distribution. A circle is divided proportionally to the
relative frequency and portions of the circle are
allocated for the different groups.
1. MONTHLY BUDGET OF A COLLEGE STUDENT
EXPENSES
Lodging
Food
School
Supplies
Pocket Money
Miscellaneous
Recreation

AMOUNT
P2,500
5,000
500
1,000
1,000
500

41

PIE CHART

Monthly Budget of s College Student


Recreation
6%
Miscellaneous
13%

Pocket
Money
13%

Food
62%

School Supplies
6%

42

EXAMPLE
A sample of 200 runners
were asked to indicate
their favorite type of
running shoe. Draw a pie
chart based on the
Type of
following information.
shoe

% of total

Nike

# of
runners
92

Adidas

49

24.5

Reebok

37

18.5

Asics

13

6.5

Other

4.5

46.0

43

Figure 5.Pie Chart for Running Shoes

Reebok
19%

Adidas
25%

Asics
Other
6%
4%

Nike
Adidas
Reebok

Nike
46%

Asics
Other

44

GRAPHICAL REPRESENTATION of the


FREQUENCY DISTRIBUTION
The three commonly used graphic
forms are Histograms, Frequency
Polygons, and Ogives

45

A Histogram is a graph in which the class midpoints


or limits are marked on the horizontal axis and the
class frequencies on the vertical axis.
The class frequencies are represented by the heights
of the bars and the bars are drawn adjacent to each
other.

Frequency

12
10
8
6
4
2
0
12.65

17.45

22.25

27.05

Hours spent studying

31.85
46

Example 2
A recent report showed
the following data for
percentages of
executives in 42 top US
corporations suffering
from drug abuse
problems.
Construct the frequency
distribution.

5.1

9.1

13.4

5.5

9.1

13.7

5.9

9.3

14.2

6.5

9.8

14.3

6.8

9.9

15

10

15.2

7.2

10.2

15.3

7.3

10.3

16

8.3

11

16.3

8.4

11.5

16.3

8.5

11.7

16.7

8.5

12.3

17

8.7

12.7

17.3

8.8

13.2

17.5

47

Output: Frequency Distribution Table


Class
Intervals

Freq

5.1-6.8

6.9-8.6

8.7-10.4

10

10.5-12.2

12.3-14.0

14.1-15.8

15.9-17.6

Class Marks

5.95

7.75
9.55
11.35

13.15
14.95

Class
Boundaries

<CF

>CF

Relative
Percentage

5.05-6.85

42

11.9

6.85-8.65

12

37

16.7

8.65-10.45

22

30

23.8

10.45-12.25

25

20

7.1

12.25-14.05

30

17

11.9

14.05-15.85

35

12

11.9

15.85-17.65

42

16.7

16.75

48

HISTOGRAM
Percentage of Executives Suffering from Drug Abuse Problems
12
10

10

8
NUMBER OF
CORPORATIONS

13.15

14.95

2
0
5.95

7.75

9.55

11.35

CLASS MARKS

49

A Frequency Polygon consists of line


segments connecting the points formed by
the class midpoint and the class frequency.
Percentage of Executives Suffering from Drug Abuse Problems
12
10
8
NO. OF CORPORATIONS

6
4
2
0
4.15

5.95

7.75

9.55

11.35
13.15
CLASS MARKS

14.95

16.75

18.55

50

Ogive(Cumulative frequency polygon)


A graph showing the cumulative frequency
plotted against the class boundary. The graph
can be a less than or greater than ogive
Less Than Ogive
42

45
L
e

40

35

s 35

30

s 30
t
h

22

25
20
12

a 15
n 10
c
f

25

5
0

0
5.05

6.85

8.65

10.45

12.25

Class Boundaries

14.05

15.85

17.65

For
less
than
ogive
plot a
point
above
each
UCB51

Greater Than Ogive


45
g

35

30

30

25

t
e c

20

20

f 15
10

37

40

42

17
12
7
0

5.05

6.85

8.65

10.45

12.25

14.05

15.85

For greater
than ogive plot
a point above
each LCB

17.65

Class Boundaries

52

The Abuse of Visual Displays


Remember that visuals can be designed to be either emotionally charged or
purposely misleading to the unwary viewer. This capacity to mislead is
shared by a great many statistical tests and descriptions, as well as visual
displays. We will consider just a few of the many possible examples where
graphical methods could be viewed as misleading.

53

54

55

Anda mungkin juga menyukai