I week
1. Introduce yourself to students.
2. Explain methods of work and a system of evaluation.
Mode:
Required materials for next week classes will be placed on the course web page
(www.efsa.unsa.ba/nastava) in the current week. Students should be required to bring
calculator and materials from the site on the class. Students will be presented methodological
examples and they are expected to actively participate in the class. Students could also find
tasks for independent work in the materials. At the beginning of the class, a brief discussion
regarding the results of these examples is expected to be developed.
Evaluation:
10% - on line quizzes
10% - processing seminar topics
40% - midterm test
40% - final test
3. Recapitulation of basic concepts:
a. The definition of a mass phenomenon.
The mass phenomenon is a phenomenon that is manifested by the large number of objects.
We are investigating a mass phenomenon on the basis of information about the
characteristics (features, properties) of the elements that comprise them.
b. The definition of statistical units and statistical set (population).
The statistical units are the elements with series of features which serve as a base to
investigate the variation of the mass phenomenon (e.g. for mass production, the statistical
units can be factory or drive or machine). The size of the statistical set can be finite or
infinite.
A set of statistical units is statistical set, population or the basic set (e.g. for the mass
phenomenon such as production, statistical set is the set of all factories in a region or the set
of working facilities in the factory or the set of machinery in the facility or factory).
The statistical set size (denoted by N) is the number of statistical units that form one
statistical set.
Example 1:
Mass phenomenon: unemployment
Statistical set: working age population
Statistical units: working capable resident
The ordinal scale of measurement - Modalities are expressed descriptively, they can be
ordered according to a logical sequence. Sequence matters, not all modalities have the
same weight. Counting and logical operations of comparison (<, =, >) are allowed, but
not mathematical operations. It is possible to replace descriptive modalities by ordered
lists.
The interval scale of measurement - Modalities are expressed numerically, but the
position of zero is arbitrary. A zero does not mean absence of phenomenon. Differences
in measuring characteristics of elements are represented in differences in numbers in the
interval scale. Counting, subtraction and comparison are allowed.
The metric (ratio) scale of measurement - Modalities are numeric and a natural zero is
present. A zero stands for the absence of a phenomenon. All logical and mathematical
operations are allowed. It enables the most precise and comprehensive analysis.
e. Types of statistical variables.
Qualitative
o Nominal (attributive) (e.g., eye color)
o Ordinal (e.g. the profession of a university employee)
Quantitative
o Discrete -it can take only certain values from the interval (e.g. the grade)
o Continuous - it can take any value from the interval (e.g. height or weight).
f. Types of frequencies.
o Absolute frequency the number of occurrences (repetitions) of a modality.
n
fi ,
f
i =1
=N
CAF i =
fj
j =1
CRF i =
pj
j =1
Example 2:
Determine the correct data type (quantitative or qualitative). Determine whether quantitative data
are continuous or discrete. Also, determine whether qualitative data are nominal or ordinal.
Example 3:
The following data are the number of groceries in 20 suburbs:
1
2
1
3
3
2
4
2
4
5
5
6
2
2
1
6
2
2
3
5
a) Compare and comment the statistical series with the gross data, arranged statistical series
and statistical distribution of frequencies.
b) Calculate and explain the relative frequencies. Analyze population structure.
c) Calculate and explain the cumulative absolute frequencies.
d) Calculate and explain the cumulative relative frequencies.
Solution:
Population - suburbs, Statistical unit - suburb, Population size - N = 20. Variable (characteristic)
- the number of groceries in the suburb, Type of variables - quantitative discrete, Modalities 1,2,3,4,5,6,.
a) The series with gross statistical data is presented in the table above. That is the database with
data for each city ("scattered data"). In order to obtain the arranged statistical series, we must
sort the data by size:
1
2
2
3
5
1
2
2
4
5
1
2
3
4
6
2
2
3
5
6
The final form of data grouping is statistical distribution of frequencies, where each modality
(modality has n) is associated with appropriate absolute frequency (number of modality
repetitions in database):
xi - i modality of observed characteristics
f i - Absolute frequency, frequency (the number of impressions) of the i-th modality (classes,
interval)
n - number of modalities (classes, interval)
Number of
groceries
(modalities) - xi
Number of
suburbs
(absolute
frequency) - fi
Relative
frequency - pi
1
2
3
4
5
6
3
7
3
2
3
2
20
0,15
0,35
0,15
0,10
0,15
0,10
1,00
Cumulative
absolute
frequency CAF i
3
10
13
15
18
20
Cumulative
relative
frequency CRF i
0,15
0,50
0,65
0,75
0,90
1,00
fi
- Relative frequency (if you multiply by 100 we will get frequency percentage
N
Pi = pi 100% ) showing percentage participation of each modality in the structure of the
population.
b) pi =
10%
15%
15%
10%
35%
15%
1
2
3
4
5
6
c) Increasing absolute cumulant (Cumulative frequency) - How many of the data has the
value less than or equal to the value of observed modality. For example, we read from the
table that CAF i ( x i = 3) = 13 , which means that 13 suburbs have 3 or less than 3
groceries.
d) Increasing relative cumulant (Cumulative relative frequency) - What is the % of data that
have a value less than or equal to the value of observed modality (if it is multiplied by
100%). For example, we read from the table that CRF i ( x i = 2) = 0,50 , which means
that 50of suburbs have 2 or less than 2 groceries.
Example 4:
15 students have been asked about the number of movies they have been watched at the cinema
during the past three months. Their answers are given in the following table:
2
10
11
5
7
5
1
13
8
8
4
5
3
9
7
a) Compare and comment the statistical series with the gross data, arranged statistical series
and statistical distribution of frequencies.
b) Calculate and explain the relative frequencies. Analyze population structure.
c) Calculate and explain the cumulative absolute frequencies.
d) Calculate and explain the cumulative relative frequencies.
e) Present the statistical frequency distribution using the polygon of absolute frequencies.
Solution:
a) In order to obtain the arranged statistical series, we must sort the data by size:
1
5
8
2
5
9
3
7
10
4
7
11
5
8
13
This series contains the large number of different modalities, therefore it is not practical to group
data in a statistical frequency distribution as we did at the previous example. In such cases and
also in the cases of continuous variable, it is recommended to group data into interval frequency
distribution. The given data are grouped into 3 intervals, presented in the table below:
Ri intervals
(real borders)
0-5
5-10
10-15
sum
ci
midpoint
of interval
2,5
7,5
12,5
fi
pi
CAF i
CRF i
4
8
3
15
0,27
0,53
0,20
1,00
4
12
15
0,27
0,80
1,00
b) For example, p2 = 0,53 , which means that 53% of students have been watched from 5 to 10
(in fact from 5 to 9) movies at the cinema during the last three months.
Analysis of the structure:
The most of the students (53%) have been watched 5 to 10 movies while the rarest are students
(20% of them) who have been watched 10 to 15 movies at the cinema during the last three
months.
8
20%
27%
0-5
5-10
10-15
53%
c) Increasing absolute cumulant: CAF 2 = 12 , which means that 12 of students have been
watched less than 10 (upper border) movies at the cinema, during the last three months.
d) Increasing relative cumulant: CRF 2 = 0,80 , which means that 80% of students have been
watched less than 10 movies at the cinema, during the last three months.
e)
Absolute frequencies
7,5
12,5
Midpoint of intervals
Example 5:
The data, given in table below, present the number of people from the population of 15 years and
older according to the level of education and gender. Data are collected in the 1991 census and
obtained from the Statistical yearbooks of the Federation of Bosnia and Herzegovina in 2003:
SCHOOL COMPLETED
No schooling
1-3 grade of elementary school
4 grade of elementary school
5-7 grade of elementary school
Elementary School
High school
College
Faculty
Unknown
Men
98.420
31.311
212.378
41.732
421.045
671.058
56.759
77.240
47.397
Women
372.762
49.919
277.185
45.831
397.316
421.314
35.742
45.727
46.506
Men
f Mi
PMi =
Women
f Mi
(%)
PWi =
f Wi
f Mi
i
No schooling
98.420
1-3 grade of
31.311
elementary school
4
grade
of
212.378
elementary school
5-7 grade of
41.732
elementary school
Elementary
421.045
School
High school
671.058
College
56.759
Faculty
77.240
Unknown
47.397
TOTAL
1.657.340
Total
fWi
(%)
P =
f Ti = f Mi + fWi Ti
fWi
i
5,94 372.762
f Ti
(%)
f Ti
i
22,03
471.182
14,07
49.919
2,95
81.230
2,43
12,81 277.185
16,38
489.563
14,62
45.831
2,71
87.563
2,61
25,40 397.316
23,48
818.361
24,43
40,49 421.314
3,42 35.742
4,66 45.727
2,86 46.506
100,00 1.692.302
24,90
2,11
2,70
2,75
100,00
1.092.372
92.501
122.967
93.903
3.349.642
32,61
2,76
3,67
2,80
100,00
1,89
2,52
10
number of men
500.000
400.000
300.000
200.000
100.000
Unknown
Faculty
College
High school
Elementary School
5-7 grade of
elementary school
4 grade of
elementary school
1-3 grade of
elementary school
No schooling
level of education
College
3%
Faculty
5%
Unknown
3%
No schooling
6%
1-3 grade of
elementary school
2%
4 grade of elementary
school
13%
5-7 grade of
elementary school
3%
High school
40%
Elementary School
25%
11
37
2.
76
2
350.000
27
7.
18
5
300.000
250.000
200.000
150.000
46
.5
06
45
.7
27
35
.7
42
50.000
45
.8
31
100.000
49
.9
19
number of women
400.000
39
7.
31
6
450.000
42
1.
31
4
Un
kn
ow
n
Fa
cu
l ty
Co
l le
ge
sc
ho
ol
Hig
h
1-3
No
gra
sc
ho
de
o lin
of
g
ele
me
nta
4g
ry
rad
sc
ho
eo
ol
fe
lem
en
5-7
tar
gra
ys
cho
de
of
ol
ele
me
nta
ry
sch
oo
l
El e
me
nta
ry
Sc
ho
ol
level of education
High school
25%
No schooling
22%
4 grade of elementary
school
16%
Elementary School
23%
12
Faculty
College
High school
Elementary School
4 grade of elementary
school
0
No schooling
level of education
The structure of the population old 15 and over year by finished the school:
Structural circle:
Population structure 15 year and older according to level of
education
Unknown
3%
Faculty
4%
College
3%
High school
32%
No schooling
14% 1-3 grade of
elementary school
2%
4 grade of
elementary school
15%
5-7 grade of
elementary school
3%
Elementary School
24%
13
Parallel track of the population structure according to the both characteristics (gender and
the level of education):
Graph with splited columns (Clustered column):
700.000
600.000
500.000
400.000
300.000
Men
200.000
Women
Unknown
Faculty
College
High school
Elementary School
5-7 grade of
elementary school
4 grade of
elementary school
1-3 grade of
elementary school
100.000
No schooling
level of education
Parallel track of the population structure according to the both characteristics (gender and
the level of education):
Graph with divided columns (Stacked column):
1.200.000
1.000.000
800.000
Women
600.000
Men
400.000
Unknown
Faculty
College
High school
Elementary
School
5-7 grade of
elementary
school
4 grade of
elementary
school
1-3 grade of
elementary
school
200.000
No schooling
level of education
14