Anda di halaman 1dari 94

JOMO KENYATTA UNIVERSITY

OF
AGRICULTURE & TECHNOLOGY
SCHOOL OF OPEN, DISTANCE &
eLEARNING
IN COLLABORATION WITH

DEPARTMENT OF INFORMATION
TECHNOLOGY

HBAF 3105 STATISTICS AND QUANTITATIVE MODELLING FOR


FINANCE AND ACCOUNTING

J.Okelo
(masenooj@gmail.com)
P.O. Box 62000, 00200
Nairobi, Kenya

HBAF 3105: STATISTICS AND QUANTITATIVE MODELLING FOR FINANCE AND ACCOUNTING
Course description
History of statistics; Use and abuse of statistics; Measures of central location; mean,
median, mode; Measures of dispersion: range, standard deviation, variance, quartiles, skewness, kurtosis;Variables: qualitative, quantitative, discrete and continuous variables; Normal distribution; standard normal distribution; Z-distribution; tdistribution; F-distribution; chi-square distribution; hypo Research Project testing;
Inferential statistics; Correlation analysis; Regression analysis; Linear simple and
multiple regressions, Dummy Variables; Binary Logit and Probit models, Index
numbers; simple index numbers: Aggregative indexes, weighted aggregative indexes, Laspeyres Index, Paasche Index; Consumer Price Index; Time Series Analysis: components of time series analysis, estimation of trends; Computer application
in statistical data processing and analysis.
Course aims
This course is intended to expose educators to the discipline of statistics. It will
mainly deal with applied statistics to enable the students to appreciate the use of
statistics in social and applied sciences in general and data analysis in business
studies in particular.
Learning outcomes
Upon completion of this course you should be able to;
1. Apply statistical methods to data presentation, processing and analysis.
2. Use statistical methods in research and business analysis.
3. Describe the role of statistics in research and business analysis.
4. Apply Statistical techniques in Research.
Instruction methodology
Lectures and tutorials, Online lectures with self study materials, Case studies,
Group discussions/online blogs and forums
ii

Instructional Materials/Equipment
Writing board and writers, Computers, Statistical software
Assessment information
The module will be assessed as follows;
40% Continuous Assessment (Tests 10%, Assignment 10%, Practical 20%)
60% End of Semester Examination.
Course Text books
1. Mason, R. D., Lind, D. A. and Marchal, W. G. (1999). Statistical Techniques in Business and Economics. Irwin McGraw-Hill, Boston. ISBN-10:
0256263078, ISBN-13: 978-0256263077, Edition: 10th
2. Douglas Lind, William Marchal, Samuel Wathen (2009). Statistical Techniques in Business and Economics with Student CD [Hardcover], ISBN-10:
0077309421, ISBN-13: 978-0077309428, Edition: 14
3. Thomas H. Wonnacott, Ronald J. Wonnacott (1990) Introductory Statistics
for Business and Economics, 4th Edition, John Wiley and Sons Inc. [Hardcover] ISBN-10: 047161517X , ISBN-13: 978-0471615170
Reference Text books
1. Robert D. Mason, Douglas A. Lind, William G. Marchal 1998). Statistics:
An Introduction, Duxbury Pr; 5 Sub edition (1998) ISBN-10: 0534353797
ISBN-13: 978-0534353797
2. Fruend, J.E. and Williams, F.J. (1979). Modern Business Statistics. Pitman
Publishing Limited, London. ISBN 10: 0135895804 0-13-589580-4, ISBN
13: 9780135895801
3. Spiegel, M.R. (1992). Theory and Problems of Statistics, 2nd Edition, Schaums
Outline Series, McGraw-Hill Book Company, London, ISBN 0071128204.

iii

Course Journals
1. Journal of Quantitative Methods for Economics and Business Administration
ISSN: 1886-516 X D.L.: SE-2927-06.
2. Journal of Applied Statistics J Appl Stat. Published/Hosted by Taylor and
Francis Group, ISSN (printed): 0266-4763. ISSN (electronic): 1360-0532.
3. Scandinavian Journal of Statistics, Online ISSN: 1467-9469
4. Advances in Data Analysis and Classification, ISSN Print: 1862-5347 ISSN
Online: 1862-5355
5. Annals of the Institute of Statistical Mathematics, Executive Editor: T. Higuchi,
ISSN Print: 0020-3157 ISSN Online: 1572-9052
6. International Journal of Statistics and Probability ISSN 1927-7032(Print)

iv

Contents
1 Introduction
1.1 Definition Statistics . . . . . . . . .
1.1.1 Who Uses Statistics? . . . .
1.1.2 Limitations of Statistics . .
1.2 Data types . . . . . . . . . . . . . .
1.3 Types of Statistics . . . . . . . . . .
1.4 Finite populations . . . . . . . . . .
1.4.1 Simple random sample . . .
1.4.2 Sampling from a population
2 Measures of central tendency
2.0.3 Arithmetic mean . .
2.0.4 The Geometric mean
2.0.5 Harmonic mean . .
2.0.6 P-tiles . . . . . . . .

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.

3 Probability and Distributions


3.1 Learning outcomes . . . . . . . . . . .
3.2 Probability distributions . . . . . . . . .
3.3 Discrete Probability distributions . . . .
3.3.1 Expectation of a random variable
3.3.2 Bernoulli Distribution . . . . . .
3.3.3 Binomial Distribution . . . . .
3.3.4 Poisson Distribution . . . . . .
3.3.5 Geometric Distribution . . . . .
3.3.6 Negative Binomial Distribution .
3.4 Continuous Distributions . . . . . . . .
v

.
.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

1
1
2
3
4
5
6
6
7

.
.
.
.

9
9
10
11
15

.
.
.
.
.
.
.
.
.
.

18
18
18
18
19
21
21
22
23
24
25

CONTENTS
3.4.1

CONTENTS
Uniform Distribution . . . . . . . . . . . . . . . . . . . . . 25

4 Normal Distribution
4.1 Introduction . . . . . . . . . . . .
4.2 Description . . . . . . . . . . . .
4.2.1 Functional form . . . . .
4.3 The Standard normal Distribution

.
.
.
.

26
26
26
27
27

.
.
.
.

30
30
31
33
35

6 Hypothesis Testing 2
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.1.1 Analysis of Variance (ANOVA) . . . . . . . . . . . . . . .
6.1.2 Techniques of One-way ANOVA . . . . . . . . . . . . . .

39
39
39
41

7 Correlation and Regression Analysis


7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . .
7.2 Test of relationships involving quantitative data . . . . .
7.2.1 Pearsons product-moment correlation coefficient
7.2.2 Spearmans rank correlation coefficient . . . . .
7.3 Linear Regression . . . . . . . . . . . . . . . . . . . . .
7.3.1 Multiple regression with dummy variables . . . .
7.3.2 Dealing with Interaction terms . . . . . . . . . .

43
43
43
44
44
46
48
50

.
.
.
.

5 Tests of Hypothesis 1
5.1 Introduction . . . . . . . . . . . . .
5.2 Parametric Tests . . . . . . . . . .
5.2.1 Z Test for Two Means . . .
5.2.2 The t-Test . . . . . . . . . .

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

8 Non-Linear Regression Analysis


8.1 A linear model for proportions? . . . . . . . . . . . . .
8.1.1 Logistic curve: A curve that lies between 0 and
values of X . . . . . . . . . . . . . . . . . . . .
8.1.2 The parameters of the logistic curve . . . . . . .
8.1.3 Multiple logistic regression . . . . . . . . . . . .
8.2 Probit Model . . . . . . . . . . . . . . . . . . . . . . .
8.3 CobbDouglas functional form of production functions .

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
1
.
.
.
.
.

52
. . . . . 52
for all
. . . . . 52
. . . . . 53
. . . . . 53
. . . . . 59
. . . . . 60
vi

CONTENTS
8.3.1
8.3.2

CONTENTS
Formulation . . . . . . . . . . . . . . . . . . . . . . . . . 60
Application . . . . . . . . . . . . . . . . . . . . . . . . . . 61

9 Index numbers
9.1 Index numbers . . . . . . . . . . . . . .
9.1.1 Price and quantity indices . . . .
9.1.2 CPI and stock market indices . . .
9.2 Simple price index . . . . . . . . . . . .
9.3 Aggregate price . . . . . . . . . . . . . .
9.3.1 Unweighted aggregate price index
9.4 Laspeyres and Paasche indices . . . . . .
9.4.1 Laspeyres index . . . . . . . . .
9.4.2 Paasche index . . . . . . . . . .
9.4.3 Fishers Ideal Index . . . . . . .
9.5 Deflating a time series . . . . . . . . . .
9.5.1 Correcting for inflation . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

10 Basics of Time Series Analysis


10.1 Time series data . . . . . . . . . . . . . . . . . . . . .
10.2 Types of time series data . . . . . . . . . . . . . . . .
10.3 Components of a time series . . . . . . . . . . . . . .
10.3.1 Trend . . . . . . . . . . . . . . . . . . . . . .
10.3.2 Cyclic Movements . . . . . . . . . . . . . . .
10.3.3 Seasonal Movements . . . . . . . . . . . . . .
10.3.4 Random or irregular fluctuations . . . . . . . .
10.4 Smoothing of a time series . . . . . . . . . . . . . . .
10.4.1 Moving average with odd and even run lengths
10.4.2 Robust smoothing . . . . . . . . . . . . . . . .
10.4.3 Running medians, followed by moving averages
10.4.4 Limitations of moving averages . . . . . . . .
10.5 Long-term trend and Forecasting . . . . . . . . . . . .
10.5.1 Least squares for a polynomial fit . . . . . . .
10.5.2 Exponential Trend . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

63
63
63
63
64
65
65
66
66
66
67
68
68

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

70
70
71
72
72
72
72
72
73
74
75
76
76
76
76
78

vii

CONTENTS

CONTENTS

11 Constrained maxima and minima and the method of lagrange multipliers


11.1 The Method Of Lagrange Multipliers: . . . . . . . . . . . . . . .
11.2 Models involving differential equations . . . . . . . . . . . . . . .
11.2.1 Unrestricted growth Models . . . . . . . . . . . . . . . .
11.2.2 Restricted growth models . . . . . . . . . . . . . . . . .
11.2.3 Restricted Growth Models . . . . . . . . . . . . . . . . .
Solutions to Exercises . . . . . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.

79
79
82
82
83
84
86

viii

HBAF 3105

LESSON 1
Introduction
Learning outcomes
Upon completion of this lesson you should be able to;
1. Define statistics and describe various uses
2. Give a brief history of statistics and identify some limitations of statistics
3. Distinguish between various variable types
4. Distinguish between descriptive and inferential statistics
5. Describe the concept behind sample, population and sampling error
1.1. Definition Statistics
As a plural noun, the word statistics describes a collection of numerical data such
as employment statistics, accident statistics, population statistics, birth and death,
income and expenditure, of exports and imports etc. It is in this sense that the word
statistics is used by a layman or a newspaper. As a singular noun, the purpose
of statistics is to develop and apply methodology for extracting useful knowledge
from both experiments and survey data. Major activities in statistics involve:
Design of experiments and surveys
Exploration and visualization of sample data
Summary description of sample data
Stochastic modeling of uncertainty
Forecasting based on suitable models
Hypothesis testing and statistical inference
Development of new statistical theory and methods
Generally statistics can be defined as a branch of science deals with data collection,
presentation, analysis, and interpretation of analyzed data. This definition clearly
points out four stages in a statistical investigation, namely:
1

HBAF 3105
1. Collection of data
2. Presentation of data
3. Analysis of data
4. Interpretation of analyzed data
The development of statistics was strongly motivated by the need to make sense
of the large amount of data collected by population surveys in the emerging nation
states of Europe. At the same time, the mathematical foundations for statistics
advanced significantly due to breakthroughs in probability theory inspired by games
of chance (gambling). For more information about the history of statistics refer to
the books by Johnson and Kotz (1998) and Kotz and Johnson (1993). The various
methods used in statistical investigations are termed as statistical methods and the
person using them is known as a statistician. A statistician is concerned with the
analysis and interpretation of the data and drawing valid worthwhile conclusions
from them for decision making. For example:
A shoe factory will be interested in the most common shoe sizes in order to
make a decision on the production process.
The Ministry of Education will be interested in the trend in the number of
pupils starting each level of education in order to make decisions related to
building of schools, training of teachers, etc.
The latest sales data have just come in, and your boss wants you to prepare
a report for management on places where the company could improve its
business. What should you look for? What should you not look for?
1.1.1. Who Uses Statistics?
Statistical techniques are used extensively by marketing, accounting, quality control, consumers, professional sports people, hospital administrators, educators, politicians, physicians, etc...
Uses of Statistics
1. To present the data in a concise and definite form: Statistics helps in classifying and tabulating raw data for processing and further tabulation for end
users.
2

HBAF 3105
2. To make it easy to understand complex and large data: This is done by presenting the data in the form of tables, graphs, diagrams etc., or by condensing
the data with the help of means, dispersion etc.
3. For comparison: Tables, measures of means and dispersion can help in comparing different sets of data.
4. In forming policies: It helps in forming policies like a production schedule,
based on the relevant sales figures. It is used in forecasting future demands.
5. In measuring the magnitude of a phenomenon:- Statistics has made it possible
to count the population of a country, the industrial growth, the agricultural
growth, the educational level (of course in numbers).
1.1.2. Limitations of Statistics
Statistics has its limitations and to mention a few; In most statistical investigations,
we use samples to represent a population, with the number of data points collected
in the sample depending on the resources available. A sample used may not represent the population adequately and this may lead to results with little or no relevance
to the population that it came from.
Results based on data with strong departures from the assumptions such as
normality will be less reliable than results from data that meet the assumptions of a statistical test.
It is possible to lie (or to make mistakes) by ignoring some key statistical
principles. E.g. Correlation does not imply causation
In the recent past the numbers of deaths in Nairobi have increased proportionately to number crimes in Nairobi.
Young children who sleep with the light on are much more likely to develop myopia in later life. The former is a recent scientific example that
resulted from a study at the University of Pennsylvania Medical Center.
Published in the May 13, 1999 issue of Nature, the study received much
coverage at the time in the popular press. However, a later study at Ohio
State University did not find a link between infants sleeping with the

HBAF 3105
light on and development of myopia. It did find a strong link between
parental myopia and the development of child myopia, also noting that
myopic parents were more likely to leave a light on in their childrens
bedroom.
Sleeping with ones shoes on is strongly correlated with waking up with
a headache. Therefore, sleeping with ones shoes on causes headache.
In hypothesis testing, the p-value or "probability value" inform us of the probability of the null hypothesis occurring. For example, when comparing means
of several groups, the p-value is the probability that the observed differences
occur only by chance (does not exist in the population). We then use the reverse logic that if the differences occur by chance so seldom (typically when
p < 5% or 0.05), real differences must exist in the population. This has serious
implications on what you say about the hypothesis you accept:
Accepting a null hypothesis does not mean that the samples are the same
or that there is no relationship. It is just that the evidence in the sample
is not strong enough to support the opposite.
By accepting an alternative hypothesis at the 5% level of significance
you can say that if 100 similar surveys were done 95 of them would
show a difference (that is only 5 out of 100 surveys would be expected
NOT to differ).
1.2. Data types
Variable
A variable is a characteristic of an item or individual. It is simply something that
varies or doesnt always have the same value such as date of birth, age, marks,
districts as you move from one subject to another Data
Data are the different values associated with a variable. Operational definitions Data
values are meaningless unless their variables have operational definitions, universally accepted meanings that are clear to all associated with an analysis. The processing of the data depends on the nature of the variable on which data is collected.
Variable can be classified as follows:

HBAF 3105
1. Qualitative: refers to variables whose values fall into groups or categories.
They are also called categorical variables because the data they carry describes categories (e.g, District, Marital status, Gender, Religious affiliation,
Type of car owned). They can further be classified as;
Nominal variables: Variables whose categories are just names with nonatural
ordering E.g. gender, colour, district, marital status etc. or
Ordinal variables: Variables whose categories have a natural ordering
E.g. education level, degree classifications e.t.c. In a variable such
as performance, category Excellent is better than the category Very
good which is better than Good .
2. Quantitative: Numerical variables (e.g number of students, age, weight, distance etc). They can further be classified as;
Discrete variables: can only assume certain values and there are usually
between values, e.g the numbers of bedrooms in a house, the number of
children in a family e.t.c. In most cases they arise from counting and
their ratios do not make sense.
Continuous variables: can assume any value within a specific range, e.g.
The time cook ugali, Height of a tree, Your age, Distance from here to
Nairobi. e.t.c. In most cases, such data arises from measurements.
1.3. Types of Statistics
Descriptive Statistics: is a field that focuses on describing different characteristics
of the data rather than trying to infer something from it. It is a body of methods of
organizing, summarizing, and presenting sample data in an informative way.
A Steadman poll found that 41% of Kenyans would vote for Candidate A in
the last general election. The statistic 41 describes the number out of every
100 persons who were interviewed.
According to Consumer Reports, Whirlpool washing machine owners reported 9 problems per 100 machines during 1995. The statistic 9 describes
the number of problems out of every 100 machines.
5

HBAF 3105
Inferential Statistics: body of methods which tries to infer or reach conclusions
about the population based on the scientifically sampled data. The calculated summaries from the sample are used for estimation, prediction, or generalization about
a population from which the sample was taken.
TV networks constantly monitor the popularity of their programs by hiring
pollsters to sample the preferences of TV viewers.
The JKUAT accounting department normally selects a sample of the payment
vouchers to check for accuracy for all the payment vouchers.
Most data sets contain one or more measurements from each of a collection of
individuals (or other units). The measurements of interest usually vary in ways that
cannot be explained in terms of other measurements from the individuals. This
unexplained variability can be modelled by considering the data to be a random
sample from some underlying population.
1.4. Finite populations
A sample provides information about a population when it is too difficult or expensive to make measurements from the whole population. We often want to find
information about a particular group of individuals (people, fields, trees, and bottles
of beer or some other collection of items). This target group is called the population. Collecting measurements from every item in the population is called a census.
A census is rarely feasible, because of the cost and time involved.
1.4.1. Simple random sample
We can usually obtain sufficiently accurate information by only collecting information from a selection of units from the population - a sample. Although a sample
gives less accurate information than a census, the savings in cost and time often outweigh this. The simplest way to select a representative sample is a simple random
sample. In it, each unit has the same chance of being selected and some random
mechanism is used to determine whether any particular unit is included in the sample.

HBAF 3105
1.4.2. Sampling from a population
It is convenient to define the population and sample to be sets of values or measurements (rather than people or other items). This abstraction - a population of
values and a corresponding sample of values - can be applied to a wide range of
applications.
Effect of sample size
Bigger samples mean more stable and reliable information about the underlying
population. As the sample size is increased, the sampling error becomes smaller.
When a sample is used to estimate a population characteristic, an error is usually
involved. Sampling error is caused by random selection of the sample from the
population. The difference between an estimate and the population value being
estimated is called its sampling error. The cost savings from using a sample instead
of a full census can be huge.

HBAF 3105
Revision Questions
Example . Define the term statistics
Solution: ...

E XERCISE 1.  Discuss a case in real live where you think Statistics was misused
E XERCISE 2.  Discuss how statistics led to the development of computer systems
and how computer systems led to the development of statistics.
E XERCISE 3.  Discuss the relative "weakness" of categorical variables (including measures on nominal and ordinal scales), and continuous variables (including
measures on interval and ratio scales) with respect to the type of information that
can be obtained from the statistics.
Suggested materials for further reading
1. Mason, R. D., Lind, D. A. and Marchal, W. G. (1999). Statistical Techniques
in Business and Economics. Irwin McGraw-Hill, Boston.
2. K. Pelosi and Theresa M. Sandifer (1976). Elementary Statistics. John Wiley
& Sons, Inc
3. Wonnacott, T.H. and Wonnacott, R.J. (1990). Introductory Statistics for Business and Economics, 2nd Edition, John Wiley and Sons Inc.
4. Gujarati, D.N. (2006). Basic Econometrics 3rd Edition, McGraw-Hill, Inc.,
New York.
5. Keller, G., Warrack, B. and Bartel, H. (1994). Statistics for Management and
Economics. 3rd Edition. Wadsworth Publishing Company, Belmont California, USA.

HBAF 3105

LESSON 2
Measures of central tendency
In most sets of data, there is a tendency of the observed values to cluster themselves
about some value. The phenomena is called central tendency. This central tendency
can be measured using single numerical values which may be used to judge the
entire distribution. The measures of central tendency include; The Mean (Arithmetic, Geometric or Harmonic)
The Median and
The Mode.
2.0.3. Arithmetic mean
Arithmetic mean (Average) means the sum or total of all the observations divided by
the total number of observations in a given sample (frequency). For xi observations
The mean, denoted by x . For x1 , x2 , . . . xn observations is defined as;
n
or x = 1n ni=1 x1
x = x1 +x2 +...+x
n
Incase x1 , x2 , . . . xn have frequencies f1 , f2 , . . . fn then mean is defined by
f2 x2 +...+ fn xn
x = f x11f+
or x = N1 ni=1 fi xi
1+ f 2 +...+ f n
where is a Greek letter meaning Sum and N = ni=1 fi .
Alternatively we can also obtain mean using assumed mean given by the expression
x = A+

fd
N

where A=Is the assumed mean and is the deviation from the assumed mean and it
is given by d = A x
Example
Find the mean of the data sets given below i. 4, 6, 7, 8, 10, 5, 7, 12, 14, 6, 7
Marks
1-5 6-10 11-15 16-16
Frequency 6
7
5
2
solution
x = 4+6+7+8+10+5+7+12+14+6+7
= 7.818182
11

HBAF 3105

x = N1 ni=1 fi xi = N1 ?(175) = 8.75


Alternatively if we let the assumed mean A = 8 then mean is given by

x = A + Nf d = 8 + 15
20 = 8 + 0.75 = 8.75
Remark 1. assumed mean can be any value but it is advisable to choose assumed
mean from values to make the computation of mean much easier.
2.0.4. The Geometric mean
Geometric mean can be defined as the nth root of the product ofx1 , x2 , . . . xn observations. Geometric is usually denoted by G and it is expressed as

For a case where x1 , x2 , . . . xn have got the frequencies f1 , f2 , . . . fn , then the geometric mean is expressed as
From the above expression of geometric mean we also introduce logarithms on both
L.H.S and R.H.S of the equations and this will yield

Again if x1 , x2 , . . . xn have frequencies then expression takes the form

10

HBAF 3105

Example . The growth rates of textile unit in the western region of Kenya for the
last five years are given below. Use it to calculate the geometric mean of the growth
rate.
Year
1 2 3
4
5
Growth rate 7 8 10 12
Solution: The geometric mean

18

So the geometric mean of the Growth rate 11.09 percent.

Example . Find the geometric mean for the distribution given below:Yield (Dividend/Market) 0-10 10-20 20-30 30-40
Number of Companies
5
15
25
Solution: Geometric mean is computed as follows:-

35

The geometric mean

2.0.5. Harmonic mean


The harmonic mean of a series of values is defined as the reciprocal of the mean of
their reciprocals. Thus if H is the harmonic mean
x1i
1
=
H
n

11

HBAF 3105

H=

1
x1

+ + x12

n
+ ... + x1n

If x1 , x2 , . . . xn have the frequencies f1 , f2 , . . . fn , then the harmonic mean is given by


H = fn 1 where N = fi

i x
i

Example1
Calculate the harmonic mean the observations:- 4, 8, and 16.
Solution:
Harmonic mean

Example2
Find the harmonic mean for the distribution given below:Yield (Dividend/Market) 2-4 4-6 6-8 8-10
Number of Companies
Solution:

The harmonic mean is given by


100
= 4.98
H = fn 1 = 20.061

i x
i

So the average dividend yield calculated by the harmonic mean formula is 4.98
percent.
Median
The median is the value above which and below which half of the observations fall
(if ranked in order of size). In other words it is the midpoint of the values after they
have been ordered from the smallest to the largest, or the largest to the smallest.
Procedure for finding the median for discrete data is: Arrange the data in an ascending order
If the number of observations is n (odd), then median is in position
is even, then median is the average of the two middle values.

n+1
2

.If n

12

HBAF 3105
For grouped data Median can be obtained by first locating the median class and then
use the interpolationh classi to obtain the median using the expression given below
N
C
Median (M) = L + 2 f h where L is the lower limit of the median class, h is the
class interval, f is the frequency of the median class, N is the total frequency of all
the observations and C is the cumulative frequency above the median class.
Example1.
Find the median of the following values, 19, 13, 14, 18, 12, 25, 11, 10, 17, 23, where
n = 10 (even). Arranging them in ascending order gives, 10, 11, 12, 13, 14, 17, 18,
19, 23, 25 so Median is the average of the 5th and the 6th values 14+17
= 15.5
2
Example 2
The age of a sample of five college students is: 21, 25, 19, 20, and 22. Confirm that
the median is 21.
Example 3
The height of four basketball players, in inches, is 76, 73, 80, and 75. Confirm that
the median is 75.5
Example 4
Find the median for the following distribution
Yield (Dividend/Market) 0-10 10-20 20-30 30-40 40-50
Number of Companies
Solution

22

38

46

35

20

Here the total frequencies are N = 161 (odd number) hence the median is in the
161+1
size of N=1
th item which is 81st item. So the median lies in the
2 th item i.e
2
20-30 group thus 20-30 is the median class with lower limit of 20 soL = 20 , f = 46
,C = 60 and h = 10

N


C
Median(M) = L + 2 f
h = 20 + 8160
10 = 20 + 210
46
46 = 24.5652
Example 5
The data below shows marks obtained by some students in continuous assessment
test, use it to calculate the median of the data.
Marks
1-5 6-10 11-15 16-20 21-25
Number of Students

12

32

46

35

20
13

HBAF 3105
Solution

N = 120 Which is even number; hence the median lies between 60th item and the

N
C
h
61st item which is in the class of 11-15 so we use the expression M = L + 2 f
to approximate the size 60th item and the size of 61st itemL = 10.5 , f = 30 ,c + 44
and h = 5
N 


C
16
h = 10.5 + 6044
The size of 60th item M60 = L + 2 f
5

10.5
+
30
30 5 =
13.1667
N


|+1C
The size of 61st item (M61 ) = L + 2 f
h = 10.5 + 6144
5 10.5 +
30

17
30 5 = 13.333
So Median is the average of the 60th and the 61st values 13.16667+13.3333
= 3.250001
2
Mode
Mode is the value or item occurring most frequently in a set of observations or
statistical data. The mode may not exist, and if it does exist, it may not be unique.
If each observation occurs the same number of times, then there is no mode. If
two or more observations occur the same number of times then, two or more modes
exist and the distribution is called multimodal.


f2
For grouped data Mode can be obtained using the expression: Mode = L+ f1 + f2 h


f2
Alternatively Mode can also be found by the formula Mode= L+ ( fm ffm)+(
fm f2 ) h
1
Where L, is the lower class limit of the modal class with modal class being the class
with the highest frequency.
fm is the frequency of the modal class
f2 is the frequency succeeding the modal class
f1 is the frequency preceding the modal class and
his the class interval of the modal class
Remark 2. For grouped data the mode may be estimated by first identifying the
modal class and then taking the midpoint of the class interval. This can also be
obtained from a histogram by taking the midpoint of the class interval with the
highest peak.

14

HBAF 3105
Example1. For the values 9, 3 , 4 , 2, 1, 5, 8, 4, 7, 3 , each of the values 3 and 4
occur twice. The mode is therefore 3 and 4.
Example2.
Calculate the mode of the following data set
Gross profit as percentage 0-7 7-14 14-21 21-28 28-35 35-42 42-49
Number of companies
19
25
36
72
51
43
28
Solution: The highest frequency is 72 so the modal class is 21-28. So The lower
class limit is L = 21 , f1 = 36 , f2 = 51 and h = 7 thus

2.0.6. P-tiles
These are values of variate which divides the total frequency in to equal parts. The
most commonly used form of P-tiles are the Quartiles ( p = 4), the Deciles (p=10 )
and Percentiles ( p=100).Quartiles divides the data into four equal parts, Deciles divides the data into ten equal parts and Percentiles divides the data into one hundred
equal parts. Note that for P parts we have p 1boundaries.
When P = 4 we get the Quartiles (Q1, Q2 andQ3 ).The kth ( quartile is given by Qk =
k
4 ((n + 1)th)value
When P = 10 we get the Decile (D1, D2 andD3 ). .The kth deciles is given by Dk =
k
4 ((n + 1)th)
k
When P = 100 we get the Percentiles given by Pk = 100
((n + 1)th)value
Note:Q2 = D5 = P50 median. The actual value can be obtained using linear interpolation.
Example1.
For the data; 13, 14, 17, 10, 11, 12, 23, 25, 18, 19 we can arrange the values in
ascending order and assign each value a rank (position) to get
Arranged values: 10, 11, 12, 13, 14, 17, 18, 19, 23, 25
Position: 1 2 3 4 5 6 7 8 9 10
Then
Q2 = 24 (10 + 1)th = 5.5th value

5.5th value = 12 5th value + 6th

5.5th value = 5.5th value + 0.5 7th value 6th value = 17 + 0.6 (18 17) = 17.6
15

HBAF 3105
Remark 3. The method described here is in Minitab and SPSS by default. Other
software such as SAS, S and use different estimation of the P-tiles using the concept.
For grouped data we can obtain Quartiles, Deciles and percentiles the following
expressions:Quartiles
Q1 = L +

N

4 C


hand Q3 = L +

3N
4 C


h where Q1 and Q3 are lower and upper

Quartiles.
Deciles
As already seen earlier that deciles arevariatethat divides thedata into
 10 equal
N
kN
C
C
hso D1 = L + 10 f
h and D2 =
parts, the value of Dk = kth decile = L + 10 f


KN
C
h and so on.
L + 10 f
Percentiles:
The generalized formulafor percentiles
is givenas follows:

= kth Percentile= L+

KN
100 C

N
100 C

KN
100 C


and

Pk
thus P1 = L+
h and P2 = L+
f
f
f
so on.
Exercise 4.
For the data: 23, 10, 25, 15, 22, 17, 24, 32, calculate the following; Median, 3rd
Quartile 5th Decile and 80 th Percentile.
Exercise 5.
For the distribution given below obtain Lower Quartile, Upper Quartile, 6th decile
and 70th percentile.
Dividend
5-10 10-15 15-20 20-25 25-30 30-35 35-40 40-45
Number of Companies

15

10

16

HBAF 3105
Revision Questions
Example . ....
Solution: .....

E XERCISE 4.  ....

17

HBAF 3105

LESSON 3
Probability and Distributions
3.1. Learning outcomes
Upon completion of this lesson you should be able to;
1. Define a probability distribution
2. Compute Mathematical expectation of a random variable
3. Describe properties of various probability distributions
3.2. Probability distributions
A random variable is a function or a mapping from a sample space into the real
numbers (most of the time). In other words, a random variable assigns real values
to outcomes of experiments. This mapping is called random, as the output values of
the mapping depend on the outcome of the experiment, which are indeed random.
A random variable is therefore just a rule that assigns a number to each outcome of
an experiment. These numbers are called the values of the random variable and the
variable is denoted by capital letters such as X, Y and Z. We can formally say that
a random variable is a function that associates a unique numerical value with every
outcome of an experiment.
3.3. Discrete Probability distributions
A discrete random variable is one which may take on only a countable number
of distinct values such as 0, 1, 2, 3, 4, ... Examples of discrete random variables
include the number of children in a family, the members day attendance at a cinema,
the number of DBA students e.t.c.
Let X be the number of heads observed when a fair coin is tossed three times. Let
H represent the outcome of a head and T the outcome of a tail. The sample space
for such an experiment will be:

If let ,x = 1, 2 and 3 (representing Number of Heads) then we can have

18

HBAF 3105
X

P(X=x) 18 38 38 18
The resulting table above is known as probability distribution table. A probability
distribution for a discrete random variable is a formula, table or graph that provides the probability associated with each value of the random variable. If is the
probability distribution of a random variable X, then the following properties hold
Probabilities lies between 0 and 1 i.e 0 P(X = x) 1
The sum of probabilities add up to 1 i.e P(X = x) = 1
NOTE: Here, uppercase is used for the random variable and lowercase is used to
denote a realization of X .
Probabilities can be easily obtained from the probability distribution table as follows:
Probability of getting two or more heads is given
P (X > 1) = P (X = 2) + P (X = 3) = 38 + 81 = 12
In some books this is referred to as a probability mass function (pm f )but it should
be noted that (pmf) is for continuous random variables while probability density
function (pd f )is used for discrete random variables. If the variable is discrete, it
describes how likely the random variable is to be at a certain point. The (pm f ) or
(pd f ) is represented by the lowercase f (x)or P(X = x) for a random variable X.
3.3.1. Expectation of a random variable
Mathematical expectation refers to the mean or expected value of a random variable
X whose distribution is known. The expected value, denoted by E(X), it is given by
the expression

2
2
The
 expected
 value ( x) also called variance and it is denoted by V (x) = =
E (x )2

19

HBAF 3105

Example . The table below gives a probability distribution of a discrete random


variable X. Given that P (X < 150) = 0.6, find the value of a and b hence calculate
E(X) and Standard deviation of X
X
40 80 120 150 200
P(X=x)
Solution:

0.2

0.23

0.15


E XERCISE 5.  Marketing estimates that a new instrument for the analysis of soil
samples will be very successful, moderately successful, or unsuccessful, with probabilities 0.3, 0.6, and 0.1, respectively. The yearly revenue associated with a very
successful, moderately successful, or unsuccessful product is $10 million, $5 million, and $1 million, respectively. Let the random variable X denote the yearly
revenue of the product. Determine the probability mass function of X hence or
otherwise the expected value and variance of X.
E XERCISE 6.  The following table gives probability distribution of marks obtained by some students in a CAT.
Marks
12
14
18
23
24
25
Probability(p)
0.0645 0.0968 0.1935 0.2581 0.2258 0.1613
From the above table find the probability that a randomly picked student from this
class scored
More than 22 marks?
20

HBAF 3105
Less than 20.5marks?
Between 16 and 23 marks exclusive?
What is the expected mark?
3.3.2. Bernoulli Distribution
The coin toss: There is no more basic random event than the flipping of a coin.
Heads or tails. Its as simple as you can get! The "Bernoulli Trial" refers to a single
event which can have one of two possible outcomes with a fixed probability of each
occurring. You can describe these events as "Yes or No" questions. For example:
Will the coin land heads?
Will the newborn child be a girl?
Will a potential customer decide to buy my product?
Will this person be carjacked in his/her lifetime?
The main controlling parameter in Bernoulli distribution is the probability of success p. A "fair coin" or an experiment where success and failure are equally likely
will have a probability of p = 0.5 (50%). If a random variable X is distributed with
a Bernoulli distribution with a parameter p we write its probability function as:

where the event X = 1 represents the "Success."


Mean and Variance
E(X) = pand Var(x) = p(1 p) = pqwhre q = 1 pwhich represent failure.
3.3.3. Binomial Distribution
Where the Bernoulli distribution asks the question of "Will this single event succeed?" the Binomial is associated with the question "Out of a given number of
trials, how many will succeed?" Some example questions that are modeled with a
Binomial distribution are:
Out of twenty tosses, how many times will this coin land heads?
From the children born in a given hospital on a given day, how many of them
will be girls?
21

HBAF 3105
How many mosquitoes, out of a swarm, will die when sprayed with insecticide?
NOTE: The Binomial distribution is composed of multiple Bernoulli trials. We conduct repeated experiments where the probability of success is given by the parameter p and add up the number of successes. This number of successes is represented
by the random variable . The value of is then between 0 and n .
When a random variable X has a Binomial Distribution with parameters p and n we
write it as X Bin(n, p)or X B(n, P) and the probability distribution function is
given by the equation:

Mean and Variance


E(X) = npand Var(x) = np(1 p) npq whre q = 1 p which represent failure.
Example . The probability of hitting the bulls eye in a dart game is 0.12. Find
the probability that in eight trials the bulls eye will be hit (a) Exactly 4 times (b)
At least once (c) Expected value
Solution:
There are 8 trials in a binomial experiment i.e; n = 8p = 0.12then
P(Exactly 4 hits out
! of 8)
!
n
8
nk
P (X = 4) =
pk (1 p) P (X = 4) =
0.124 (1 0.12)84 = 0087
k
4
!
n
P9At least once)i.eP(X > 1)P(X > 1) = 1P(X = 0) = 1
pk (1 p)nk
k
!
8
1
0.120 (1 0.12)80 = 0.3596
0
E(X) = np
E(X) = np 8(0.12) = 0.96

3.3.4. Poisson Distribution
The Poisson distribution is very similar to the Binomial Distribution. In both cases
we are examining the number of times an event happens but whereas the Binomial
22

HBAF 3105
Distribution looks at how many times we register a success over a fixed total number
of trials, the Poisson Distribution measures how many times a discrete event occurs,
over a period of continuous space or time.
Instead of parameter that represents a component probability like in the Bernoulli
and Binomial distributions, Poisson uses the parameter which represents the "average or expected" number of events to happen within our experiment. The probability mass function of the Poisson is given by

Mean and Variance


The Poisson distribution can be used as an approximation to the Binomial distribution using X po(n, p)where nand p are the number of trials and the probability
of success respectively in the Binomial distribution which is being oximated. The
approximation can be used when is large (say> 50) and p is small (say <0.1). This
ensures that that is, the mean and variance are approximately equal.
Example
A restaurant is such that one of its dish gets ordered on average 4 times per day.
What is the probability of having this dish ordered exactly 3 times tomorrow?
Solution:
The probability of having the dish ordered 3 times exactly is given if we set x =
3in the above equation. Remember that weve already determined that we sell on
average 4 dishes per day, so = 4

3.3.5. Geometric Distribution


Geometric Distribution refers to the probability of the number of times needed to
do something until getting a desired result. For example:
How many times will I throw a coin until it lands on heads?
How many children will I have until I get a girl?
Just like the Bernoulli Distribution, the Geometric distribution has one controlling
parameter as the probability of success p in any independent test. If a random

23

HBAF 3105
variable X is distributed with a Geometric Distribution with a parameter p we write
its probability mass function as:
Mean and Variance
where q = 1 p . With a Geometric Distribution it is also pretty easy to calculate
the probability of a "more than n times" case. The probability of failing to achieve
the wanted result is(1 p)
Example.
Suppose a drunkard is trying to find the key to his front door, out of a keychain with
10 different keys. What is the probability that he succeeds in finding the right key
in the 4th attempt?
Solution:
1
then
This is a geometric distribution with parameter (probability of Success) p = 10

3.3.6. Negative Binomial Distribution


Just as the Bernoulli and the Binomial distribution are related in counting the number of successes in 1 or more trials, the Geometric and the Negative Binomial distribution are related in the number of trials needed to get 1 or more successes.
The Negative Binomial distribution refers to the probability of the number of times
needed to do something until achieving a fixed number of desired results.
For example:
How many times will I throw a coin until it lands on heads for the 5th time?
How many children will I have when I get my second daughter?
How many fish will I have by the time I get the fifth Tilapia
Just like the Binomial Distribution, the Negative Binomial distribution has two controlling parameters: the probability of success p in any independent test and the
desired number of successes m. If a random variable has Negative Binomial distribution with parameters and , its probability mass function is:

24

HBAF 3105
Mean and Variance
Example . A hawker goes home if he has sold 3 coats that day. Some days he
sells them quickly. Other days hes out till late in the evening. If on the average
he sells a coat at one out of ten houses he approaches, what is the probability of
returning home after having visited only 10 houses?
Solution:
The number of trials is Negative Binomial distributed with parameters p = 0.1 and
m = 3, hence:

3.4. Continuous Distributions


A continuous random variable is one that can take on any values within a continuous
range or an interval. Examples: Duration of a call in a telephone exchange, the time
taken to complete a certain task, weight of a student. Age of a person etc. Unlike
a discrete random variable, a continuous random variable has a probability density
function(pdf) instead of a probability mass function. The difference is that the
former must integrate to 1, while the latter must have a sum up to 1. If (x) is the pdf
of a random variable X.
The expected value or the mean of X is defined as
The variance of a continuous or discrete distribution is defined as

3.4.1. Uniform Distribution


The uniform distribution, as its name suggests, is a distribution with probability
densities that are the same at each point in an interval. In casual terms, the uniform
distribution shapes like a rectangle. The probability density function of the uniform
distribution is defined as
It can be shown that the expected value of and variance is given by the following
expressions

25

HBAF 3105

LESSON 4
Normal Distribution
Learning outcomes
Upon completion of this lesson you should be able to;
1. Identity data that is normally distributed.
2. Read standard normal statistical tables.
3. Apply normality concept in estimating probabilities of certain outcomes
4.1. Introduction
The Normal Probability Distribution is one of the most useful and more important
continuous distributions in statistics. The Normal distribution is used frequently in
statistics for many reasons:
The Normal distribution has many convenient mathematical properties.
Many natural phenomena have distributions which when studied have been
shown to be close to that of the Normal Distribution.
The Central Limit Theorem shows that the Normal Distribution is a suitable
model for large samples regardless of the actual distribution.
4.2. Description
The Normal distribution describes a continuous variable that takes on values in the
real number line. The formula for the Normal has two parameters, the mean, and
the variance 2 . The parameter is a location parameter and 2 is a scale
parameter. The symmetric about the mean as shown in the following figure

Consider the following Histogram with normality plot on it for a certain study on
men heights

26

HBAF 3105

It is clear that the very tall are as few as the very short. Majority of the Americans
are 174 cm tall. The heights range from 150 cm which is about 174 - 3(6.7)cm to
about 195 cm which is about 174 + 3(6.7) cm. This is in line with Tchebysheffs
theorem.
Note Tchebysheffs theorem: state that For any set of observations x1 ,x2 ,. . . xn at
least 1 1/k2 of the values will lie within k standard deviations of the mean is where
k1
4.2.1. Functional form
A continuous random variable, X, is normally distributed with a probability density
function given by:

where and are the mean and the standard deviation respectively. The expected
value of a distribution is defined as the probability weighted sum of outcomes.
For X N(, 2 )
and, the variance of a distribution is the probability weighted sum of the squared
differences between outcomes and their expected values.
It is now clear that the parameters and are simply equal to the expected value and
variance (respectively).
4.3. The Standard normal Distribution
A normal distribution with a mean of 0 and a standard deviation of 1 is called
the standard normal distribution. Every normally distributed variable can be transformed into a standard normal variable by commuting the Z score value: The Z
value is the distance between a selected value, designated x, and the population
mean , divided by the population standard deviation
Z = x

27

HBAF 3105

Figure 4.1: Standard Normal distribution curve


The transformed values will always give the curve above. Notice that the central
value of Z is zero (0) and the curve is still symmetric. We determine probabilities
based upon distance from the mean (i.e., the number of standard deviations). It
worth noting that
The probability is the proportion of area under the standard normal curve.
The probabilities have been computed and published under the name Normal
probability tables. What we get when we use these tables is always the area
between the mean and z standard deviations from the mean.
Because of symmetry P(X > 0) = P(X < 0) = 0.5
Tables show probabilities rounded to 4 decimal places. e.g

If Z < 1.96 then probability is 0:9750, we write P(Z < 1.96) = 0.9750
If Z > -1.96 then probability 0.9750, we write P(Z > -1.96) = 0.9750
From the standard normal tables table
1. P(Z < 1.00) = 0.5398
2. P(Z < 2.97) = 0.9985 ) P(Z > 2.97) = 0.0015
3. P(Z < 0) = 0.5000
4. P(Z < -1) = P(Z > 1) = 1 -0.5398 = 0.4602
Example . The daily water usage per person in Thika is normally distributed with
a mean of 20 gallons and a standard deviation of 5 gallons. What is the probability
that a person from Thika selected at random will use;
28

HBAF 3105
1. Less than 20 gallons per day?
2. Less than 25 gallons per day?
3. More than 30 gallons per day?
Solution:
We cannot read the probabilities directly. We must standardize our values as follows

E XERCISE 7.  In a sample of 1000 cases, the mean of ascertain test is 14 and


standard deviation 2.5. Assuming the data is normally distributed. Find out how
many students scored
i. Between 12 and 15
ii. Above 18
iii. Below 8
E XERCISE 8.  The data given below shows the number of employees with their
corresponding ages in a company. Use the data to find the probabilities that a person
picked at random has
i. Age more than 35 year
ii. Age falling between 24 and 38

E XERCISE 9.  A recent study showed that 20% of JKUAT employees are landlords. A sample of 250 employees is taken. What is the probability that less than
40 are landlords?
Suggested materials for further reading
1. Wonnacott, T.H. and Wonnacott, R.J. (1990). Introductory Statistics for Business and Economics, 2nd Edition, John Wiley and Sons Inc.
2. Gujarati, D.N. (2006). Basic Econometrics. 3rd Edition, McGraw-Hill, Inc.,
New York.
3. Keller, G., Warrack, B. and Bartel, H. (1994). Statistics for Management and
Economics. 3rd Edition. Wadsworth Publishing Company, Belmont California, USA.
29

HBAF 3105

LESSON 5
Tests of Hypothesis 1
Learning outcomes
Upon completion of this lesson you should be able to;
1. Define hypothesis testing
2. State two types of errors in hypothesis testing
3. Carry out necessary computations for t-tests
4. Use SPSS to carry out tests involving comparison of means of two groups
5.1. Introduction
A statistical hypothesis is an assertion or conjecture about a parameter (or parameters) of a population. It can be viewed as precise testable statement about the value
of a population parameter developed for the purpose of testing. Hypothesis testing
is a procedure, based on sample evidence and probability theory, used to determine
whether the hypothesis is a reasonable statement and should not be rejected, or is
unreasonable and should be rejected.
Null Hypothesis: :H0 this is the statement to be rejected A statement about
the value of a population parameter. Typically a null hypothesis is the opposite of the real hypothesis of interest. It might state, for example, that a
parameter equals 0 in the population, or that the values of two subgroup parameters are equal in the population.
Alternative Hypothesis: H1 : A statement that is accepted if the sample data
provide evidence that the null hypothesis is false.
Level of Significance : The probability of rejecting the null hypothesis
when it is actually true.
Type I Error: Rejecting the null hypothesis when it is actually true.
Type II Error: Accepting the null hypothesis when it is actually false. That
is
30

HBAF 3105

Power of a test :(1 ) The probability of rejecting a false null hypothesis.


See the following table on associated probabilities.

Test statistic: A value, determined from sample information, used to determine whether or not to reject the null hypothesis.
Critical value: The dividing point between the region where the null hypothesis is rejected and the region where it is not rejected.
A p value: A measure of how much evidence you have against the null
hypothesis. The smaller the , the more evidence you have. One usually combines p value the with the significance level to make decision on a given
test of hypothesis. In such a case, if the p-value is less than some threshold
(usually .05, sometimes a bit larger like 0.1 or a bit smaller like .01) then you
reject the null hypothesis.
5.2. Parametric Tests
Hypothesis tests can be two-tailed when looking for a change, such as testing
H0: = 5V S H0 : 6= 5
or one- tailed when looking for an increase (or decrease) such as testing H0: =
5V S H0 : > 5
The procedure to use when carrying out a hypothesis test is:
1. Determine H0 H1 and the significance level.
2. Decide whether a one- or two-tailed test is appropriate.
3. Calculate the test statistic assuming H0 is true.
4. Compare the test statistic with the critical value(s) for the critical region.
5. Accept or reject H0 as appropriate.

31

HBAF 3105
6. State conclusion in terms of the original problem.
When testing for the population mean from a large sample and the population standard deviation is known, the test statistic is given by

Z = x
/ n
where is a known population standard deviation.
Example . Brandways Company indicates on the label that their loaves weigh
400g. A sample of 40 loaves is selected hourly from their processing line and the
contents weighed. Last hour a sample of 40 loaves had a mean weight of 403g with
a standard deviation of 8g. Test at .05 significance level whether their process is out
of control?
Solution:
The hypotheses to be tested
H0 : = 400V S H0 : 6= 400
403400

The test statistic value is given by Z = x


/ n = 8/ 40 2.371
The critical value from Z-table at 5% is given as 1.96
Since Zcal is greater than Ztab i.e 2.371>1.96 we reject the H0 .

Example . A randomly sample of 9 subjects is taken from a population with a


mean IQ of 100 and standard deviation of 15. The 9 people underwent an intensive
training and then the IQ test was administered. The sample mean IQ was 113 and
the sample standard deviation was found to 10. Test whether the training had any
significant effect (increase) in IQ score?
Solution:
Note that the level of significance is not specified. The standard value is 0.05 but
we may use 0.01 or 0.1 depending on the accuracy required. In this example we use
= 0.01
The hypotheses to be tested
H0 : = 100Against H0 : > 100
113100

The test statistic value is given by Z = x


/ n = 15/ 9 = 2.6
The critical value from Z-table at 1% is given as 2.33
Since Zcal is greater than Ztab i.e 2.6>2.33 we reject the . Conclude that the data
provides enough evidence to indicate that such training increases the IQ.

Commonly used test statistics involving normal distribution are summarized as follows:
32

HBAF 3105

For small sample size i.e the test statics becomes

t = x
s/ n
where
q s is the standard deviation of the sample size usually given by the expression
2

s=

f (xx)
n1

E XERCISE 10.  Jane is in charge of Quality Control at a bottling facility. Currently, she is checking the operation of a machine that is supposed to deliver 355
mL of liquid into an aluminum can. If the machine delivers too little, then the local
Regulatory Agency may fine the company. If the machine delivers too much, then
the company may lose money. For these reasons, Jane is looking for any evidence
that the amount delivered by the machine is different from 355 mL. During her investigation, she obtains a random sample of 10 cans, and measures the following
volumes:
355.8 355.0 355.5 353.7 355.5
355.3 353.8 355.6 355.0 355.4
The machines specifications claim that the amount of liquid delivered varies according to a normal distribution, with mean = 355ml and variance= 0.64ml. Do
the data suggest that the machine is operating correctly?
5.2.1. Z Test for Two Means
The Null Hypothesis should be an assumption about the difference in the population
means for two populations. The data should consist of two samples of quantitative
data (one from each population). The samples must be obtained independently
from each other. The samples must be drawn from populations which have known
Standard Deviations (or Variances). Also, the measured variable in each population
(generically denoted x1 and x2) should have a Normal Distribution.
Procedure: The null Hypothesis:
H0 : 1 2 = dH0 : 1 2 6= d
in which d is the supposed difference in the expected values under the null hypothesis. The Alternate Hypothesis could be,

33

HBAF 3105

The Test Statistic:

Usually, the null hypothesis is that the population means are equal i.e. ; in this case,
the formula reduces to

If the Variances (and thus the Standard Deviations) of the two populations are assumed equal, the pooled variance could be used and in this case, we get..

Example
Universities and colleges in the United States of America are categorized by the
highest degree offered. Types IIA institutions offer a Masters Degree, and type IIB
institutions offer a Baccalaureate degree. A professor, looking for a new position,
wonders if the salary difference between type IIA and IIB institutions is really significant. He finds that a random sample of 200 IIA institutions has a mean salary
(for full professors) of $54,218, with standard deviation $8,450. A random sample
of 200 IIB institutions has a mean salary (for full professors) of $46,550, with standard deviation $9,500 (assume that the sample standard deviations are in fact the
population standard deviations). Do these data indicate a significantly higher salary
at IIA institutions?
Solution
The null hypothesis is that there is no difference; thus
H1 : A > B
Since the hypotheses concern means from independent samples, a two sample test
is indicated. The samples are large, and the standard deviations are known (we
assumed), so a two sample z-test is appropriate.

This value is far much larger than 4, the most extreme value in the standard normal,
we reject the null hypothesis and conclude that IIA schools have a significantly
higher salary than IIB schools.
34

HBAF 3105
5.2.2. The t-Test
The t- test is the most powerful parametric test for calculating the significance of
means when the sample is small and when the population variance is unknown. The
test is based on a t-distribution which has the following properties;
It is continuous, bell-shaped, and symmetrical about zero like the z-distribution.
There is a family of t-distributions sharing a mean of zero but having different
standard deviations.
The t-distribution is more spread out and flatter at the center than the z distribution, but approaches the z-distribution as the sample size gets larger.
A t-test is necessary for small samples because their distributions are not normal.
If the sample is large (n >= 30) then statistical theory says that the sample mean is
normally distributed and a z test for a single mean can be used. This is a result of a
famous statistical theorem, the Central limit theorem.
A t-test, however, can still be applied to larger samples and as the sample size n
grows larger and larger, the results of a t-test and z-test become closer and closer.
In the limit, with infinite degrees of freedom, the results of t and z tests become
identical. In order to perform a t-test, one first has to calculate the degrees of freedom. This quantity takes into account the sample size and the number of parameters
that are being estimated. Here, the population parameter is being estimated by
the sample statistic, the xmean

of the sample data. For a t-test the degree of freedom


of the single mean is . This is because only one population parameter (the populations mean) is being estimated by a sample statistic (the sample mean). degrees of
freedom (d f ) = n 1 . The test statistic for the one sample case is given by;
q

where s = (xx)
n1
For a two-tail test using the t-distribution, you will reject the null hypothesis when
the value of the test statistic is greater than tn1, 2 or if it is less than tn1, 2 depending
on the direction of the tail.
Example.
The current rate for producing 5 amp fuses at an ABC company is 250 per hour.
A new machine has been purchased and installed that, according to the supplier,

35

HBAF 3105
will increase the production rate. A sample of 10 randomly selected hours from last
month revealed the mean hourly production on the new machine was 256, with a
sample standard deviation of 6 per hour. At the 0.05 significance level can ABC
conclude that the new machine is faster?
Solution
The hypothesis is H0 : = 250 H1 : > 250
Since sample is small and the standard deviation is unknown, t-test is appropriate.
We reject the null hypothesis at 0.05 significance level if t9,0.05 > 1.833(From ttables).
Since tcal > 3.16, we reject the null hypothesis and conclude that the sample provides enough evidence the new machine is faster.
Exercise 8
A college professor wants to compare her students scores with the national average.
She chooses a simple random sample of 20 students, who score an average of 54.2
on a standardized test. Their scores have a standard deviation of 4.5. The national
average on the test is a 60. She wants to know if her students scored significantly
lower than the national average.
Comparing Two independent Population Means
A small two sample t-test is used to test the difference between two population
means and when the sample size for at least one population is less than 30. To
conduct this test, three assumptions are required:
The populations must be normally or approximately normally distributed.
The populations must be independent.
The population variances must be equal
The standardized test statistic is:

36

HBAF 3105
Dependent samples
Example
Dependent samples are samples that are paired or related in some fashion. The
idea of using the same subject and taking repeated measurements. For example, if
you wished to buy a car you would look at the same car at two (or more) different
dealerships and compare the prices. Use the following test when the samples are
dependent:
where di = xi yi is the difference between pairs, d is the average of the differences
is the estimated standard deviation of the differences.An independent testing agency
is interested in the cost for renting a single bedroomed house Nairobi estates. A random sample of 6 towns is obtained and the following rental information obtained.
At the .05 significance level can the testing agency conclude that there is a difference in the rental charged between 2006 and 2007?
Estate
Rent06 (Ksh00) Rent07 (Ksh00)
Githurai

55

59

Kahawa

64

65

Ngomongo

23

48

Roysub

38

48

Kawangware
solution
Estate

57

59

Rent06 (Ksh00)

Rent07 (Ksh00)

Difference (d)

Githurai

55

59

Kahawa

64

65

Ngomongo

23

48

13

Roysub

38

48

20

Kawangware

57

59

-7

Total

282

312

30

Average

47

52

37

HBAF 3105
The hypothesis is
Ho : d = 0 H0 : d > 0
The test statistics is

but t5; 0 : 05 = 2.015,. Sincetc < t5; 0 : 05,we fail to reject the null hypothesis
and conclude that the data do not provide enough evidence that rent has increased
significantly.
Exercise 9
A sample of 8 students was given a diagnostic test before studying a particular
module and then again after completing the module. The following data gives their
scores before and after the training.
Test at 0.1 and 0.05 levels of significant if the teaching leads to improvements in
students.
Suggested materials for further reading
1. Wonnacott, T.H. and Wonnacott, R.J. (1990). Introductory Statistics for Business and Economics, 2nd Edition, John Wiley and Sons Inc.
2. Gujarati, D.N. (2006). Basic Econometrics. 3rd Edition, McGraw-Hill, Inc.,
New York.
3. Keller, G., Warrack, B. and Bartel, H. (1994). Statistics for Management and
Economics. 3rd Edition. Wadsworth Publishing Company, Belmont California, USA.

38

HBAF 3105

LESSON 6
Hypothesis Testing 2
Learning outcomes
Upon completion of this lessonyou should be able to;
1. Explain the data cosiderations for one way ANOVA
2. Perform basic computations involving ANOVA
3. Carry out goodness of fit test using chisquare
4. Carry out contingency table analysis chi-square
5. Describe some limitations of hypothesis testing
6.1. Introduction
This lesson combines two topics which may appear totally unrelated. While the
first topic (ANOVA) deals with ratio/interval versus categorical variables, the second part (CHI-SQUARE) assumes categorical variables. You will realize that both
topics are still on hypothesis testing.
6.1.1. Analysis of Variance (ANOVA)
We have studied the test of significance difference in means between two independent populations. For this we used the standard error of mean or the standard
error of difference of the two means, using z-test or t-test. This concept can be
extended to the differences in means of more than two independent populations but
in a slightly different manner. Suppose we want to study the effects of four types of
fertilizers, say A, B, C and D on the yield of sugar cane. We take five plots for each
fertilizer. In this way, the use of 4 fertilizers is done on 20 plots. We can find the
arithmetic means of the yields of 5 plots for each fertilizer separately. But the test
of significance of the difference of these means is not possible with t-test. However,
one way using t-test is that we make 6 pairs of two fertilizers AB, AC, AD, BC, BD
and CD and then test their difference. Conclusion can also be drawn separately.
There arise two difficulties:
1. First, the work of computation will increase and
39

HBAF 3105
2. Second, only the pairs are tested out of the four fertilizers. We cannot find
whether the difference is significant taking them together.
In such situation a method of test of significance to avoid these two difficulties is
needed and the desired objective test of significance between the means of more
than two samples is fulfilled. Here test of significance means, to test the hypothesis
whether the means of several samples have significant difference or not. To testing
the difference among several sample means we use a statistical technique known as
Analysis of Variance. The main objective of the analysis of variance is to test the
hypothesis whether the means of several groups have significant difference or not.
Components of total Variability
When observations are classified into groups or samples on the basis of single criterion, then it is called One-way classification. For examples, The yield of sugar
cane of 20 plots, classified in pots on the basis of four types of fertilizer, the marks
obtained by students of different colleges, etc. In general and for one way classification, total variability is partitioned into two parts that is
Total Variation = Variation between groups+ Variation within groups.
Assumptions of Analysis of Variance
The analysis of variance is based on certain assumptions as given below:
1. Normality of the Distribution: The population for each sample must be normally distributed with mean and unknown variance 2
2. Independence of Samples: All the sample observations must be selected randomly.The total variation of the various sources of variation should be additive.
3. Additivity : The total variation of the various sources of variation should be
additive.
4. Equal variances (but unknown) : The populations from which the k samples
say are drawn have means and unknown variance12 = 22 = ... = n2
5. The error components are independent and have mean 0 and variance 2
The tests of significance performed in the analysis of variance are meaningful under
its assumptions.
40

HBAF 3105
6.1.2. Techniques of One-way ANOVA
1. In One-way analysis of variance there are k groups, one from each of k normal
populations with common variance 2 and means 1 , 2 , ...k . The number
of observations ni in groups may be equal or unequal i.e. n1 + n2 + ...nk = n
2. Linear Model
xi j = +i +ei j where wi j = observations i = 1, 2, ...k, j = ni , = generalmean
ei j =effect of error or random term, i =Effect of ith factor i
3. Null Hypothesis (H0) and Alternative Hypothesis (H1):
H0 : The means of the populations are equal i.e. 1 = 2 = ...K
H1: A least two of the means are not equal.
4. Computations:
(a) Calculate sum of observations in each sample and of all observations.
Sum of sample observations
x1 , x2 , ... xn :
Sum of the squares of the group observations:
x12 , x22 , ... x32
2

(b) Calculate correction factor CF = Tn where T = Square of the sum of all


the observations, n=Total number of observations

(c) Calculate group means x1 , x2 , ...xk and their common mean xwhere

xk = nkxk

ad x = nx err
(d) Calculate total sum of squares by the formula

(e) Sum of squares between samples by the formula

(f) Calculate sum of squares within samples by the formula

41

HBAF 3105
Sum of squares may also be computed as
SSW = T SS SBB
5. Sum of squares within samples is also called Error sum of squares.
6. Calculate mean sum of squares:
SSB
MSSB = Mean sum of squares between samples k1

MSSW = Mean sum of squares within samples =

SSW
k1

Total number of degrees of freedom = n 1


where n - 1 = (k - 1) + (n - k)
7. Obtain the variance ratio F : F =

MSSB
MSSW

8. Interpretation of F - Ratio : Compare the calculated value of F with tabulated


value.
Fc Calculated value of F
Ft = (v1 , v2 )Tabulated value of F
Here
v1 Degree of freedom for numerator
v2 Degree of freedom for denominator
Example . ....
Solution: ,.....

E XERCISE 11.  ...

42

HBAF 3105

LESSON 7
Correlation and Regression Analysis
Learning outcomes
Upon completion of this lesson you should be able to;
1. Draw scatter plots for bi-variate data.
2. Calculate and interpret parametric and non-parametric correlation coefficients.
3. Fit data to simple regression equation.
4. Interpret computer output on multiple regression model.
7.1. Introduction
For most data sets, we are interested in understanding the relationships between the
variables. If the relationship between variables X and Y is causal, it is possible to
predict the effect of changing the value of X. Causality can only be deduced from
how the data were collected the data values themselves do not contain any information about causality. Observational and experimental data In an observational study,
values are passively recorded from individuals. Experiments are characterised by
the experimenters control over the values of one or more variables. Causality and
experiments Causal relationships can only be deduced from well-designed experiments.
7.2. Test of relationships involving quantitative data
Bivariate data is data in which two variables are assigned to each member of a
population, e.g. length and weight, shoe size and arm span, etc. A scatter diagram
can be used to represent bivariate data graphically.

Correlation Analysis is a group of statistical techniques used to measure the strength


of the relationship (correlation) between two variables. Linear correlation gives

43

HBAF 3105
a measure of how well a straight line can be used to model a set of points on a
scatter diagram. The Coefficient of Correlation is a measure of the strength of
the relationship between two variables. The correlation is perfect if the points lie
exactly on a straight line. There are two commonly used correlation coefficients
both giving values between -1 (perfect negative correlation) and +1 (perfect positive
correlation) inclusive.
Pearsons product-moment correlation coefficient,
Spearmans rank correlation coefficient
7.2.1. Pearsons product-moment correlation coefficient
Given bivariate data x1 , x2 , x3 , x4 , ....., xn and y1 , y2 , y3 , y4 , ....yn then we define the following summaries .

Then Pearsons product-moment correlation coefficient is given by

or directly from the sums as

This correlation coefficient should only be used if the two variables are normally
distributed.
7.2.2. Spearmans rank correlation coefficient
The data is ranked (equal values being given the average rank of those which would
otherwise have been taken). Let d be the difference in the ranks and n be the number
of pairs of data, then
d2
r = 1 n 6n2 1
(
)
This rank correlation coefficient is useful when the data is not Normal. Some examples of scatter diagrams and estimates of correlation

44

HBAF 3105
Example . Example
The number of hours (x) spent studying for an examination by 8 students, together
with the marks (y) achieved in the examination, are given in the table below

i. Make a scatter graph for this data


ii. Calculate the product-moment correlation coefficient r for the data.
iii. Spearmans rank correlation coefficient
iv. State what the value of r indicates about the relation between x and y.
Solution:

From the scatter plot it is apparent that marks and time spent in revision have a
strong positive correlation


Spearmans rank

E XERCISE 12.  The following data gives the age in months of a child a the corresponding weight in Kg (a) Plot a scatter diagram for the data and make comments
(b) Calculate the product moment correlation coefficient

45

HBAF 3105
7.3. Linear Regression
Dependent Variable: The variable that is being predicted or estimated. In business
set up the dependent variable could be profit and sales. It usually represents the
output .
Independent Variable: The variable that provides the basis for estimation. It is the
predictor variable or explanatory variables. In business set up the independent variables could be advertisement costs, Number of salesmen etc. Usually independent
variables are represented by variables X1 , X2 , ...Xk for k (inputs) independent variables. If we can find a relationship between the output Yand the inputs X1 , X2 , ...Xk
of the form
Y = 0 + 1 X1 + 2 X2 + ... + k Xk
The above equation is referred to as a multiple regression model with predictors or
independent variables. If we simply have the simple linear regression model given
by Y = 0 + 1 X1
Using ordinary least Square Method (OLS) or maximum likelyhood estimator we
can estimate the and as follows: Given data on Y and X in n pairs (yi , xi ), we can
we compute

Then, for the regression line Y = 0 + 1 X1

The Coefficient of Determination is given by r2 and is the proportion of the total


variation in the dependent variable Y that is explained or accounted for by the variation in the independent variable X. It is the square of the coefficient of correlation,
and ranges from 0 to 1.
Example . Example
The number of hours (x) spent studying for an examination by 8 students, together
with the marks (y) achieved in the examination, are given in the table below

46

HBAF 3105
i. Make a scatter graph for this data
ii. Calculate regression equation of Y on X and plot it on the same axes (a)
Solution:
Scatter plot


From the data we

So the regression equation Y = 24.50 + 5.5X1


Exercise 17
Consider the exercise 16 concerning the age in months of a child versus its weight
at different points in time. Show that the regression equation is given
Y = 0.785X + 5
What percentage of weight is explainable by age?
Multiple regressions
For multiple regressions of the form Y = 0 + 1 X1 + 2 X2 + ... + k Xk ,we can use
computer softwares for data analysis such as; spss, stata, Gretel, eviews, r-gui
among others to fit a regression model We are usually interested in knowing which
variable(s) are contributing significantly to the model and their direction of influence. To achieve this each coefficient should be examined (tested for significance).
If results are provided by a computer, the associated p-values are used to make the
decision. If the p-value is less than the preferred significance level, the null hypothesis is rejected.
Example . Example
A multiple regression model was fitted to some data with dependent variable y and
independent variables x1, x2, x3, x4 and x5. The computer output (Gretel software)
47

HBAF 3105
is as given below. Use it to answer the question that follows.
i. Is the model valid
ii. How good are the predictors in explaining y?
iii. Interpret the influence and relevance of each variable including the constant
Model 11: OLS, using observations 1-402
Dependent variable: Y

Solution:
i. Yes. The model is valid because the F-ratio=83.92284 has a p-value less than the
standard significant level of 0.05
ii. From R-Squared=0.638, we can tell that the predictors can only explain 63.8%
of the variation in y. This is a good level of fit and so x1 , x3, x4 and x5 are good
predictors of y
iii. Since p-values are less than 0.05 for the constant, x1 x3 x4 and x5 and, these
terms are relevant to the model but x2 is not and may be dropped. The best predictor
is x1 (t-value is largest in absolute value) with a positive influence.

7.3.1. Multiple regression with dummy variables
Now, let us look at the case where some input variables are categorical. Suppose
the researcher wants to include variables such as gender, marital status, employment category into the model as repressors, the solution is use of dummy variables.
Consider the following data as captured in SPSS file.

48

HBAF 3105
E XERCISE 13.  Suppose we wish to fit a regression model in which Maths is the
dependent variable (Y ), and the independent variables are taken to be Kiswahili
(X1), English (X2), Home (X3) and Gender (X4). The regression function has the
same general form
Y = 0 + 1 X1 + 2 X2 + ... + k Xk
but in this case since Home and Gender are categorical variables, we need to give
them appropriate codes. The codes have no meaning numerically but should indicate presence or absence. For example, gender should be coded as 1 for Male and
0 for female or vice versus but NOT 1 for male and 2 for female. The 1 will indicate presence of maleness and the 0 will indicate absence of maleness in implying
femaleness. The same rule applies to the variable Home. The coded data should
look as follows;

We can now use the standard procedure to get regression output below. SPSS gives
the output in three separate tables. It is wise to start with table 2 which gives the
validity information. In this model

Although the table through tells us that 85.2% of the variation in math marks can be
explained by the four explanatory variables, the validity of this model may render
all this useless. It is therefore important to look at the following table before we
become excited about the good performance of the model.

Since the F-ratio=15.852, has a p-value less than the standard significant level of
0.05 this model is very valid. We can report the results knowing well that the model
is not fitting the data by chance.
49

HBAF 3105

The influence and relevance of each predictor variable shows that both Kiswahili
marks and students gender significantly determines. Home background and performance in English are irrelevant in this model but how do we interpret the significant
coefficients?
Answer
Since the unstandardized coefficient of Kiswahili is -0.433, we can tell that for
every increase in Kiswahili marks by 1 mark, maths marks goes down by 0.433
marks. That is Kiswahili marks can be used to predict Maths marks and the higher
the score in Kiswahili the lower in Maths. For gender this interpretation does not
make sense but let us try. The unstandardized coefficient of Gender is 9.613, do we
say that for every increase in Gender by 1 (what??), maths marks goes up by 9.613.
The correct way of saying this is the presence of maleness increases Maths score by
9.613 marks. In other words being a male significantly places you in an advantage
position to perform better in Mathematics.
What happens when the variable has more than two categories? Just make it k
1 dichotomies variables where k is the number of categories in the variable. E.g.
Boarding category would become 2 variables (k = 3) and the variable would adequately represent the data. Suppose the variables are Boarding1 and Boarding2
then the following gives a simple display of expected entries in the data file
Boarding 1 Boarding 2
Day

Boarding

Mixed
0
0
In reality Boarding1 represents Day school while Boarding2 represents Boarding
schools while the absence of both implies mixed school.
7.3.2. Dealing with Interaction terms
The effects of two explanatory variables are not always additive. For example,
increasing the amount of nitrogen fertilizer (X) may improve the yield of wheat
50

HBAF 3105
(Y), but only at high temperatures (Z) as shown in the following figure.

In the illustrative diagram, we have only shown data at two values of Z to show
clearly that the increase in yield per unit increase in fertilizer (i.e. the slope for X)
is greater at high temperatures (high Z) than at low temperatures (low Z). To model
interaction between the effects of X and Z, a simple model that is often adequate
adds a term involving the product of X and Z to the linear model,
Y = 0 + 1 X + 2 Z + 3 XZ
This model can be written as Y = (0 + 1 X)1 + (2 + 3 X)Z +
which is a general linear model because the unknown parameters appear linearly.
When considering how the expected response is affected by changes to X, note that
the slope of the modified equation is the red term which involves Z: The effect of
increasing X by 1 unit depends on Z.
Suggested materials for further reading
1. Fruend, J.E. and Williams, F.J. (1979). Modern Business Statistics. Pitman
Publishing Limited, London.
2. Gupta, S.C. and Kapoor, V.K. (1995). Fundamentals of Mathematical Statistics. Sultan Chand and Sons, New Delhi.
3. Keller, G., Warrack, B. and Bartel, H. (1994). Statistics for Management and
Economics. 3rd Edition. Wadsworth Publishing Company, Belmont California, USA.
4. W. Douglas Stirling. Computer-Assisted Statistics Textbooks. Palmerston
North, New Zealand. http://cast.massey.ac.nz/african (Freely available online)

51

HBAF 3105

LESSON 8
Non-Linear Regression Analysis
Learning outcomes
Upon completion of this lesson you should be able to;
1. Identity some useful models for non-linear regression analysis
2. Distinguish between a logit and probit model
3. Fit given data to with binary dependent variable to appropriate regression
model
4. Interpret results of Cobb-Douglas production model
8.1. A linear model for proportions?
When we modeled how a numerical explanatory variable effected a numerical response variable, a linear equation was used, That is;
y == b0 + b1 x
When the response variable is categorical, it is tempting to try a similar linear equation to explain how the proportion in one response category is affected by the explanatory variable. That is Predicted proportion
p == b0 + b1 x
To model how a proportion depends on a numerical explanatory variable, X, an
equation should give values between 0 and 1 for all possible values of X. This
means that the equation must be nonlinear in X.
8.1.1. Logistic curve: A curve that lies between 0 and 1 for all values of X
A linear equation cannot provide adequate predictions of the proportion in a response category at extreme values of X. There are various nonlinear equations that
satisfy the requirement that their value is between 0 and 1 for all values of X, but
the simplest of these is a logistic curve, Predicted Proportion

Logistic curves satisfy the requirement because...

52

HBAF 3105
The numerator and denominator are always positive, so their ratio must be
positive too.
The denominator is 1.0 greater than the numerator, so the ratio must be less
than 1.
The goal is to model the probability of a particular as a function of the predictor
variable(s).
8.1.2. The parameters of the logistic curve
The constants b0 andb1 have a similar effect on the shape of the logistic curve to
the corresponding parameters of a linear equation.
The parameter b0 determines the horizontal position of the curve. Increasing
it shifts the curve to the left.
The parameter b1 determines the slope of the curve. Increasing it makes the
curve steeper, and its sign determines whether the curve slopes upwards or
downwards.
We again b0 c all the intercept of the curve and we call the slope.
8.1.3. Multiple logistic regression
Recall that value produced by logistic regression is a probability value between 0.0
and 1.0. If the probability for group membership in the modeled category is above
some cut point (the default is 0.50), the subject is predicted to be a member of the
modeled group. If the probability is below the cut point, the subject is predicted to
be a member of the other group.
For any given case, logistic regression computes the probability that a case with a
particular set of values for the independent variable is a member of the modeled
category. For dichotomous variable Y (1 for Success and 0 for Failure) the
model for multiple input variables is;

or you can write a re-arranged equivalent model as

53

HBAF 3105
Logistic regression analysis requires that the independent variables be metric
or dichotomous.
If an independent variable is nominal level and not dichotomous, the logistic
regression procedure in SPSS has a option to automatically dummy code the
variable for you.(For software without this option, make the variable c - 1
categories where c is the number of categories.
We consider an example given in UCLA Academic Technology Services. These
data were collected on 200 high schools students and are scores on various tests,
including science, math, reading and social studies (socst) under the file name
hsb2.sav. The variable female is a dichotomous variable coded 1 if the student
was female and 0 if male. Because the raw data does not have a suitable dichotomous variable to use as our dependent variable, create one (and call it honcomp,
for honors composition) based on the continuous variable write (Recode all values
below 60 to zero (0) and all values above 60 to one(1)). The data would look like
this.

The total data set consists of 200 cases, of which the first 17 are shown above.
The dependent or response variable is "honcomp" while the predictors are read,
science and ses. Run the Logistic Regression analysis in SPSS as follows
1. Open hsb2.sav
2. Recode the variable write as explained above
3. Open the syntax file in SPSS and write the following commands l o g i s t i c
r e g r e s s i o n honcomp with read s c i e n c e s e s / c a t e g o r i c a l s e s .
4. Run the commands to get logistic regression output OR use the standard menu
to run the procedure.
54

HBAF 3105
5. Output Analysis: A number of tables are produced and their details can be
understood. Here we look at a number of them that might not be straightforward.
Categorical Variable Codings

This table shows the Automatic transformation of the ses categorical variable (has
3 categories) into two dichotomous variables: low and middle ( Notice that when
ses isnot low or middle, then it is high).
1. Observed - This indicates the number of 0s and 1s that are observed in the
dependent variable.

2. Step 1 - This is the first step (or model) with predictors in it.
3. Chi-square and Sig. - This is the chi-square statistic and its significance
level. In this example, the statistics for the Step, Model and Block are the
same because we have not used stepwise logistic regression or blocking. The
value given in the Sig. column is the probability of obtaining the chi-square
statistic given that the null hypothesisis true. In other words, this is the probability of obtaining this chi-square statistic (65.588) if there is in fact no effect
of the independent variables, taken together, on the dependent variable. This
is, of course, the p-value, which is compared to a critical value, perhaps 0.05
or 0.01 to determine if the overall model is statistically significant. In this
case, the model is statistically significant because the p-value is less than
0.000.
4. df - This is the number of degrees of freedom for the model. There is one
degree of freedom for each predictor in the model. In this example, we have
four predictors: read, write and two dummies for ses. e.
55

HBAF 3105
5. -2 Log likelihood - This is the -2 log likelihood for the final model. By itself,
this number is not very informative. However, it can be used to compare
nested (reduced) models.
6. Cox & Snell R Square and Nagelkerke R Square - These are pseudo Rsquares. Logistic regression does not have an equivalent to the R-squared that
is found in OLS regression; however, many people have tried to come up with
one. There are a wide variety of pseudo-R-square statistics (these are only
two of them). Because this statistic does not mean what R-squared means in
OLS regression (the proportion of variance explained by the predictors), we
suggest interpreting this statistic with great caution.

7. Observed - This indicates the number of 0s and 1s that are observed in the
dependent variable.
8. Predicted - These are the predicted values of the dependent variable based
on the full logistic regression model. This table shows how many cases are
correctly predicted (132 cases are observed to be 0 and are correctly predicted
to be 0; 27 cases are observed to be 1 and are correctly predicted to be 1), and
how many cases are not correctly predicted (15 cases are observed to be 0 but
are predicted to be 1; 26 cases are observed to be 1 but are predicted to be 0).
9. Overall Percentage - This gives the percent of cases for which the dependent
variables was correctly predicted given the model. In this part of the output,
this is the null model. Note that 79.5 = 159/200

10. B - This is the coefficient for the constant (also called the "intercept") in the
null model. (k) S.E. - This is the standard error around the coefficient for the
constant

56

HBAF 3105
11. Wald and Sig. - This is the Wald chi-square test that tests the null hypothesis
that the constant equals 0. This hypothesis is rejected because the p-value
(listed in the column called "Sig.") is smaller than the critical p-value of .05
(or .01). Hence, we conclude that the constant is not 0. Usually, this finding
is not of interest to researchers.
12. df - This is the degrees of freedom for the Wald chi-square test. There is
only one degree of freedom because there is only one predictor in the model,
namely the constant.
13. Exp(B) - This is the exponentiation of the B coefficient, which is an odds
ratio. This value is given by default because odds ratios can be easier to
interpret than the coefficient. In this case the odds ratio for ses(2) is 0.363
implying that those with middle level of ses are 1/0.363=2.75 less likely than
those in ses(3) or high level to get honors composition.
Model The prediction equation is
Notice that for variable x3 (ses), we have two dichotomous variables x31 (ses(1)
or low) and x32 (ses(2) or middle). Here p is the probability of being in honors
composition. Expressed in terms of the variables used in this example, the logistic
regression equation is

Interpretation Parameters
Read - For the variable read, the p-value is .000 (<0.001), so the null hypothesis
that the coefficient equals 0 would be rejected and so for every one-unit increase in
reading score we expect a 0.098 increase in the log-odds of honcomp, holding all
other independent variables constant.
science - For the variable science, the p-value is .015, so the null hypothesis that the
coefficient equals 0 would be rejected and so for every one-unit increase in science
score, we expect a 0.066 increase in the log-odds of honcomp, holding all other
independent variables constant.
ses - For the variable ses, the p-value is .035, so the null hypothesis that the coefficient equals 0 would be rejected. Because the test of the overall variable is statistically significant, you can look at the one degree of freedom tests for the dummies
57

HBAF 3105
ses(1) and ses(2). The dummy ses(1) is not statistically significantly different from
the dummy ses(3) (which is the omitted, or reference, category), but the dummy
ses(2) is statistically significantly different from the dummy ses(3) with a p-value
of .022. This tells you if the overall variable ses is statistically significant. There
is no coefficient listed, because ses is not a variable in the model. Rather, dummy
variables which code for ses are in the equation, and those have coefficients. Since
the reference group is level 3 (see the Categorical Variables Codings table above),
the coefficient of ses(2) represents the difference between level 2 of ses and level 3.
In this case the odds ratio for ses(2) is 0.363 implying that those with middle level
of ses are 1/0.363=2.75 less likely than those in ses(3) or high level to get honors
composition. m. df - This column lists the degrees of freedom for each of the tests
of the coefficients.
E XERCISE 14.  The following tables are part of SPSS output for logistic regression fitted to part of hsb2.sav data.

58

HBAF 3105

Using the example above, interpret the model comprehensively and write down the
fitted equation.
8.2. Probit Model
Unlike the logistic model that uses the cumulative logistic distribution (logit), the
probit model uses the standard normal distribution (probit) given by
where z = 0 + 1 x + 2 x2 + ... + k xk Both Logit and Probit models assume that
the dependent variable Y is dichotomy.
Note
Choosing between Logit/Probit-In the dichotomous case, there is no basis in
statistical theory for preferring one over the other. In most applications it
makes no difference which one uses.
If we have a small sample the two distributions can differ significantly in their
results, but they are quite similar in large samples.
Various R2 measures have been devised for Logit and Probit but they are ad
hoc and cannot be compared to R2 in linear regression analysis.
For detailed SPSS example on Probit Analysis, search for Annotated SPSS Output
Probit Regression1. Data file by name probit.sav on undergraduates applying to
graduate school and includes undergraduate GPAs, the reputation of the school of
the undergraduate (a topnotch indicator), the students GRE score, and whether or
not the student was admitted to graduate school is used and a detailed explanation
of the results is given.

59

HBAF 3105
8.3. CobbDouglas functional form of production functions
The CobbDouglas production function is widely used to represent the relationship
of output and two inputs. The Cobb-Douglas form was developed and tested against
statistical evidence by Charles Cobb and Paul Douglas during 19001947.
8.3.1. Formulation
In its most standard form for production of a single good with two factors, the
function is
Y = AL K
where:
Y=total production (the monetary value of all goods produced in a year)
L =labor input (the total number of person-hours worked in a year)
K =capital input (the monetary worth of all machinery, equipment, and buildings)
A =total factor productivity an d ar the output elasticities of capital and
labor, respectively. These values are constants determined by available technology.
Output elasticity measures the responsiveness of output to a change in levels of
either labor or capital used in production, all other things being equal or held constant. For example if = 0.15, a 1% increase in labor would lead to approximately a
0.15% increase in output. Empirically it was found that, 75% increase in output can
be attributed to increase in labour input and the remaining 25% was due to capital
input. It was also found that the sum of exponents of Cobb-Douglas production
function is equal to one. That is + = 1. This implies that it is a linearly homogenous
production function. Following are important features of Cobb-Douglas Production
Function;
1. Average Product of factors of production used up in this function depends
upon the ratio in which the factors are combined for the production of commodity under consideration

60

HBAF 3105
2. Marginal Product of factors of production used up in this function also depends upon the ratio in which the factors are combined for the production of
commodity under consideration
3. Cobb-Douglas production function is used in obtaining marginal rate of technical substitution (the rate at which one input can be substituted for the other
to produce same level of output) between two inputs.
4. As seen earlier, the sum of exponents of Cobb Douglas production function
is equal to one i.e. + = 1.. This is a measure of returns to scale.
(a) When + = 1, it is constant returns to scale,
(b) If + < 1., returns to scale are decreasing, and
(c) if + = 1. returns to scale are increasing.
Cobb and Douglas were influenced by statistical evidence that appeared to show
that labor and capital shares of total output were constant over time in developed
countries; they explained this by statistical fitting least-squares regression of their
production function. There is now doubt over whether constancy over time exists.
[These notes can be found in WikiPedia]
8.3.2. Application
Using available data, we can take the natural log of each data series to create variables that are in the log levels rather than the levels to get.

We then carry out the standard Regression Analysis on the transformed data.
Example . Suppose for some data the fitted model with all parameters being
significant the regression equation is
lnYt = 7.08 + 0.94ln(L) + 0.51ln(K)
and R2 = 0:9975. Interpret this model.
Solution: If all parameters are significant, their corresponding p-values are less than
5%. The 0.94 estimate for 1 or 1 indicates that a 10 percent increase in the L leads
to a 9.4 percent increase in the output level Y , which implies there is diminishing
returns to labor. Similarly, the 0.51 estimate for 1 or indicates that a 10 percent
61

HBAF 3105
increase in the K leads to a 5.1 percent increase in the output level, which implies
there are diminishing returns to capital.

However, + = 1 : 45 > 1 (the sum is greater than one), which implies production exhibits increasing returns to scale. Increasing returns to scale means a
proportionate increase in all inputs leads to a more than proportional increase the
output. For example, doubling all inputs would lead to more than a doubling of
output. In this case, indicates a one hundred percent increase in (or doubling of) the
inputs leads to a 145 percent increase in the output level.
Suggestions for further reading /reference
1. Johnston, J. (1972). Econometric Methods, 2nd Edition, McGraw-Hill Kogakusha, Ltd, Tokyo.
2. http:www.ats.ucla.edu/stat/spss/output/SPSS_probit.htm
3. http://www.ats.ucla.edu/stat/spss/output/logistic.htm

62

HBAF 3105

LESSON 9
Index numbers
Learning outcomes
Upon completion of this lesson you should be able to;
1. Define an index number and describe its properties.
2. Compute simple and composite price indices
3. Compute Laspeyres, Paasche and Fishers indices and interpret them
4. Apply process of deflating to time series data.
9.1. Index numbers
An index number measures the value of an item (or group of items) at a point in
time as a percentage of the value of the item (or group of items) at another fixed
time point.
9.1.1. Price and quantity indices
There are many types of indexes (or indices) for example price indices are used to
measure changes in prices of items over time, while quantity indices are used to
measure changes in quantities such as imports or exports over time. In this section,
only price indices are considered but many of the principles and formula carry over
to other types of index numbers. Price indices are widely used to describe business
activity. Index numbers describing consumer prices and stock market prices are
widely used and reported in the media. An index number can describe a specific
category of item or may be more general.
9.1.2. CPI and stock market indices
The Consumers Price Index (CPI) in a country summarises the overall price level
of goods and services purchased by households at different times. Other price indices describe prices of energy, accommodation and various classes of food. Stock
market indices such as the NSE 20 share index, Dow Jones (USA), FTSE 100 (UK)
and NZX50 (New Zealand) are used to summarize changes in the value of company
shares in specific countries.
63

HBAF 3105
9.2. Simple price index
A simple price index measures the price of a single item or commodity as a percentage of the price of the same item at a fixed time, normally in the past. The fixed
time is called the base period and may be chosen for convenience (e.g. January 1st
for daily data, or 2010 for annual data). Assuming the time period is years, then if
P0 denotes the price in the base year and denotes the price in year then the index
number for year is given by
Pi = PP0I 100
The simple price index is just the current price expressed as a percentage of the
base year price. However some index numbers use a factor of 1000 rather than 100,
especially if it is desired to express the index as a whole number. Find the index
numbers using 2000 as the base year.
Example . Example
The table below shows the spot price of European Brent Oil (in US dollars per
barrel) from 2000 to 2005.

Solution:
Using 2000 as the base year, the price index for year 2001 equals

Similarly the price index for year 2002 equals



The full series is shown below.
2000 2001 2002
2003

2004

2005

100 85.25 87.19 100.66 133.50 190.40


The index allows us to measure changes as a percentage of the base year. In 2001
the price was about 15% lower than in 2000 (85.35100 = -14.65) but by 2009 it
was 115% higher (215.42100 = 115.42). Note that:
An index value below 100 means that the price that year was lower than the
base year

64

HBAF 3105
An index value above 100 means that the price that year was higher than the
base year
The base year always has an index of 100 (or 1000).
Changing the base year
In practice the base year is revised from time to time so that comparisons can be
made with a recent (i.e. not ancient) price value. For example the quarterly New
Zealand Consumers Price Index has a current base of June 2006. Converting an
existing index to a new base is quite straightforward.
IExisting
INew = INewbase
100
Here new refers to the index using the new base, existing refers to the index using
the existing base, and newbase refers to the index for the new base year using the
existing base year.
9.3. Aggregate price
index An aggregate price index combines the prices of several related items into a
single index number. The group of items is sometimes referred to as a market basket or basket for short. There are many examples of aggregate indices the NZX50
index aggregates the prices of the top 50 companies (as measured by market capitalisation) listed on the New Zealand Stock Exchange, Nairobi 20 shares index., the
quarterly Consumers Price Index (or CPI) aggregates the prices of a range of food
and related household shopping items and is commonly used as a measure for price
inflation in an economy.
9.3.1. Unweighted aggregate price index
There are two types of aggregate price indices. The simpler type is known as an
unweighted aggregate price index and is so called because it gives equal weight
to each item in the basket. If there are n items in the basket then the unweighted
aggregate price index in time i is given by

( j)

( j)

wherePi and P0 denote the prices of the item in the basket at time i and at the
base time respectively.

65

HBAF 3105

9.4. Laspeyres and Paasche indices


An unweighted price index treats each item in the basket equally. For a price index,
this is usually equivalent to assuming that a consumer purchases the same amount
of each item. In practice this is usually not the case and so a weighted aggregate
price index weights the price of each item by the quantity. There are two ways of
doing this;
9.4.1. Laspeyres index
This uses the quantities of items in the base period as weights. The formula for
computing the index is given by;

( j)

where Q0 denote the quantity of the jth item in the basket at the base time.
Exercise 22
Applying the formula for the Laspeyres index to the prices for 2006 for the data in
table above assuming the quantity for 2006 was the same as that for 2005.
9.4.2. Paasche index
It uses the quantities of items in the current period as weights taking into account
of variations in consumption patterns over time. For example, there may be a trend
for consumers to use margarine instead of butter between 2000 and 2010. The
Laspeyres index for dairy products in 2010 is based on out-of-date consumption
patterns from 2000 whereas the Paasche index for 2010 is based on the current
consumption of the items. The formula is

E XERCISE 15.  Obama owns stock in three companies. Shown below is the
price per share at the end of 2000 and 2007 for the three stocks and the quantities
he owned in 2000 and 2007. Using 2000 as the base year, compute Laspeyres

66

HBAF 3105
Weighted Price Index (LI), Paaschen Weighted Price index (PI) and the Value index
(VI). Interpret the value index.
Company Price
Quantity
10

22

35

30

1.5

60

70

10

9.4.3. Fishers Ideal Index


Laspeyres index tends to overweight goods whose prices have increased and Paasches
index, on the other hand, tends to overweight goods whose prices have gone down.
Fishers ideal (FI) index was developed in an attempt to offset these shortcomings.
It is the geometricq
mean of the Laspeyres and Paasche indexes. That is;

FI = LI PI = Laspeyres Paasches
E XERCISE 16.  Using the previous example obtain the Fishers ideal index
Test for an Ideal Index number
It is considered that perfect index number should follow the following test The time
reversal test: if we reverse the time subscripts of price (or quantity) index, the result
should be reciprocal of the index that is
P0n Pn0 = 1
where P0n is the price index for current year with the base period 0 and is the price
index for the current year 0, with base period n. If Lespeyres does not satisfy time
reversal factor then

For paasches if we have

Then the Paasches does not satisfy the time reversal factor. Other test that may be
used to test I deal index number are: the factor reversal test, the circular test and
proportionality test.
Exercise
Uses the previous example to test whether both paasches and lespeyres price indices
are ideal indices using time reversal factor.
67

HBAF 3105
9.5. Deflating a time series
Many time series display the effects of more than one variable changing over time.
For example, changes in the NZ price of an item sourced in the USA will reflect
changes in the NZ$/US$ exchange rate as well as changes in the US$ price. If an
index is available which measures the effect of such a variable then its effect can
be removed by a process of deflating. The idea is similar to that of detrending or
deseasonalising a time series (See the relevant section). If Xi denotes the time series
value at any time i and Ii and I0 denote the index values at time i and the base time
respectively then the deflated value Di is given by
Di = Xi II0i
9.5.1. Correcting for inflation
This kind of adjustment is often used to take account of inflation. Although it is
interesting to know that Tarakihi cost $25.43 per kg in 2008 but only $19.20 per kg
in 2005, an increase in price is hardly surprising when wages and all other prices
rose in that period. The Consumer Price Index (CPI) is often used to adjust for
inflation. Since the CPI was 953 in 2005 and 1044 in 2008 (based on a CPI of 1000
in June 2006), the price of Tarakihi in 2008 can be expressed in "2005 dollars" as:

In 2005 dollars, the price of Tarakihi rose from $19.20 in 2005 to $23.21 in 2008.
E XERCISE 17.  The table below shows New Zealands Gross Domestic Product
from 2002 to 2008 together with the CPI (Source: Statistics NZ website). Note that
the CPI uses a factor of 1000 rather than 100 and has a base quarter of June 2006.
New Zealand GDP ($millions) and CPI

Suggested materials for further reading


1. Gupta, S.C. and Kapoor, V.K. (1995). Fundamentals of Mathematical Statistics. Sultan Chand and Sons, New Delhi.
2. Keller, G., Warrack, B. and Bartel, H. (1994). Statistics for Management and
Economics. 3rd Edition. Wadsworth Publishing Company, Belmont California, USA.
68

HBAF 3105
3. W. Douglas Stirling. Computer-Assisted Statistics Textbooks. Palmerston
North, New Zealand. http://cast.massey.ac.nz/african (Freely available online)
4. Mason, R. D., Lind, D. A. and Marchal, W. G. (1999). Statistical Techniques in Business and Economics. Irwin McGraw-Hill, Boston. ISBN-10:
0256263078, ISBN-13: 978-0256263077, Edition: 10th
5. Douglas Lind, William Marchal, Samuel Wathen (2009). Statistical Techniques in Business and Economics with Student CD [Hardcover], ISBN-10:
0077309421, ISBN-13: 978-0077309428, Edition: 14
6. Thomas H. Wonnacott, Ronald J. Wonnacott (1990) Introductory Statistics
for Business and Economics, 4th Edition, John Wiley and Sons Inc. [Hardcover] ISBN-10: 047161517X , ISBN-13: 978-0471615170

69

HBAF 3105

LESSON 10
Basics of Time Series Analysis
Learning outcomes
Upon completion of this lesson you should be able to;
1. Identify and describe the main components of a time series
2. Smooth a given time series
3. Fit a given time series to a linear trend curve
4. Fit seasonal data to Exponential Model
5. Forecast future values of a time series
10.1. Time series data
Many data sets contain measurements that are made sequentially at regular intervals. These data are called time series.
Exporters look at recent currency exchange rates to help predict future movements that will affect the price of their products in foreign markets.
Manufacturers collect data regularly on the quality of their products. For example, the fat content of milk is likely to be recorded daily by a bottling plant.
Climatologists analyze historical records of weather to assess the evidence for
global warming.
Retail chains monitor changes in the population in different regions to help
determine where new stores should be located.
Health scientists examine time series of the number of influenza cases to help
predict demand for vaccines.
Definition 7
A discrete time series is a sequence of observed values {x1 , x2 , ...xn }measured at
discrete times{t1 ,t2 , ...tn } tng. In other words, a time series is any statistical data that
is arranged according to the time it was recorded or observed (chronologically). The
70

HBAF 3105
time interval is usually regular and may be any of the time units existing naturally
(milliseconds, seconds, minutes, hours, days, weeks, months, years, decades, ... ,
millenniums). Time series data are widely analysed in the business world accurate
forecasting of exchange rates, share prices, demand for products and other business
variables can have a major effect on profitability. There are, obviously, numerous
reasons to record and to analyze the data of a time series. The main ones are;
To explore and extract signals (patterns) contained in time series in order to
gain a better understanding of the data generating mechanism,
To explain (i.e variation in one time series may be used to explain variation
in the other time series)
To make forecasts (i.e. predict future values)
To use the acquired knowledge to optimally control systems and processes.
10.2. Types of time series data
Time series data arise in various different contexts.
Measurements relating to events that occur at discrete times e.g. the dividend
paid out each year on British Airways shares
Regular snapshots of a continuous process e.g. the Consumer Price Index at
the end of each month
Quantities that are aggregated over a period e.g. the numbers of admissions
to a hospital each day, or monthly energy consumption
Further, the measurements themselves may be of various different types.
Continuous e.g. fat content in homogenized milk produced by a bottling
factory or signal of a electric voltage passing a particular point
Discrete e.g. number of complaints received by a department store each day
In this course we do not need to further distinguish between the different types of
time series but our main focus will be on discrete type.

71

HBAF 3105
10.3. Components of a time series
There are four components (forces that determine the observed values) of a time
series. A few patterns in time series are particularly important.
10.3.1. Trend
These are the long term movements which give the general way in which the data
move over a long period of time. A graph of observed value versus time may show
small ups and downs, but the long term movement eliminates these minor variations
and looks at the big picture. If you draw a graph of this trend, it is called a trend
curve. Identifying trend is important since we might use it to help forecast future
values.
10.3.2. Cyclic Movements
These are movements that happen in regular long-term cycles. In business, for
example, cycles consists of alternating periods of recession and inflation, recovery
and prosperous times. Cyclic movements are large scale and should be very clear
in the data. The collapse of the Kenya Bus Service, Uchumi supermarkets e.t.c
10.3.3. Seasonal Movements
These are also movements that happen in regular cycles, but repeat yearly. The patterns are caused by either natural conditions such as weather fluctuations or manmade conditions such as business, administrative, political procedures, start and end
of semester, Easter Holidays, festive seasons etc.
10.3.4. Random or irregular fluctuations
These are ups and downs in a time series that do not correspond to trend, seasonal
variation or auto correlation. They are unpredictable and as a result off chance
events such as strikes, floods, earthquakes, plane clash, post election violence etc.
The timeplot
The key components can easily be seen in the Figure 10.1 (The data used here is
available online and aslo as a part of Gretl open source software. Download and
confirm the features)

72

HBAF 3105

Figure 10.1: Time plot of US Airline passengers data in thousands


It is often difficult to get useful information from time series if they are presented
in tabular form. As seen in the Figure 10.1, information in a time series is most
easily understood from a graphical display. A time series plot is a type of dot plot
in which the values are displayed as crosses against a vertical axis. The horizontal
axis spreads out the crosses in time order. (It can also be thought of as a scatter plot
in which the explanatory variable is time.) The figure clearly brings out the feature
discussed earlier. The trend is steady and upwards, seasonal fluctuations dominate
the data and some irregular patterns are also evident.
E XERCISE 18.  The table below shows the number of driving licenses approved
for citizens in Mombasa, Kenya each year from 1978 to 2001. Plot the series and
describe its features (You may use Excel, R or Gretel)

Multiple time series


Several related time series can be superimposed with different colours on the same
display, making comparisons easier. The crosses at the individual data points are
often omitted to reduce the clutter of the display.
10.4. Smoothing of a time series
In a time series, random fluctuations can usually be treated as noise that can obscure
trend and other signal in the series. Various smoothing methods have been proposed
to reduce these random fluctuations and show the systematic movement in the series
more clearly. These methods replace each value in the series with a function of
it and the adjacent values. Smoothed value = centre (original value and adjacent

73

HBAF 3105

Figure 10.2: A smoothed value may therefore be for "year 2005.5" which is far
from ideal.
values) For example, each value might be replaced by the mean of it and the two
adjacent values, replacing the value at time i by
x = mean(xi , xi1 andxi+1
This smoothed fit is called a 3-point moving average. Moving averages are also
called running means. Greater smoothing is obtained with means of more adjacent
values. For example, a 5-point moving average replaces each value with the mean
of it and the 2 adjacent values on each side.
Loss of ends of values
Moving averages are effective at highlighting the trend in the centre of a time series,
but cannot be used at the ends since the moving average requires values both before
and after each value being smoothed. As the span (order of moving average) of
smoothing increases, the number of un-smoothed values at the ends of the series
also increases. For example, if 7-point moving averages are used, 3 values at each
end of the series cannot be smoothed.
10.4.1. Moving average with odd and even run lengths
A moving average provides a smoothed value at the middle of the times of the
values being averaged. For example, if the run length is 5, the smoothed value
is identified with the middle time. This works fine for moving averages of odd
numbers of values, as were used in the previous page. However if moving averages
are round using an even number of values, the resulting smoothed value are for
half-way between the times of the middle observations.
A second stage of averaging
To avoid this problem, it is conventional to post-process the moving averages with
an even run length by taking a further 2-point moving average to get the values
centered on the original times.
74

HBAF 3105

This is equivalent to giving half weight to the two outermost values. When based
on a 4-point moving average, this method therefore uses an average of the 3 values
centered on each value and two further values with half-weight.

Example . Given the following seasonal data, obtain the smoothed trend values
assuming the additive model

Solution:
To completely destroy seasonal patterns the order of the moving average should be
a multiple of 4 (the period of the data). In this case we use 4 in order to get Moving
averages whose main component is the trend.


10.4.2. Robust smoothing
Moving averages and running medians each have their advantages and disadvantages.
Moving averages are more affected by outliers in the series.
Running medians often have a stepped appearance the smoothed series is
level for periods, followed by relatively sharp jumps.

75

HBAF 3105
10.4.3. Running medians, followed by moving averages
To take advantage of the best features of both moving averages and running medians, these two techniques are often applied sequentially.
Firstly, low-order running medians are used to remove the influence of outliers.
The resulting series is then further smoothed with low-order moving averages.
10.4.4. Limitations of moving averages
We used moving averages to smooth out the seasonal variation in a time series, but
this method has serious limitations.
Moving averages cannot be used for the ends of a time series. For monthly
data, this means that we cannot remove the seasonal variation from the last 6
months usually the most important part of the series
The smoothing is only local. The moving average only uses values in the
current cycle so we are not using information from other cycles to determine
the seasonal pattern more accurately.
The method does not provide forecasts of future values in the series.
10.5. Long-term trend and Forecasting
Moving averages provide a good description of the trend in a time series. However
a common goal in time series analysis is to forecast values of the time series in
the future. For example, accurate forecasting of the demand for a product allows
production capacity to be adjusted in time to meet changes to the demand. Moving
averages cannot smooth the end values of the series and do not provide a method to
extend the trend into the future. The best that can be done is to extend the trend by
eye hardly an objective forecasting method!
10.5.1. Least squares for a polynomial fit
Linear Trend An alternative is to describe the trend with a mathematical equation
which models the trend as a function of time,
76

HBAF 3105
trend = f (t)
where the function usually involves some constants (parameters) that can be adjusted to improve the fit of the model. The simplest such model is a linear model of
the form
trend = b0 + b1t
This model has the same form as the linear models and the residuals are the differences between the actual time series values, y, and the models predictions
et = y trend
and the two model parameters are estimated by least squares to minimize the sum
of squares of residuals,
s = e2 = (y b0 b1t)2
Exercise 26
Find the trend equation for the time series below

Quadratic trend
A linear trend is not appropriate for all time series. Many trends have curvature
which must be described with a more complex model. We now briefly describe
fitting a quadratic trend of the form
trend = b0 + b1t + b2t 2
A quadratic curve of this form has three parameters that can be adjusted to improve
the fit of the model. We again define residuals to be the differences between the
actual time series values, y, and the models predictions,
et = yi trendi
The least squares estimates of the three parameters are again the values that minimize the residual sum of squares,
S = e2
To decide which model is more appropriate, compare ad j.r2 and standard error to
that of linear model to see if this is an improvement.
Dangers in forecasting
It is important to realise that the forecasts from linear or quadratic models are highly
dependent on the type of line or curve that is chosen for modelling. The dangers
are the same as those for extrapolation in bivariate relationships. Note Beware

77

HBAF 3105
forecasting many time periods into the future the shape of the actual trend line
might be different from your model.
Cubic and higher-degree polynomial models
If a quadratic model does not adequately describe the shape of the trend in a time
series, it is tempting to try to further increase the order of the polynomial,
trend = b0 + blt + b2t 2 + b3t 3
This kind of polynomial model can also be fitted by least squares. A polynomial
of degree 3 or 4 often provides a fairly smooth description of trend but polynomial models usually behave badly (with sudden increases or decreases) beyond the
data points, so Polynomial models of degree greater than 2 should be avoided for
forecasting.
Detrending a time series
The residuals form the detrended series, and the process of removing the trend is
called detrending. Detrending will often reveal interesting features that were obscured by the trend, and which may be important in explaining the past or forecasting the future. It is therefore useful to look for patterns in a time series plot of the
residuals. If the model under consideration fits well, there should be no pattern in
the residuals each should have the same chance of being positive or negative. If
there are systematic patterns in the residuals, it may be possible to use a different
model for the trend (e.g. a quadratic rather than a linear model), but time series
often exhibit patterns that cannot be explained with simple models for the trend.
10.5.2. Exponential Trend
The trend curve is of the form;
Y = 0 1t
Taking logs both sides leads to
logYt = log0 + tlog1t + log
which we can write as
Yt = A + Bt + et
where A = log0 B = log1 and Yt = logYt The method of least squares requires that
we minimize the function

78

HBAF 3105

Exponential Model with Quarterly Data


(bi 1) 100%is the quarterly compound growth rate bi provides the multiplier for
the ith quarter relative to the 4th quarter (i = 2, 3, 4). Taking logarithms both sides;
which is in the form
where ai =estimate of ), or bi = 10ai , i = 1, 2, 3, 4.
b2is the estimated multiplier for first quarter relative to fourth quarter
b3is the estimated multiplier for second quarter relative to fourth quarter
b4is the estimated multiplier for third quarter relative to fourth quarter Note
the resemblance of this model with Cobb-Douglas model discussed earlier.

LESSON 11
Constrained maxima and minima and the method of lagrange
multipliers
11.1. The Method Of Lagrange Multipliers:
To find the relative extremum of the function f(x,y) subject to the constraintg(x, y) =
0
1. Form an auxiliary function F(x, y, l) = f (x, y) + lg(x, y) called the lagrange
function. The variable l is called the Lagrange multiplier.
2. Solve the system that comprises the equations Fx = 0, Fy = 0, andFl = 0
3. For all values of x,y,and l.
4. Evaluate f at each of the points (x,y) found in step ii .The largest (smallest)
values of these values is the maximum (minimum) values of f.
79

HBAF 3105
Example . Using the method of Lagrange Multipliers .Find the relative minimum
of the function f(x, y) = 2x2 + y2 subject to the constraint x+y=1.
Solution: Write the constraint equation x+y=1 in the form g(x, y) = x + y 1 = 0
.Then form the Lagrangian function F(x, y, l) = f (x, y)+ lg(x, y) = 2x2 +y2 + l(x+
y 1)
To find the critical points of the function F, solve the system that comprises the
equations Fx = 4x + l = 0 Fy = 2y + l = 0Fl = x + y 1 = 0
Solving the first and second equations in this system for x and y in terms of l, we
obtain x = 1/4ly = 1/2l
Substituting in the third equation yields 1/4l 1/2l 1 = 0orl = 4/3
Therefore, x = 1/3 and y = 2/3 And (1/3 ,2/3) affords a constrained minimum of
the function f

Example . Use the method of Lagrange Multipliers to find the minimum of the
function f(x, y, z) = 2xy + 6yz + 8xz subject to the constraint x y z=12000
Solution: Writing xyz=12000 in the formg(x, y, z) = xyz 12000.
Lagrange function is F(x, y, z, l) = f (x, y, z) + lg(x, y, z)
= 2xy + 6yz + 8xz + l(xyz 12000)
Deriving the equation partially with respect to x, y, z & l gives the system
Fx = 2y + 8z + lyz = 0
Fy = 2x + 6z + lxz = 0
Fz = 6y + 8x + lxy = 0
Fl = xyz 12000 = 0
Solving the first three equations of the system for l in terms of x, y, and z we have
l = (2y + 8z)/yzl
= (2x + 6z)/xzl
= (6y + 8x)/xy
(2x+6z)
Equating the first two equations for l leads to ( 2y+8z
yz ) =
xz
2xy + 8xz = 2xy + 6yz
x = 3/4y
Equating 2nd & 3rd expressions for l in the system yields z = 1/4y
Finally substituting these values in the equation xyz-12000 = 0 gives:
( 43 y)(y)( 14 y) 12000 = 0
y3 = (12000)(4)(4)
= 64000
3
80

HBAF 3105
Or y = 40. Hence x=3/4 (40)=30 z=1/4 (40)=10. Therefore we see that the point
(30, 40, 10) gives the constrained minimum of f.
f (30, 40, 10) = 2(30)(40) + 6(40)(10) + 8(30)(10) = 7200

Application problems
Example . The total weekly profit (in dollars) that Acrosonic company realized
in producing and selling its bookshelf loudspeaker systems is given by the profit
function
P(x, y) = 1/4x2 3/8y2 1/4xy + 120x + 100y 5000
Where x denotes the number of fully assembled units and y the number of kits
produced and sold per week. The management decides that production of the loudspeaker systems should be restricted to a total of exactly 230 units per week. Under
this condition, how many fully assembled units and how many kits should be produced per week to maximize Acrosonics weekly profit?
Solution: We maximize the function
P(x, y) = 41 x 3 38 y2 14 xy + 120x + 100y 5000
Subject to the constraint
g(x, y) = x + y 230
Lagrangian function F(x, y, l) = P(x, y) + lg(x, y)
= 14 x2 38 y2 41 xy + 120x + 100y 500 + l(x + y 230)
To find critical points , solve the following system of equations
Fx = 1/2x 14 y + 120 + l = 0Fy
= 34 y 14 x + 100 + l = 0Fl = x + y 230 = 0
Solving the first equation for l gives l = 1/2x + 41 y 120
Substituting in the second equation gives
34 y 14 x + 100 + 12 x + 14 y 120 = 0
12 y + 14 x 20 = 0ory = 12 x 40
Substituting in the 3rd equation givesx + 12 x 40 230 = 0 that is x=180 hence
y = 12 (180) 40 = 50
Maximum weekly profit is given by
P(180, 50) = 14 (180)2 83 (50)2 14 (180)(50) + 120(180) + 100(50) 5000
= 10, 312.5


81

HBAF 3105
E XERCISE 19.  Suppose that x units of labour and y units of capital are required
to produce
f (x, y) = 100x34 y14
Units of a certain product. If each unit of labor costs $200 , each unit of capital
costs $300 and a total of $60,000 is available for production , determine how many
units of labor and how many units of capital should be used in order to maximize
production.
E XERCISE 20.  The total monthly profit of Robertson controls company in manufacturing and selling x hundreds of its standard mechanical setback thermostats
and y hundreds of its deluxe electronic setback thermostats per month is given by
the total profit function
P(x, y) = 1/8x2 1/2y2 1/4xy + 13x + 40y 280
Where P is in hundreds of dollars. If the production of the setback thermostats
is to be restricted to a total of exactly 4000 per month, how many of each model
should Robertson manufacture in order to maximize its monthly profit? What is the
maximum monthly profit?
E XERCISE 21.  An open rectangular box is to be constructed from materials that
costs $3 per square foot for the bottom and $1 per square foot for its sides. Find the
dimensions of the box of greatest volume that can be constructed for $36.
E XERCISE 22.  The total weekly profit (in dollars) realized by the country workshop in manufacturing and selling its rolltop desks is given by the profit function
P(x, y) = 0.2x2 0.25y2 0.2xy + 100x + 90y 4000
Where x stands for the number of finished units and y denotes the number of unfinished units manufactured and sold per week. The management decides to restrict
the manufacture of these desks to a total of exactly 200 units per week. How many
finished and unfinished units should be manufactured per week to maximize the
companys weekly profit?
Maximize the function f (x, y, z) = xyz = x2 +y2 +z2 subject to the constraint 3x+2y+z=6
11.2. Models involving differential equations
11.2.1. Unrestricted growth Models
The size of a population at any time t,Q(t) , increases at a rate proportional to Q(t)
itself. Thus dQ
dt = kQwhere k is a constant of proportionality. This is a differential
82

HBAF 3105
equation involving the unknown function Q and its derivative Q.
11.2.2. Restricted growth models
In many applications the quantity Q(t) does not exhibit unrestricted growth but approaches some definite upper bound. Suppose Q(t) does not exceed some number
C, called the carrying capacityof the environment. Furthermore suppose the rate of
growth of this of this quantity is proportional to the difference between its upper
bound and its current size , the resulting differential equation is
dQ
dt = k(C Q)
Where k is a constant of proportionality. Observe that if the initial population is
small relative to C, then the growth rate of Q is relatively large. But as Q(t) approaches C, the difference C-Q(t) approaches zeroand so does the growth rate of
Q.
Applications:
Unrestricted Growth Models
dQ
dt = kQ
Separating the variables in this equation we have dQ
dt = kdt which upon integration
yield

s dQ
Q = kdt
ln|Q| = kt +C1
Q = ekt+C1 = Cekt
Where C = ec1 is an arbitrary positive constant. Thus we may write the solution as
Q(t) = Cekt
Observe that the quantity present initially is denoted by Q0 ,then Q(0) = Q0 .
Applying this condition yields the equation Ce0 = Q0 or C = Q0 therefore the model
for unrestricted exponential growth with initial population Q0 is given by Q(t) =
Q0 ekt
Example . Example Under ideal laboratory conditions, the rate of growth of
bacteria in a culture is proportional to the size of the culture at any time t . Suppose
that 10,000 bacteria are present initially in a culture and 60,000 are present two
hours later. How many bacteria will there be in the culture at the end of 4 hours?
Solution: Let Q(t) denote the number of bacteria present in the culture at time t.
Then dQ/dt = kQ
83

HBAF 3105
Solving this separable first order differential equation gives Q(t) = Q0 ekt
i.e Q(t) = 10, 000ekt
Next the condition that 60,000 bacteria are present 2 hours later translates into
Q(2) = 60, 000
Or
60, 000 = 10, 000e2 k
e2 k = 6
ek = 6( 12)
Thus the number of bacteria present at any time t is given by
Q(t) = 10, 000ekt = 10, 000(ek )t
= (10, 000)6( (t2))
In particular, the number of bacteria present in the culture at the end of 4 hours is
given by Q(4) = 10, 000(642 )
=360, 000

11.2.3. Restricted Growth Models
To solve this separable first order differential equation , we first separate the variables i.e
dQ
dt = k(C Q)
dQ
(CQ) / = kdt
dQ

Integrating we get CQ
= kdt
ln|C Q| = kt + d
ln|C Q| = kt d
C Q = ektd = ekt ed
or
Q(t) = C Akt )
Where we have denoted the constant ed byA.
Example . Example During a flu epidemic, 5%of the 5000 army personnel stationed at Port MacAthur had contracted influenza at time t = 0. Furthermore the rate
at which they were contracting influenza was jointly proportional to the number of
personel who had alredy contracted the disease and the non infected population. If
20% of the personnel had contracted the flu by the 10th day, find the number of
personnel who had contracted the flu by the 13th day.

84

HBAF 3105
Solution:
let Q(t) denote the number of army personnel who had contracted the flu after t
days. Then
dQ/dt = kQ(500 Q)
5000
Q(t) = /(1+Ae
5000 kt
The condition that 5% of the population had contracted influenza at time t = 0
implies that
Q(0) = 5000
1+A = 250
From which we see that A=19. Therefore
5000
Q(t) = /(1+19e
5000 kt
20 % of the population had contracted influenza by the 10th day implies
5000
Q(t10) = /(1+19e
50000 kt = 1000
1 + 19e50000k = 5
e50000k = 4/19
50, 000k = ln4 ln19
And k = 1/50, 000(ln4 ln19) = 0.0000312
5000
Therefore Q(t) = 1=19e
0.156t
In particular Q(13) = 1=19e5000

0.156t (13) = 1428
E XERCISE 23.  Suppose that a tank initially contains 10 gallons of pure water.
Brine containing 3 pounds of salt per gallon flows into the tank at a rate of 2 gallons
per minute, and the well stirred mixture flows out of the tank at the same rate. How
much salt is present at the end of 10 minutes? How much salt is present at the long
ran.
The population of a certain community is increasing at a rate directly proportional
to the population at any time t . In the last 3 years the population has doubled. How
long will it take for the population to triple?
An amount of money deposited in a savings account grows at a rate proportional to
the amount present. Suppose that $10,000 is deposited in a fixed account earning
interest at the rate of 10% per year compounded continuously. What is the accumulated amount after 5 years How long does it take for the original deposit to double

85

HBAF 3105
Solutions to Exercises
Exercise 10.
The hypotheses to be tested :H0 : = 100 Against H0 : > 100

The test statistic value is given by

Exercise 10

Exercise 15.
The Laspeyres index;

Paasche Weighted Price index

Between 200 and 2007, the value of the investment increased by 19.2%
Exercise 15
Exercise 16.
FI =

LI PI = 120 119.2 = 119.5951

Exercise 16

86