Anda di halaman 1dari 89

Welcome to

Quantitative
Methods Class
Instructor: Ms . Ayesha N. Rao

Instructors Profile
Done several projects of Time-Series with Pakistan

Institute of Development Economics(PIDE), Central Board


of Revenue (CBR) and Federal Bureau of Statistics (FBS).
PhD in progress
Hold a M.Phil in Statistics
Hold a M.Sc in Statistics

Publications
1. Zahid Asghar and Ayesha Nazuk. Iran-Pakistan-India

Gas PipelineAn Economic Analysis in a Game


Theoretic Framework. The Pakistan Development
Review,Vol. 46, No. 4, Part II (2007) pp. 537550.
2. Ayesha Nazuk and Javid Shabbir. A New Mixed
Randomized Response Model, Proceedings of the
European conferences on Quality in Official
Statistics- Q2010, Held in Finlanda Hall, Helsinki,
Finland (4th-6th May 2010).
3. Estimating the proportion of liars in NUST, NUST
Journal of Business and Finance.
.For details type AYESHA NAZUK in Google.

More Pulblications
4. Ayesha Nazuk, Fiza Amer, Quratulain Tanvir, Saba Nawaz, Sahar Zahid

Siddiqui, Shahwaiz Alvi (2013) Entrepreneurial Education in Public


Sector Institutes of Rawalpindi/Islamabad, International Journal of
Management Sciences and Business Research, Vol. 2, Issue 2, pp. 5672.
5. Ayesha Nazuk, Sadia Nadir and Javid Shabbir, (2013), Adjustment of
the auxiliary variable for estimation of a finite population mean, article
accepted for publication in Lahore Journal of Operations Research and
Statistics.
6. Ayesha Nazuk, Yusra Siddiquii, Maha Gul, Rana Iradat Shareef, Meraj
Murtaza and Raza Abbas Rajput, Analysis of Cheating disorder among
university students through Randomized Response Technique,
International Journal of Business and Behavioral Sciences Vol. 3, No.3;
2013, pp. 15-22.
7. Book review of "Bio-statistical Analysis" by Jerrold H. Zar, NUST Journal
of Business and Economics, Vol 2 No. 2, pp. 98-99.

Workshop Conducted
Registered with PDC NUST and have organized several

trainings on Statistics and softwares.


Trained the faculty of NUST Business School with tools of

Econometrics.
Trained the faculty of AIOU with SPSS.
Have delivered guest lectures in various

universities/organizations.

Student Consultation
Students are expected to go through the class

lectures and notes on continuous basis. In


case of a problem, you are welcome to
contact me on;
Appointment Hours are posted on LMS. Or

any other time but subject to prior


appointment on office phone i:e 90853560.

Mid-Term =
30
Terminal
Exam=40

Assignments= 15
Quizzes= 15

Marking
Scheme

In case of
Term Paper
marks for
assig will be
scaled down.

Contact Details
Ms. Ayesha N. Rao
Office: Room 310, NBS Faculty

Block.
Phone: +92-51-9085-3560
E-mail:
ayesha.nazuk@nbs.edu.pk or
ayesha.nazuk@s3h.nust.edu.pk

COURSE OBJECTIVES
This

course
provides
an
introduction to theoretical and
applied statistics and Mathematics
for business and economics. The
main objective is to stress the
importance of applying statistical
analysis to the solution of common
business problems.

Text Book
- will be uploaded on LMS soon.

Assignments
Students are recommended to make a

study group (each consists of 3 to 4


students) and are strongly encouraged to
study together to solve homework problems.
Submit assignments in group. NO

INDIVIDUAL ASSIGNMENTS.

Examination and Quizzes


2-3

quizzes,

mid-term

exam

and

comprehensive final exam will be given in


class during the semester.
Quizzes, of course, will be solved

independently.

Review Worksheets
If required, review worksheets will be posted

on LMS.
You are encouraged to solve and discuss

these mutually and with the instructor.

Make-Up Quiz
There will be no make-ups for missed

quizzes regardless of reason.


Late assignments will not be accepted.
DO NOT REQUEST FOR THIS.
Make-up quiz may be given under extreme
circumstances.
For such request please
submit a written application.

Absentees are
supposed
To cover previous lecture and do not come in

class- unprepared.
It has been noticed in past few semesters that

absentees try to impede the pace of the


lecture.
Such behavior is not at all welcome.

CLASSROOM POLICY

I expect you to conduct yourself with

professional courtesy in the classroom.


You should not talk to other students during
lectures unless directed to do so by the
instructor.
Brief discussions, in a decent low voice, to
ensure understanding may be done.
Please turn off Cell Phones, Beepers, ipods or Pagers.

Course Outline
Measures of Central Tendency &

Dispersion

The Arithmetic Mean, The Mode, The Median


Range, Skewness, Kurtosis, Variance &
Standard Deviation

Course Outline
Probability

Basic Definitions: Events, Sample Space &

Probability
Rules of Probability
The Rule of Complements
Addition Law & Mutually Exclusive Events
Conditional Probability
Independence of Events
Product Rules for independent events

Course Outline
Probability Distributions

Normal distribution
Student-t distribution

Course Outline
Hypothesis Testing

The Concept of Hypothesis Testing


Type I & Type II Errors, Computing the p-Value
One-tailed & Two-tailed Tests
Tests of the Mean of a Normal Distribution:

Population variance known


Tests of the Mean of a Normal Distribution:
Population variance unknown

Course Outline
Hypothesis Testing II

Tests of the Difference Between Two

Population
Tests of the Difference Between Two
Population Proportions
Tests of the Equality of the Variances Between
Two Normally Distributed Populations

Course Outline
Regression & Correlation Analysis
Regression versus Correlation
Regression versus Causation
Classical Linear Regression Model &

Assumptions
Method of Ordinary Least Squares
Method of Logistic Regression

Course Outline
Differentiation
Concepts of Derivatives
Rules of Derivatives
Examples & Practice
Applications in Business

Course Outline
Optimization

Concavity & Inflection Points Identification of

Maxima & Minima Business Applications

Course Outline
Depreciation &/or Annuities

Straight-line-method, Sum-of-year-digit

Method, Declining Balance Method, Units of


Production Method & The MARC Method
Annuities, Sinking Funds

Course Outline
Markup
Markup on Cost
Markup on Selling Price, Relationships

between markups
Markdown and Shrinkage

Course Outline
Discounts
Trade discount, Trade discount series
Cash discounts, Discounts and Freight terms

Scales of
Measurem
ent

Nominal data is just for naming. E:g our


names, names of cities, CNIC numbers, roll
numbers of students provided that they are
not assigned as per merit. All arithmetic
operations are invalid.
Ordinal data is for naming with a sense of
ranking. Level of management from low to
high. Numbers on the back of cricketers
provided that they are based on ICC
ranking.

Scales of
Measurem
ent

Interval data is purely numeric but it has not


got a true zero point. The Fahrenheit and
Celsius scales of temperatures are both
examples of data at the interval level of
measurement. You can talk about 30 degrees
being 60 degrees less than 90 degrees, so
differences do make sense. However 0 degrees
(in both scales) cold as it may be does not
represent the total absence of temperature.
PDC NUST workshop on SPSS-Trainer Ms.
Ayesha Nazuk Rao (Assistant Professor
NUST)-July 29, 2013 to July 31 2013.

Scales of
Measurem
ent

Ratio scales data is purely numeric and it has


got a true zero point. Distances, in any system
of measurement give us data at the ratio level.
A measurement such as 0 feet does make
sense, as it represents no length. Furthermore
2 feet is twice as long as 1 foot. So ratios can
be formed between the data.

Type of Averages

Commerci
al
averages

Mathematical
average

Positional
Average

athematical Average is
based on an algebraic
formula

Harmonic
Mean

Arithmetic
Mean

Geometri
c Mean

ositional Averages are


ased on their relative
location

Mode

Quartiles,
Octiles,
Deciles,
Percentiles

Median

Commercial Averages
Moving
Averages (To

know the trend in


a time series)

Progressi
ve
Average

Composite
Average

(used
(used to
to report
report
average
average
profits/losses
profits/losses etc
etc in
in
the
starting
year
of
the starting year of
a
a firm)
firm)

Calculation Of Progressive Average


Calculate the progressive average of the data

Solution: Calculation of progressive average

Years

Sale (in lakhs of $)

1972
1973
1974
1975
1976
1977
1978
1979
1980
1981
1982

8
9
8
7
8
9
10
11
11
12
10

Progressive
total.

Progressive

average.

8
8.0
17
8.5
25
8.3
32
8.0
40
8.0
49
8.1
59
8.4
70
8.7
81
9.0
93
9.3
103 9.3

Composite Average-Use
The Dow Jones Composite
Average is a stock index from
Dow Jones Indexes that tracks 65
prominent companies. The
average's components are every
stock from the Dow Jones
Industrial Average, the Dow Jones
Transportation Average, and the
Dow Jones Utility Average.
C.A= Sum of all averages/ number

Unbias
ed

Arithmetic Mean

Scale
data
only.
Testing
possible

Comple
te Data
Use

Further
Treatme
nt
Outliers
need to
be
deleted

Not
Robust

The Arithmetic Mean is not


independent of origin and scale
Let Y=ax+b
Then mean of Y= a+b (Mean of X)
Let Y=ax-b
Then mean of Y= a-b (Mean of X)
Addition/subtraction changes the origin and

multiplication/division changes the scale.

Not
Unbias
ed

Relies
on only
one
value

Mode

Testing of
Hypothesi
s not
possible

No
Further
Treatme
nt
Can be
found in
any scale

Robust

Not
Unbias
ed

Median

Testing of
Hypothesis
possible
through nonparametric
test

Relies
on one
or at
most
two
values
No
Further
Treatme
nt

Ordinal or
more

Robust

PDC NUST workshop on SPSS-Trainer Ms.


Ayesha Nazuk Rao (Assistant Professor
NUST)-July 29, 2013 to July 31 2013.

Not
Unbias
ed

Relies
on
complet
e data

Geometric Mean

No
Further
Treatme
nt
Testing of
hypothes
is
possible

Scale
data only

Robust

PDC NUST workshop on SPSS-Trainer Ms.


Ayesha Nazuk Rao (Assistant Professor
NUST)-July 29, 2013 to July 31 2013.

Mode is valid for


use both in
symmetric and
skewed dataset.
One data can
have no or more
than one mode.

Median is also
valid for use
both in
symmetric and
skewed data
set. It can be
used in ordinal
data set as well.

PDC NUST workshop on SPSS-Trainer Ms.


Ayesha Nazuk Rao (Assistant Professor
NUST)-July 29, 2013 to July 31 2013.

Mean or
Arithmetic mean
is valid for use
only in
symmetric data
only. It is valid
for use in scale
data only.
Geometric Mean
is a measure
that uses
complete data
and is yet
Robust .

Trimmed means
are used in case
we want to have
picture of data
free from extreme
values.

One can use


2.5%, 5 or 10%
trimmed mean .

PDC NUST workshop on SPSS-Trainer Ms.


Ayesha Nazuk Rao (Assistant Professor
NUST)-July 29, 2013 to July 31 2013.

For a 10%
trimmed mean
we ignore
smallest 10%
and largest 10%
observations.
Calculate the
mean of the
truncated
dataset.

Winsorized mean
is also used in the
presence of
outliers

Replaces
outliers with
most extreme
values in the
remaining
dataset.

PDC NUST workshop on SPSS-Trainer Ms.


Ayesha Nazuk Rao (Assistant Professor
NUST)-July 29, 2013 to July 31 2013.

See example in
next slide

For a sample of 10 numbers (from


x1, the smallest, to x10 the largest)
the 20% Winsorized mean is

The key is in the repetition of x2 and


x9: the extras substitute for the
original values x1 and x10 which have
been discarded and replaced.

Quadratic Mean is
SQUARE ROOT
of the sum of
squares divided
by no. of values

Q.M > = A.M

PDC NUST workshop on SPSS-Trainer Ms.


Ayesha Nazuk Rao (Assistant Professor
NUST)-July 29, 2013 to July 31 2013.

Q.M is better in
performance in
presence of
negative values

Example: Q.M
X
-100
0
100

X**2
10000
0
10000

A.M=0

Q.M= 81.64966

Data Type and Average


used
Ratios of change, proportions, percentages

etc. G.M
Rate of change per unit of time such as
speed, number of items produced per day etc.
H.M
Well behaved (that is outlier free) data which
is purely quantitative A.M
NominalMode
Ordinal---Median

Median
The Middle observation of an arrayed

( that is arranged either in ascending or


descending order) is called median.
For an ungrouped data it is the item
number (n+1)/2.
e:g if the data set is 4,6,1,5,4,8. we shall
array it 1,4,4,5,6,8 median= item at
position (6+1)/2=3.5th item= (3rd item+4th
item)/2=(4+5)/2=4.5. So median is 4.5.
If data is 1,3,7 median is (3+1)/2=2nd
item= 3.

Median for grouped


data
h n
Median l C
f 2
l LCB of median class
h height of median class
f frequency of median class
C cummulative frequency of class before median class

Median can be found for ordinal


data
For example in a shop there are

clothes in different shades of a


color. One can arrange the
clothes from light shades to
darker as follows.
S1,S2,S2,S3,S4,S5,S6
Median shade is at (7+1)/2=4th
shade=S3

Grouped data Median


Calculations
Interval f

C.F

C.B

01
23
45
Total

5
11
13
-------

-0.51.5
1.53.5 Median Class
as 7.5 lies
3.55.5 here

5
6
2
13

Locate the class whose C.F covers


n/2=sum(f)/2=13/2=7.5

Contd.
Median=1.5+(2/6)

(7.5-5)=2.33
So, the value 2.33
cuts previous data in
two equal halves.

Mode for grouped data

l = the lower limit of modal class = 15


f1 = frequency of modal class = 7
fo = frequency of class preceding the modal class = 5
f2 = frequency of class succeeding the modal class = 2
h = size of class intervals = 5

Example Continued
Mode
Mode
Mode
Mode
Mode
Mode

=
=
=
=
=
=

15 + [(7 - 5) / (2 x 7 - 5 - 2)] x 5
15 + [2 / (14 - 7)] x 5
15 + (2 / 7) x 5
15 + (10 / 7)
15 + 1.42
16.42

Summary of
Central Tendency Measures
Measure
Equation
Mean
Xi / n
Median
(n+1)Position
2
Mode
none

Description
Balance Point
Middle Value
When Ordered
Most Frequent

Quintiles; Quartiles Deciles and


Percentiles
Quintiles are break points that divide an

arrayed data in to i equal parts.


Quartiles divide data into four equal parts.
There are three quartiles Q1,Q2 and Q3.
Q1= [(n+1)/4]th item, 25% data lies
before it.
Q2=[2(n+1)/4]th item=[(n+1)/2]th item
which is median. 50% data lie before Q2
Q3=[3(n+1)/4] item, 75 % data lies before
it.

Quartiles
1. Measure of Non-central Tendency
2. Split Ordered Data into 4 Quarters

25
25
25
25
% of i-th
% Quartile
%
%
3. Position
Q1
Q2
Q3

i (n 1)
Positioning Point ofQi
4

Deciles
Deciles are 9 break points that divide an

arrayed data in to 10 equal parts.


D1=[(n+1)/10]th item, 10% data lies before
it.
D2=[2(n+1)/10]th item 20% data lie before it.
D9=[9(n+1)/10]th item, 90% data lies before
it.
Note that D5=Q2=Median

Percentiles
Percentiles are 99 break points that divide an

arrayed data in to 100 equal parts.


P25=Q1
P75=Q3
P10=D1
Other relationships may easily be seen.

Measures of
Variation

Data Summary; A
To recall the data
compaction process,
Glance
1. To summarize the data we use graphs and charts
2. For more technical analysis, a frequency distribution
is made.
3. To report a summary value that may represent the
data, we find measure of central tendency.
4. BUT there may be data sets who have same value of
central tendency but differ in terms of
variation/scatter around the central value.

Illustrative example
On the average

31 patients get
satisfactory
treatment from
both D1 and D2.
However the
data, number of
patients that
come to D1 or
D2, is very
different.

Doctor Mean Values( No.


of patients)
D1

30.75 12,35,36,40

D2

30.75 1,4,3,115

Measure of Central Tendency are


Insufficient
Measure of central cannot convey the

full picture of data.


Specifically they cannot tell us the

amount of scatter in the data.


If a measure of scatter ( variation)

accompanies a measure of central


tendency, then data can be more
efficiently described.

Measure of variation
Definition of Measure of Variation
Measure of variation is a measure that
describes how spread out or scattered a set of
data. It is also known as measures of
dispersion or measures of spread.
Examples of Measure of Variation
Some Important measures of variation:
The range, the variance, and the standard
deviation.

Range
The range is the distance between the lowest

data point and the highest data point.


Range can be misleading since it does not
take into consideration every value. Consider
each of the following data sets:
1,10,10,10,10 and 1,2,5,8,10. Both have a
range of 9, yet the first data set is clearly not
as dispersed as the second.

Variance &
Standard Deviation
1. Measures of Dispersion
2. Most Common Measures
3. Consider How Data Are Distributed
4. Show Variation About Mean ( X or )

X = 8.3
4 6

8 10 12

Sample Variance
Formula
n
2
n - 1 in

X)
(Xi
denominator! (Use
i 1
n1

N if Population
Variance)

Standard Deviation
The standard deviation of a set of scores

is a measure of variation of scores about


the mean. It is calculated by

procedure for finding the standard


deviation ( ungrouped data)
1) Find the mean of the scores
2) Subtract the mean from each individual
3)
4)
5)
6)

score
Square each of the values in step 2
Add up all the squares obtained in step 3
Divide the total in step 4 by n-1
Find the square root of step 5.

Ungrouped Sample Data;


S.D
Find the standard

deviation of the
data 1, 2, 12, 3, 6
and 11.
The mean of X is
5.83
Variance is
(110.833/5) =
22.166
And S.D is 4.708.

X
1
2
12
3
6
11

X X

-4.8
-3.8
6.1
-2.8
0.17
5.17

Sum -----

X X

23.4
14.7
38.0
8.03
0.03
26.7

110.833

Standard Deviation Grouped


Sample data
Interv f
al

(X)

01
23
45
Total

0.5 -1.538
2.5 0.462
4.5 2.462
------

5
6
2
13

X-A.M

(X-A.M)^2

f(X-A.M)^2

2.3654
0.2134
6.0614

11.82
1.280
12.12
25.23

A.M=sum of f*x/sum of
f=26.50/13=2.038
S.D=Square root of (25.23/13-

Variance
Variance is the square of S.D
Because the differences are squared, the

units of variance are not the same as the units


of the data. Therefore, the standard deviation
is reported as the square root of the variance
and the units then correspond to those of the
data set.

Interpretation of Standard Deviation:

There are some ideas you remember about

standard deviation and variance


A small standard deviation means the data is
close together, a large deviation means the
data is wide spread
At least 75% of all scores fall within 2
standard deviations from the mean and at
least 89% fall within at least 3 standard
deviations from the mean.

Welcome to
Mathematics &
Statistics Class
Instructor: Ms . Ayesha N. Rao

76

Inter-quartile Range
When there are extreme values in a

distribution or when the distribution is


skewed, variance and standard deviation
are not true measures of spread. in these
situations inter-quartile range or semi-inter
quartile range are preferred measures of
spread.
Inter quartile range is the difference
between the Q1 and Q3. Semi-inter
quartile range is half of the difference
between the Q1 and Q3.

Summary of
Variation Measures
Measure
Range

Equation

Xlargest - Xsmallest Total Spread


Q3 - Q1

Interquartile Range
Standard Deviation
(Sample)
Standard Deviation
(Population)
Variance
(Sample)

Description

Spread of Middle 50%


2

n1

X i

X
N

(Xi - X )2
n- 1

Dispersion about
Sample Mean

Dispersion about
2

Population Mean

Squared Dispersion
about Sample Mean

Relative Measure of
Dispersion

Comparison of data sets


Up till now we have been analyzing a

single data.
Direct comparison of variance/standard
deviation is not valid. Because they
depend on unit of measurement.
For example if we have data on weights of
potatoes and another on weights of milk
cartons. Then variance of 0.1 kg may be
considered large for potatoes but small for
milk cartons.
Relative measures of variation are those
that help in comparing two or more data
sets; as to which data is more

Coefficient of Variation
It is defined as the ratio of the standard

deviation to the mean;


C.V=S.D/Mean
This is only defined for non-zero mean, and is
most useful for variables that are always
positive.
It does not have any meaning for data on an
interval scale.

S.D Vs C.V
For example, the value of the standard

deviation of a set of weights will be different


depending on whether they are measured in
kilograms or pounds. The coefficient of
variation, however, will be the same in both
cases as it does not depend on the unit of
measurement.

C.V interpretation
Lesser the C.V,

lesser is the
variability in the
data.

C.V Pros and Cons.


Advantages
The coefficient of variation is a
dimensionless number. So when
comparing between data sets with
different units or wildly different means,
one should use the coefficient of variation
for comparison instead of the standard
deviation.
Disadvantages
When the mean value is near zero, the
coefficient of variation is sensitive to small
changes in the mean, limiting its
usefulness.

Z-Scores
Z-scores are a means of answering the

question ``how many standard deviations


away from the mean is this observation?'' If
our observation X is from a population with
mean and standard deviation , then

Z Score for Sample


On the other hand, if the observation X is

from a sample with mean and standard


deviation s, then

X
Z
s

Z Score Interpretation
A positive (negative) Z-

score indicates that the


observation is greater than
(less than) the mean.

Example
In a certain city the mean price of a quart of

milk is 63 cents and the standard deviation is


8 cents. The average price of a package of
bacon is $1.80 and the standard deviation is
15 cents. If we pay $0.89 for a quart of milk
and $2.19 for a package of bacon at a 24-hour
convenience store, which is relatively more
expensive? To answer this, we compute Zscores for each:

Solution
Z (Milk)=(0.89-0.63)/0.08=3.25
Z (Bacon)= (2.19-1.80)/0.15=2.60
Our Z-scores show us that we are overpaying

quite a bit more for the milk than we are for


the bacon.