Advance Statistics

20-11-2018
SS ZG536
ADVANCED STATISTICAL TECHNIQUES
FOR ANALYTICS
BITS Pilani Dr Y V K Ravi Kumar, BITS- Pilani

Pilani|Dubai|Goa|Hyderabad
BITS Pilani
Hyderabad Campus
L- 1: Overview of the course

& Descriptive Statistics
1
20-11-2018
• “When you can measure what you are speaking about and
express in numbers, you know something about it ;but when
you cannot measure it, cannot express it in numbers, your
knowledge is of meagre and unsatisfactory kind”
•
• Lord Kelvin
•“Statistical thinking will be one day

as necessary for efficient citizenship
as the ability to read and write”
•
• H G Wells
2
20-11-2018
• Lies
• damn lies
• Statistics
Analytics
The term “ Analytics”
Disciplines
• - Statistics
• - Machine Learning
• - Biology
• - Kernel Methods
3
20-11-2018
I m por t an ce o f Dat a
• Importance : For any analytical exercise Data are key ingredients

Replace intuition with data driven decisions
• For example consider the following cases:
 Medical treatment
 Industry
 Power generation
 Crime detection
 Cognitive assessment
Mod el _r e qu i r em en ts
Business relevance
Statistical performance
Interpretable
Justifiability
Operational efficiency
Economic cost
Regulation and legislation
4
20-11-2018
Statistics?
• Procedures for organising, summarizing, and
interpreting information
Standardized techniques used by scientists
Vocabulary & symbols for communicating about data
Two main branches:

• Descriptive statistics
• Inferential statistics
Basic tools / concepts in analysis
Mean
Median
Mode
Range
Variance / Standard deviation

Coefficient of variation
Mean Deviation
5
20-11-2018
Statistical graphs of data
• A picture is worth a thousand words!

 Bar chart / graph
 Histograms
 Pie chart
 Pareto chart / diagram
 Frequency polygons
 Scatter plots
 Time series plot
Bar Graphs
• Useful for showing two samples side-by-side
6
20-11-2018
Histograms
 Univariate histograms
3.5
3.0
2.5
2.0
1.5
1.0
.5 Std. Dev = .12

Mean = .80
0.0 N = 13.00
.63 .69 .75 .81 .88 .94 1.00
Exam 1
Histograms
• f on y axis (could also plot p or % )

• X values (or midpoints of class intervals) on x axis
• Plot each f with a bar, equal size, touching
• No gaps between bars
7
20-11-2018
Bivariate histogram
Graphing the data – Pie charts
8
20-11-2018
Frequency Polygons
 Frequency Polygons
 Depicts information from a frequency table or a
grouped frequency table as a line graph
Frequency Polygon
A smoothed out histogram

Make a point representing f of each value
Connect dots
Anchor line on x axis
Useful for comparing distributions in two samples (in this
case, plot p rather than f )
9
20-11-2018
!!!!
• A famous statistician would never travel by airplane, because she had
studied air travel and estimated the probability of there being a bomb on any
given flight was 1 in a million, and she was not prepared to accept these
odds.
• One day a colleague met her at a conference far from home.
• "How did you get here, by train?"
• "No, I flew"
• "What about the possibility of a bomb?"
• "Well, I began thinking that if the odds of one bomb are 1:million, then the
odds of TWO bombs are (1/1,000,000) x (1/1,000,000) = 10-12. This is a
very, very small probability, which I can accept. So, now I bring my own
bomb along!"
Random Experiment
• Term "random experiment" is used to describe any action whose
outcome is not known in advance. Here are some examples of
experiments dealing with statistical data:
 Tossing a coin
Counting how many times a certain word or a combination of words
appears in the text of the “King Lear” or in a text of Confucius
 counting occurrences of a certain combination of amino acids in a
protein database.
pulling a card from the deck
10
20-11-2018
Sample spaces, sample sets and events
• The sample space of a random experiment is a set S

that includes all possible outcomes of the experiment.
• For example, if the experiment is to throw a die and

record the outcome, the sample space is S = {
1,2,3,4,5,6}
Discrete sample spaces.

Continuous sample spaces
11
20-11-2018
• The set of possible outcomes S describes an event that always occurs.

•
Each outcome is represented by a sample point in the sample space.
• There is more than one way to view and experiment, so an experiment
may have more than one associated sample space.
• In tossing a die, one sample space is {1,2,3,4,5,6}, while two others

are {odd, even} and {less then 3.5, more then 3.5}
Events
• An event is a set of outcomes of the experiment. This includes the null

(empty) set of outcomes and the set of all outcomes. Each time the
experiment is run, a given event A either occurs, if the outcome of the
experiment is an element of A, or does not occur, if the outcome of the
experiment is not an element of A.
12
20-11-2018
Basic Set Operations
Mutually Exclusive Events
• Two events are mutually exclusive if they can not occur at the
same time. Which are mutually exclusive?
• Draw an Ace and draw a heart from a standard deck of
52 cards
• It is raining and I show up for class
• Dr. Li is an easy teacher and I fail the class
• Dr. Beaubouef is a hard teacher and I ace the class.
13
20-11-2018
Independent & Dependent
• Events are either

independent (the occurrence of one event has no effect on
the probability of occurrence of the other) or
dependent (the occurrence of one event gives information
about the occurrence of the other)
Random experiment
• Consider the random experiment of dropping a Styrofoam cup onto
the floor from a height of four feet. The cup hits the ground and
eventually comes to rest. It could land upside down, right side up, or it
could land on its side. We represent these possible outcomes of the
random experiment by the following.
14
20-11-2018
Probability
Axioms of Probability
15
20-11-2018
Probability of a Union
Mutually Exclusive Events
16
20-11-2018
Three Events
• The sales manager of an e commerce company says

that 80% of those who visit their website for the first
time do not buy any mobile. If a new customer visits
the website, what is the probability that the customer
would buy mobile
17
20-11-2018
2
Blue Black Brown Total
Software prog
35 25 20 80
Project Mgrs
7 8 5 20
Total
42 33 25 100
If an employee is selected at random , what is the

probability that he is a software prog?
………………………………………………………..,what
is the probability that he is wearing a blue trouser
3
• A Survey conducted by a bank revealed that 40% of the accounts are
savings accounts and 35% of the accounts are current accounts and the
balance are loan accounts.
What is the probability that an account taken at random is a loan account ?

What is the probability that an account taken at random is NOT savings
account ?
What is the probability that an account taken at random is NOT a current
account
What is the probability that an account taken at random is a current account
or a loan account?
18
20-11-2018
4
• From a Hospital data it is found that 45% of the
patients are having high B.P. Also it was found that
35% of these patients having high B P is also having
diabetes.
• What is the probability that a patient having high BP

is also diabetic
Conditional Probability
The probability of event B given that event A has

occurred P(B|A) or, the probability of event A
given that event B has occurred P(A|B)
19
20-11-2018
5
Actually
purchased
Planned to YES NO TOTAL
purchase
YES 200 50 250
NO 100 650 750
TOTAL 300 700 1000
Definition
20
20-11-2018
Multiplication and Total Probability Rules
Multiplication Rule
Total Probability Rule (two events)
21
20-11-2018
Independence
Definition (two events)
6
• Toss a six-sided die twice. The sample space consists of all
ordered pairs (i; j) of the numbers 1; 2; : : : ; 6, that is, S =
{(1; 1); (1; 2); : : : ; (6; 6)}.. Let A = {outcomes match}
• and B = {sum of outcomes at least 8}.
• Then find P(A),P(B),P(A/B) and P(B/A)
22
20-11-2018
7
• Three persons A,B and C are competing for the post of CEO of a
company. The chances of they becoming CEO are 0.2,0.3 and 0.4
respectively.
• The chances of they taking employees beneficial decisions are

0.50,0.45 and 0.6 respectively
• What are the chances of having employees beneficial decisions after

having new CEO
Bayes’ Theorem
Definition
23
20-11-2018
• P B   PE1  B   PE2  B     PEn  B 
• For each P Ei  B   PB | Ei PEi 
P B   P E1  B   P E2  B     P En  B 

 P B | E1 P E1   P B | E2 PE2     PB | En P En 
n
  PB | Ei P Ei 
i 1
Bayes’ Theorem
Bayes’ Theorem
24
20-11-2018
Applications
Diagnostic tests in medicine
Telecommunication
Customer service
Trouble shooting in engineering processes &

systems
Example 1
• A Component is tested for its stipulated quality , but the
test is not infalliable. If the component is good,70% of the
time , test gives positive indication i.e. 70% of the time the
test classifies good item as good. If the component is
defective,80% of the time , test gives negative indication
implying that the component is bad. If in the manufacturing
process, the percentage of defective components is 20,then
find
probability that the component is good and test gives
positive indication
…….the component is not good and test gives negative
indication
…….the component is good given that the test is positive
25
20-11-2018
Example 2
Technicians regularly make repairs when breakdowns
occur on an automated production line. Janak, who
services 20% of the breakdowns, makes an incomplete
repair 1 time in 20.Tarun ,who services 60% of the
breakdowns ,makes an incomplete repair 1 time in 10
Gautham, who services 15% of the breakdowns, makes an
incomplete repair 1 time in 10 and Prasad ,who services
5% of the breakdowns, makes an incomplete repair 1 time
in 20.For the next problem with the production line
diagnosed as being due to an initial repair that was
incomplete, what is the probability that this initial repair
was made by Janak?
Solution
Let A be the event that the initial repair was incomplete

B1 that the repair was made by Janak
B2 that it was made by Tarun ,
B3 that it was made by Gautham,
B4 that it was made by Prasad,
26
20-11-2018
P ( B1/A ) =
P(B )P(A/B )
1 1
P(B )P(A/B )  P(B )P(A/B )  P(B )P(A/B )  P(B )P(A/B )
1 1 2 2 3 3 4 4
=
0.20 (0.05)
(0.20)(0.0 5)  (0.60)(0.1 0)  (0.15)(0.1 0)  (0.05)(0.0 5)
= 0.114
54
Random Variables
We now introduce a new term

Instead of saying that the possible outcomes are 1,2,3,4,5
or 6, we say that random variable X can take values
{1,2,3,4,5,6}.
A random variable is an expression whose value is
the outcome of a particular experiment.
The random variables can be either discrete or continuous.
It’s a convention to use the upper case letters (X,Y) for
the names of the random variables and the lower case
letters (x,y) for their possible particular values.
27
20-11-2018
Random Variables
Definition
Random Variables
Definition
28
20-11-2018
Random Variables
Examples of Random Variables
The Probability Function for discrete random variables
We assigned a probability 1/6 to each face of the dice. In the

same manner, we should assign a probability 1/2 to the sides
of a coin.
What we did could be described as distributing the values of

probability between different elementary events:
P(X=xk)=p(xk),k=1,2,…
It is convenient to introduce the probability function p(x) :

P(X=x)=p(x)
29
20-11-2018
Continuous distribution and the probability density function
A random variable X is said to have a continuous distribution

with density function f(x) if for all a b we have
b
P (a  X  b )   f ( x )dx (1.15)
a
 f (x)  1 (1.16)

P (E )   f ( x )dx (1.17)
E
Examples:
1. The uniform distribution on (a,b):

We are picking a value at random from (a,b).
1 , axb
f ( x )   b  a (1.18)
0

otherwise
30
20-11-2018
2. The exponential distribution
  e  x , x0
f ( x )   (1.19)
0

otherwise
Expected Value
E ( X )   X iP( X i )
i 1
31
20-11-2018
Variance
n
 2
  X
i1
i  E  X 2 P  X i 
1
• Toss a coin 3 times. The sample space is
• S = {HHH; HTH; THH; TTH; HHT; HTT; THT; TTT}
• Mean
• Variance
32
20-11-2018
Binomial Distribution
n = number of trials ,x = number of successes , p = probability of success
q = probability of failure
The picture can't be display ed.
Probability of x successes in n trials
n! n x
 p xq
r! (n - x)!
  np
   np ( 1  p )
• Roll 12 dice simultaneously, and let X denote the number of

6’s that appear.
• Then find P(7<= X <= 9).
33
20-11-2018
3
• A recent national study showed that approximately 44.7% of college
students have used Wikipedia as a source in at least one of their term
papers.
• Let X equal the number of students in a random sample of size n = 31
who have used Wikipedia as a source.
• How is X distributed?
• Find the probability that X is equal to 17.
• Find the probability that X is at most 13.
• Find the probability that X is between 16 and 19, inclusive.
• Find mean and variance
The Poisson Distribution
x e 
P( X )  Expected value = 
X! Variance = 
34
20-11-2018
Problem
• On the average, five cars arrive at a particular car wash
every hour. Let X count the number of cars that arrive
from 10AM to 11AM. (mean = 5)
• What is the probability that no car arrives during this

period?
Problem
• Suppose the car wash is in operation from 8AM to 6PM, and we let Y
be the number of customers that appear in this period. Since this
period covers a total of 10 hours, from ( lambda = 50).
•
• What is the probability that there are between 48 and 50 customers,
inclusive?
35
20-11-2018
Normal Distribution
Probability density function - f(X)
5 5.05 5.1 5.15 5.2 5.25 5.3 5.35 5.4
1 / 2 ( X   )2
1
f (X ) 
2

e
 2
Three Common Areas Under the Curve
Three Normal
distributions with
different areas
36
20-11-2018
Standard Normal Distribution
=100
=15
x
Z 

55 70 85 100 115 130 145
-3 -2 -1 0 1 2 3
Thanks
37
20-11-2018
SS ZG536
FOR ANALYTICS

BITS Pilani
Hyderabad Campus
L- 2: Descriptive Statistics
38
20-11-2018
Today…..
 Recall the past for a while_ Simple tools
 Visualization of data
 Basics of probability
 Discussion & Problems on probability
 Conditional probability
Visualization
 Summary gives an idea about the data
• summary(income)
• Min 1st QU. Median Mean 3rd Qu Max
• - 7.8 12.5 32.0 52.03 67.2 585
Visualization – why
39
20-11-2018
Data Visualisation
 Line chart
 Bar chart
 Histogram
 Pie chart
 Scatter plot
 Box plot
Line Chart
40
20-11-2018
Bar Chart
Histograms
41
20-11-2018
Histograms
Histograms
42
20-11-2018
Pie charts
Scatter Plot
43
20-11-2018
Box plot
To conclude _ Visualization
Visualization gives a sense of data distribution and relationship
among variables
Visualization is an iterative process and helps answer questions about

the data. Time spent is not wasted during the modelling process and
helps to find the optimal model to fit the data
44
20-11-2018
!!!!
• A famous statistician would never travel by airplane, because she had studied air travel and
estimated the probability of there being a bomb on any given flight was 1 in a million, and she
was not prepared to accept these odds.
• One day a colleague met her at a conference far from home.
• "How did you get here, by train?"
• "No, I flew"
• "What about the possibility of a bomb?"
• "Well, I began thinking that if the odds of one bomb are 1:million, then the odds of TWO bombs
are (1/1,000,000) x (1/1,000,000) = 10-12. This is a very, very small probability, which I can accept.
So, now I bring my own bomb along!"
Random Experiment
• Term "random experiment" is used to describe any action whose
outcome is not known in advance. Here are some examples of
experiments dealing with statistical data:
 Tossing a coin
Counting how many times a certain word or a combination of words
appears in the text of the “King Lear” or in a text of Confucius
 counting occurrences of a certain combination of amino acids in a
protein database.
pulling a card from the deck
45
20-11-2018
•Sample Space
Discrete sample spaces.
Continuous sample spaces
Event
Independent events
Dependent events
46
20-11-2018
Probability
Axioms of Probability
47
20-11-2018
• The sales manager of an e commerce company says

that 80% of those who visit their website for the first
time do not buy any mobile. If a new customer visits
the website, what is the probability that the customer
would buy mobile
2
Blue Black Brown Total
Software prog
35 25 20 80
Project Mgrs
7 8 5 20
Total
42 33 25 100
If an employee is selected at random , what is the

probability that he is a software prog?
………………………………………………………..,what
is the probability that he is wearing a blue trouser
48
20-11-2018
3

account ?
account
or a loan account?
Thanks
49
20-11-2018
SS ZG536
FOR ANALYTICS

BITS Pilani
Hyderabad Campus
50
20-11-2018
Today…..
 Recall the past for a while_ Simple tools
 Visualization of data
 Basics of probability
 Discussion & Problems on probability
 Conditional probability
 Box plot
Visualization
 Summary gives an idea about the data
• summary(income)
• Min 1st QU. Median Mean 3rd Qu Max
• - 7.8 12.5 32.0 52.03 67.2 585
Visualization – why
51
20-11-2018
3

account ?
account
or a loan account?
4
• From a Hospital data it is found that 45% of the
patients are having high B.P. Also it was found that
35% of these patients having high B P is also having
diabetes.
• What is the probability that a patient having high BP

is also diabetic
52
20-11-2018
The probability of event B given that event A has

occurred P(B|A) or, the probability of event A
given that event B has occurred P(A|B)
Definition
53
20-11-2018
Multiplication Rule
Total Probability Rule (two events)
54
20-11-2018
Independence
Definition (two events)
Bayes’ Theorem
Definition
55
20-11-2018
• P B   PE1  B   PE2  B     PEn  B 
• For each P Ei  B   PB | Ei PEi 
P B   P E1  B   P E2  B     P En  B 

 P B | E1 P E1   P B | E2 PE2     PB | En P En 
n
  PB | Ei P Ei 
i 1
Bayes’ Theorem
Bayes’ Theorem
56
20-11-2018
Applications
Diagnostic tests in medicine
Telecommunication
Customer service
Trouble shooting in engineering processes &

systems
114
Random Variables

{1,2,3,4,5,6}.
57
20-11-2018
Random Variables
Definition
Random Variables
58
20-11-2018

of a coin.

P(X=xk)=p(xk),k=1,2,…

P(X=x)=p(x)

b
P (a  X  b )   f ( x )dx (1.15)
a
 f (x)  1 (1.16)

P (E )   f ( x )dx (1.17)
E
59
20-11-2018
Expected Value
E ( X )   X iP( X i )
i 1
Variance
n
 2
  X
i1
i  E  X 2 P  X i 
60
20-11-2018
Thanks
SS ZG536
FOR ANALYTICS

61
20-11-2018
BITS Pilani
Hyderabad Campus
Today…..
Recall the past for a while_ Conditional probability and Baye’s theorem & some examples
Random variables
Probability distribution
Examples
62
20-11-2018
and Baye’s theorem
63
20-11-2018
64
20-11-2018
65
20-11-2018
66
20-11-2018
134
Random Variables

{1,2,3,4,5,6}.
67
20-11-2018
Random Variables
Definition
Random Variables
68
20-11-2018

of a coin.

P(X=xk)=p(xk),k=1,2,…

P(X=x)=p(x)

b
P (a  X  b )   f ( x )dx (1.15)
a
 f (x)  1 (1.16)

P (E )   f ( x )dx (1.17)
E
69
20-11-2018
Expected Value
E ( X )   X iP( X i )
i 1
Variance
n
 2
  X
i1
i  E  X 2 P  X i 
70
20-11-2018
71
20-11-2018
72
20-11-2018
73
20-11-2018
74
20-11-2018
75
20-11-2018
76
20-11-2018
77
20-11-2018
Thanks
78
20-11-2018
SSTCS ZG536
FOR ANALYTICS

BITS Pilani
Hyderabad Campus
L- 5: Descriptive and inferential statistics
79
20-11-2018
Agenda
Quick Review of the topics covered in

previous class
Normal Distribution
Sampling
Testing of Hypothesis
80
20-11-2018
Example
Technicians regularly make repairs when breakdowns
occur on an automated production line. Janak, who
services 20% of the breakdowns, makes an incomplete
repair 1 time in 20.Tarun ,who services 60% of the
breakdowns ,makes an incomplete repair 1 time in 10
Gautham, who services 15% of the breakdowns, makes an
incomplete repair 1 time in 10 and Prasad ,who services
5% of the breakdowns, makes an incomplete repair 1 time
in 20.For the next problem with the production line
diagnosed as being due to an initial repair that was
incomplete, what is the probability that this initial repair
was made by Janak?
Solution
Let A be the event that the initial repair was incomplete

B1 that the repair was made by Janak
B2 that it was made by Tarun ,
B3 that it was made by Gautham,
B4 that it was made by Prasad,
81
20-11-2018
P ( B1/A ) =
P(B )P(A/B )
1 1
P(B )P(A/B )  P(B )P(A/B )  P(B )P(A/B )  P(B )P(A/B )
1 1 2 2 3 3 4 4
=
0.20 (0.05)
(0.20)(0.0 5)  (0.60)(0.1 0)  (0.15)(0.1 0)  (0.05)(0.0 5)
= 0.114
Problem
• On the average, five cars arrive at a particular car wash every hour. Let
X count the number of cars that arrive from 10AM to 11AM. (mean =
5).What is the probability that no car arrives during this period?
82
20-11-2018
Problem
• Suppose the car wash is in operation from 8AM to 6PM, and we let Y
be the number of customers that appear in this period.(lambda = 50).
• What is the probability that there are between 48 and 50 customers,
inclusive?
83
20-11-2018
Normal Distribution
5 5.05 5.1 5.15 5.2 5.25 5.3 5.35 5.4
1 / 2 ( X   )2
1
f (X ) 
2

e
 2
Normal Distribution
5 5.05 5.1 5.15 5.2 5.25 5.3 5.35 5.4
84
20-11-2018
Three Normal
distributions with
different areas
Standard Normal Distribution
=100
=15
x
Z 

-2 0 2 3
-1 1
55 70 85 100 115 130 145
-3
85
20-11-2018
Note
Since the normal density cannot be integrated

in between every pair of limits a and b , probabilities
relating to normal distributions are usually obtained
from special tables (see tables)
86
20-11-2018
87
20-11-2018
Normal distribution will take on a value
1) to the left of z = -1.78
2) to the right of z = -1.45
3) corresponding to -0.80  z  1.53
4) to the left of z = -2,52 and to the

right of z = 1.83

right of z = 1.83
88
20-11-2018

right of z = 1.83

right of z = 1.83
89
20-11-2018
Calculation of probabilities using a normal distribution
Problem
The mean and standard deviation of a

normal variate are 8 and 4 respectively
Find 1) P [ 5 X  10 ]
2)P [ X  5]
Solution
1) = 8
 =4
We know that Z= X  = X  8
4
When X=5 Z = 5  8 = - 0.75

4
When X=10 Z = 10  8 = 0.5

4
P [ 5  X  10 ] = P [ -0.75 Z  0.5 ]
90
20-11-2018
= F (0.5) – F ( - 0.75)
= 0.6915 – .22663 = 0.4649
91
20-11-2018
Three Normal
distributions with
different areas
92
20-11-2018
93
20-11-2018
Inferential Statistics
 Sampling
Sample
Random sampling
Central Limit theorem
94
20-11-2018
Statistical Inferences
• Theory of statistical inference is divided into

two major areas
•
 Estimation
 Tests of hypothesis
•
Hypothesis Testing
•Goal:
•Make statement(s) regarding unknown
population parameter values based on
sample data
95
20-11-2018
Hypothesis Testing
 Is also called significance testing

 Tests a claim about a parameter using
evidence (data in a sample
Example
• Drug company has new drug, wishes to compare it with
current standard treatment
• Federal regulators tell company that they must
demonstrate that new drug is better than current
treatment to receive approval
• Firm runs clinical trial where some patients receive new
drug, and others receive standard treatment
• Numeric response of therapeutic effect is obtained
(higher scores are better).
• Parameter of interest: mNew - mStd
96
20-11-2018
Hypothesis Testing Steps
Null and alternative

hypotheses
Test statistic
P-value and interpretation
Significance level (optional)
Example
•Null hypothesis H0: μ = 170

•The alternative hypothesis can be
either H1: μ > 170 (one-sided test)
• or
H1: μ ≠ 170 (two-sided test)
97
20-11-2018
Test Statistic
Use this statistic to test the problem:
x  0
z stat 
SE x
where  0  population mean assuming H 0 is true

and SE x 
n
Example
A. Hypotheses:
H0: µ = 100 versus
Ha: µ > 100 (one-sided)
Ha: µ ≠ 100 (two-sided)
B. Test statistic:
 15
SE x   5
n 9
x   0 112.8  100
z stat    2.56
SE x 5
98
20-11-2018
C. P-value: P = Pr(Z ≥ 2.56) = 0.0052
P =.0052  it is unlikely the sample came from this null

distribution  strong evidence against H0
Hypothesis Testing
Test Result – H0 True H0 False
True State
H0 True Correct Decision Type I Error
H0 False Type II Error Correct Decision
  P (Type I Error )   P (Type II Error )

• Goal: Keep ,  reasonably small
99
20-11-2018
Problem
• It is claimed that a random sample 49 tyres has

a mean life of 15200 kms. This sample was
drawn from a population whose mean is 15150
kms and a standard deviation of 1200kms. Test
the significance at 0.05 level.
•
Solution:
1. Null hypothesis H0 :  =15200
2. Alternate hypothesis H1:   15200
3. Level of significance  =0.05
4. critical region :- This is a two tailed test (large sample). So reject H0 if

( Z c a l =Z) < - Z  or (Z= Z c a l ) > Z 
2 2
Here  = 0.05
 0 .0 5
=
2 2
= 0.025
From table we get
 Z  =1.96
2
i.e; if
Zcal=Z <-1.96 or Zcal >1.96 we reject null hypothesis.
100
20-11-2018
6. Computation :
Test statistic
x   1520015150
Zcal =Z = 
 1200
n 49
=0.2916
7. Decesion:
Since Zcal = 0.2916 < 1.96 we accept the mull hypothesis.
Problem
• A trucking firm is suspicious of the claim that the average life

•
• time of certain tyres is at least 28,000 miles. To check the claim, the
• firm puts 40 of these tyres on its trucks and get a mean life of
• 27,463miles with a standard deviation of 1,348 miles. What can it
• conclude if the probability of Type I error is to be at most 0.01

•
101
20-11-2018
Solution
1.Null hypothesis : H0 :  28,000 miles
2. Alternate hypothesis: H1: < 28,000 miles
3. Level of significance:  = 0.01
4. Critical region
This is a left tailed test (large sample)
If Z = Zcal < - Z  we reject null hypothesis
If Z = Zcal < - Z  =- Z 0.01= -2.33 we reject null hypothesis
102
20-11-2018
5.Computation
Test statistic
x  27, 463  28, 000

Z= =  -2.52
 1,348
n 40
6.Conclusion
Since Z = Zcal = -2.52 < -2.33 , we reject null hypothesis at level of
significance 0.01. In other words the trucking firm’s suspicion that
 < 28,000 miles is confirmed.
Hypothesis concerning one mean (small sample)
Procedure
1. Null hypothesis H0 :  = 0
2.Alternate Hypothesis H1 :   0 ( Two tailed test)
Or
H1 :   0 ( Right tailed test)
Or
H1 :   0 ( left tailed test )
3. Level of significance :
103
20-11-2018
4. Critical region
For two tailed test H1 :    0
Reject H0 if t <  t  or
2
t > t  with (n-1) degrees of freedom

2
For right tailed test H1 :    0
Reject H0 if t > t  with (n-1) degrees of freedom
For left tailed test H1 :   0
Reject H0 if t < -t  (n-1) degrees of freedom
5. Test statistic
x
t = with (n-1) degrees of freedom
s
n
6. Calculation
7. Decision
104
20-11-2018
A random sample of 6 steel beams has a mean
compressive strength of 58,392 p.s.i (pounds per square
inch ) with a standard deviation of 648 p.s.i . use this
information at the level of significance   0.05 to test
whether the true average compressive strength of steel
from which the sample came is 58,000 p.s.i
Thanks
105
20-11-2018
SSTCS ZG536
FOR ANALYTICS

BITS Pilani
Hyderabad Campus
L- 6: Inferential statistics
106
20-11-2018
Agenda
Quick Review of the topics covered in

previous class
Testing of Hypothesis
Statistical Inferences
• Theory of statistical inference is divided into

two major areas
•
 Estimation
 Tests of hypothesis
•
107
20-11-2018
108
20-11-2018
109
20-11-2018
110
20-11-2018
111
20-11-2018
Hypothesis Testing
•Goal:
•Make statement(s) regarding unknown
population parameter values based on
sample data
112
20-11-2018
Hypothesis Testing
 Is also called significance testing

 Tests a claim about a parameter using
evidence (data in a sample
Example
• Drug company has new drug, wishes to compare it with
current standard treatment
• Federal regulators tell company that they must
demonstrate that new drug is better than current
treatment to receive approval
• Firm runs clinical trial where some patients receive new
drug, and others receive standard treatment
• Numeric response of therapeutic effect is obtained
(higher scores are better).
• Parameter of interest: mNew - mStd
113
20-11-2018
Hypothesis Testing Steps
Null and alternative

hypotheses
Test statistic
P-value and interpretation
Significance level (optional)
114
20-11-2018
Example
•Null hypothesis H0: μ = 170

•The alternative hypothesis can be
either H1: μ > 170 (one-sided test)
• or
H1: μ ≠ 170 (two-sided test)
Test Statistic
Use this statistic to test the problem:
x  0
z stat 
SE x
where  0  population mean assuming H 0 is true

and SE x 
n
115
20-11-2018
Example
A. Hypotheses:
H0: µ = 100 versus
Ha: µ > 100 (one-sided)
Ha: µ ≠ 100 (two-sided)
B. Test statistic:
 15
SE x   5
n 9
x   0 112.8  100
z stat    2.56
SE x 5
C. P-value: P = Pr(Z ≥ 2.56) = 0.0052
P =.0052  it is unlikely the sample came from this null

distribution  strong evidence against H0
116
20-11-2018
Hypothesis Testing
Test Result – H0 True H0 False
True State
H0 True Correct Decision Type I Error
H0 False Type II Error Correct Decision
  P (Type I Error )   P (Type II Error )

• Goal: Keep ,  reasonably small
Problem
• It is claimed that a random sample 49 tyres has

a mean life of 15200 kms. This sample was
drawn from a population whose mean is 15150
kms and a standard deviation of 1200kms. Test
the significance at 0.05 level.
•
117
20-11-2018
Solution:
1. Null hypothesis H0 :  =15200
2. Alternate hypothesis H1:   15200
3. Level of significance  =0.05
4. critical region :- This is a two tailed test (large sample). So reject H0 if

( Z c a l =Z) < - Z  or (Z= Z c a l ) > Z 
2 2
Here  = 0.05
 0 .0 5
=
2 2
= 0.025
From table we get
 Z  =1.96
2
i.e; if
Zcal=Z <-1.96 or Zcal >1.96 we reject null hypothesis.
6. Computation :
Test statistic
x   1520015150
Zcal =Z = 
 1200
n 49
=0.2916
7. Decesion:
Since Zcal = 0.2916 < 1.96 we accept the mull hypothesis.
118
20-11-2018
Problem
• A trucking firm is suspicious of the claim that the average life

•
• time of certain tyres is at least 28,000 miles. To check the claim, the
• firm puts 40 of these tyres on its trucks and get a mean life of
• 27,463miles with a standard deviation of 1,348 miles. What can it
• conclude if the probability of Type I error is to be at most 0.01

•
Solution
1.Null hypothesis : H0 :  28,000 miles
2. Alternate hypothesis: H1: < 28,000 miles
119
20-11-2018
3. Level of significance:  = 0.01
4. Critical region
This is a left tailed test (large sample)
If Z = Zcal < - Z  we reject null hypothesis
If Z = Zcal < - Z  =- Z 0.01= -2.33 we reject null hypothesis
5.Computation
Test statistic
x  27, 463  28, 000

Z= =  -2.52
 1,348
n 40
6.Conclusion
Since Z = Zcal = -2.52 < -2.33 , we reject null hypothesis at level of
significance 0.01. In other words the trucking firm’s suspicion that
 < 28,000 miles is confirmed.
120
20-11-2018
Hypothesis concerning one mean (small sample)
Procedure
1. Null hypothesis H0 :  = 0
2.Alternate Hypothesis H1 :   0 ( Two tailed test)
Or
H1 :   0 ( Right tailed test)
Or
H1 :   0 ( left tailed test )
3. Level of significance :
4. Critical region
For two tailed test H1 :    0
Reject H0 if t <  t  or
2
t > t  with (n-1) degrees of freedom

2
For right tailed test H1 :    0
Reject H0 if t > t  with (n-1) degrees of freedom
For left tailed test H1 :   0
Reject H0 if t < -t  (n-1) degrees of freedom
121
20-11-2018
5. Test statistic
x
t = with (n-1) degrees of freedom
s
n
6. Calculation
7. Decision
122
20-11-2018
123
20-11-2018
Thanks
SS ZG536
FOR ANALYTICS

124
20-11-2018
BITS Pilani
Hyderabad Campus
L- 7: Inferential statistics & Predictive

Analytics
Agenda
Central limit theorem
Type I, Type II Errors
Testing of Hypothesis – continuation from
previous session
Covariance
Correlation
Introduction to regression
125
20-11-2018
Central Limit Theorem
126
20-11-2018
127
20-11-2018
128
20-11-2018
129
20-11-2018
130
20-11-2018
131
20-11-2018
132
20-11-2018
133
20-11-2018
134
20-11-2018
135
20-11-2018
136
20-11-2018
137
20-11-2018
138
20-11-2018
139
20-11-2018
140
20-11-2018
141
20-11-2018
142
20-11-2018
143
20-11-2018
144
20-11-2018
145
20-11-2018
146
20-11-2018
147
20-11-2018
Thanks
SSTCS ZG536
FOR ANALYTICS

148
20-11-2018
BITS Pilani
Hyderabad Campus
L- 8: Predictive Analytics
Agenda
Covariance
Correlation
Method of least squares
Simple linear regression
149
20-11-2018
150
20-11-2018
151
20-11-2018
152
20-11-2018
153
20-11-2018
154
20-11-2018
Regression
155
20-11-2018
156
20-11-2018
157
20-11-2018
158
20-11-2018
159
20-11-2018
160
20-11-2018
161
20-11-2018
Thanks
SSTCS ZG536
FOR ANALYTICS

162
20-11-2018
BITS Pilani
Hyderabad Campus
L- 9: Predictive Analytics & Revision
Agenda
Review of last session

Method of least squares
Simple linear regression
163
20-11-2018
164
20-11-2018
165
20-11-2018
166
20-11-2018
167
20-11-2018
168
20-11-2018
169
20-11-2018
Regression
170
20-11-2018
171
20-11-2018
172
20-11-2018
173
20-11-2018
174
20-11-2018
175
20-11-2018
176
20-11-2018
177
20-11-2018
178
20-11-2018
179
20-11-2018
180
20-11-2018
181
20-11-2018
182
20-11-2018
Thanks
SSTCS ZG536
FOR ANALYTICS

183
20-11-2018
BITS Pilani
Hyderabad Campus
L- 11: Predictive Analytics(Continued) &

Forecasting Models
Agenda
Model validation
Ridge and lasso models
Assumptions of Linear regression
Logistic regression
184
20-11-2018
369/54
Classical Linear Regression (OLS)
 Explanatory and Response Variables are Numeric

 Relationship between the mean of the response variable and the level of
the explanatory variable assumed to be approximately linear (straight
line)
 Model:
Y   0  1 x    ~ N (0,  )
• beta1 > 0  Positive Association
• beta1 < 0  Negative Association
• beta1 = 0  No Association
370/54
Multiple regression
Numeric Response variable (y)

p Numeric predictor variables
Model:
Y = 0 + 1x1 +  + pxp + 
185
20-11-2018
371/54
• Population Model for mean response:
E (Y | x1 ,  x p )   0  1 x1     p x p
• Least Squares Fitted (predicted) equation, minimizing SSE:
2
^ ^ ^ ^
 ^

Y   0   1 x1     p x p SSE    Y  Y 
 
Accuracy of a model
• By Using the following the strength of the linear model can be tested
1) Coefficient of determination
2) Residual Standard error
186
20-11-2018
187
20-11-2018
R – Squared vs Adjusted R - Squared
In multiple regression, adjusted R – squared is better

metric than R – squared asses the goodness of fit of
the model
R – squared always increases if additional variables

are added into model , even if they are not related to
the dependent variable
Regularization
Over fitting can be solved with regularization
Regularization can be done by putting constraints on the coefficients

and variables.
LASSO: Least Absolute Shrinkage and Selection Operator

• Some coefficients can be dropped( i.e become zero)
RIDGE: The coefficients will approach zero, but never dropped
188
20-11-2018
Lasso & Ridge
^ ^ ^ ^
Y   0   1 x1     p x p
2
 ^

• OLS estimation: min SSE    Y  Y 
 
n 2 p
 ^

• LASSO estimation: min SSE    Y  Y      j
i 1   j 1
n 2 p
 ^

min SSE    Y  Y      j
2
• Ridge regression estimation:
i 1   j 1
Assumptions in Regression
Analysis
189
20-11-2018
Assumptions
 The distribution of residuals is normal (at each value of

the dependent variable).
 The variance of the residuals for every set of values for
the independent variable is equal.
 violation is called heteroscedasticity.
 The error term is additive
 no interactions.
 At every value of the dependent variable the expected
(mean) value of the residuals is zero
 No non-linear relationships
379
 The expected correlation between residuals, for any

two cases, is 0.
• The independence assumption (lack of autocorrelation)
 All independent variables are uncorrelated with the
error term.
 No independent variables are a perfect linear function
of other independent variables (no perfect
multicollinearity)
 The mean of the error term is zero.
190
20-11-2018
Assumption 1: The Distribution of

Residuals is Normal at Every Value
of the Dependent Variable
FEMALE MALE
8 6
4 3
2
Frequency
Frequency
0 0
60.0 70.0 80.0 90.0 100.0 110.0 120.0 130.0 140.0 60.0 70.0 80.0 90.0 100.0 110.0 120.0 130.0 140.0
382
191
20-11-2018
Non-Normality
• Skew and Kurtosis
Skew – much easier to deal with
Kurtosis – less serious anyway
• Transform data
removes skew
positive skew – log transform
negative skew - square
383
Assumption 2: The variance of the

residuals for every set of values
for the independent variable is
equal.
192
20-11-2018
Heteroscedasticity
• This assumption is a about heteroscedasticity of the

residuals
• Hetero=different
• Scedastic = scattered
• We don’t want heteroscedasticity
• we want our data to be homoscedastic
• Draw a scatterplot to investigate
385
160
140
120
100
80
60
MALE
40
40 60 80 100 120 140 160
386
FEMALE
193
20-11-2018
Good – no heteroscedasticity
Residual
Predicted Value
387
Bad – heteroscedasticity
Residual
Predicted Value
388
194
20-11-2018
Assumption 3:
The Error Term is Additive
Assumption 4: At every value of the

dependent variable the expected
(mean) value of the residuals is zero
195
20-11-2018
Assumption 5: The expected correlation

between residuals, for any two cases, is 0.
•Result, with line of best fit

90
80
70
60
50
40
30
20
Grade
10
10 20 30 40 50 60 70
Time 392
196
20-11-2018
• Now somewhat different
90
80
70
60
50
40
Question
30
3
20 2
Grade
10 1
10 20 30 40 50 60 70
Time
393
Assumption 6: All independent variables are

uncorrelated with the error term.
197
20-11-2018
Assumption 7: No independent variables are a

perfect linear function of other independent
variables
Assumption 8: The mean of the error

term is zero.
198
20-11-2018
Multicollinearity
• Correlation Matrix
VIF(Variance Inflation Factor)

• VIF(Variance Inflation Factor)
• The better way to assess multi collinearity is to compute the VIF
• If VIF = 1 then Variables are not correlated
• 1< VIF < 5 then the variables are moderately correlated
• VIF > 5 then highly correlated and need to be eliminated from the model
199
20-11-2018
Logistic Regression
Why use logistic regression?
There are many important research topics for which the dependent
variable is "limited.“
•
For example: voting, morbidity or mortality, and participation data is
not continuous or distributed normally.
Logistic regression is a type of regression analysis where the

dependent variable is a dummy variable: coded 0 (did not vote) or
1(did vote)
200
20-11-2018
Logistic Regression
• Logistic regression is a supervised classification model.

• This allows us to make predictions from labelled data ,if the target
variable is categorical.
• Binary classification
• Examples
1. A customer will default on a loan or not
2. A particular machine will break down in the next month or not
3. Predicting whether an incoming email is spam or not
Categorical Response Variables

Examples:  Non  smoker
Whether or not a person Y 
smokes Binary Response Smoker
Survives
Success of a medical Y 
treatment Dies
Opinion poll responses Agree


Y   Neutral
Ordinal Response Disagree

201
20-11-2018
202
20-11-2018
Y =Logistic
Binary BinaryRegression
response XModel
= Quantitative predictor
p = proportion of 1’s (yes, success) at any X
Equivalent forms of the logistic regression model:
Logit form Probability form
 p 
log    0  1 X
1 p  eo1X1
p o1X1
1e
1
 (o1X1)
1e
Binary Logistic Regression via R
> logitmodel=glm(Gender~Hgt,family=binomial, data=Pulse)

> summary(logitmodel)
Call:
glm(formula = Gender ~ Hgt, family = binomial)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.77443 -0.34870 -0.05375 0.32973 2.37928
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 64.1416 8.3694 7.664 1.81e-14 ***
Hgt -0.9424 0.1227 -7.680 1.60e-14***
---
203
20-11-2018
Call:
glm(formula = Gender ~ Hgt, family = binomial, data = Pulse)
Coefficients:
(Intercept) 64.1416 8.3694 7.664 1.81e-14 ***
Hgt -0.9424 0.1227 -7.680 1.60e-14***
---
e64.140.9424Ht
p
1  e64.14.9424Ht
proportion of females at that Hgt
Example: TMS for Migraines

Transcranial Magnetic Stimulation vs. Placebo
Pain Free? TMS Placebo
YES 39 22
NO 61 78
Total 100 100
39 / 100 39 0.639
PTMS  0.39 oddsTMS    0.639 P   0.39
61 / 100 61 1  0.639
PPlacebo  0.22 22
odds Placebo   0.282
78
0.639
Odds ratio   2.27 Odds are 2.27 times higher of getting
0.282 relief using TMS than placebo
204
20-11-2018
Logistic Regression for TMS data
> lmod=glm(cbind(Yes,No)~Group,family=binomial,data=TMS)
> summary(lmod)
Coefficients:
(Intercept) -1.2657 0.2414 -5.243 1.58e-07 ***
GroupTMS 0.8184 0.3167 2.584 0.00977 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 6.8854 on 1 degrees of freedom

Residual deviance: 0.0000 on 0 degrees of freedom
AIC: 13.701
Note: e0.8184 = 2.27 = odds ratio
Binary Logistic Regression Model
Y = Binary X1,X2,…,X
X = Single
k = Multiple
predictor
response predictors
π = proportion of 1’s (yes,
at anysuccess)
x1, x2, …,
at xany x
k
Equivalent forms of the logistic regression model:

 p 
Logit form log    0  1 X 1   2 X 2     k X k
 1  p 
e  o  1 X 1   2 X 2   k X k
p   o o1 X1 X 1   2 X 2   k X k
Probability form   1 e e
1  e  o  1 X1

1  e (  o  1 X 1   2 X 2   k X k )
205
20-11-2018
Interactions in logistic regression
Consider Survival in an ICU as a function of

SysBP -- BP for short – and Sex
> intermodel=glm(Survive~BP*Sex, family=binomial, data=ICU)
> summary(intermodel)
Coefficients:
(Intercept) -1.439304 1.021042 -1.410 0.15865
BP 0.022994 0.008325 2.762 0.00575 **
Sex 1.455166 1.525558 0.954 0.34016
BP:Sex -0.013020 0.011965 -1.088 0.27653
Null deviance: 200.16 on 199 degrees of freedom

Residual deviance: 189.99 on 196 degrees of freedom
Rep = red,
1.0
Dem = blue
0.8
Lines are
very close
Prob of voting Yes
0.6
to parallel;
0.4
not a
significant
0.2
interaction
0.0
0 1 10 100 1,000 10,000 1,000,000
Auto industry contributions (lifetime)
206
20-11-2018
Forecasting models
Principles of forecasting
Time series analysis
Smoothing and decomposition methods
ARIMA
GARCH
Holt – winter model
Casual methods
Moving averages
Exponential smoothing
207
20-11-2018
Forecasting
• Predict the next number in the pattern:
a) 3.7, 3.7, 3.7, 3.7, 3.7, ?
b) 2.5, 4.5, 6.5, 8.5, 10.5, ?
c) 5.0, 7.5, 6.0, 4.5, 7.0, 9.5, 8.0, 6.5, ?
Forecasting
• Predict the next number in the pattern:
a) 3.7, 3.7, 3.7, 3.7, 3.7, 3.7
b) 2.5, 4.5, 6.5, 8.5, 10.5, 12.5
9.0
c) 5.0, 7.5, 6.0, 4.5, 7.0, 9.5, 8.0, 6.5,
208
20-11-2018
What Is Forecasting?
• Process of predicting a future event Underlying basis

of all business decisions
Production
Inventory
Personnel
Facilities
Why do we need to forecast?
209
20-11-2018
Importance of Forecasting
Departments throughout the organization depend on

forecasts to formulate and execute their plans.
Finance needs forecasts to project cash flows and capital

requirements.
Human resources need forecasts to anticipate hiring

needs.
Production needs forecasts to plan production levels,

workforce, material requirements, inventories, etc.
 Demand is not the only variable of interest to

forecasters.
 Manufacturers also forecast worker absenteeism,

machine availability, material costs, transportation
and production lead times, etc.
 Besides demand, service providers are also

interested in forecasts of population, of other
demographic variables, of weather, etc.
210
20-11-2018
Types of forecasts
 Demand Forecasts
 Environmental Forecasts
 Technological Forecasts
211
20-11-2018
Timing of Forecasts
 Short-range Forecast
 Medium – range Forecast
 Long – range Forecast
Quantitative Forecasting Methods
Quantitative
Forecasting
Time Series Causal

Models Models
Moving Exponential Trend

Regression
Average Smoothing Models
212
20-11-2018
What is a Time Series?
• Set of evenly spaced numerical data

Obtained by observing response variable at regular time periods
• Forecast based only on past values

Assumes that factors influencing past, present, & future will continue
• Example
Year: 1995 1996 1997 1998 1999
Sales: 78.7 63.5 89.7 93.2 92.1
Time Series Models

Forecaster looks for data patterns as
Data = historic pattern + random variation
Historic pattern to be forecasted:

 Level (long-term average) – data fluctuates around a constant mean
 Trend – data exhibits an increasing or decreasing pattern
 Seasonality – any pattern that regularly repeats itself and is of a constant
length
 Cycle – patterns created by economic fluctuations
Random Variation cannot be predicted
213
20-11-2018
Time Series Patterns
Time Series Components

A time series can be described by models based on the following
components
Tt Trend Component
St Seasonal Component
Ct Cyclical Component
It Irregular Component
Using these components we can define a time series as the sum of its
components or an additive model
X t  Tt  S t  Ct  I t
Alternatively, in other circumstances we might define a time series as
the product of its components or a multiplicative model – often
represented as a logarithmic model
X t  Tt S t Ct I t
214
20-11-2018
Trend Component
• Persistent, overall upward or downward pattern
• Due to population, technology etc.
• Several years duration
Response
Mo., Qtr., Yr. © 1984-1994 T/Maker Co.
Trend Component
• Overall Upward or Downward Movement

• Data Taken Over a Period of Years
Sales
Time
215
20-11-2018
Cyclical Component
• Repeating up & down movements
• Due to interactions of factors influencing economy
• Usually 2-10 years duration
Cycle
Response
Mo., Qtr., Yr.
Cyclical Component
• Upward or Downward Swings

• May Vary in Length
• Usually Lasts 2 - 10 Years
Sales
Time
216
20-11-2018
Seasonal Component
• Regular pattern of up & down fluctuations
• Due to weather, customs etc.
• Occurs within one year
Summer
Response
© 1984-1994 T/Maker Co.
Mo., Qtr.
Seasonal Component
• Upward or Downward Swings

• Regular Patterns
• Observed Within One Year
Sales
Time (Monthly or Quarterly)
217
20-11-2018
Irregular Component
• Erratic, unsystematic, ‘residual’ fluctuations

• Due to random variation or unforeseen events
• Union strike © 1984-1994 T/Maker Co.
• War
• Short duration &
nonrepeating
Moving Average Models
• Simple Moving Average Forecast

t 1
Y i
Ft  E ( Yt )  i t k
k
Weighted Moving Average Forecast
t 1
wY i i
Ft  E ( Yt )  i t k
k
218
20-11-2018
Selecting the Right Forecasting Model
1. The amount & type of available data

 Some methods require more data than others
2. Degree of accuracy required
 Increasing accuracy means more data
3. Length of forecast horizon
 Different models for 3 month vs. 10 years
4. Presence of data patterns
 Lagging will occur when a forecasting model meant
for a level pattern is applied with a trend
Moving Average
[Solution]
Year Sales MA(3) in 1,000

1995 20,000 NA
1996 24,000 (20+24+22)/3 = 22
1997 22,000 (24+22+26)/3 = 24
1998 26,000 (22+26+25)/3 = 24
1999 25,000 NA
219
20-11-2018
Moving Average
Year Response Moving
Ave
Sales
1994 2 NA
8
1995 5 3
6
1996 2 3
4
1997 2 3.67
2
1998 7 5
0
1999 6 NA
94 95 96 97 98 99
Thanks
220
20-11-2018
SSTCS ZG536
FOR ANALYTICS

BITS Pilani
Hyderabad Campus
L- 12: Predictive Analytics _ Time Series Analysis
221
20-11-2018
Forecasting models
Principles of forecasting
Time series analysis
Smoothing and decomposition methods
Casual methods
Moving averages
Exponential smoothing
AR,MA,ARMA & ARIMA Models
Quantitative Forecasting Methods
Quantitative
Forecasting
Time Series Causal

Models Models
Moving Exponential Trend

Regression
Average Smoothing Models
222
20-11-2018
What is a Time Series?
• Set of evenly spaced numerical data

Obtained by observing response variable at regular time periods
• Forecast based only on past values

Assumes that factors influencing past, present, & future will continue
• Example
Year: 1995 1996 1997 1998 1999
Sales: 78.7 63.5 89.7 93.2 92.1
Applications
Retail sales
Spare parts planning
Stock trading
223
20-11-2018
Time series _ components

Trend
Seasonality
Cyclic
Random
Time Series Models

Forecaster looks for data patterns as
Data = historic pattern + random variation
Historic pattern to be forecasted:

 Level (long-term average) – data fluctuates around a constant mean
 Trend – data exhibits an increasing or decreasing pattern
 Seasonality – any pattern that regularly repeats itself and is of a constant
length
 Cycle – patterns created by economic fluctuations
Random Variation cannot be predicted
224
20-11-2018
Time Series Patterns
Box – Jenkins Methodology

1. Condition data and select a model
 identify and account for any trends or seasonality in the time series
 examine the remaining time series and determine a suitable model
2. Estimate the model parameters
3. Assess the model and return to step 1,if necessary
225
20-11-2018

A time series can be described by models based on the following components
Tt Trend Component
St Seasonal Component
Ct Cyclical Component
It Irregular Component
Using these components we can define a time series as the sum of its
components or an additive model
X t  Tt  St  Ct  I t
Alternatively, in other circumstances we might define a time series as the
product of its components or a multiplicative model – often represented
as a logarithmic model
X t  Tt St Ct I t
Trend Cyclical
Seasonal Irregular
226
20-11-2018
Smoothing Methods
Moving Average Models
• Simple Moving Average Forecast

t 1
Y i
Ft  E ( Yt )  i t k
k
Weighted Moving Average Forecast
t 1
wY i i
Ft  E ( Yt )  i t k
k
227
20-11-2018
Example(Moving averages)
• Use the following data to compute three year moving average for all
available years. Find the trend and Forecast error
YEAR Saleson (Lakhs) YEAR Saleson (Lakhs)
2008 21 2013 22
2009 22 2014 25
2010 23 2015 26
2011 25 2016 27
2012 24 2017 26
228
20-11-2018
Time Series Models

• Weighted Moving Average:
• All weights must add to 100% or 1.00

e.g. Ct .5, Ct-1 .3, Ct-2 .2 (weights add to 1.0)
Ft 1   C t A t
• Allows emphasizing one period over others; above indicates

more weight on recent data (Ct=.5)
• Differs from the simple moving average that weighs all periods
equally - more responsive to trends
Example(Weighted moving Averages)

Weights Month
3 Last month
2 Two months ago
1 Three months ago
Months 1 2 3 4 5 6 7 8 9 10 11 12
Sales 10 12 13 16 19 23 26 30 28 18 16 14
229
20-11-2018
Weights Example(Weighted moving Averages)

Month
3 Last month
2 Two months ago
1 Three months ago
Months 1 2 3 4 5 6 7 8 9 10 11 12
Sales 10 12 13 16 19 23 26 30 28 18 16 14
230
20-11-2018
Time Series Models

• Exponential Smoothing:
Most frequently used time series method because of ease of use
and minimal amount of data needed
• Need just three pieces of data to start:
 Last period’s forecast (Ft)
 Last periods actual value (At)

Ft 1  αA t  1  α Ft
 Select value of smoothing coefficient, ,between 0 and 1.0
• If no last period forecast is available, average the last few periods
or use naive method
• Higher values may place too much weight on last period’s
random variation
231
20-11-2018
232
20-11-2018
Forecasting Trend
• Basic forecasting models for trends compensate for the lagging that
would otherwise occur
• One model, trend-adjusted exponential smoothing uses a three step
process
• Step 1 - Smoothing the level of the series
S t  αA t  (1  α)(S t 1  Tt 1 )
• Step 2 – Smoothing the trend
Tt  β(S t  S t 1 )  (1  β)Tt 1
• Forecast including the trend
FIT t1  S t  Tt
233
20-11-2018
Measuring Forecasting Accuracy
• Mean Absolute Deviation (MAD)

 measures the total error in a forecast without regard
MAD to sign

 actual  forecast
n
CFE   actual  forecast

• Cumulative Forecast Error (CFE)
 Measures any bias in the forecast
 actual - forecast 
2
MSE 
• Mean Square Error (MSE) n
 Penalizes larger errors
CFE
TS 
• Tracking Signal MAD
 Measures if your model is working
234
20-11-2018
235
20-11-2018
Models
AR Model
MA Model
ARMA Model
ARIMA Model
AR Model(Auto regressive
model)
236
20-11-2018
Moving Average(MA) Model
ARMA model –ARMA(p,q)
237
20-11-2018

 Lagging will occur when a forecasting model meant
for a level pattern is applied with a trend
Thanks
238
20-11-2018
SSTCS ZG536
FOR ANALYTICS

BITS Pilani
Hyderabad Campus
L- 13:Time Series Analysis(cont..)
239
20-11-2018
240
20-11-2018
Measuring Forecasting Accuracy
• Mean Absolute Deviation (MAD)

 measures the total error in a forecast without regardMAD
to sign   actual  forecast
n
CFE   actual  forecast

• Cumulative Forecast Error (CFE)
 Measures any bias in the forecast
 actual - forecast 
2
MSE 
• Mean Square Error (MSE) n
 Penalizes larger errors
CFE
TS 
• Tracking Signal MAD
 Measures if your model is working
241
20-11-2018
242
20-11-2018
Models
AR Model
MA Model
ARMA Model
AR Model(Auto regressive
model)
243
20-11-2018
Moving Average(MA) Model
ARMA model –ARMA(p,q)
244
20-11-2018

 Lagging will occur when a forecasting model meant for
a level pattern is applied with a trend
Case
• Testing the impact of nutrition and exercise on 60 candidates

between age 18 and 50.They are grouped with different
strategies.Now we need to find the most effective strategy
• Group 1 eats only junk food
• Group 2 eats only healthy food
• Group 3 eats junk food &does cardio exercise every other day
• Group 4 eats healthy food & does cardio ……………
• Group 5 eats junk food& does both cardio & strength training every
other day
• Group 6 eats healthy food…….
245
20-11-2018
ANOVA-analysis of variance
• * Significance of difference between two sample means
246
20-11-2018
ANOVA
• Effectiveness of different promotional activities
• Quality of a product produced by different manufacturers in terms of

an attribute
• Yield of crop due to varieties of seeds , fertilisers and quality of soil
Assumptions
• Each population is normally distributed with mean With equal

variances
• Each sample is drawn randomly and independent of other samples
247
20-11-2018
ANOVA summary
Short cut method
248
20-11-2018
Example
• To test the significance of variation in the retail prices of a
commodity in three metro cities,Mumbai,Kolkata and
Delhi, four shops are chosen at random and the prices are
given below
Example
• To test the significance of variation in the retail prices of a
commodity in three metro cities,Mumbai,Kolkata and
Delhi, four shops are chosen at random and the prices are
given below
249
20-11-2018
Short cut method
ANOVA summary
250
20-11-2018
Example
• A study was conducted to investigate the perception of corporate

ethical values among individuals specialising in marketing. Using 0.05
level of significance and the data given below, test for significant
differences in perception among three groups.( higher scores indicate
higher ethical values)
251
20-11-2018
252
20-11-2018
253
20-11-2018
Example
Two way ANOVA
254
20-11-2018
255
20-11-2018
256
20-11-2018
257
20-11-2018
258
20-11-2018
259
20-11-2018
260
20-11-2018
Thanks
SS ZG536
FOR ANALYTICS

261
20-11-2018
BITS Pilani
Hyderabad Campus
L- 14:Appled Multivariate Analytics
Agenda
Multivaraite normal distribution

Preliminaries …Eigen values and vectors
Principal component analysis
262
20-11-2018
263
20-11-2018
Preliminaries
Standard Deviation is a measure of the spread of the
data
Variance – measure of the deviation from the mean for
points in one dimension e.g. heights
Covariance as a measure of how much each of the
dimensions vary from the mean with respect to each
other.
Covariance is measured between 2 dimensions to see if
there is a relationship between the 2 dimensions e.g.
number of hours studied & marks obtained
The covariance between one dimension and itself is the
variance
264
20-11-2018
Covariance Matrix
Representing Covariance between dimensions as a matrix

e.g.
• cov(x,x) cov(x,y) cov(x,z)
• C= cov(y,x) cov(y,y) cov(y,z)
• cov(z,x) cov(z,y) cov(z,z)
Diagonal is the variances of x, y and z

cov(x,y) = cov(y,x) hence matrix is symmetrical about the diagonal
N-dimensional data will result in nxn covariance matrix
A positive value of covariance indicates both dimensions increase or

decrease together
A negative value indicates while one increases the other decreases, or

vice-versa
If covariance is zero: the two dimensions are independent of each other .
265
20-11-2018
Transformation matrices
• Consider:
2 3 3 12 3
2 1 x 2 = 8 =4x 2
• Square transformation matrix transforms (3,2) from

its original location. Now if we were to take a multiple
of (3,2)
3 6
2 x =
2 4
2 3 6 24 6
2 1 x 4 = 16 = 4 x 4
eigenvalue problem
 The eigenvalue problem is any problem having the

following form:
 A.X=λ.X
 A: n x n matrix
 X: n x 1 non-zero vector
 λ: scalar
 Any value of λ for which this equation has a solution is
called the eigenvalue of A and vector v which corresponds
to this value is called the eigenvector of A.
266
20-11-2018
eigenvalue problem
2 3 3 12 3
= x
4 2
2 1 x 2 = 8
A . v = λ. v
Therefore, (3,2) is an eigenvector of the square matrix
A and 4 is an eigenvalue of A
Given matrix A, how can we calculate the

eigenvector and eigenvalues for A?
267
20-11-2018
268
20-11-2018
Data Presentation
• Blood and urine measurements (wet chemistry) from 65 people (33 alcoholics, 32
non-alcoholics). 1000
900
• Matrix Format 800
700
600
Value
500
400
H-WBC H-RBC H-Hgb H-Hct H-MCV H-MCH H-MCHC 300
A1 8.0000 4.8200 14.1000 41.0000 85.0000 29.0000 34.0000 200
A2 7.3000 5.0200 14.7000 43.0000 86.0000 29.0000 34.0000
A3 4.3000 4.4800 14.1000 41.0000 91.0000 32.0000 35.0000 100
A4 7.5000 4.4700 14.9000 45.0000 101.0000 33.0000 33.0000 00 10 20 30 40 50 60
A5 7.3000 5.5200 15.4000 46.0000 84.0000 28.0000 33.0000
A6 6.9000 4.8600 16.0000 47.0000 97.0000 33.0000 34.0000
measurement
Measurement
A7 7.8000 4.6800 14.7000 43.0000 92.0000 31.0000 34.0000
A8 8.6000 4.8200 15.8000 42.0000 88.0000 33.0000 37.0000
A9 5.1000 4.7100 14.0000 43.0000 92.0000 30.0000 32.0000
Univariate
Bivariate
550
1.8 500
1.6 450
1.4 400
C-LDH
1.2 350
H-Bands
1 300
0.8 250
0.6 200
150
0.4
100
0.2
0
Trivariate 50
0 50 150 250 350 450
0 10 20 30 40 50 60 70 C-Triglycerides
Person
4
3
M-EPI
0
600
400 500
400
200 300
C-LDH 00
100
200
C-Triglycerides
269
20-11-2018
Applications
 Face Recognition
 Image Compression
 Gene Expression Analysis
 Data Reduction
 Data Classification
 Trend Analysis
 Factor Analysis
 Noise Reduction
Principal Component Analysis
• In real world data analysis tasks we analyze complex data i.e. multi
dimensional data. We plot the data and find various patterns in it or use it
to train some machine learning models. One way to think about
dimensions is that suppose you have an data point x , if we consider this
data point as a physical object then dimensions are merely a basis of view,
like where is the data located when it is observed from horizontal axis or
vertical axis.
270
20-11-2018
• As the dimensions of data increases, the difficulty to visualize it and

perform computations on it also increases. So, how to reduce the
dimensions of a data-
* Remove the redundant dimensions
* Only keep the most important dimensions
• Now lets think about the requirement of data analysis.

Since we try to find the patterns among the data sets so we want the data
to be spread out across each dimension. Also, we want the dimensions to
be independent. Such that if data has high covariance when represented in
some n number of dimensions then we replace those dimensions with
linear combination of those n dimensions. Now that data will only be
dependent on linear combination of those related n dimensions. (related =
have high covariance)
271
20-11-2018
 It is a linear transformation that chooses a new

coordinate system for the data set such that
greatest variance by any projection of the data set comes to
lie on the first axis (then called the first principal component),
the second greatest variance on the second axis, and so on.
 PCA can be used for reducing dimensionality by

eliminating the later principal components.
• what does Principal Component Analysis (PCA) do?
• PCA finds a new set of dimensions (or a set of basis of views) such that all
the dimensions are orthogonal (and hence linearly independent) and
ranked according to the variance of data along them. It means more
important principle
axis occurs first. (more important = more variance/more spread out data)
272
20-11-2018
• How does PCA work

Calculate the covariance matrix X of data points.
Calculate eigen vectors and corresponding eigen values.
Sort the eigen vectors according to their eigen values in decreasing
order.
Choose first k eigen vectors and that will be the new k dimensions.
Transform the original n dimensional data points into k dimensions.
273
20-11-2018
274
20-11-2018
275
20-11-2018
276
20-11-2018
277
20-11-2018
Principal Components
• All principal components 30
(PCs) start at the origin of 25

Wavelength 2
the ordinate axes. 20

PC 1
15
• First PC is direction of 10
maximum variance from 5
origin
0 0 5 10 15 20 25 30
• Subsequent PCs are Wavelength 1
orthogonal to 1st PC and 30

describe maximum residual 25
Wavelength 2
variance 20
PC 2
15
10
0 0 5 10 15 20 25 30
Wavelength 1
278
20-11-2018
Principal Components
• All principal components 30
(PCs) start at the origin of 25
Wavelength 2
the ordinate axes. 20
PC 1
15
• First PC is direction of 10
maximum variance from 5
origin
0 0 5 10 15 20 25 30
• Subsequent PCs are Wavelength 1
orthogonal to 1st PC and 30

describe maximum residual 25
Wavelength 2
variance 20
PC 2
15
10
0 0 5 10 15 20 25 30
Wavelength 1
An Example Mean1=24.1
Mean2=53.8
X1 X2 X1' X2' 100
90
80
70
60
19 63 -5.1 9.25 50 Series1
40
30
20
39 74 14.9 20.25 10
0
0 10 20 30 40 50
30 87 5.9 33.25
40
30
30 23 5.9 -30.75 20
10
0 Series1
15 35 -9.1 -18.75 -15 -10 -5
-10
0 5 10 15 20
-20
15 43 -9.1 -10.75 -30
-40
15 32 -9.1 -21.75
558
30 73 5.9 19.25
279
20-11-2018
Covariance Matrix
75 106
• C=
106 482
• Using MATLAB, we find out:

• Eigenvectors:
• e1=(-0.98,-0.21), 1=51.8
• e2=(0.21,-0.98), 2=560.2
• Thus the second eigenvector is more important!
If we only keep one dimension: e2

0.5
yi
0.4
-10.14
0.3
• We keep the dimension 0.2

0.1
-16.72
-31.35
of e2=(0.21,-0.98) 0
31.374
-40 -20 -0.1 0 20 40
• We can obtain the final -0.2 16.464
data as -0.3
-0.4
8.624
19.404
-0.5
-17.63
x 
yi  0.21  0.98 i1   0.21* xi1  0.98 * xi 2
 xi 2 
560
280
20-11-2018
281
20-11-2018
563
282
20-11-2018
Thanks
SS ZG536
FOR ANALYTICS

283
20-11-2018
BITS Pilani
Hyderabad Campus
L- 15:Appled Multivariate Analytics &

Revision
284
20-11-2018
285
20-11-2018
286
20-11-2018
287
20-11-2018
288
20-11-2018
289
20-11-2018
290
20-11-2018
291
20-11-2018
Thanks
292

Advance Statistics

Diunggah oleh

Informasi Dokumen

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Advance Statistics

Diunggah oleh

Hak Cipta:

Format Tersedia

20-11-2018

BITS Pilani Dr Y V K Ravi Kumar, BITS- Pilani

L- 1: Overview of the course

•“Statistical thinking will be one day

• Importance : For any analytical exercise Data are key ingredients

• For example consider the following cases:

Regulation and legislation

Two main branches:

Basic tools / concepts in analysis

Variance / Standard deviation

Statistical graphs of data

• A picture is worth a thousand words!

• Useful for showing two samples side-by-side

.5 Std. Dev = .12

• f on y axis (could also plot p or % )

Graphing the data – Pie charts

A smoothed out histogram

Sample spaces, sample sets and events

• The sample space of a random experiment is a set S

• For example, if the experiment is to throw a die and

Discrete sample spaces.

• The set of possible outcomes S describes an event that always occurs.

• In tossing a die, one sample space is {1,2,3,4,5,6}, while two others

• An event is a set of outcomes of the experiment. This includes the null

Basic Set Operations

Mutually Exclusive Events

Independent & Dependent

• Events are either

Mutually Exclusive Events

• The sales manager of an e commerce company says

If an employee is selected at random , what is the

What is the probability that an account taken at random is a loan account ?

• What is the probability that a patient having high BP

The probability of event B given that event A has

Multiplication and Total Probability Rules

Multiplication and Total Probability Rules

Total Probability Rule (two events)

Definition (two events)

• The chances of they taking employees beneficial decisions are

• What are the chances of having employees beneficial decisions after

• P B   PE1  B   PE2  B     PEn  B 

• For each P Ei  B   PB | Ei PEi 

P B   P E1  B   P E2  B     P En  B 

Trouble shooting in engineering processes &

Let A be the event that the initial repair was incomplete

We now introduce a new term

Examples of Random Variables

The Probability Function for discrete random variables

We assigned a probability 1/6 to each face of the dice. In the

What we did could be described as distributing the values of

It is convenient to introduce the probability function p(x) :

Continuous distribution and the probability density function

A random variable X is said to have a continuous distribution

1. The uniform distribution on (a,b):

2. The exponential distribution

Probability of x successes in n trials

• Roll 12 dice simultaneously, and let X denote the number of

• Then find P(7<= X <= 9).

The Poisson Distribution

• What is the probability that no car arrives during this

5 5.05 5.1 5.15 5.2 5.25 5.3 5.35 5.4

Three Common Areas Under the Curve

Standard Normal Distribution