Anda di halaman 1dari 292

20-11-2018

SS ZG536
ADVANCED STATISTICAL TECHNIQUES
FOR ANALYTICS

BITS Pilani Dr Y V K Ravi Kumar, BITS- Pilani


Pilani|Dubai|Goa|Hyderabad

BITS Pilani
Hyderabad Campus

L- 1: Overview of the course


& Descriptive Statistics

1
20-11-2018

• “When you can measure what you are speaking about and
express in numbers, you know something about it ;but when
you cannot measure it, cannot express it in numbers, your
knowledge is of meagre and unsatisfactory kind”

• Lord Kelvin

•“Statistical thinking will be one day


as necessary for efficient citizenship
as the ability to read and write”

• H G Wells

2
20-11-2018

• Lies
• damn lies

• Statistics

Analytics
The term “ Analytics”

Disciplines
• - Statistics
• - Machine Learning
• - Biology
• - Kernel Methods

3
20-11-2018

I m por t an ce o f Dat a

• Importance : For any analytical exercise Data are key ingredients


Replace intuition with data driven decisions

• For example consider the following cases:

 Medical treatment
 Industry
 Power generation
 Crime detection
 Cognitive assessment

Mod el _r e qu i r em en ts

Business relevance

Statistical performance

Interpretable

Justifiability
Operational efficiency

Economic cost

Regulation and legislation

4
20-11-2018

Statistics?
• Procedures for organising, summarizing, and
interpreting information
Standardized techniques used by scientists
Vocabulary & symbols for communicating about data

Two main branches:


• Descriptive statistics
• Inferential statistics

Basic tools / concepts in analysis

Mean
Median

Mode

Range

Variance / Standard deviation


Coefficient of variation

Mean Deviation

5
20-11-2018

Statistical graphs of data

• A picture is worth a thousand words!


 Bar chart / graph
 Histograms
 Pie chart
 Pareto chart / diagram
 Frequency polygons
 Scatter plots
 Time series plot

Bar Graphs

• Useful for showing two samples side-by-side

6
20-11-2018

Histograms
 Univariate histograms
3.5

3.0

2.5

2.0

1.5

1.0

.5 Std. Dev = .12


Mean = .80
0.0 N = 13.00
.63 .69 .75 .81 .88 .94 1.00

Exam 1

Histograms

• f on y axis (could also plot p or % )


• X values (or midpoints of class intervals) on x axis
• Plot each f with a bar, equal size, touching
• No gaps between bars

7
20-11-2018

Bivariate histogram

Graphing the data – Pie charts

8
20-11-2018

Frequency Polygons
 Frequency Polygons
 Depicts information from a frequency table or a
grouped frequency table as a line graph

Frequency Polygon

A smoothed out histogram


Make a point representing f of each value
Connect dots
Anchor line on x axis
Useful for comparing distributions in two samples (in this
case, plot p rather than f )

9
20-11-2018

!!!!
• A famous statistician would never travel by airplane, because she had
studied air travel and estimated the probability of there being a bomb on any
given flight was 1 in a million, and she was not prepared to accept these
odds.
• One day a colleague met her at a conference far from home.
• "How did you get here, by train?"
• "No, I flew"
• "What about the possibility of a bomb?"
• "Well, I began thinking that if the odds of one bomb are 1:million, then the
odds of TWO bombs are (1/1,000,000) x (1/1,000,000) = 10-12. This is a
very, very small probability, which I can accept. So, now I bring my own
bomb along!"

Random Experiment
• Term "random experiment" is used to describe any action whose
outcome is not known in advance. Here are some examples of
experiments dealing with statistical data:

 Tossing a coin
Counting how many times a certain word or a combination of words
appears in the text of the “King Lear” or in a text of Confucius
 counting occurrences of a certain combination of amino acids in a
protein database.
pulling a card from the deck

10
20-11-2018

Sample spaces, sample sets and events

• The sample space of a random experiment is a set S


that includes all possible outcomes of the experiment.

• For example, if the experiment is to throw a die and


record the outcome, the sample space is S = {
1,2,3,4,5,6}

Discrete sample spaces.


Continuous sample spaces

11
20-11-2018

• The set of possible outcomes S describes an event that always occurs.



Each outcome is represented by a sample point in the sample space.
• There is more than one way to view and experiment, so an experiment
may have more than one associated sample space.

• In tossing a die, one sample space is {1,2,3,4,5,6}, while two others


are {odd, even} and {less then 3.5, more then 3.5}

Events

• An event is a set of outcomes of the experiment. This includes the null


(empty) set of outcomes and the set of all outcomes. Each time the
experiment is run, a given event A either occurs, if the outcome of the
experiment is an element of A, or does not occur, if the outcome of the
experiment is not an element of A.

12
20-11-2018

Basic Set Operations

Mutually Exclusive Events

• Two events are mutually exclusive if they can not occur at the
same time. Which are mutually exclusive?
• Draw an Ace and draw a heart from a standard deck of
52 cards
• It is raining and I show up for class
• Dr. Li is an easy teacher and I fail the class
• Dr. Beaubouef is a hard teacher and I ace the class.

13
20-11-2018

Independent & Dependent

• Events are either


independent (the occurrence of one event has no effect on
the probability of occurrence of the other) or
dependent (the occurrence of one event gives information
about the occurrence of the other)

Random experiment
• Consider the random experiment of dropping a Styrofoam cup onto
the floor from a height of four feet. The cup hits the ground and
eventually comes to rest. It could land upside down, right side up, or it
could land on its side. We represent these possible outcomes of the
random experiment by the following.

14
20-11-2018

Probability

Axioms of Probability

15
20-11-2018

Probability of a Union

Mutually Exclusive Events

16
20-11-2018

Three Events

• The sales manager of an e commerce company says


that 80% of those who visit their website for the first
time do not buy any mobile. If a new customer visits
the website, what is the probability that the customer
would buy mobile

17
20-11-2018

2
Blue Black Brown Total
Software prog
35 25 20 80
Project Mgrs
7 8 5 20
Total
42 33 25 100

If an employee is selected at random , what is the


probability that he is a software prog?

………………………………………………………..,what
is the probability that he is wearing a blue trouser

3
• A Survey conducted by a bank revealed that 40% of the accounts are
savings accounts and 35% of the accounts are current accounts and the
balance are loan accounts.

What is the probability that an account taken at random is a loan account ?


What is the probability that an account taken at random is NOT savings
account ?
What is the probability that an account taken at random is NOT a current
account
What is the probability that an account taken at random is a current account
or a loan account?

18
20-11-2018

4
• From a Hospital data it is found that 45% of the
patients are having high B.P. Also it was found that
35% of these patients having high B P is also having
diabetes.

• What is the probability that a patient having high BP


is also diabetic

Conditional Probability

The probability of event B given that event A has


occurred P(B|A) or, the probability of event A
given that event B has occurred P(A|B)

19
20-11-2018

5
Actually
purchased
Planned to YES NO TOTAL
purchase
YES 200 50 250
NO 100 650 750
TOTAL 300 700 1000

Conditional Probability

Definition

20
20-11-2018

Multiplication and Total Probability Rules

Multiplication Rule

Multiplication and Total Probability Rules

Total Probability Rule (two events)

21
20-11-2018

Independence

Definition (two events)

6
• Toss a six-sided die twice. The sample space consists of all
ordered pairs (i; j) of the numbers 1; 2; : : : ; 6, that is, S =
{(1; 1); (1; 2); : : : ; (6; 6)}.. Let A = {outcomes match}
• and B = {sum of outcomes at least 8}.
• Then find P(A),P(B),P(A/B) and P(B/A)

22
20-11-2018

7
• Three persons A,B and C are competing for the post of CEO of a
company. The chances of they becoming CEO are 0.2,0.3 and 0.4
respectively.

• The chances of they taking employees beneficial decisions are


0.50,0.45 and 0.6 respectively

• What are the chances of having employees beneficial decisions after


having new CEO

Bayes’ Theorem

Definition

23
20-11-2018

• P B   PE1  B   PE2  B     PEn  B 

• For each P Ei  B   PB | Ei PEi 

P B   P E1  B   P E2  B     P En  B 


 P B | E1 P E1   P B | E2 PE2     PB | En P En 
n
  PB | Ei P Ei 
i 1

Bayes’ Theorem

Bayes’ Theorem

24
20-11-2018

Applications
Diagnostic tests in medicine

Telecommunication

Customer service

Trouble shooting in engineering processes &


systems

Example 1
• A Component is tested for its stipulated quality , but the
test is not infalliable. If the component is good,70% of the
time , test gives positive indication i.e. 70% of the time the
test classifies good item as good. If the component is
defective,80% of the time , test gives negative indication
implying that the component is bad. If in the manufacturing
process, the percentage of defective components is 20,then
find
probability that the component is good and test gives
positive indication
…….the component is not good and test gives negative
indication
…….the component is good given that the test is positive

25
20-11-2018

Example 2
Technicians regularly make repairs when breakdowns
occur on an automated production line. Janak, who
services 20% of the breakdowns, makes an incomplete
repair 1 time in 20.Tarun ,who services 60% of the
breakdowns ,makes an incomplete repair 1 time in 10
Gautham, who services 15% of the breakdowns, makes an
incomplete repair 1 time in 10 and Prasad ,who services
5% of the breakdowns, makes an incomplete repair 1 time
in 20.For the next problem with the production line
diagnosed as being due to an initial repair that was
incomplete, what is the probability that this initial repair
was made by Janak?

Solution

Let A be the event that the initial repair was incomplete


B1 that the repair was made by Janak
B2 that it was made by Tarun ,
B3 that it was made by Gautham,
B4 that it was made by Prasad,

26
20-11-2018

P ( B1/A ) =
P(B )P(A/B )
1 1
P(B )P(A/B )  P(B )P(A/B )  P(B )P(A/B )  P(B )P(A/B )
1 1 2 2 3 3 4 4

=
0.20 (0.05)
(0.20)(0.0 5)  (0.60)(0.1 0)  (0.15)(0.1 0)  (0.05)(0.0 5)

= 0.114

54
Random Variables

We now introduce a new term


Instead of saying that the possible outcomes are 1,2,3,4,5
or 6, we say that random variable X can take values
{1,2,3,4,5,6}.
A random variable is an expression whose value is
the outcome of a particular experiment.
The random variables can be either discrete or continuous.
It’s a convention to use the upper case letters (X,Y) for
the names of the random variables and the lower case
letters (x,y) for their possible particular values.

27
20-11-2018

Random Variables
Definition

Random Variables

Definition

28
20-11-2018

Random Variables

Examples of Random Variables

The Probability Function for discrete random variables

We assigned a probability 1/6 to each face of the dice. In the


same manner, we should assign a probability 1/2 to the sides
of a coin.

What we did could be described as distributing the values of


probability between different elementary events:
P(X=xk)=p(xk),k=1,2,…

It is convenient to introduce the probability function p(x) :


P(X=x)=p(x)

29
20-11-2018

Continuous distribution and the probability density function

A random variable X is said to have a continuous distribution


with density function f(x) if for all a b we have
b
P (a  X  b )   f ( x )dx (1.15)
a

 f (x)  1 (1.16)

P (E )   f ( x )dx (1.17)
E

Examples:

1. The uniform distribution on (a,b):


We are picking a value at random from (a,b).

1 , axb
f ( x )   b  a (1.18)
0

otherwise

30
20-11-2018

2. The exponential distribution

  e  x , x0
f ( x )   (1.19)
0

otherwise

Expected Value

E ( X )   X iP( X i )
i 1

31
20-11-2018

Variance

n
 2
  X
i1
i  E  X 2 P  X i 

1
• Toss a coin 3 times. The sample space is
• S = {HHH; HTH; THH; TTH; HHT; HTT; THT; TTT}

• Mean

• Variance

32
20-11-2018

Binomial Distribution
n = number of trials ,x = number of successes , p = probability of success
q = probability of failure
The picture can't be display ed.

Probability of x successes in n trials

n! n x
 p xq
r! (n - x)!
  np
   np ( 1  p )

• Roll 12 dice simultaneously, and let X denote the number of


6’s that appear.

• Then find P(7<= X <= 9).

33
20-11-2018

3
• A recent national study showed that approximately 44.7% of college
students have used Wikipedia as a source in at least one of their term
papers.
• Let X equal the number of students in a random sample of size n = 31
who have used Wikipedia as a source.

• How is X distributed?
• Find the probability that X is equal to 17.
• Find the probability that X is at most 13.
• Find the probability that X is between 16 and 19, inclusive.
• Find mean and variance

The Poisson Distribution

x e 
P( X )  Expected value = 
X! Variance = 

34
20-11-2018

Problem
• On the average, five cars arrive at a particular car wash
every hour. Let X count the number of cars that arrive
from 10AM to 11AM. (mean = 5)

• What is the probability that no car arrives during this


period?

Problem

• Suppose the car wash is in operation from 8AM to 6PM, and we let Y
be the number of customers that appear in this period. Since this
period covers a total of 10 hours, from ( lambda = 50).

• What is the probability that there are between 48 and 50 customers,
inclusive?

35
20-11-2018

Normal Distribution
Probability density function - f(X)

5 5.05 5.1 5.15 5.2 5.25 5.3 5.35 5.4

1 / 2 ( X   )2
1
f (X ) 
2

e
 2

Three Common Areas Under the Curve

Three Normal
distributions with
different areas

36
20-11-2018

Standard Normal Distribution

=100
=15
x
Z 

55 70 85 100 115 130 145

-3 -2 -1 0 1 2 3

Thanks

37
20-11-2018

SS ZG536
ADVANCED STATISTICAL TECHNIQUES
FOR ANALYTICS

BITS Pilani Dr Y V K Ravi Kumar, BITS- Pilani


Pilani|Dubai|Goa|Hyderabad

BITS Pilani
Hyderabad Campus

L- 2: Descriptive Statistics

38
20-11-2018

Today…..

 Recall the past for a while_ Simple tools

 Visualization of data

 Basics of probability

 Discussion & Problems on probability

 Conditional probability

Visualization
 Summary gives an idea about the data
• summary(income)
• Min 1st QU. Median Mean 3rd Qu Max
• - 7.8 12.5 32.0 52.03 67.2 585
Visualization – why

39
20-11-2018

Data Visualisation

 Line chart

 Bar chart

 Histogram

 Pie chart

 Scatter plot

 Box plot

Line Chart

40
20-11-2018

Bar Chart

Histograms

41
20-11-2018

Histograms

Histograms

42
20-11-2018

Pie charts

Scatter Plot

43
20-11-2018

Box plot

To conclude _ Visualization
Visualization gives a sense of data distribution and relationship
among variables

Visualization is an iterative process and helps answer questions about


the data. Time spent is not wasted during the modelling process and
helps to find the optimal model to fit the data

44
20-11-2018

!!!!
• A famous statistician would never travel by airplane, because she had studied air travel and
estimated the probability of there being a bomb on any given flight was 1 in a million, and she
was not prepared to accept these odds.
• One day a colleague met her at a conference far from home.
• "How did you get here, by train?"
• "No, I flew"
• "What about the possibility of a bomb?"
• "Well, I began thinking that if the odds of one bomb are 1:million, then the odds of TWO bombs
are (1/1,000,000) x (1/1,000,000) = 10-12. This is a very, very small probability, which I can accept.
So, now I bring my own bomb along!"

Random Experiment
• Term "random experiment" is used to describe any action whose
outcome is not known in advance. Here are some examples of
experiments dealing with statistical data:

 Tossing a coin
Counting how many times a certain word or a combination of words
appears in the text of the “King Lear” or in a text of Confucius
 counting occurrences of a certain combination of amino acids in a
protein database.
pulling a card from the deck

45
20-11-2018

•Sample Space
Discrete sample spaces.
Continuous sample spaces

Event

Independent events
Dependent events

46
20-11-2018

Probability

Axioms of Probability

47
20-11-2018

• The sales manager of an e commerce company says


that 80% of those who visit their website for the first
time do not buy any mobile. If a new customer visits
the website, what is the probability that the customer
would buy mobile

2
Blue Black Brown Total
Software prog
35 25 20 80
Project Mgrs
7 8 5 20
Total
42 33 25 100

If an employee is selected at random , what is the


probability that he is a software prog?

………………………………………………………..,what
is the probability that he is wearing a blue trouser

48
20-11-2018

3
• A Survey conducted by a bank revealed that 40% of the accounts are
savings accounts and 35% of the accounts are current accounts and the
balance are loan accounts.

What is the probability that an account taken at random is a loan account ?


What is the probability that an account taken at random is NOT savings
account ?
What is the probability that an account taken at random is NOT a current
account
What is the probability that an account taken at random is a current account
or a loan account?

Thanks

49
20-11-2018

SS ZG536
ADVANCED STATISTICAL TECHNIQUES
FOR ANALYTICS

BITS Pilani Dr Y V K Ravi Kumar, BITS- Pilani


Pilani|Dubai|Goa|Hyderabad

BITS Pilani
Hyderabad Campus

L- 3: Descriptive Statistics

50
20-11-2018

Today…..

 Recall the past for a while_ Simple tools

 Visualization of data

 Basics of probability

 Discussion & Problems on probability

 Conditional probability

 Box plot

Visualization
 Summary gives an idea about the data
• summary(income)
• Min 1st QU. Median Mean 3rd Qu Max
• - 7.8 12.5 32.0 52.03 67.2 585
Visualization – why

51
20-11-2018

3
• A Survey conducted by a bank revealed that 40% of the accounts are
savings accounts and 35% of the accounts are current accounts and the
balance are loan accounts.

What is the probability that an account taken at random is a loan account ?


What is the probability that an account taken at random is NOT savings
account ?
What is the probability that an account taken at random is NOT a current
account
What is the probability that an account taken at random is a current account
or a loan account?

4
• From a Hospital data it is found that 45% of the
patients are having high B.P. Also it was found that
35% of these patients having high B P is also having
diabetes.

• What is the probability that a patient having high BP


is also diabetic

52
20-11-2018

Conditional Probability

The probability of event B given that event A has


occurred P(B|A) or, the probability of event A
given that event B has occurred P(A|B)

Conditional Probability

Definition

53
20-11-2018

Multiplication and Total Probability Rules

Multiplication Rule

Multiplication and Total Probability Rules

Total Probability Rule (two events)

54
20-11-2018

Independence

Definition (two events)

Bayes’ Theorem

Definition

55
20-11-2018

• P B   PE1  B   PE2  B     PEn  B 

• For each P Ei  B   PB | Ei PEi 

P B   P E1  B   P E2  B     P En  B 


 P B | E1 P E1   P B | E2 PE2     PB | En P En 
n
  PB | Ei P Ei 
i 1

Bayes’ Theorem

Bayes’ Theorem

56
20-11-2018

Applications
Diagnostic tests in medicine

Telecommunication

Customer service

Trouble shooting in engineering processes &


systems

114
Random Variables

We now introduce a new term


Instead of saying that the possible outcomes are 1,2,3,4,5
or 6, we say that random variable X can take values
{1,2,3,4,5,6}.
A random variable is an expression whose value is
the outcome of a particular experiment.
The random variables can be either discrete or continuous.
It’s a convention to use the upper case letters (X,Y) for
the names of the random variables and the lower case
letters (x,y) for their possible particular values.

57
20-11-2018

Random Variables

Definition

Random Variables

Examples of Random Variables

58
20-11-2018

The Probability Function for discrete random variables

We assigned a probability 1/6 to each face of the dice. In the


same manner, we should assign a probability 1/2 to the sides
of a coin.

What we did could be described as distributing the values of


probability between different elementary events:
P(X=xk)=p(xk),k=1,2,…

It is convenient to introduce the probability function p(x) :


P(X=x)=p(x)

Continuous distribution and the probability density function

A random variable X is said to have a continuous distribution


with density function f(x) if for all a b we have
b
P (a  X  b )   f ( x )dx (1.15)
a

 f (x)  1 (1.16)

P (E )   f ( x )dx (1.17)
E

59
20-11-2018

Expected Value

E ( X )   X iP( X i )
i 1

Variance

n
 2
  X
i1
i  E  X 2 P  X i 

60
20-11-2018

Thanks

SS ZG536
ADVANCED STATISTICAL TECHNIQUES
FOR ANALYTICS

BITS Pilani Dr Y V K Ravi Kumar, BITS- Pilani


Pilani|Dubai|Goa|Hyderabad

61
20-11-2018

BITS Pilani
Hyderabad Campus

L- 4: Descriptive Statistics

Today…..

Recall the past for a while_ Conditional probability and Baye’s theorem & some examples

Random variables

Probability distribution
Examples

62
20-11-2018

Conditional Probability
and Baye’s theorem

63
20-11-2018

64
20-11-2018

65
20-11-2018

66
20-11-2018

134
Random Variables

We now introduce a new term


Instead of saying that the possible outcomes are 1,2,3,4,5
or 6, we say that random variable X can take values
{1,2,3,4,5,6}.
A random variable is an expression whose value is
the outcome of a particular experiment.
The random variables can be either discrete or continuous.
It’s a convention to use the upper case letters (X,Y) for
the names of the random variables and the lower case
letters (x,y) for their possible particular values.

67
20-11-2018

Random Variables

Definition

Random Variables

Examples of Random Variables

68
20-11-2018

The Probability Function for discrete random variables

We assigned a probability 1/6 to each face of the dice. In the


same manner, we should assign a probability 1/2 to the sides
of a coin.

What we did could be described as distributing the values of


probability between different elementary events:
P(X=xk)=p(xk),k=1,2,…

It is convenient to introduce the probability function p(x) :


P(X=x)=p(x)

Continuous distribution and the probability density function

A random variable X is said to have a continuous distribution


with density function f(x) if for all a b we have
b
P (a  X  b )   f ( x )dx (1.15)
a

 f (x)  1 (1.16)

P (E )   f ( x )dx (1.17)
E

69
20-11-2018

Expected Value

E ( X )   X iP( X i )
i 1

Variance

n
 2
  X
i1
i  E  X 2 P  X i 

70
20-11-2018

71
20-11-2018

72
20-11-2018

73
20-11-2018

74
20-11-2018

75
20-11-2018

76
20-11-2018

77
20-11-2018

Thanks

78
20-11-2018

SSTCS ZG536
ADVANCED STATISTICAL TECHNIQUES
FOR ANALYTICS

BITS Pilani Dr Y V K Ravi Kumar, BITS- Pilani


Pilani|Dubai|Goa|Hyderabad

BITS Pilani
Hyderabad Campus

L- 5: Descriptive and inferential statistics

79
20-11-2018

Agenda

Quick Review of the topics covered in


previous class
Normal Distribution
Sampling
Testing of Hypothesis

80
20-11-2018

Example
Technicians regularly make repairs when breakdowns
occur on an automated production line. Janak, who
services 20% of the breakdowns, makes an incomplete
repair 1 time in 20.Tarun ,who services 60% of the
breakdowns ,makes an incomplete repair 1 time in 10
Gautham, who services 15% of the breakdowns, makes an
incomplete repair 1 time in 10 and Prasad ,who services
5% of the breakdowns, makes an incomplete repair 1 time
in 20.For the next problem with the production line
diagnosed as being due to an initial repair that was
incomplete, what is the probability that this initial repair
was made by Janak?

Solution

Let A be the event that the initial repair was incomplete


B1 that the repair was made by Janak
B2 that it was made by Tarun ,
B3 that it was made by Gautham,
B4 that it was made by Prasad,

81
20-11-2018

P ( B1/A ) =
P(B )P(A/B )
1 1
P(B )P(A/B )  P(B )P(A/B )  P(B )P(A/B )  P(B )P(A/B )
1 1 2 2 3 3 4 4

=
0.20 (0.05)
(0.20)(0.0 5)  (0.60)(0.1 0)  (0.15)(0.1 0)  (0.05)(0.0 5)

= 0.114

Problem
• On the average, five cars arrive at a particular car wash every hour. Let
X count the number of cars that arrive from 10AM to 11AM. (mean =
5).What is the probability that no car arrives during this period?

82
20-11-2018

Problem
• Suppose the car wash is in operation from 8AM to 6PM, and we let Y
be the number of customers that appear in this period.(lambda = 50).
• What is the probability that there are between 48 and 50 customers,
inclusive?

83
20-11-2018

Normal Distribution
Probability density function - f(X)

5 5.05 5.1 5.15 5.2 5.25 5.3 5.35 5.4

1 / 2 ( X   )2
1
f (X ) 
2

e
 2

Normal Distribution
Probability density function - f(X)

5 5.05 5.1 5.15 5.2 5.25 5.3 5.35 5.4

84
20-11-2018

Three Common Areas Under the Curve

Three Normal
distributions with
different areas

Standard Normal Distribution

=100
=15
x
Z 

-2 0 2 3
-1 1
55 70 85 100 115 130 145

-3

85
20-11-2018

Note

Since the normal density cannot be integrated


in between every pair of limits a and b , probabilities
relating to normal distributions are usually obtained
from special tables (see tables)

86
20-11-2018

87
20-11-2018

Normal distribution will take on a value

1) to the left of z = -1.78

2) to the right of z = -1.45

3) corresponding to -0.80  z  1.53

4) to the left of z = -2,52 and to the


right of z = 1.83

Normal distribution will take on a value

1) to the left of z = -1.78

2) to the right of z = -1.45

3) corresponding to -0.80  z  1.53

4) to the left of z = -2,52 and to the


right of z = 1.83

88
20-11-2018

Normal distribution will take on a value

1) to the left of z = -1.78

2) to the right of z = -1.45

3) corresponding to -0.80  z  1.53

4) to the left of z = -2,52 and to the


right of z = 1.83

Normal distribution will take on a value

1) to the left of z = -1.78

2) to the right of z = -1.45

3) corresponding to -0.80  z  1.53

4) to the left of z = -2,52 and to the


right of z = 1.83

89
20-11-2018

Calculation of probabilities using a normal distribution

Problem

The mean and standard deviation of a


normal variate are 8 and 4 respectively

Find 1) P [ 5 X  10 ]
2)P [ X  5]

Solution

1) = 8

 =4
We know that Z= X  = X  8
4

When X=5 Z = 5  8 = - 0.75


4

When X=10 Z = 10  8 = 0.5


4

P [ 5  X  10 ] = P [ -0.75 Z  0.5 ]

90
20-11-2018

= F (0.5) – F ( - 0.75)

= 0.6915 – .22663 = 0.4649

91
20-11-2018

Three Common Areas Under the Curve

Three Normal
distributions with
different areas

92
20-11-2018

93
20-11-2018

Inferential Statistics
 Sampling

Sample

Random sampling

Central Limit theorem

94
20-11-2018

Statistical Inferences

• Theory of statistical inference is divided into


two major areas

 Estimation

 Tests of hypothesis

Hypothesis Testing

•Goal:
•Make statement(s) regarding unknown
population parameter values based on
sample data

95
20-11-2018

Hypothesis Testing

 Is also called significance testing


 Tests a claim about a parameter using
evidence (data in a sample

Example
• Drug company has new drug, wishes to compare it with
current standard treatment
• Federal regulators tell company that they must
demonstrate that new drug is better than current
treatment to receive approval
• Firm runs clinical trial where some patients receive new
drug, and others receive standard treatment
• Numeric response of therapeutic effect is obtained
(higher scores are better).
• Parameter of interest: mNew - mStd

96
20-11-2018

Hypothesis Testing Steps

Null and alternative


hypotheses
Test statistic
P-value and interpretation
Significance level (optional)

Example

•Null hypothesis H0: μ = 170


•The alternative hypothesis can be
either H1: μ > 170 (one-sided test)
• or
H1: μ ≠ 170 (two-sided test)

97
20-11-2018

Test Statistic

Use this statistic to test the problem:

x  0
z stat 
SE x
where  0  population mean assuming H 0 is true

and SE x 
n

Example
A. Hypotheses:
H0: µ = 100 versus
Ha: µ > 100 (one-sided)
Ha: µ ≠ 100 (two-sided)
B. Test statistic:

 15
SE x   5
n 9
x   0 112.8  100
z stat    2.56
SE x 5

98
20-11-2018

C. P-value: P = Pr(Z ≥ 2.56) = 0.0052

P =.0052  it is unlikely the sample came from this null


distribution  strong evidence against H0

Hypothesis Testing

Test Result – H0 True H0 False

True State
H0 True Correct Decision Type I Error

H0 False Type II Error Correct Decision

  P (Type I Error )   P (Type II Error )


• Goal: Keep ,  reasonably small

99
20-11-2018

Problem

• It is claimed that a random sample 49 tyres has


a mean life of 15200 kms. This sample was
drawn from a population whose mean is 15150
kms and a standard deviation of 1200kms. Test
the significance at 0.05 level.

Solution:

1. Null hypothesis H0 :  =15200

2. Alternate hypothesis H1:   15200

3. Level of significance  =0.05

4. critical region :- This is a two tailed test (large sample). So reject H0 if


( Z c a l =Z) < - Z  or (Z= Z c a l ) > Z 
2 2

Here  = 0.05

 0 .0 5
=
2 2

= 0.025

From table we get

 Z  =1.96
2

i.e; if
Zcal=Z <-1.96 or Zcal >1.96 we reject null hypothesis.

100
20-11-2018

6. Computation :

Test statistic

x   1520015150
Zcal =Z = 
 1200
n 49

=0.2916

7. Decesion:

Since Zcal = 0.2916 < 1.96 we accept the mull hypothesis.

Problem

• A trucking firm is suspicious of the claim that the average life



• time of certain tyres is at least 28,000 miles. To check the claim, the

• firm puts 40 of these tyres on its trucks and get a mean life of

• 27,463miles with a standard deviation of 1,348 miles. What can it

• conclude if the probability of Type I error is to be at most 0.01


101
20-11-2018

Solution

1.Null hypothesis : H0 :  28,000 miles

2. Alternate hypothesis: H1: < 28,000 miles

3. Level of significance:  = 0.01

4. Critical region

This is a left tailed test (large sample)

If Z = Zcal < - Z  we reject null hypothesis

If Z = Zcal < - Z  =- Z 0.01= -2.33 we reject null hypothesis

102
20-11-2018

5.Computation

Test statistic

x  27, 463  28, 000


Z= =  -2.52
 1,348
n 40

6.Conclusion

Since Z = Zcal = -2.52 < -2.33 , we reject null hypothesis at level of

significance 0.01. In other words the trucking firm’s suspicion that

 < 28,000 miles is confirmed.

Hypothesis concerning one mean (small sample)

Procedure

1. Null hypothesis H0 :  = 0

2.Alternate Hypothesis H1 :   0 ( Two tailed test)

Or

H1 :   0 ( Right tailed test)

Or

H1 :   0 ( left tailed test )

3. Level of significance :

103
20-11-2018

4. Critical region

For two tailed test H1 :    0

Reject H0 if t <  t  or
2

t > t  with (n-1) degrees of freedom


2

For right tailed test H1 :    0

Reject H0 if t > t  with (n-1) degrees of freedom

For left tailed test H1 :   0

Reject H0 if t < -t  (n-1) degrees of freedom

5. Test statistic

x
t = with (n-1) degrees of freedom
s
n

6. Calculation

7. Decision

104
20-11-2018

A random sample of 6 steel beams has a mean

compressive strength of 58,392 p.s.i (pounds per square

inch ) with a standard deviation of 648 p.s.i . use this

information at the level of significance   0.05 to test

whether the true average compressive strength of steel

from which the sample came is 58,000 p.s.i

Thanks

105
20-11-2018

SSTCS ZG536
ADVANCED STATISTICAL TECHNIQUES
FOR ANALYTICS

BITS Pilani Dr Y V K Ravi Kumar, BITS- Pilani


Pilani|Dubai|Goa|Hyderabad

BITS Pilani
Hyderabad Campus

L- 6: Inferential statistics

106
20-11-2018

Agenda

Quick Review of the topics covered in


previous class
Testing of Hypothesis

Statistical Inferences

• Theory of statistical inference is divided into


two major areas

 Estimation

 Tests of hypothesis

107
20-11-2018

108
20-11-2018

109
20-11-2018

110
20-11-2018

111
20-11-2018

Hypothesis Testing

•Goal:
•Make statement(s) regarding unknown
population parameter values based on
sample data

112
20-11-2018

Hypothesis Testing

 Is also called significance testing


 Tests a claim about a parameter using
evidence (data in a sample

Example
• Drug company has new drug, wishes to compare it with
current standard treatment
• Federal regulators tell company that they must
demonstrate that new drug is better than current
treatment to receive approval
• Firm runs clinical trial where some patients receive new
drug, and others receive standard treatment
• Numeric response of therapeutic effect is obtained
(higher scores are better).
• Parameter of interest: mNew - mStd

113
20-11-2018

Hypothesis Testing Steps

Null and alternative


hypotheses
Test statistic
P-value and interpretation
Significance level (optional)

114
20-11-2018

Example

•Null hypothesis H0: μ = 170


•The alternative hypothesis can be
either H1: μ > 170 (one-sided test)
• or
H1: μ ≠ 170 (two-sided test)

Test Statistic

Use this statistic to test the problem:

x  0
z stat 
SE x
where  0  population mean assuming H 0 is true

and SE x 
n

115
20-11-2018

Example
A. Hypotheses:
H0: µ = 100 versus
Ha: µ > 100 (one-sided)
Ha: µ ≠ 100 (two-sided)
B. Test statistic:

 15
SE x   5
n 9
x   0 112.8  100
z stat    2.56
SE x 5

C. P-value: P = Pr(Z ≥ 2.56) = 0.0052

P =.0052  it is unlikely the sample came from this null


distribution  strong evidence against H0

116
20-11-2018

Hypothesis Testing
Test Result – H0 True H0 False

True State
H0 True Correct Decision Type I Error

H0 False Type II Error Correct Decision

  P (Type I Error )   P (Type II Error )


• Goal: Keep ,  reasonably small

Problem

• It is claimed that a random sample 49 tyres has


a mean life of 15200 kms. This sample was
drawn from a population whose mean is 15150
kms and a standard deviation of 1200kms. Test
the significance at 0.05 level.

117
20-11-2018

Solution:

1. Null hypothesis H0 :  =15200

2. Alternate hypothesis H1:   15200

3. Level of significance  =0.05

4. critical region :- This is a two tailed test (large sample). So reject H0 if


( Z c a l =Z) < - Z  or (Z= Z c a l ) > Z 
2 2

Here  = 0.05

 0 .0 5
=
2 2

= 0.025

From table we get

 Z  =1.96
2

i.e; if
Zcal=Z <-1.96 or Zcal >1.96 we reject null hypothesis.

6. Computation :

Test statistic

x   1520015150
Zcal =Z = 
 1200
n 49

=0.2916

7. Decesion:

Since Zcal = 0.2916 < 1.96 we accept the mull hypothesis.

118
20-11-2018

Problem

• A trucking firm is suspicious of the claim that the average life



• time of certain tyres is at least 28,000 miles. To check the claim, the

• firm puts 40 of these tyres on its trucks and get a mean life of

• 27,463miles with a standard deviation of 1,348 miles. What can it

• conclude if the probability of Type I error is to be at most 0.01


Solution

1.Null hypothesis : H0 :  28,000 miles

2. Alternate hypothesis: H1: < 28,000 miles

119
20-11-2018

3. Level of significance:  = 0.01

4. Critical region

This is a left tailed test (large sample)

If Z = Zcal < - Z  we reject null hypothesis

If Z = Zcal < - Z  =- Z 0.01= -2.33 we reject null hypothesis

5.Computation

Test statistic

x  27, 463  28, 000


Z= =  -2.52
 1,348
n 40

6.Conclusion

Since Z = Zcal = -2.52 < -2.33 , we reject null hypothesis at level of

significance 0.01. In other words the trucking firm’s suspicion that

 < 28,000 miles is confirmed.

120
20-11-2018

Hypothesis concerning one mean (small sample)

Procedure

1. Null hypothesis H0 :  = 0

2.Alternate Hypothesis H1 :   0 ( Two tailed test)

Or

H1 :   0 ( Right tailed test)

Or

H1 :   0 ( left tailed test )

3. Level of significance :

4. Critical region

For two tailed test H1 :    0

Reject H0 if t <  t  or
2

t > t  with (n-1) degrees of freedom


2

For right tailed test H1 :    0

Reject H0 if t > t  with (n-1) degrees of freedom

For left tailed test H1 :   0

Reject H0 if t < -t  (n-1) degrees of freedom

121
20-11-2018

5. Test statistic

x
t = with (n-1) degrees of freedom
s
n

6. Calculation

7. Decision

A random sample of 6 steel beams has a mean

compressive strength of 58,392 p.s.i (pounds per square

inch ) with a standard deviation of 648 p.s.i . use this

information at the level of significance   0.05 to test

whether the true average compressive strength of steel

from which the sample came is 58,000 p.s.i

122
20-11-2018

A random sample of 6 steel beams has a mean

compressive strength of 58,392 p.s.i (pounds per square

inch ) with a standard deviation of 648 p.s.i . use this

information at the level of significance   0.05 to test

whether the true average compressive strength of steel

from which the sample came is 58,000 p.s.i

123
20-11-2018

Thanks

SS ZG536
ADVANCED STATISTICAL TECHNIQUES
FOR ANALYTICS

BITS Pilani Dr Y V K Ravi Kumar, BITS- Pilani


Pilani|Dubai|Goa|Hyderabad

124
20-11-2018

BITS Pilani
Hyderabad Campus

L- 7: Inferential statistics & Predictive


Analytics

Agenda
Central limit theorem
Type I, Type II Errors
Testing of Hypothesis – continuation from
previous session
Covariance
Correlation
Introduction to regression

125
20-11-2018

Central Limit Theorem

126
20-11-2018

127
20-11-2018

128
20-11-2018

129
20-11-2018

130
20-11-2018

131
20-11-2018

132
20-11-2018

133
20-11-2018

134
20-11-2018

135
20-11-2018

136
20-11-2018

137
20-11-2018

138
20-11-2018

139
20-11-2018

140
20-11-2018

141
20-11-2018

142
20-11-2018

143
20-11-2018

144
20-11-2018

145
20-11-2018

146
20-11-2018

147
20-11-2018

Thanks

SSTCS ZG536
ADVANCED STATISTICAL TECHNIQUES
FOR ANALYTICS

BITS Pilani Dr Y V K Ravi Kumar, BITS- Pilani


Pilani|Dubai|Goa|Hyderabad

148
20-11-2018

BITS Pilani
Hyderabad Campus

L- 8: Predictive Analytics

Agenda

Covariance
Correlation
Introduction to regression
Method of least squares
Simple linear regression

149
20-11-2018

150
20-11-2018

151
20-11-2018

152
20-11-2018

153
20-11-2018

154
20-11-2018

Regression

155
20-11-2018

156
20-11-2018

157
20-11-2018

158
20-11-2018

159
20-11-2018

160
20-11-2018

161
20-11-2018

Thanks

SSTCS ZG536
ADVANCED STATISTICAL TECHNIQUES
FOR ANALYTICS

BITS Pilani Dr Y V K Ravi Kumar, BITS- Pilani


Pilani|Dubai|Goa|Hyderabad

162
20-11-2018

BITS Pilani
Hyderabad Campus

L- 9: Predictive Analytics & Revision

Agenda

Review of last session


Introduction to regression
Method of least squares
Simple linear regression

163
20-11-2018

164
20-11-2018

165
20-11-2018

166
20-11-2018

167
20-11-2018

168
20-11-2018

169
20-11-2018

Regression

170
20-11-2018

171
20-11-2018

172
20-11-2018

173
20-11-2018

174
20-11-2018

175
20-11-2018

176
20-11-2018

177
20-11-2018

178
20-11-2018

179
20-11-2018

180
20-11-2018

181
20-11-2018

182
20-11-2018

Thanks

SSTCS ZG536
ADVANCED STATISTICAL TECHNIQUES
FOR ANALYTICS

BITS Pilani Dr Y V K Ravi Kumar, BITS- Pilani


Pilani|Dubai|Goa|Hyderabad

183
20-11-2018

BITS Pilani
Hyderabad Campus

L- 11: Predictive Analytics(Continued) &


Forecasting Models

Agenda

Model validation
Ridge and lasso models
Assumptions of Linear regression
Logistic regression

184
20-11-2018

369/54

Classical Linear Regression (OLS)

 Explanatory and Response Variables are Numeric


 Relationship between the mean of the response variable and the level of
the explanatory variable assumed to be approximately linear (straight
line)
 Model:

Y   0  1 x    ~ N (0,  )
• beta1 > 0  Positive Association
• beta1 < 0  Negative Association
• beta1 = 0  No Association

370/54

Multiple regression

Numeric Response variable (y)


p Numeric predictor variables

Model:

Y = 0 + 1x1 +  + pxp + 

185
20-11-2018

371/54

• Population Model for mean response:

E (Y | x1 ,  x p )   0  1 x1     p x p
• Least Squares Fitted (predicted) equation, minimizing SSE:

2
^ ^ ^ ^
 ^

Y   0   1 x1     p x p SSE    Y  Y 
 

Accuracy of a model
• By Using the following the strength of the linear model can be tested

1) Coefficient of determination

2) Residual Standard error

186
20-11-2018

187
20-11-2018

R – Squared vs Adjusted R - Squared

In multiple regression, adjusted R – squared is better


metric than R – squared asses the goodness of fit of
the model

R – squared always increases if additional variables


are added into model , even if they are not related to
the dependent variable

Regularization
Over fitting can be solved with regularization

Regularization can be done by putting constraints on the coefficients


and variables.

LASSO: Least Absolute Shrinkage and Selection Operator


• Some coefficients can be dropped( i.e become zero)

RIDGE: The coefficients will approach zero, but never dropped

188
20-11-2018

Lasso & Ridge

^ ^ ^ ^
Y   0   1 x1     p x p
2
 ^

• OLS estimation: min SSE    Y  Y 
 
n 2 p
 ^

• LASSO estimation: min SSE    Y  Y      j
i 1   j 1

n 2 p
 ^

min SSE    Y  Y      j
2
• Ridge regression estimation:
i 1   j 1

Assumptions in Regression
Analysis

189
20-11-2018

Assumptions

 The distribution of residuals is normal (at each value of


the dependent variable).
 The variance of the residuals for every set of values for
the independent variable is equal.
 violation is called heteroscedasticity.
 The error term is additive
 no interactions.
 At every value of the dependent variable the expected
(mean) value of the residuals is zero
 No non-linear relationships
379

 The expected correlation between residuals, for any


two cases, is 0.
• The independence assumption (lack of autocorrelation)
 All independent variables are uncorrelated with the
error term.
 No independent variables are a perfect linear function
of other independent variables (no perfect
multicollinearity)
 The mean of the error term is zero.

190
20-11-2018

Assumption 1: The Distribution of


Residuals is Normal at Every Value
of the Dependent Variable

FEMALE MALE
8 6

4 3

2
Frequency

Frequency

0 0
60.0 70.0 80.0 90.0 100.0 110.0 120.0 130.0 140.0 60.0 70.0 80.0 90.0 100.0 110.0 120.0 130.0 140.0

382

191
20-11-2018

Non-Normality
• Skew and Kurtosis
Skew – much easier to deal with
Kurtosis – less serious anyway
• Transform data
removes skew
positive skew – log transform
negative skew - square

383

Assumption 2: The variance of the


residuals for every set of values
for the independent variable is
equal.

192
20-11-2018

Heteroscedasticity

• This assumption is a about heteroscedasticity of the


residuals
• Hetero=different
• Scedastic = scattered
• We don’t want heteroscedasticity
• we want our data to be homoscedastic
• Draw a scatterplot to investigate
385

160

140

120

100

80

60
MALE

40
40 60 80 100 120 140 160

386
FEMALE

193
20-11-2018

Good – no heteroscedasticity

Residual

Predicted Value

387

Bad – heteroscedasticity
Residual

Predicted Value

388

194
20-11-2018

Assumption 3:
The Error Term is Additive

Assumption 4: At every value of the


dependent variable the expected
(mean) value of the residuals is zero

195
20-11-2018

Assumption 5: The expected correlation


between residuals, for any two cases, is 0.

•Result, with line of best fit


90

80

70

60

50

40

30

20
Grade

10
10 20 30 40 50 60 70

Time 392

196
20-11-2018

• Now somewhat different

90

80

70

60

50

40
Question
30
3

20 2
Grade

10 1
10 20 30 40 50 60 70

Time
393

Assumption 6: All independent variables are


uncorrelated with the error term.

197
20-11-2018

Assumption 7: No independent variables are a


perfect linear function of other independent
variables

Assumption 8: The mean of the error


term is zero.

198
20-11-2018

Multicollinearity
• Correlation Matrix

VIF(Variance Inflation Factor)


• VIF(Variance Inflation Factor)

• The better way to assess multi collinearity is to compute the VIF

• If VIF = 1 then Variables are not correlated

• 1< VIF < 5 then the variables are moderately correlated

• VIF > 5 then highly correlated and need to be eliminated from the model

199
20-11-2018

Logistic Regression

Why use logistic regression?

There are many important research topics for which the dependent
variable is "limited.“

For example: voting, morbidity or mortality, and participation data is
not continuous or distributed normally.

Logistic regression is a type of regression analysis where the


dependent variable is a dummy variable: coded 0 (did not vote) or
1(did vote)

200
20-11-2018

Logistic Regression

• Logistic regression is a supervised classification model.


• This allows us to make predictions from labelled data ,if the target
variable is categorical.
• Binary classification
• Examples
1. A customer will default on a loan or not
2. A particular machine will break down in the next month or not
3. Predicting whether an incoming email is spam or not

Categorical Response Variables


Examples:  Non  smoker
Whether or not a person Y 
smokes Binary Response Smoker
Survives
Success of a medical Y 
treatment Dies

Opinion poll responses Agree



Y   Neutral
Ordinal Response Disagree

201
20-11-2018

202
20-11-2018

Y =Logistic
Binary BinaryRegression
response XModel
= Quantitative predictor
p = proportion of 1’s (yes, success) at any X
Equivalent forms of the logistic regression model:
Logit form Probability form
 p 
log    0  1 X
1 p  eo1X1
p o1X1
1e
1
 (o1X1)
1e

Binary Logistic Regression via R

> logitmodel=glm(Gender~Hgt,family=binomial, data=Pulse)


> summary(logitmodel)

Call:
glm(formula = Gender ~ Hgt, family = binomial)

Deviance Residuals:
Min 1Q Median 3Q Max
-2.77443 -0.34870 -0.05375 0.32973 2.37928

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 64.1416 8.3694 7.664 1.81e-14 ***
Hgt -0.9424 0.1227 -7.680 1.60e-14***
---

203
20-11-2018

Call:
glm(formula = Gender ~ Hgt, family = binomial, data = Pulse)

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 64.1416 8.3694 7.664 1.81e-14 ***
Hgt -0.9424 0.1227 -7.680 1.60e-14***
---

e64.140.9424Ht
p
1  e64.14.9424Ht
proportion of females at that Hgt

Example: TMS for Migraines


Transcranial Magnetic Stimulation vs. Placebo
Pain Free? TMS Placebo
YES 39 22
NO 61 78
Total 100 100
39 / 100 39 0.639
PTMS  0.39 oddsTMS    0.639 P   0.39
61 / 100 61 1  0.639
PPlacebo  0.22 22
odds Placebo   0.282
78
0.639
Odds ratio   2.27 Odds are 2.27 times higher of getting
0.282 relief using TMS than placebo

204
20-11-2018

Logistic Regression for TMS data

> lmod=glm(cbind(Yes,No)~Group,family=binomial,data=TMS)
> summary(lmod)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.2657 0.2414 -5.243 1.58e-07 ***
GroupTMS 0.8184 0.3167 2.584 0.00977 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 6.8854 on 1 degrees of freedom


Residual deviance: 0.0000 on 0 degrees of freedom
AIC: 13.701

Note: e0.8184 = 2.27 = odds ratio

Binary Logistic Regression Model

Y = Binary X1,X2,…,X
X = Single
k = Multiple
predictor
response predictors
π = proportion of 1’s (yes,
at anysuccess)
x1, x2, …,
at xany x
k

Equivalent forms of the logistic regression model:


 p 
Logit form log    0  1 X 1   2 X 2     k X k
 1  p 
e  o  1 X 1   2 X 2   k X k
p   o o1 X1 X 1   2 X 2   k X k
Probability form   1 e e
1  e  o  1 X1

1  e (  o  1 X 1   2 X 2   k X k )

205
20-11-2018

Interactions in logistic regression

Consider Survival in an ICU as a function of


SysBP -- BP for short – and Sex
> intermodel=glm(Survive~BP*Sex, family=binomial, data=ICU)
> summary(intermodel)

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.439304 1.021042 -1.410 0.15865
BP 0.022994 0.008325 2.762 0.00575 **
Sex 1.455166 1.525558 0.954 0.34016
BP:Sex -0.013020 0.011965 -1.088 0.27653

Null deviance: 200.16 on 199 degrees of freedom


Residual deviance: 189.99 on 196 degrees of freedom

Rep = red,
1.0

Dem = blue
0.8

Lines are
very close
Prob of voting Yes

0.6

to parallel;
0.4

not a
significant
0.2

interaction
0.0

0 1 10 100 1,000 10,000 1,000,000

Auto industry contributions (lifetime)

206
20-11-2018

Forecasting models
Principles of forecasting
Time series analysis
Smoothing and decomposition methods
ARIMA
GARCH
Holt – winter model
Casual methods
Moving averages
Exponential smoothing

207
20-11-2018

Forecasting

• Predict the next number in the pattern:

a) 3.7, 3.7, 3.7, 3.7, 3.7, ?

b) 2.5, 4.5, 6.5, 8.5, 10.5, ?

c) 5.0, 7.5, 6.0, 4.5, 7.0, 9.5, 8.0, 6.5, ?

Forecasting

• Predict the next number in the pattern:

a) 3.7, 3.7, 3.7, 3.7, 3.7, 3.7

b) 2.5, 4.5, 6.5, 8.5, 10.5, 12.5

9.0
c) 5.0, 7.5, 6.0, 4.5, 7.0, 9.5, 8.0, 6.5,

208
20-11-2018

What Is Forecasting?

• Process of predicting a future event Underlying basis


of all business decisions
Production
Inventory
Personnel
Facilities

Why do we need to forecast?

209
20-11-2018

Importance of Forecasting

Departments throughout the organization depend on


forecasts to formulate and execute their plans.

Finance needs forecasts to project cash flows and capital


requirements.

Human resources need forecasts to anticipate hiring


needs.

Production needs forecasts to plan production levels,


workforce, material requirements, inventories, etc.

 Demand is not the only variable of interest to


forecasters.

 Manufacturers also forecast worker absenteeism,


machine availability, material costs, transportation
and production lead times, etc.

 Besides demand, service providers are also


interested in forecasts of population, of other
demographic variables, of weather, etc.

210
20-11-2018

Types of forecasts

 Demand Forecasts

 Environmental Forecasts

 Technological Forecasts

211
20-11-2018

Timing of Forecasts

 Short-range Forecast

 Medium – range Forecast

 Long – range Forecast

Quantitative Forecasting Methods

Quantitative
Forecasting

Time Series Causal


Models Models

Moving Exponential Trend


Regression
Average Smoothing Models

212
20-11-2018

What is a Time Series?

• Set of evenly spaced numerical data


Obtained by observing response variable at regular time periods

• Forecast based only on past values


Assumes that factors influencing past, present, & future will continue
• Example
Year: 1995 1996 1997 1998 1999
Sales: 78.7 63.5 89.7 93.2 92.1

Time Series Models


Forecaster looks for data patterns as
Data = historic pattern + random variation

Historic pattern to be forecasted:


 Level (long-term average) – data fluctuates around a constant mean
 Trend – data exhibits an increasing or decreasing pattern
 Seasonality – any pattern that regularly repeats itself and is of a constant
length
 Cycle – patterns created by economic fluctuations

Random Variation cannot be predicted

213
20-11-2018

Time Series Patterns

Time Series Components


A time series can be described by models based on the following
components
Tt Trend Component
St Seasonal Component
Ct Cyclical Component
It Irregular Component
Using these components we can define a time series as the sum of its
components or an additive model
X t  Tt  S t  Ct  I t
Alternatively, in other circumstances we might define a time series as
the product of its components or a multiplicative model – often
represented as a logarithmic model
X t  Tt S t Ct I t

214
20-11-2018

Trend Component
• Persistent, overall upward or downward pattern
• Due to population, technology etc.
• Several years duration

Response

Mo., Qtr., Yr. © 1984-1994 T/Maker Co.

Trend Component

• Overall Upward or Downward Movement


• Data Taken Over a Period of Years

Sales

Time

215
20-11-2018

Cyclical Component
• Repeating up & down movements
• Due to interactions of factors influencing economy
• Usually 2-10 years duration

Cycle
Response

Mo., Qtr., Yr.

Cyclical Component

• Upward or Downward Swings


• May Vary in Length
• Usually Lasts 2 - 10 Years

Sales

Time

216
20-11-2018

Seasonal Component
• Regular pattern of up & down fluctuations
• Due to weather, customs etc.
• Occurs within one year

Summer
Response

© 1984-1994 T/Maker Co.

Mo., Qtr.

Seasonal Component

• Upward or Downward Swings


• Regular Patterns
• Observed Within One Year

Sales

Time (Monthly or Quarterly)

217
20-11-2018

Irregular Component

• Erratic, unsystematic, ‘residual’ fluctuations


• Due to random variation or unforeseen events
• Union strike © 1984-1994 T/Maker Co.

• War
• Short duration &
nonrepeating

Moving Average Models

• Simple Moving Average Forecast


t 1

Y i
Ft  E ( Yt )  i t k
k
Weighted Moving Average Forecast
t 1

wY i i
Ft  E ( Yt )  i t k
k

218
20-11-2018

Selecting the Right Forecasting Model

1. The amount & type of available data


 Some methods require more data than others
2. Degree of accuracy required
 Increasing accuracy means more data
3. Length of forecast horizon
 Different models for 3 month vs. 10 years
4. Presence of data patterns
 Lagging will occur when a forecasting model meant
for a level pattern is applied with a trend

Moving Average
[Solution]

Year Sales MA(3) in 1,000


1995 20,000 NA
1996 24,000 (20+24+22)/3 = 22
1997 22,000 (24+22+26)/3 = 24
1998 26,000 (22+26+25)/3 = 24
1999 25,000 NA

219
20-11-2018

Moving Average
Year Response Moving
Ave
Sales
1994 2 NA
8
1995 5 3
6
1996 2 3
4
1997 2 3.67
2
1998 7 5
0
1999 6 NA

94 95 96 97 98 99

Thanks

220
20-11-2018

SSTCS ZG536
ADVANCED STATISTICAL TECHNIQUES
FOR ANALYTICS

BITS Pilani Dr Y V K Ravi Kumar, BITS- Pilani


Pilani|Dubai|Goa|Hyderabad

BITS Pilani
Hyderabad Campus

L- 12: Predictive Analytics _ Time Series Analysis

221
20-11-2018

Forecasting models
Principles of forecasting
Time series analysis
Smoothing and decomposition methods
Casual methods
Moving averages
Exponential smoothing
AR,MA,ARMA & ARIMA Models

Quantitative Forecasting Methods

Quantitative
Forecasting

Time Series Causal


Models Models

Moving Exponential Trend


Regression
Average Smoothing Models

222
20-11-2018

What is a Time Series?

• Set of evenly spaced numerical data


Obtained by observing response variable at regular time periods

• Forecast based only on past values


Assumes that factors influencing past, present, & future will continue
• Example
Year: 1995 1996 1997 1998 1999
Sales: 78.7 63.5 89.7 93.2 92.1

Applications
Retail sales

Spare parts planning

Stock trading

223
20-11-2018

Time series _ components


Trend

Seasonality

Cyclic

Random

Time Series Models


Forecaster looks for data patterns as
Data = historic pattern + random variation

Historic pattern to be forecasted:


 Level (long-term average) – data fluctuates around a constant mean
 Trend – data exhibits an increasing or decreasing pattern
 Seasonality – any pattern that regularly repeats itself and is of a constant
length
 Cycle – patterns created by economic fluctuations

Random Variation cannot be predicted

224
20-11-2018

Time Series Patterns

Box – Jenkins Methodology


1. Condition data and select a model

 identify and account for any trends or seasonality in the time series
 examine the remaining time series and determine a suitable model

2. Estimate the model parameters

3. Assess the model and return to step 1,if necessary

225
20-11-2018

Time Series Components


A time series can be described by models based on the following components
Tt Trend Component
St Seasonal Component
Ct Cyclical Component
It Irregular Component
Using these components we can define a time series as the sum of its
components or an additive model

X t  Tt  St  Ct  I t
Alternatively, in other circumstances we might define a time series as the
product of its components or a multiplicative model – often represented
as a logarithmic model

X t  Tt St Ct I t

Time Series Components

Trend Cyclical

Seasonal Irregular

226
20-11-2018

Smoothing Methods

Moving Average Models

• Simple Moving Average Forecast


t 1

Y i
Ft  E ( Yt )  i t k
k
Weighted Moving Average Forecast
t 1

wY i i
Ft  E ( Yt )  i t k
k

227
20-11-2018

Example(Moving averages)

• Use the following data to compute three year moving average for all
available years. Find the trend and Forecast error
YEAR Saleson (Lakhs) YEAR Saleson (Lakhs)

2008 21 2013 22
2009 22 2014 25
2010 23 2015 26
2011 25 2016 27
2012 24 2017 26

228
20-11-2018

Time Series Models


• Weighted Moving Average:

• All weights must add to 100% or 1.00


e.g. Ct .5, Ct-1 .3, Ct-2 .2 (weights add to 1.0)

Ft 1   C t A t

• Allows emphasizing one period over others; above indicates


more weight on recent data (Ct=.5)

• Differs from the simple moving average that weighs all periods
equally - more responsive to trends

Example(Weighted moving Averages)


Weights Month
3 Last month
2 Two months ago
1 Three months ago

Months 1 2 3 4 5 6 7 8 9 10 11 12

Sales 10 12 13 16 19 23 26 30 28 18 16 14

229
20-11-2018

Weights Example(Weighted moving Averages)


Month
3 Last month
2 Two months ago
1 Three months ago

Months 1 2 3 4 5 6 7 8 9 10 11 12

Sales 10 12 13 16 19 23 26 30 28 18 16 14

230
20-11-2018

Time Series Models


• Exponential Smoothing:
Most frequently used time series method because of ease of use
and minimal amount of data needed
• Need just three pieces of data to start:
 Last period’s forecast (Ft)
 Last periods actual value (At)

Ft 1  αA t  1  α Ft
 Select value of smoothing coefficient, ,between 0 and 1.0
• If no last period forecast is available, average the last few periods
or use naive method
• Higher values may place too much weight on last period’s
random variation

231
20-11-2018

232
20-11-2018

Forecasting Trend
• Basic forecasting models for trends compensate for the lagging that
would otherwise occur
• One model, trend-adjusted exponential smoothing uses a three step
process
• Step 1 - Smoothing the level of the series

S t  αA t  (1  α)(S t 1  Tt 1 )
• Step 2 – Smoothing the trend

Tt  β(S t  S t 1 )  (1  β)Tt 1
• Forecast including the trend

FIT t1  S t  Tt

233
20-11-2018

Measuring Forecasting Accuracy

• Mean Absolute Deviation (MAD)


 measures the total error in a forecast without regard
MAD to sign

 actual  forecast
n

CFE   actual  forecast


• Cumulative Forecast Error (CFE)
 Measures any bias in the forecast
 actual - forecast 
2

MSE 
• Mean Square Error (MSE) n
 Penalizes larger errors
CFE
TS 
• Tracking Signal MAD
 Measures if your model is working

234
20-11-2018

235
20-11-2018

Models
AR Model

MA Model

ARMA Model

ARIMA Model

AR Model(Auto regressive
model)

236
20-11-2018

Moving Average(MA) Model

ARMA model –ARMA(p,q)

237
20-11-2018

Selecting the Right Forecasting Model

1. The amount & type of available data


 Some methods require more data than others
2. Degree of accuracy required
 Increasing accuracy means more data
3. Length of forecast horizon
 Different models for 3 month vs. 10 years
4. Presence of data patterns
 Lagging will occur when a forecasting model meant
for a level pattern is applied with a trend

Thanks

238
20-11-2018

SSTCS ZG536
ADVANCED STATISTICAL TECHNIQUES
FOR ANALYTICS

BITS Pilani Dr Y V K Ravi Kumar, BITS- Pilani


Pilani|Dubai|Goa|Hyderabad

BITS Pilani
Hyderabad Campus

L- 13:Time Series Analysis(cont..)

239
20-11-2018

240
20-11-2018

Measuring Forecasting Accuracy

• Mean Absolute Deviation (MAD)


 measures the total error in a forecast without regardMAD
to sign   actual  forecast
n

CFE   actual  forecast


• Cumulative Forecast Error (CFE)
 Measures any bias in the forecast

 actual - forecast 
2

MSE 
• Mean Square Error (MSE) n
 Penalizes larger errors
CFE
TS 
• Tracking Signal MAD
 Measures if your model is working

241
20-11-2018

242
20-11-2018

Models
AR Model

MA Model

ARMA Model

AR Model(Auto regressive
model)

243
20-11-2018

Moving Average(MA) Model

ARMA model –ARMA(p,q)

244
20-11-2018

Selecting the Right Forecasting Model

1. The amount & type of available data


 Some methods require more data than others
2. Degree of accuracy required
 Increasing accuracy means more data
3. Length of forecast horizon
 Different models for 3 month vs. 10 years
4. Presence of data patterns
 Lagging will occur when a forecasting model meant for
a level pattern is applied with a trend

Case

• Testing the impact of nutrition and exercise on 60 candidates


between age 18 and 50.They are grouped with different
strategies.Now we need to find the most effective strategy
• Group 1 eats only junk food
• Group 2 eats only healthy food
• Group 3 eats junk food &does cardio exercise every other day
• Group 4 eats healthy food & does cardio ……………
• Group 5 eats junk food& does both cardio & strength training every
other day
• Group 6 eats healthy food…….

245
20-11-2018

ANOVA-analysis of variance
• * Significance of difference between two sample means

246
20-11-2018

ANOVA

• Effectiveness of different promotional activities

• Quality of a product produced by different manufacturers in terms of


an attribute

• Yield of crop due to varieties of seeds , fertilisers and quality of soil

Assumptions

• Each population is normally distributed with mean With equal


variances

• Each sample is drawn randomly and independent of other samples

247
20-11-2018

ANOVA summary

Short cut method

248
20-11-2018

Example
• To test the significance of variation in the retail prices of a
commodity in three metro cities,Mumbai,Kolkata and
Delhi, four shops are chosen at random and the prices are
given below

Example
• To test the significance of variation in the retail prices of a
commodity in three metro cities,Mumbai,Kolkata and
Delhi, four shops are chosen at random and the prices are
given below

249
20-11-2018

Short cut method

ANOVA summary

250
20-11-2018

Example

• A study was conducted to investigate the perception of corporate


ethical values among individuals specialising in marketing. Using 0.05
level of significance and the data given below, test for significant
differences in perception among three groups.( higher scores indicate
higher ethical values)

251
20-11-2018

252
20-11-2018

253
20-11-2018

Example

Two way ANOVA

254
20-11-2018

255
20-11-2018

256
20-11-2018

257
20-11-2018

258
20-11-2018

259
20-11-2018

260
20-11-2018

Thanks

SS ZG536
ADVANCED STATISTICAL TECHNIQUES
FOR ANALYTICS

BITS Pilani Dr Y V K Ravi Kumar, BITS- Pilani


Pilani|Dubai|Goa|Hyderabad

261
20-11-2018

BITS Pilani
Hyderabad Campus

L- 14:Appled Multivariate Analytics

Agenda

Multivaraite normal distribution


Preliminaries …Eigen values and vectors

Principal component analysis

262
20-11-2018

263
20-11-2018

Preliminaries
Standard Deviation is a measure of the spread of the
data
Variance – measure of the deviation from the mean for
points in one dimension e.g. heights
Covariance as a measure of how much each of the
dimensions vary from the mean with respect to each
other.
Covariance is measured between 2 dimensions to see if
there is a relationship between the 2 dimensions e.g.
number of hours studied & marks obtained
The covariance between one dimension and itself is the
variance

264
20-11-2018

Covariance Matrix

Representing Covariance between dimensions as a matrix


e.g.
• cov(x,x) cov(x,y) cov(x,z)
• C= cov(y,x) cov(y,y) cov(y,z)
• cov(z,x) cov(z,y) cov(z,z)

Diagonal is the variances of x, y and z


cov(x,y) = cov(y,x) hence matrix is symmetrical about the diagonal
N-dimensional data will result in nxn covariance matrix

A positive value of covariance indicates both dimensions increase or


decrease together

A negative value indicates while one increases the other decreases, or


vice-versa

If covariance is zero: the two dimensions are independent of each other .

265
20-11-2018

Transformation matrices
• Consider:
2 3 3 12 3
2 1 x 2 = 8 =4x 2

• Square transformation matrix transforms (3,2) from


its original location. Now if we were to take a multiple
of (3,2)
3 6
2 x =
2 4
2 3 6 24 6
2 1 x 4 = 16 = 4 x 4

eigenvalue problem

 The eigenvalue problem is any problem having the


following form:
 A.X=λ.X
 A: n x n matrix
 X: n x 1 non-zero vector
 λ: scalar
 Any value of λ for which this equation has a solution is
called the eigenvalue of A and vector v which corresponds
to this value is called the eigenvector of A.

266
20-11-2018

eigenvalue problem

2 3 3 12 3
= x
4 2
2 1 x 2 = 8
A . v = λ. v
Therefore, (3,2) is an eigenvector of the square matrix
A and 4 is an eigenvalue of A

Given matrix A, how can we calculate the


eigenvector and eigenvalues for A?

267
20-11-2018

268
20-11-2018

Data Presentation

• Blood and urine measurements (wet chemistry) from 65 people (33 alcoholics, 32
non-alcoholics). 1000
900
• Matrix Format 800
700
600

Value
500
400
H-WBC H-RBC H-Hgb H-Hct H-MCV H-MCH H-MCHC 300
A1 8.0000 4.8200 14.1000 41.0000 85.0000 29.0000 34.0000 200
A2 7.3000 5.0200 14.7000 43.0000 86.0000 29.0000 34.0000
A3 4.3000 4.4800 14.1000 41.0000 91.0000 32.0000 35.0000 100
A4 7.5000 4.4700 14.9000 45.0000 101.0000 33.0000 33.0000 00 10 20 30 40 50 60
A5 7.3000 5.5200 15.4000 46.0000 84.0000 28.0000 33.0000
A6 6.9000 4.8600 16.0000 47.0000 97.0000 33.0000 34.0000
measurement
Measurement
A7 7.8000 4.6800 14.7000 43.0000 92.0000 31.0000 34.0000
A8 8.6000 4.8200 15.8000 42.0000 88.0000 33.0000 37.0000
A9 5.1000 4.7100 14.0000 43.0000 92.0000 30.0000 32.0000

Univariate
Bivariate
550
1.8 500
1.6 450
1.4 400
C-LDH

1.2 350
H-Bands

1 300
0.8 250
0.6 200
150
0.4
100
0.2
0
Trivariate 50
0 50 150 250 350 450
0 10 20 30 40 50 60 70 C-Triglycerides
Person
4

3
M-EPI

0
600
400 500
400
200 300
C-LDH 00
100
200
C-Triglycerides

269
20-11-2018

Applications
 Face Recognition
 Image Compression
 Gene Expression Analysis
 Data Reduction
 Data Classification
 Trend Analysis
 Factor Analysis
 Noise Reduction

Principal Component Analysis

• In real world data analysis tasks we analyze complex data i.e. multi
dimensional data. We plot the data and find various patterns in it or use it
to train some machine learning models. One way to think about
dimensions is that suppose you have an data point x , if we consider this
data point as a physical object then dimensions are merely a basis of view,
like where is the data located when it is observed from horizontal axis or
vertical axis.

270
20-11-2018

• As the dimensions of data increases, the difficulty to visualize it and


perform computations on it also increases. So, how to reduce the
dimensions of a data-
* Remove the redundant dimensions
* Only keep the most important dimensions

• Now lets think about the requirement of data analysis.


Since we try to find the patterns among the data sets so we want the data
to be spread out across each dimension. Also, we want the dimensions to
be independent. Such that if data has high covariance when represented in
some n number of dimensions then we replace those dimensions with
linear combination of those n dimensions. Now that data will only be
dependent on linear combination of those related n dimensions. (related =
have high covariance)

271
20-11-2018

 It is a linear transformation that chooses a new


coordinate system for the data set such that
greatest variance by any projection of the data set comes to
lie on the first axis (then called the first principal component),
the second greatest variance on the second axis, and so on.

 PCA can be used for reducing dimensionality by


eliminating the later principal components.

• what does Principal Component Analysis (PCA) do?

• PCA finds a new set of dimensions (or a set of basis of views) such that all
the dimensions are orthogonal (and hence linearly independent) and
ranked according to the variance of data along them. It means more
important principle
axis occurs first. (more important = more variance/more spread out data)

272
20-11-2018

• How does PCA work


Calculate the covariance matrix X of data points.
Calculate eigen vectors and corresponding eigen values.
Sort the eigen vectors according to their eigen values in decreasing
order.
Choose first k eigen vectors and that will be the new k dimensions.
Transform the original n dimensional data points into k dimensions.

273
20-11-2018

274
20-11-2018

275
20-11-2018

276
20-11-2018

277
20-11-2018

Principal Components

• All principal components 30

(PCs) start at the origin of 25


Wavelength 2

the ordinate axes. 20


PC 1
15
• First PC is direction of 10
maximum variance from 5
origin
0 0 5 10 15 20 25 30
• Subsequent PCs are Wavelength 1

orthogonal to 1st PC and 30


describe maximum residual 25
Wavelength 2

variance 20
PC 2
15

10

0 0 5 10 15 20 25 30
Wavelength 1

278
20-11-2018

Principal Components

• All principal components 30

(PCs) start at the origin of 25

Wavelength 2
the ordinate axes. 20
PC 1
15
• First PC is direction of 10
maximum variance from 5
origin
0 0 5 10 15 20 25 30
• Subsequent PCs are Wavelength 1

orthogonal to 1st PC and 30


describe maximum residual 25

Wavelength 2
variance 20
PC 2
15

10

0 0 5 10 15 20 25 30
Wavelength 1

An Example Mean1=24.1
Mean2=53.8
X1 X2 X1' X2' 100
90
80
70
60
19 63 -5.1 9.25 50 Series1
40
30
20
39 74 14.9 20.25 10
0
0 10 20 30 40 50

30 87 5.9 33.25
40

30
30 23 5.9 -30.75 20

10

0 Series1
15 35 -9.1 -18.75 -15 -10 -5
-10
0 5 10 15 20

-20

15 43 -9.1 -10.75 -30

-40

15 32 -9.1 -21.75
558

30 73 5.9 19.25

279
20-11-2018

Covariance Matrix
75 106
• C=
106 482

• Using MATLAB, we find out:


• Eigenvectors:
• e1=(-0.98,-0.21), 1=51.8
• e2=(0.21,-0.98), 2=560.2
• Thus the second eigenvector is more important!

If we only keep one dimension: e2


0.5
yi
0.4
-10.14
0.3

• We keep the dimension 0.2


0.1
-16.72
-31.35
of e2=(0.21,-0.98) 0
31.374
-40 -20 -0.1 0 20 40
• We can obtain the final -0.2 16.464

data as -0.3
-0.4
8.624
19.404
-0.5
-17.63

x 
yi  0.21  0.98 i1   0.21* xi1  0.98 * xi 2
 xi 2 

560

280
20-11-2018

281
20-11-2018

563

282
20-11-2018

Thanks

SS ZG536
ADVANCED STATISTICAL TECHNIQUES
FOR ANALYTICS

BITS Pilani Dr Y V K Ravi Kumar, BITS- Pilani


Pilani|Dubai|Goa|Hyderabad

283
20-11-2018

BITS Pilani
Hyderabad Campus

L- 15:Appled Multivariate Analytics &


Revision

284
20-11-2018

285
20-11-2018

286
20-11-2018

287
20-11-2018

288
20-11-2018

289
20-11-2018

290
20-11-2018

291
20-11-2018

Thanks

292

Anda mungkin juga menyukai