Sahadeb Sarkar
IIM Calcutta
1
Terminology
Discrete data: relates to discrete outcomes, Discrete
distributions
• Categorical Data: Discrete data with finitely many
possible values on a nominal scale (e.g., the state a
person lives in, the political party one might vote for,
the blood type of a patient; Multinomial, Bernoulli
distribution). Central tendency given by its mode
• Count data (non-negative integer valued) : Records
the frequency of an event, may not have an upper
bound (e.g., Poisson, Binomial, Negative Binomial
distributions). It arise out of counting and not ranking.
2
Discrete Data Types
• Dichotomous data: can take only two values
such as “Yes” and “No”
• Nonordered polytomous data: five different
detergents
• Ordered polytomous data: grades A, B , C, D;
“old”, “middle-aged”, “young” employees
4
Derivation Tools in CDA, Text p.18
Slutsky’s Theorem:
𝑑 𝑑
Suppose 𝑋𝑛 𝑋 and 𝑌𝑛 𝑐, constant. Then
𝑑
1. 𝑋𝑛 + 𝑌𝑛 𝑋+𝑐
𝑑
2. 𝑌𝑛 𝑋𝑛 𝑐𝑋
𝑑
3. If c0, 𝑋𝑛 /𝑌𝑛 𝑋/𝑐
5
Inference for One-way Frequency
Table
6
Binomial Distribution
(leading to One-Way Frequecy Table)
Suppose Y is a random variable with 2 possible outcome
categories c1,c2 with probabilities π1, π2=(1 π1).
Suppose there are n observations on Y ; we can summarize
the responses through the vector of observed frequencies
(random variables), (X1, X2=nX1).
7
Example 1.1, p. 6, Text
8
Metabolic syndrome
(https://en.wikipedia.org/wiki/Metabolic_syndrome)
9
Example 1.1 (Binary Case), p. 37, Text
• Test if the prevalence of Metabolic Syndrome is 40% in this
study population
48
π− 𝜋0 93
− 0.4
𝑍= = = 2.286;
𝜋0 ×(1−𝜋0 )/𝑛 0.4×0.6/93
P-value = 2(2.2.86)=0.0223
𝑟+𝑘 1
• 𝑃 𝑋 = 𝑘 = 𝑟 𝑘! 𝑝𝑟 (1 − 𝑝)𝑘 . Put 𝛼 = r & =rp/(1-p) for reparameterization
𝑘+𝑟−1 −𝑟
• Note: =(−1)𝑘
𝑘 𝑘
11
Negative Binomial Distribution (p. 41)
𝑟+𝑘 𝑟
• 𝑃 𝑋=𝑘 = 𝑝 (1 − 𝑝)𝑘 ……… (1a)
𝑟 𝑘!
• E(X)= rp/(1-p), V(X) = rp/(1-p)2 > E(X) …….. (1b)
• Extension through reparameterization:
1
= (> 0), =rp/(1-p) in (1)
𝑟
𝑟
1
+𝑘 µ 𝑘
1
• Then, 𝑃 𝑋 = 𝑘 = α α
……… (2a)
1
α
1
𝑘!
α
+µ
1
α
+µ
• E(X)= ; V(X) = + 2 ……………………(2b)
12
Hypergeometric Distribution
• Randomly sample n elements from a finite (dichotomous)
population of size N, without replacement, having K
“success”-type and (N-K) “failure”-type elements. (e.g.
Pass/Fail or Employed/ Unemployed).
• The probability of a success changes on each draw, as each
draw decreases the population.
• X = number of successes in the sample. Then X has the
hypergeometric distribution:
13
Multivariate Hypergeometric Distribution
• Randomly sample n elements from a finite (polytomous)
population of size N, without replacement, having K1, K2, ..., Kc
elements of types 1, 2, …, c.
• Xi= number of i-th type elements in the sample, i=1,…,c. Then
X has multivariate hypergeometric distribution:
𝑐 𝐾𝑖
𝑖=1 𝑥
𝑖
𝑃(𝑋𝑖 = 𝑥𝑖 , 𝑖 = 1, … , 𝑐) =
𝑁
𝑛
• E(Xi)=n(Ki/N),
• V(Xi) = {n(Ki/N)(1 – (Ki/N) )}×[(N-n)/(N-1)]
• Cov(Xi, Xj) = {n(Ki/N)(Kj/N) }×[(N-n)/(N-1)]
14
Inference for Multinomial Case
15
Multinomial Distribution
(may lead to One-Way, Two-Way, … Frequecy Table)
Suppose Y is a random variable with k possible
outcome categories c1,c2,…,ck with probabilities π1,
π2,…, πk=(1- π1-…- πk-1).
Suppose there are n observations on Y; we can
summarize the responses through the vector of
observed frequencies (random variables), X = (X1,
X2,…, Xk), where Xk=n- X1-…- Xk-1.
16
Multinomial Distribution
(may lead to One-Way, Two-Way, … Frequecy Table)
X = (X1, X2,…, Xk) has a multinomial distribution
with parameters n and (π1, π2,…, πk ).
𝑛! 𝑥 𝑥
P(X1=x1, …, Xk=xk) = 𝜋1 1 … 𝜋𝑘 𝑘
𝑥1 ! 𝑥2 !… 𝑥𝑘 !
𝐸(𝑋𝑖 )=n𝜋𝑖 ,
17
Example 1.1, p. 6, Text
One-Way Frequency Table for Metabolic Syndrome Study
MS
Present Absent Total
48 45 93
19
Example: Pearson’s χ2 Test
When we are trying to do a test of hypothesis to determine
whether a die is a fair die, it is a simple hypothesis.
Suppose we roll it 120 times and the summarized data
are as follows:
In this case, k=6 and n=120. H0: πi = 1/6 (= π0i), i=1,2,…,6
20
Pearson’s Chi-Square (contd.)
The hypothesis presented in Equation (1) is an
example of a simple hypothesis. (Simple in the sense
that the hypothesis completely specifies the true
distribution).
21
Multinomial Example, p.38,Text
Multinomial Case:
Depression Diagnosis in the DOS Study
Major Dep Minor Dep No Dep Total
128 136 481 745
DOS = Depression Of Seniors
24
Poisson Distribution Case
Suppose Y is a random variable taking integer values y=0,
−
𝑦
1, 2,…, with probability P(Y=y)=𝑒
𝑦!
Suppose there are n observations on Y; we can summarize
the observations through the vector of observed
frequencies for value-categories 0, 1, 2, …
27
Intentionally Kept Blank
28
Sampling Schemes
Leading to (2×2) Contingency Tables
29
Layout of the 2×2 table
Column factor
(‘Response’)
Level 1 Level 2
Level 1 n11 n12 R1=n1+
Row Fact Row
(‘Explanatory’) Total
Level 2 n21 n22 R2=n2+
Grand
Total
Column
Marginal
Total 30
Totals
Sampling schemes
leading to 2×2 contingency tables
31
Poisson Sampling
• Poisson Sampling (French mathematician Simeon
Denis Poisson): Here a fixed amount of time (or space,
volume, money etc.) is employed to collect a random
sample from a single population and each member of
the population falls into one of the four cells in the
2×2 table.
• In the CVD Death example 1 (next slide), researchers
spent a certain amount of time sampling the health
records of 3112 women who were categorized as
obese and non obese against died of CVD or not. In
this case, none of the marginal totals or the sample
size was known in advance.
32
Example-1: Cardio-Vascular Deaths and Obesity among
women in American Samoa
34
Prospective Product Binomial Sampling
• Prospective Product Binomial Sampling
(“cohort” study): First identify explanatory variable(s)
that explain “causation” . Population is categorized according
to levels of explanatory variable and random samples are then
selected from each explanatory group.
If separate lists of obese and non obese American Samoan
women were available in Example 1, a random sample of
2500 could have been selected from each. The term Binomial
refers to the dichotomy of the explanatory variable. The term
Product refers to the fact that sampling is done from more
than one population independently.
35
Example-2: Vitamin-C versus Common Cold
Outcome
36
Retrospective Product Binomial Sampling
37
Example 3: Smoking versus Lung Cancer
Outcome
CANCER CONTROL TOTAL
SMOKER 83 72 155
NON-SMOKER 3 14 17
TOTAL 86 86 172
38
Retrospective Product Binomial
Sampling
• We cannot test for the equality of proportions along the
explanatory variable if the sampling scheme is
retrospective.
• We only get odds ratio from a case control study which is
an inferior measure of strength of association as
compared to relative risk.
• Why do retrospective sampling at all, then?
Compared to prospective cohort studies they tend to be less
costly and shorter in duration. Case-control studies are often
used in the study of rare diseases, or as a preliminary study
where little is known about the association between possible
risk factor and disease of interest.
39
Retrospective Product Binomial
Sampling (Continued)
• If the probabilities of the “Yes” response are very
small, it may need a huge sample size to get any
“Yes” response at all through prospective sampling.
• Retrospective sampling guarantees that we have at
least a reasonable number of “Yes” responses for
each level of explanatory variable.
• In the smoking versus lung cancer study (Example 3),
retrospective sampling may be accomplished without
having to follow the subjects throughout their
lifetime.
40
Prospective
Subjects selected
according to the levels
of the explanatory
variable
Explanatory Response
Variable Variable
Retrospective
Subjects selected
according to
levels of the
Response variable
41
Layout of the 2×2 table
Column factor
(Response)
Level 1 Level 2
Level 1 n11 n12 R1=n1+
Row Factor Row
(Explanatory) Total
Level 2 n21 n22 R2=n2+
Grand
Total
Column
Marginal
Total 42
Totals
Estimated Proportions
• Proportion of “Yes” (Level 1) response in the
first level of the explanatory variable is
ˆ1 n11 / R1
ˆ 2 n21 / R2
43
Assumption
• We will assume that the frequencies of all the entries
in the 2x2 table are greater than 5.
• This ensures that the “asymptotic tests” performed
on the 2x2 tables are reasonably accurate.
(“asymptotic” means ‘appropriate in large samples’)
• If all the entries in the 2x2 table are not greater than
5, one may try Fisher’s Exact test.
44
Example-1: Cardio-Vascular Deaths and Obesity among
women in American Samoa
48
Calculations
51
Example 3: Smoking versus Lung Cancer
Outcome
CANCER CONTROL TOTAL
SMOKER 83 72 155
NON-SMOKER 3 14 17
TOTAL 86 86 172
53
Intentionally Kept Blank
54
Exact Test: Independence of Two Attributes
• Example: Data collected on a random sample of
people attending preview of a movie
• Did the movie have equal appeal to the young and old
or whether it is more liked by the young.
• Test H0: two attributes are independent against Ha:
they are positively associated.
55
Exact Test: Independence of Two Attributes
• To test if two qualitative characters (attributes) A and B
are independent. Let P(A=Ai, B=Bj) = pij, i=1,…,k, j=1,…,l.
𝑙
• Let 𝑃 𝐴 = 𝐴𝑖 = 𝑗=1 𝑝𝑖𝑗 = 𝑝𝑖0 ; Let 𝑃 𝐵 = 𝐵𝑗 =
𝑘
𝑖=1 𝑝𝑖𝑗 = 𝑝0𝑗
• To test H0: 𝑝𝑖𝑗 = 𝑝𝑖0 𝑝0𝑗 , for all i,j.
• nij= observed freq for cell AiBj. The marginal frequency
of Ai and Bj are 𝑛𝑖0 = 𝑙𝑗=1 𝑛𝑖𝑗 and 𝑛0𝑗 = 𝑘𝑖=1 𝑛𝑖𝑗
56
Exact (Conditional) Test: Independence of
Two Attributes
• To test if two qualitative characters (attributes) A and B are
independent. Let P(A=Ai, B=Bj) = pij, i=1,…,k, j=1,…,l.
• To test H0: 𝑝𝑖𝑗 = 𝑝𝑖0 𝑝0𝑗 , for all i,j.
• nij= observed freq for cell AiBj. The marginal frequency of Ai and Bj
are 𝑛𝑖0 = 𝑙𝑗=1 𝑛𝑖𝑗 and 𝑛0𝑗 = 𝑘𝑖=1 𝑛𝑖𝑗
• Under H0, conditional distribution of {nij, all I,j} given current
sample marginals {𝑛𝑖0 , 𝑛0𝑗 , all i, j} has the (multivariate
hypergeometric) pmf
57
Exact (Conditional) Test: Independence of
Two Attributes
• Add up probabilities, under H0, of the given table and of
those indicating more extreme positive association (and
having the same marginals). These tables and
corresponding probabilities are:
59
Homogeneity versus Independence
Hypotheses
• Hypothesis of homogeneity
H0: π1 = π2
Not done in Retrospective Product Binomial Sampling
• Hypothesis of Independence
(At this stage qualitatively expressed)
Done only in Poisson or Multinomial Sampling
60
Homogeneity versus Independence
Hypotheses (contd.)
• The hypothesis of independence is used to
investigate an association between row and column
factors without specifying one of them as a
response. Although the hypotheses may be
expressed in terms of parameters, it is more
convenient to use the qualitative wording:
• H0: The row categorization is independent of the
column categorization
61
Sampling scheme versus Hypotheses
Sampling scheme Marginal Total fixed in Usual Hypothesis: Usual Hypothesis:
advance Independence Homogeneity
Poisson None YES YES
Multinomial Grand Total (Sample YES YES
size)
Prospective Row (explanatory) YES
total
Retrospective Column (Response) YES
total
62
Inference for 22 Table
(Sec 2.2, Text)
Measures of Association:
• (i) Relative Risk (or Incidence Rate Ratio or
‘Probability Ratio’)
• (ii) Difference Between Proportions,
• (iii) Odds Ratio
63
Is “Tutoring” Helpful in a Business Stat Course?
𝑎/𝑏 𝑎𝑑
Estimated Odds Ratio = =
𝑐/𝑑 𝑏𝑐
64
Relative Risk vs Odds Ratio
65
Layout of the 2×2 table
Column factor
(Response)
Level 1 Level 2
Level 1 n11 n12 R1=n1+
Row Factor Row
(Explanatory) Total
Level 2 n21 n22 R2=n2+
Grand
Total
Column
Marginal
Total 66
Totals
(i) Relative Risk (RR) or Incidence Rate Ratio (IRR)
(Text, p.53)
67
Confidence Intervals for Relative Risk (RR)
(Text, p.54)
𝑛11
π1 𝑛1+
• Estimate of RR ( ): 𝑅𝑅 = 𝑛21
π2
𝑛2+
• Estimate of “asymptotic” variance of loge(RR):
1− π1 1− π2
• 𝑉𝑎𝑟(𝑙𝑜𝑔𝑒 𝑅𝑅 = + [𝑛22 should be 𝑛21 ]
𝑛11 𝑛22
69
Confidence Interval for π1 π2
𝑛11 𝑛21
• Estimate of π1 π2: 𝜋1 𝜋2 = −
𝑛1+ 𝑛1+
𝜋1 (1− 𝜋1 ) 𝜋2 (1− 𝜋2 )
• 𝑉𝑎𝑟(𝜋1 − 𝜋2 ) = +
𝑛1+ 𝑛2+
𝜋1 (1− 𝜋1) 𝜋2 (1− 𝜋2 )
• s.e.( 𝜋1 − 𝜋2 ) = +
𝑛1+ 𝑛2+
• 100(1-)% CI for π1 π2:
𝜋1 (1− 𝜋1) 𝜋2 (1− 𝜋2)
(𝜋1 − 𝜋2 ) − 𝑍/2 + to
𝑛1+ 𝑛2+
𝜋1 (1− 𝜋1) 𝜋2 (1− 𝜋2)
(𝜋1 − 𝜋2 ) + 𝑍/2 +
𝑛1+ 𝑛2+
70
Testing H0: π1 π2 = 0
𝑛11 𝑛21
• Estimate of π1 π2: 𝜋1 𝜋2 = −
𝑛1+ 𝑛1+
𝑛11 +𝑛21
• =
𝑛1+ +𝑛2+
1 1
• 𝑉𝑎𝑟() = (1 − )( + )
𝑛1+ n2+
𝜋1 −𝜋2
• Test statistic, Z = is asymptotically
(1−)(𝑛 + n ) 1 1
1+ 2+
N(0,1), under H0, if n1+, n2+ are ‘large’
71
Exact Test of Two Proportions
• Example. Compare two methods of treatment of an allergy.
Method 1(A) uses 15 patients and Method 2(B) uses 14. Is
mehod 2 better than method 1 ?
• Here n1+=15, n2+ = 14, n11=6, n21 = 11 and Ha: p1 < p2. Here
sample sizes are not large, hence asymptotic tests are not
applicable. Need to use exact tests.
72
Exact (Conditional) Test of Two Proportions
(GGD, Fundamentals, Vol 1)
74
Exact Test of Two Proportions
75
Example: Exact Test of Two Proportions
• Example. Compare two methods of treatment of an allergy.
Method 1(A) uses 15 patients and Method 2(B) uses 14. Is
mehod 2 better than method 1 ?
76
(iii) Odds, and Odds Ratio
Odds of an outcome: Let be the population
proportion of “YES” outcomes. Then the
corresponding odds is given by,
/(1 )
The sample odds is given by,
ˆ ˆ /(1 ˆ )
77
(iii) Odds, and Odds Ratio (contd)
i = population proportion of “YES” response for
Group X=i. Then the odds of “YES” happening is given
𝜋𝑖
by: 𝜔𝑖 = , 0 ≤ 𝜔𝑖 < ∞.
1−𝜋𝑖
The sample odds of “YES” in Group i, give the
𝜋𝑖
estimate: 𝜔𝑖 = .
1−𝜋𝑖
Odds Ratio of “YES” response in Group 1 to that in
Group 2:
𝜔1 𝜋1 (1 − 𝜋2 )
𝜑= = ×
𝜔2 (1 − 𝜋1 ) 𝜋2
78
Odds versus Probabilities
Given the probability of a “YES” outcome, the
corresponding odds is given by,
/(1 )
Similarly, given the odds ω of a “YES” response, the
corresponding probability is given by
/(1 )
79
Odds versus Probabilities (contd.)
Interpretation: An event with chance of
occurrence 0.95 means the event has odds of 19
to 1 in favour of its occurrence while an event with
chances 0.05 has the same odds 19 to 1, against it.
80
Relation between Probability, Odds & Logit
Log(Odds)
Probability Odds =Logit
0 0 NC Odds maps probability
0.1 0.11 -2.20 from [0,1] to [0,)
0.2 0.25 -1.39 asymmetrically,
0.3 0.43 -0.85 while Logit maps it to
0.4 0.67 -0.41 (-, ) symmetrically
0.5 1.00 0.00
0.6 1.50 0.41
0.7 2.33 0.85
0.8 4.00 1.39
0.9 9.00 2.20
1 NC NC
81
Example: NFL Football
TEAM ODDS against (Prob of Win)
San Francisco 49ers Even (1/2)
Denver Broncos 5 to 2 (2/7)
New York Giants 3 to 1 (1/4)
Cleveland Browns 9 to 2 (2/11)
Los Angeles Rams 5 to 1 (1/6)
Minnesota Vikings 6 to 1 (1/7)
Buffalo Bills 8 to 1 (1/9)
Pittsburgh Steelers 10 to 1 (1/11)
83
The Following are Equivalent
• The proportions π1, π2 are equal.
84
Confidence Intervals for Odds Ratio (OR)
(Text, p.52)
𝑛11 𝑛22
• Estimate of OR : 𝑂𝑅 =
𝑛21 𝑛12
• Estimate of “asymptotic” variance of loge(OR):
1 1 1 1
• 𝑉𝑎𝑟(𝑙𝑜𝑔𝑒 𝑂𝑅 = + + +
𝑛11 𝑛22 𝑛21 𝑛12
• 100(1-)% CI for OR:
𝑙𝑜𝑔𝑒 𝑂𝑅 exp − 𝑍/2 𝑉𝑎𝑟(𝑙𝑜𝑔𝑒 𝑂𝑅 to
• Alternatively,
H0 : ω1 = ω2, or H0: φ = 1, or H0: log(φ) = 0
86
Odds Ratio (Contd.)
Interpretation:
If the odds ratio =1 /2 equals to 4, then 1=42.
This means that the odds of a “yes” outcome in the
first group is four times the odds of a “yes” outcome in
the second group.
87
Advantages of Odds Ratio over
Risk Ratio or Difference of Proportions
1. Estimate of Odds Ratio (OR) remains invariant over
the sampling design (i.e., works even in case of
retrospective sampling), and it is given by
𝑂𝑅=(n11n22)/(n12n21), since
𝑷(𝒀=𝟏|𝑿=𝟏) 𝑃(𝑌=1,𝑋=1)
𝑷(𝒀=𝟎|𝑿=𝟏) 𝑃(𝑌=0,𝑋=1) 𝑃(𝑌=1,𝑋=1)𝑃(𝑌=0,𝑋=0)
𝑷(𝒀=𝟏|𝑿=𝟎) = 𝑃(𝑌=1,𝑋=0) =
𝑃 𝑌=0,𝑋=1 𝑃(𝑌=1,𝑋=0)
𝑷(𝒀=𝟎|𝑿=𝟎) 𝑃(𝑌=0,𝑋=0)
𝑃 𝑋 =1 𝑌 =1 𝑃(𝑌 = 1) 𝑷 𝑿 = 𝟏 𝒀=𝟏
𝑃 𝑋 =1 𝑌 =0 𝑃(𝑌 = 0) 𝑷 𝑿 = 𝟎 𝒀=𝟏
= =
𝑃 𝑋 =0 𝑌 =1 𝑃(𝑌 = 1) 𝑷 𝑿 = 𝟏 𝒀=𝟎
𝑃 𝑋 =0 𝑌 =0 𝑃(𝑌 = 0) 𝑷 𝑿 = 𝟎 𝒀=𝟎
2. Comparison of odds extends nicely to regression
analysis when response (Y) is a categorical variable. 88
Computation of odds ratio in a 2x2 table
Cold No Cold
Placebo 335 76
Calculate odds ratio by dividing the product of the diagonal elements of the
table with that of the off diagonal element of the table.
The above result indicates that the odds of getting cold on a placebo
treatment is 1.53 times larger than that of getting cold on vitamin C
treatment.
89
Example: Computation of odds ratio
Cancer Control
Smoker 83 72
Non-Smoker 3 14
Calculate odds ratio by dividing the product of the diagonal elements of the
table with that of the off diagonal element of the table.
The above result indicates that the odds of getting cancer for a smoker is
5.38 times larger than that of getting cancer for a non-smoker.
90
Sampling Distribution of the
Loge of Estimated Odds Ratio
91
Two Formulae of Standard Errors for the
Loge of Odds Ratio
• The estimated variance is obtained by substituting
sample quantities for unknowns in the variance
formula of the estimator. The sample quantities used
to replace the unknowns depend on the usage.
– For a confidence interval, π1 and π2 are replaced by their
individual sample estimates.
– For the test of hypothesis, they are replaced by their
pooled sample estimate from the combined sample.
92
• Testing: The Odds are equal then odds ratio=1
ln(odds ratio) = 0.
– If the sample sizes are large, resulting P-value for testing
ln(1/2) = 0, is nearly identical to that obtained with
the Z-test for equal proportions (π1 = π2).
93
Testing Equality of proportions π1 and π2,
i.e., log(OR)=0 :
• To test the equality of odds of “YES” 1 and 2 in two
Groups ( H0: 1/ 2 =1) , one estimates the common
proportion from combined sample and compute
standard error based on it.
• Estimated st. dev. for constructing Test Statistic:
1 1
s.e.(𝑙𝑛( 𝜔1 /𝜔2 )) = 𝑛1+ 𝜋𝑐 (1− 𝜋𝑐 )
+
𝑛2+ 𝜋𝑐 (1− 𝜋𝑐 )
(𝑛11 +𝑛21 )
𝑤ℎ𝑒𝑟𝑒 𝜋𝑐 =
(𝑛1+ +𝑛2+ )
𝑙𝑛( 𝜔1 /𝜔2 )
• Test statistic=
s.e.( 𝑙𝑛( 𝜔1 /𝜔2 )) ~ N(0,1)
Reject H0 if |Test statistic value| > Z/2
94
Example: Cardio-Vascular Deaths and Obesity
among women in American Samoa
1 1 1 1
= + + + (short-cut formula, p.52, text)
𝒏𝟏𝟏 𝒏𝟏𝟐 𝒏𝟐𝟏 𝒏𝟐𝟐
4. 95% interval for the odds ratio exp(0.093) to exp(0.761); or 1.10 to 2.14
Conclusion: The odds of a cold for the placebo group are estimated to be 1.53
times the odds of a cold for the vitamin C group (approximate 95% CI: 1.10 to 2.14)
99
Intentionally Kept Blank
100
Test for Marginal Homogeneity
(McNemar’s Test, Text, p.55-56)
103
Cochran-Mantel-Haenszel Test for no row by
column association in any of the 22 Tables
(pp. 94-101)
104
Cochran-Mantel-Haenszel Test (pp. 94-101)
𝑞 (ℎ) (ℎ) 2
𝑛11 −𝑚11
𝑄𝐶𝑀𝐻 =
ℎ=1
𝑞 (ℎ) , Text, p. 100: QCMH = (18-16.4 +
𝑣11
ℎ=1 32 – 28.8)2/(2.3855 + 3.7236) =
Here, h=1,2
3.7714; P-value = 0.052 with
(ℎ) 𝑛 𝑛
(ℎ) (ℎ)
𝑛2+ 𝑛+2
(ℎ) (ℎ) 12 dist
𝑤ℎ𝑒𝑟𝑒 𝑣11 = 1+(ℎ) +1
2 𝑛(ℎ) − 1
𝑛
105
Intentionally Kept Blank
106
Cochran-Armitage Trend Test
(See Text, p.60-61)
Binary categorical (row) variable X, ordered (column)
response variable Y.
108
Odds & Odds Ratio:
Extension to Regression Analysis
109
Logistic Regression/ Binary Logit Model
110
Problems with Usual Regression
113
Comparison to Classical Linear Regression
115
Categorical Variable Y as a Function of
Continuous Latent Variable Z
116
Proportional Odds Model for Ordinal Data
• Suppose Observed variable Y = j iff αj-1 ≤ Z ≤ αj, j=1, …, J,
where latent variable Z depends on the covariates through a
linear or non-linear model. If linear, then Z = βTx + , where
has CDF F. Then,
• 𝛾𝑗 = 𝑃 𝑌 ≤ 𝑗 = 𝑃 𝑍 ≤ α𝑗 = 𝐹(α𝑗 βTx), j=1, …, J-1
• 𝐹 −1 (𝛾𝑗 ) = α𝑗 βTx
• If F(z) = 1/(1+e-z), the standard logistic distribution
𝛾𝑗
Then 𝐹 −1 𝛾𝑗 = ln = α𝑗 βTx
1−𝛾𝑗
• When J=2, then we have Binary Logistic Regression Model:
𝛾1
𝐹 −1 𝛾1 = ln = α1 βTx
1−𝛾1
117
Proportional Odds Model for Ordinal Data
118
Intentionally Kept Blank
119
Properties of the Logit Model
120
Relation between Probability, Odds & Logit
Log(Odds)
Probability Odds =Logit
0 0 NC
0.1 0.11 -2.20
0.2 0.25 -1.39
0.3 0.43 -0.85
0.4 0.67 -0.41
0.5 1.00 0.00
0.6 1.50 0.41
0.7 2.33 0.85
0.8 4.00 1.39
0.9 9.00 2.20
1 NC NC
121
Estimation and Model Fit
• Estimation Method: Maximum Likelihood
• Fit: Cox & Snell R Square and Nagelkerke R Square
(similar to R2 in multiple regression).
• Cox & Snell R Square can not equal 1.0, even if fit is
perfect, this limitation is overcome by Nagelkerke R
Square.
• Hosmer–Lemeshow test used for assessing goodness
of fit of a model (Large P-value indicates overall
model fit is good)
• Compare predicted and actual values of Y to
determine the percentage of correct predictions.
122
1. MLE calculation:
The distribution of scores in a public examination is normal
with unknown mean µ and known standard deviation 5. It is,
however, known that µ equals 52, 60 or 67. If the scores of
three randomly chosen candidates turn out to be 88, 93, 59,
then find the maximum likelihood estimate of µ.
(Ans. mle µ of 67; sample mean = 80)
𝑛 𝑛
1 1
−2 𝑖=1(𝑋𝑖 − 𝜇)2
𝐿 𝜇 = 𝑒
5 2𝜋 52
𝑖=1
𝑛 2
𝑖=1 𝑋𝑖 − 𝜇
𝑙𝑜𝑔𝑒 𝐿 𝜇 = 𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡 −
2 × 52
𝑛 2
𝑖=1 𝑋𝑖 − 𝑋 + 𝑋 − 𝜇
𝑙𝑜𝑔𝑒 𝐿 𝜇 = 𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡 −
50
𝑛 2 2 𝑛
𝑖=1 𝑋𝑖 − 𝑋 +𝑛 𝑋−𝜇 + 2(𝑋 − 𝜇) 𝑖=1 𝑋𝑖 − 𝑋
𝑙𝑜𝑔𝑒 𝐿 𝜇 = 𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡 −
50
123
2. MLE calculation:
(a) The distribution of a certain quality characteristic is
continuous uniform over the range [-2,], where is
unknown. If the values of the quality characteristic in a
random sample of size five are 0.63, 0.27, –1.62, 0.71 and –
1.38, then what is the maximum likelihood estimate of ?
(Ans. 0.81 = X(1)/2, where X(1)=min of X1, X2, …, Xn )
(b) The life (in hours) of an electrical component is
exponentially distributed with mean , where is unknown.
let p be the probability that a system, consisting of two such
components arranged in series, survives for five hours or
more. If five randomly chosen components are seen to
survive for 7, 6, 4, 8 and 5 hours, then what is the maximum
likelihood estimate of p? (Ans. mle of = 6 , mle of p =
exp(-10/) = 0.188756028 0.19)
124
3. MLE calculation:
An electrical unit consists of three identical components and
the unit functions if and only if at least one of the
components function. The components are known to
function independently and has an exponential dist. Out of
500 randomly chosen units of this type 468 were found to be
functioning for 10 hours or more. On the basis of this
information, find the maximum likelihood estimate of the
probability for an individual component to function for 10
hours or more. (Ans. 0.6)
125
Estimating Model Parameters
• Linear Regression model uses OLS method to
minimize sum of squared errors of prediction
• Logistic Regression model maximizes “likelihood” of
observing y1,…,yn, defined by
n
L p i (1 p i ) (1 yi ) ,
yi
i 1
exp( 0 1 x 1i ... k x ki )
where p i .
1 exp( 0 1 x 1i ... k x ki )
Thus , L is a function of 0 , 1 ,..., k
126
Estimation (contd.)
n
L p i i (1 p i ) (1 yi ) ,
y
i 1
y i 12
ln 0 1X1i ... k X ki
(1 y i 2)
1
127
Model Fit Measures
https://stats.idre.ucla.edu/other/mult-pkg/faq/general/faq-what-are-pseudo-r-
squareds
2/n
Cox & Snell R square: L0
R 1
2
L
where L0= max likelihood of the intercept-only model,
L=max likelihood under specified model
128
Other Measures
129
Interpretation of Coefficients
p exp(a0 a1 x1 ... a k xk )
ln( ) a0 a1 x1 ... a k xk , i.e., p ,
1 p 1 exp(a0 a1 x1 ... a k xk )
130
Interpretation of Coefficients
If Xi is increased by one unit, the log odds will change by
ai units, when the values of other independent variables is
held constant.
Log(Odds)
Sign of ai will determine Probability Odds =Logit
0 0 NC
whether the probability 0.1 0.11 -2.20
increases (if the sign is 0.2 0.25 -1.39
positive) or decreases (if the 0.3 0.43 -0.85
0.4 0.67 -0.41
sign is negative) by some 0.5 1.00 0.00
amount. 0.6 1.50 0.41
0.7 2.33 0.85
0.8 4.00 1.39
0.9 9.00 2.20
1 NC NC
131
Intentionally Kept Blank
132
Prescribe “Buy/Sell/Hold” for a Security
Daily Data on
Price, Volume, MACD, RSI, Put-Call Ratio,
Open Interest, etc & “Buy/Sell/Hold prediction”
(attached in hindsight on the basis of next
trading day’s closing price)
133
134
135
Example: Insurance Requirement
136
Data (2-level Response)
137
SPSS Windows: Logit Analysis
1. Select ANALYZE from the SPSS menu bar.
6. Click OK.
138
Recovering Probability () from Logit ()
Let η = logit
= ln = 0 +1 𝑋1 + ⋯ + 𝑝 𝑋𝑝
(1 − )
exp() 1
= =
1+exp() 1+exp(−)
139
Output (2-level response)
141
Output (2-level response)
https://stats.idre.ucla.edu/other/mult-pkg/faq/general/faq-
what-are-pseudo-r-squareds
142
Output (2-level response; w/o Income)
144
Output (2-level response; w/o Income)
145
Classification Table
• When there are two categories Y=1 and Y=2, this will
produce a table of counts for classification or
misclassification
• when there are three categories Y=1, Y=2 and Y=3 one
will have a table of counts for classification or
misclassification
146
Classification Table
Proportion of correct classification for the
subjects in the sample is given by
(𝑛11 + 𝑛22 +𝑛33 )
(𝑛11 +𝑛12 +𝑛13 +𝑛21 +𝑛22 +𝑛23 +𝑛31 +𝑛32 +𝑛33 )
147
Receiver Operating Characteristic (ROC)
Curve
• Sensitivity is the true positive rate (TPR) since it is the
rate of correctly classifying an observed “yes” as “yes”
by the model for a fixed cut-off or threshold value.
• Similarly, (1-specificity) is known as the false positive
rate (FPR) as it represents the rate of incorrectly
classifying an observed “no” as “yes” by the model for
the same cut-off or threshold value.
• Thus, the ROC plot, of sensitivity versus (1specificity),
can be thought of as a plot of the Power versus Type I
Error of the decision (classification) rule based on the
chosen cut-off or threshold value using the model
predicted probabilities.
148
Receiver Operating Characteristic (ROC)
Curve
• Let x = FPR and y = TPR. Then ROC points (x,y) represent relative trade-off
between true positive rate (beneficial) and false positive rate (costly). The
best possible scenario for prediction would be to have the point (0,1) in
the upper left corner of the ROC space, as it represents 100% sensitivity
(no false negatives) and 100% specificity (no false positives). The (0,1)
point is also known as a perfect classification.
• A prediction method based on random guessing based on flipping a coin
with Pr(Heads) = p [0,1], and then classifying a subject as “yes” category
if “Heads” fall and “no” category if “Tails” appear. Such a method would
generate points along a diagonal line from the left bottom point (0,0) to
the top right corner (1,1). Points above the diagonal line in the ROC space
represent good classification results (better than random) and points
below the line represent poor results (worse than random).
• Thus, one desires to see the ROC curve as far above the diagonal as
possible. Thus, a binary logistic regression model producing the largest
the area under the ROC curve may be declared as the best model.
149
Pearson Chi-Square & the Deviance Tests
Pearson Chi-Square test is mainly used for categorical
explanatory variables. For a sample of n subjects with binary
response and a number of covariates, using a fitted binary logit
model one can construct an IJ table for observed counts with
rows consisting of I distinct patterns of the covariates in the data
and columns representing the two levels of the binary response.
For each of these I patterns construct model based expected
counts (Eij) and compare them with observed counts (nij), i=1, …,
I, j=1,2, …,J.
150
Pearson Chi-Square Tests
• Then the Pearson Chi-Square test statistic is defined by
𝐼 𝐽 2
𝑛𝑖𝑗 − 𝐸𝑖𝑗
𝑃𝐶𝑆 =
𝐸𝑖𝑗
𝑖=1 𝑗=1
• If each of the I2 cells have 5 or more observations nij,
then under the null hypothesis of correct model, PCS
has an asymptotic chi-square distribution with degrees
of freedom (df) = [I(J1) m], where m = number of
(independent) model parameters being estimated. If
J=2, then the PCS has df = (I-m).
151
Deviance test statistic
Deviance test statistic also compares the observed to the
expected counts, defined by
𝐼 𝐽 𝑛𝑖𝑗
𝐷𝑒𝑣𝑖𝑎𝑛𝑐𝑒 = 2 𝑖=1 𝑗=1 𝑛𝑖𝑗 ln .
𝐸𝑖𝑗
It also has the same df = [I(J1) m], where m = number of
(independent) model parameters being estimated. The deviance
test is really the likelihood ratio test.
Example: One may test the null hypothesis that conditional (i.e.,
given xi) distribution of the responses {Yi, i=1,…,I} have
independent binomial(1, (xi)) distributions where
𝑒 β0 +β1 x1i +⋯+βk xki
𝐱𝑖 = ,
1+𝑒 β0 +β1 x1i +⋯+βk xki
against the saturated model which does not assume {(xi), i=1,
…,n} to be related in any manner. In this case, J=2 and m=k+1.
152
Hosmer Lemeshow Test: Binary case
Pearson Chi-square test is not appropriate when there are
continuous (i.e., non-categorical) explanatory variables if they are
not grouped. In this case for each “covariate” pattern X=x, there
will usually be one observation, and hence the asymptotic or
approximate chi-square distribution for the Pearson of the
Deviance test is not valid.
This problem is faced if number of categorical binary predictors k
is large. For example, if k is 6, one may have as many as I = 2k = 64
distinct combinations or patterns of covariates.
To solve this problem of large I, Hosmer and Lemeshow (1980)
orders n subjects according to the fitted probabilities 𝝅 𝐱 =
Pr(Y=“Yes”|x) and then divide n subjects into g number of groups
of comparable sizes, where g is a small number often chosen to
be 10. 153
Hosmer Lemeshow Test: Binary case
nij = number of obs in j-th cell of the i-th group; Ei1 = sum of the
fitted probabilities 𝝅 𝐱 = Pr(Y=“Yes”|x) of the subjects in the i-
th group and Ei2 = ni+ - Ei1. The PCS-type test statistic is
constructed
2
In simulation studies, it has been seen that 𝜒𝐻𝐿 has approximately
a chi-square distribution with df= (g2).
154
Hosmer Lemeshow Test:
Multinomial Response Case
• Fagerland and Hosmer’s (2012) generalization of the
Hosmer-Lemeshow test for multinomial logistic regression
models. The groups are formed in a similar manner as done
in the binary case and the table of observed and expected
counts for J categories (namely, 0, 1, …, J-1 ) takes the form:
155
Exercise
• Fagerland, Hosmer and Bofin (2008) showed that
2
𝜒𝐹𝐻 has asymptotic chi-square distribution with df =
(g2)(J-1). Various statistical software readily
2
provide the 𝜒𝐻𝐿 test in the binary case, and do not
2
usually produce the 𝜒𝐹𝐻 test in the multinomial
case. One may compute the latter by writing R
program codes.
156
AIC, BIC
The Akaike Information criterion
AIC = 2m – 2loge(estimated maximum likelihood value under a model)
where m = number of parameters being estimated in the model. The value 2m
is used to penalize a model with higher number of parameters, to reward a
parsimonious model. Given a number of candidate models for the data, the
preferred model is the one with the minimum AIC value. The enables us to
compare two models nested or not.
AIC is it is not a consistent in that its probability of selecting the true model
among the candidates does not tend to one as the sample size increases to
infinity. One popular alternative that corrects this problem Bayesian
Information Criterion (BIC) defined by Schwarz (1978),
BIC = [loge(n)]m – 2loge(estimated maximum likelihood value under a model)
Usually, both AIC and BIC agree with the choice of the best model.
157
Example: Insurance Requirement
158
Multinomial Logit Model
j
log 0i 1 j X 1 ... kj X k , j 1,..., (m - 1)
m
k
exp( ij X i )
i.e., j m 1
i 0
k
, j 1,..., (m - 1)
1 [ exp( ij X i )]
j 1 i 0
159
Data (3-level Response)
160
SPSS Windows: Logit Analysis
1. Select ANALYZE from the SPSS menu bar.
2. Click REGRESSION and then MULTINOMIAL LOGISTIC.
3. Move “Resp3level” in to the DEPENDENT VARIABLE box.
4. Move “Age,” “Dependent,” and “Income,” in to the
COVARIATES box.)
5. Under STATISTICS: under MODEL select everything except
Monotonicity measures, under PARAMETRS select
Estimates, Likelihood ratio tests & then ENTER
6. Click OK.
161
Output (3-level response)
163
Output (3-level response)
164
Output (3-level response; w/o Income)
167
Output (3-level response; w/o Income)
168
Bankruptcy Example
169
Bankruptcy Example
(Applied Multivariate Statistical Analysis by Johnson & Wichern)
170
Bankrupt
CA/NS
CA/CL
CF/TD
NI/TD
Seq#
NI/TD
Seq#
17 0.51 0.1 2.49 0.54 1
18 0.08 0.02 2.01 0.53 1
19 0.38 0.11 3.27 0.35 1
20 0.19 0.05 2.25 0.33 1
21 0.32 0.07 4.24 0.63 1
22 0.31 0.05 4.45 0.69 1
23 0.12 0.05 2.52 0.69 1
24 -0.02 0.02 2.05 0.35 1
25 0.22 0.08 2.35 0.4 1
26 0.17 0.07 1.8 0.52 1
27 0.15 0.05 2.17 0.55 1
28 -0.1 -0.01 2.5 0.58 1
29 0.14 -0.03 0.46 0.26 1
30 0.14 0.07 2.61 0.52 1
31 0.15 0.06 2.23 0.56 1
32 0.16 0.05 2.31 0.2 1
33 0.29 0.06 1.84 0.38 1
34 0.54 0.11 2.33 0.48 1
35 -0.33 -0.09 3.01 0.47 1
36 0.48 0.09 1.24 0.18 1
172
SPSS Windows: Logit Analysis
1. Select ANALYZE from the SPSS menu bar.
6. Click OK.
173
Regressors: CFTD, CACL, NITD, CANS
174
Regressors: CFTD, CACL, NITD, CANS
175
Regressors: CFTD, CACL
176
Regressors: CFTD, CACL
177
Intentionally Kept Blank
178
Example: Survival of the Donner Party
179
Survival of the Donner Party (contd.)
…. and again in the crossing of the desert west of the
great Salt Lake. The group became stranded in the
eastern Sierra Nevada mountains when the region was
hit by heavy snow in October. By the time the last
survivor was rescued on April 21, 1847, 40 of the 87
members had died of famine and hunger and exposure
to extreme cold.
180
Donner Party Data (age above 15yrs)
[sex (i.e., gender) = 1 is male indicator variable]
181
Route of the Donner Party
182
The Logistic Regression Model
• The response variable in this case study is
binary; it take values 0 and 1. The Donner
Party study involves two explanatory
variables, one categorical and one numerical.
Logistic regression is the appropriate tool in
such a situation.
183
Intentionally Kept Blank
184
Generalized Linear Model
• A generalized linear model is one where some
function of the mean of the response variable is
related to the explanatory variables through a
regression equation.
g ( ) 0 1 X 1 p X p
185
Exponential Family of Distributions with
Vector of Parameters
186
Logit Link for Binary Responses
We define
logit ( ) log( /(1 ))
Then the logistic regression formula is given by
logit ( ) 0 1 X 1 p X p
Logistic regression is a kind of non-linear regression,
since the equation E(Y | X1, X2,…, Xp) = π is nonlinear
in the β’s.
187
Recovering Probabilities from Logits
logit = ln = 0 +1 𝑋1 + ⋯ + 𝑝 𝑋𝑝
(1 − )
exp() 1
= =
1+exp() 1+exp(−)
188
Example: Donner Party
GLM code in R
> glm_survive = glm(survive ~ age+sex, family = binomial(logit),
data = donner)
> summary(glm_survive)
• Coefficients:
(Intercept) age sex
3.23041 0.0782 1.5973
• Degrees of Freedom: 44 ‘Total’(i.e. Null); 42 Residual (Model)
Residual Deviance = 51.26, DF= 42 (=45 - 3), AIC=57.26, P-value (HL test) =
0.19, AUC=0.73, Correct Classification % = 77.8, Nagelkerke R2 = 0.28
189
Example: Donner Party
• The fit of the logistic regression model to the Donner
Party data (with 45 obs with age ≥15), where π
represents the survival probability gives
• logit(πi) = ln(πi/(1- πi)) = 3.23 – 0.078 agei 1.60 sexi,
where “sex” is an indicator variable for males.
• Interpretation:
1.60 = change in ln(odds), ln(odds ratio), for a male
compared to a female of same age
– 0.078 = change in ln(odds), ln(odds ratio), for one year
older person of same gender
190
Odds from Logistic Regression
= Odds that Y = 1 is defined as P(Y=1)/P(Y=0)
In Logistic Regression:
𝜔 = exp(𝛽0 + 𝛽0 × 𝑋1 +…+ 𝛽𝑖 × 𝑋𝑖 + … + 𝛽𝑘 × 𝑋𝑘 )
𝜔𝑎
= exp( 𝛽𝑖 (𝑎 − 𝑏))
𝜔𝑏
191
Interpretation of Coefficients (Donner Party)
192
Testing Presence of Interaction (Donner data)
Wald’s test for the hypothesis that the coefficient of
interaction term is zero in the logistic regression of survival (1
or 0) on age, sex (=1 for males), and interac=age x fem:
Donner Party data (n=45)
Variable Coefficient Standard Z-Statistic
Error
Constant 7.246 3.205 2.261
age -0.194 0.087 -2.220
sex -6.928 3.399 -2.038
Age x sex 0.162 0.094 1.714
Residual Deviance = 47.34, DF= 41 (=45-1-4), AIC=55.35, P-value(HL test)=
0.65, AUC=0.762, Correct Classification %= 73.3, Nagelkerke R2 = 0.368
From the normal distribution Two-sided p-value = 2 x Pr(Z > 1.714) = 0.087
Conclusion: There is suggestive but inconclusive(?) evidence of an interaction.
193
Example ( Confidence Interval for a Coefficient)
Confidence intervals for the odds of survival for females divided by
the odds of survival for males, accounting for age, from the model
without interaction
95% Confidence Interval for the coefficient of Sex : -1.597 1.960.755 = -3.077 to -0.117
Take anti-logarithms of endpoints to get interval for the odds ratio: 0.046 to 0.889
Conclusion: exp(-1.597) = 0.203 = log(odds ratio). Odds of survival for a male is one
fifth of that of a female of same age.
194
Intentionally Kept Blank
195
Retrospective studies
In logistic regression model, probability of binary
responses are modeled prospectively as functions of
explanatory variables. In the studies in which the
probability of “yes” responses (e.g. cancer) are very
small, independent samples are drawn retrospectively.
But prospective probabilities can not be estimated from
such retrospective sampling. However, since the odds
ratio is invariant to the study design, it can be estimated
from a retrospective study using logistic regression
model.
In a logistic regression model for a retrospective study, the
estimated intercept is not an estimate of the prospective intercept.
196
Logistic Regression for Retrospective Studies
(Agresti, p. 168)
Let Z=1, if a subject is sampled, Z=0, otherwise
Let 1 = P(Z=1|y=1) = Prob of sampling a “case”
Let 0 = P(Z=1|y=0) = Prob of sampling a “control”
Cond. Dist. of response Y given X =x is not sampled.
P(Y=1|Z=1,x) =
𝑍 = 1 𝑌 = 1, 𝑥 𝑃(𝑌=1|𝑥)
𝑃
= 1 𝑃 𝑍 = 1 𝑌 = 𝑗, 𝑥 𝑃(𝑌=𝑗|𝑥) = 𝜌1𝑃 𝑌 = 𝜌11𝑃(𝑌=1|𝑥)
𝑥 +𝜌0𝑃 𝑌 = 0 𝑥
𝑗=0
𝑌 = 1𝑥
𝑃
𝜌1
𝜌1 [ ] 𝜌1 𝛼+𝛽𝑥 ln +𝛼+𝛽𝑥
𝑃 𝑌 = 0𝑥 𝜌1 [𝑒 𝛼+𝛽𝑥 ] 𝜌0
[𝑒 ] [𝑒 𝜌0 ]
= 𝑃 𝑌 = 1𝑥 = = 𝜌1 𝛼+𝛽𝑥 = 𝜌
𝜌1 [𝑒 𝛼+𝛽𝑥 ]+𝜌0 [𝑒 ]+1 ln 𝜌1 +𝛼+𝛽𝑥
𝜌1 [ ]+𝜌0 𝜌0 [𝑒 0 ]+1
𝑃 𝑌 = 0𝑥
𝜌
𝑙𝑜𝑔𝑖𝑡((P(Y=1|Z=1,x))=[ln 1 + 𝛼] + 𝛽𝑥 = 𝛼 ∗ + 𝛽 ∗ x
𝜌0
197
Estimation of Logistic Regression coefficients
Maximum likelihood Estimation: Joint probability is given by,
199
Tests and Confidence Intervals for Single
Coefficients
The properties imply that each estimated coefficient βj in
logistic regression has, approximately, a normal sampling
distribution, and therefore,
Z-ratio =
has an approximate standard normal distribution.
The standard error is the estimated standard deviation of the
sampling distribution of the estimator.
A confidence interval is the estimate plus and minus the half-
width, which the standard error of the estimate times
appropriate percentile of the N(0,1) distribution
200
Tests and Confidence Intervals for Single
Coefficients
A test statistic is the ratio of the estimate minus the
hypothesized value to the standard error, and
the p-value is obtained by comparing this to a N(0,1)
distribution. Such a test is referred to as Wald’s test.
201
Tests and Confidence Intervals for Single
Coefficients
A test statistic is the ratio of the estimate minus the
hypothesized value to the standard error, and
the p-value is obtained by comparing this to a N(0,1)
distribution. Such a test is referred to as Wald’s test.
202
Example (Wald’s Test)
Wald’s test can be used to determine whether the log odds of
survival are associated with age differently for men than for
women. (n=45?)
Wald’s test for the hypothesis that the coefficient of interaction term is
zero in the logistic regression of survival (1 or 0) on age, fem (=1 for
females), and age x fem: Donner Party data
Variable Coefficient Standard Z-Statistic
Error
Constant 0.318 1.131 0.28
age -0.032 0.035 -0.92
fem 6.927 3.354 2.06
Age x fem -0.162 0.093 -1.73
Deviance = 47.34 Degrees of freedom = 41
From the normal distribution Two-sided p-value = 2 x Pr(Z > 1.73) = 0.085
95% Confidence Interval for the coefficient of Sex : -1.597 1.960.755 = -3.077 to -0.117
Take anti-logarithms of endpoints to get interval for the odds ratio: 0.046 to 0.889
Conclusion: The odds of survival for males are estimated to have been exp(-1.597) =
0.203, i.e., one-fifth of the odds of survival for males of similar age).
204
Example ( Confidence Interval for a Coefficient)
Confidence intervals for the odds of survival for females divided by
the odds of survival for males, accounting for age, from the model
without interaction
Conclusion: The odds of survival for females are estimated to have been 4.9 times the odds
205
of survival for males of similar age (95% CI: 1.1 times to 21.6 times).
Intentionally Kept Blank
206
Continuous Covariate and Invariance to Linear
Transformation, p. 122, Text
𝑥
Let 𝑥 ′ = 𝑎 + . [e.g., x = weight in kg, x = weight in lb]
𝑘
Thus , 𝑥 ′ =1 𝑥 = 𝑘 − 𝑘𝑎, and 𝑥 ′ =0 𝑥 = −𝑘𝑎
𝑃 𝑌 = 1 𝑥′ = 1 𝑃 𝑌 = 1 𝑥 = 𝑘 − 𝑘𝑎
[ ′ ] [ ]
𝑃 𝑌 = 0𝑥 = 1 𝑃 𝑌 = 0 𝑥 = 𝑘 − 𝑘𝑎 [𝑒 𝛼+𝛽(𝑘−𝑘𝑎) ]
𝛽′
𝑒 = ′ = = = 𝑒 𝛽(𝑘)
𝑃 𝑌 = 1𝑥 = 0 𝑃 𝑌 = 1 𝑥 = −𝑘𝑎 [𝑒 𝛼+𝛽(−𝑘𝑎) ]
[ ′ ] [ ]
𝑃 𝑌 = 0𝑥 = 0 𝑃 𝑌 = 0 𝑥 = −𝑘𝑎
207
Exact Tests for Categorical and Count
Data
208
Exact Test of Two Poisson Means
• Compare parameters 1 and 2 of two Poisson dist
• Let X1i, i=1, …, n1 and X2i, i=1, …, n2 be two independent
random samples from respective distributions.
• Want to test H0: 1 = 2 (=, unknown)
𝑛1
• Make use of the statistics Y1= 𝑖=1 𝑋1𝑖 ~𝑃𝑜𝑖(𝑛1 ), Y2=
𝑛2
𝑖=1 𝑋2𝑖 ~𝑃𝑜𝑖(𝑛2 ). We concentrate on samples for which
Y=Y1+Y2 is fixed, same as observed sum of Y1 and Y2. Then
the conditional dist of Y1 given Y=y is : Bin( y, n1/(n1+n2) )
209
Exact Test of Two Poisson Means
• If Ha: 1 > 2, compute the P-value
210
Intentionally Kept Blank
211
Exact Test of Multiple Poisson Means
• Compare parameters 1, 2, …, k of k Poisson dist
• Let {Xij, j=1, …, ni}, i=1, …, k, be k independent random
samples from respective distributions.
• Want to test H0: 1 = 2 = … = k (=, unknown)
𝑛𝑖
• Make use of the statistics Yi= 𝑗=1 𝑋𝑖𝑗 ~𝑃𝑜𝑖(𝑛𝑖 ), i=1,…,k.
We concentrate on samples for which Y=Y1+ …+Yk is the
same as observed sum y = y1+ …+yk . Then the conditional
dist of Y1 given Y=y is : Bin( y, n1/(n1+…+nk) )
212
Intentionally Kept Blank
213
Exact Test of Two Proportions (Homogeneity)
(GGD, Fundamentals, Vol 1, Ch 15)
216
Example: Exact Test of Two Proportions
• Example. Compare two methods of treatment of an allergy.
Method 1(A) uses 15 patients and Method 2(B) uses 14. Is
mehod 2 better than method 1 ?
217
Intentionally Kept Blank
218
Exact Test of Multiple Proportions
(GGD, Fundamentals, Vol 1, Ch 15)
220
Intentionally Kept Blank
221
Exact Test for Homogeneity of Two
Multinomial Distributions
• Xi= (Xi1,…,Xik) ~ Mult(ni, pi1,…,pik)
• H0: (p11,…,p1k)=(p21,…,p2k)=(p1,…,pk)
• Let X = (X1+X2), n =n1+n2
• Under H0, Xi ~ Mult(ni, p1,…,pk), X ~ Mult(n, p1,…,pk)
• Then conditional dist of X1 given X = x is
𝑛1 ! 𝑛2 ! (𝑥11 +𝑥21 )! … (𝑥1𝑘 + 𝑥2𝑘 )!
=
(𝑛1 + 𝑛2 )! (𝑥11 ! … 𝑥1𝑘 !)(𝑥21 ! … 𝑥2𝑘 !)
222
Intentionally Kept Blank
223
Exact Test: Independence of Two Attributes
• To test if two qualitative characters (attributes) A and B
are independent. Let P(A=Ai, B=Bj) = pij, i=1,…,k, j=1,…,l.
𝑙
• Let 𝑃 𝐴 = 𝐴𝑖 = 𝑗=1 𝑝𝑖𝑗 = 𝑝𝑖0 ; Let 𝑃 𝐵 = 𝐵𝑗 =
𝑘
𝑖=1 𝑝𝑖𝑗 = 𝑝0𝑗
• To test H0: 𝑝𝑖𝑗 = 𝑝𝑖0 𝑝0𝑗 , for all i,j.
• nij= observed freq for cell AiBj. The marginal frequency
of Ai and Bj are 𝑛𝑖0 = 𝑙𝑗=1 𝑛𝑖𝑗 and 𝑛0𝑗 = 𝑘𝑖=1 𝑛𝑖𝑗
224
Exact Test: Independence of Two Attributes
• To test if two qualitative characters (attributes) A and B
are independent. Let P(A=Ai, B=Bj) = pij, i=1,…,k, j=1,…,l.
• To test H0: 𝑝𝑖𝑗 = 𝑝𝑖0 𝑝0𝑗 , for all i,j.
• nij= observed freq for cell AiBj. The marginal frequency
of Ai and Bj are 𝑛𝑖0 = 𝑙𝑗=1 𝑛𝑖𝑗 and 𝑛0𝑗 = 𝑘𝑖=1 𝑛𝑖𝑗
• Under H0, conditional distribution of {nij, all I,j} for fixed
marginals {𝑛𝑖0 , 𝑛0𝑗 , all I, j} has the pmf
225
Exact Test: Independence of Two Attributes
• To test if two qualitative characters (attributes) A and B
are independent. Let P(A=Ai, B=Bj) = pij, i=1,…,k, j=1,…,l.
𝑙
• Let 𝑃 𝐴 = 𝐴𝑖 = 𝑗=1 𝑝𝑖𝑗 = 𝑝𝑖0 ; Let 𝑃 𝐵 = 𝐵𝑗 =
𝑘
𝑖=1 𝑝𝑖𝑗 = 𝑝0𝑗
• To test H0: 𝑝𝑖𝑗 = 𝑝𝑖0 𝑝0𝑗 , for all i,j.
• nij= observed freq for cell AiBj. The marginal frequency
of Ai and Bj are 𝑛𝑖0 = 𝑙𝑗=1 𝑛𝑖𝑗 and 𝑛0𝑗 = 𝑘𝑖=1 𝑛𝑖𝑗
226
Exact Test: Independence of Two Attributes
• Example: Data collected on a random sample of
people attending preview of a movie
• Did the movie have equal appeal to the young and old
or whether it is more liked by the young.
• Test H0: two attributes are independent against Ha:
they are positively associated.
227
Exact Test: Independence of Two Attributes
• Add up probabilities, under H0, of the given table and of
those indicating more extreme positive association (and
having the same marginals). These tables and
corresponding probabilities are:
229
Degrees of Freedom
Degrees of Freedom for LRT = 2loge() is =
difference in dimensionality of and 0, when
H0:0 is true. (H1: 0c )
Let I = no. of distinct patterns of covariates; then df
1. Independence Test: {IJ-1} – {(I-1)+(J-1)}
2. Homogeneity Test: {I(J-1)} – (J-1)
3. Multinomial Logistic (J=3, say): {I(J-1)} – (J-1),
(No. of model parameters under H0= J-1)
4. Binary Logistic (J=2): I(2-1)-2
230
Intentionally Kept Blank
Beyond Syllabus beyond this
231
Multinomial Goodness-of-Fit
Suppose Y is a random variable with k possible
outcomes, c1,c2,…,ck with probabilities π1, π2,…, πk,
where the outcomes are mutually exclusive and
exhaustive categories.
Let n realizations of Y be observed; we can
summarize the responses through the random vector
X = (X1, X2,…, Xk) where Xi is the frequency of cell i.
Then X has a multinomial distribution with
parameters n and (π1, π2,…, πk ).
k k
i 1
i 1 and X
i 1
i n.
232
Pearson’s χ2
H 0 : i 0i i 1,..., k (1)
i 1 n 0i
233
The Power Divergence Family (Optional)
The power divergence statistic is defined as,
2 k xi
2 I ( x | n, ) xi
( 1) i 1 n i
1
k
For 1 / 2 we get HD 2
2
Xi n i
i 1
k
n i
For 1 we get KLD n i log
i 1 Xi
1
For 2 we get NCS
X i n i
k 2
2 i 1 Xi
235
Asymptotic distribution of power
divergence statistics
Consider the testing problem of the following simple null
hypothesis,
H0 : i 0i for all i 1,2,, k
for known values π01, π02, …, π0k. For all real λ the
asymptotic distribution of the power divergence statistic
2I ( X | n : 0 )
is approximately a chi-square with (k – 1) degrees of
freedom.
236
Power Comparison
Consider the equi-probable null
H0 : i 1/ k i 1,2,, k
against the alternative
1 /(k 1)
H1 : i i 1,2, , (k 1)
k
1
k
k
237
Some specific alternatives for k = 5
The equiprobable null hypothesis specifies the
probability vector (0.2, 0.2, 0.2, 0.2, 0.2)
Consider the following alternatives:
At η = 1.5, we have
• (0.125, 0.125, 0.125, 0.125, 0.5) [bump alternative]
• At η = 0.5, we have
• (0.175, 0.175, 0.175, 0.175, 0.3) [bump alternative]
At η = – 0.9, we have
• (0.245, 0.245, 0.245, 0.245, 0.02) [dip alternative]
238
Exact Power Functions Of the Randomized Test for the Equiprobable Null Model Against the Alternative
α = 0.05, n = 20, k = 5
η
λ 1.5 0.5 -0.9
-5.0 0.2253 0.0742 0.5880
-2.0 0.2253 0.0742 0.5880
-1.0 0.2253 0.0742 0.5880
-0.5 0.3361 0.0800 0.5875
-0.3 0.4468 0.0864 0.5693
0.0 0.6100 0.1073 0.4466
0.3 0.6605 0.1146 0.3760
0.5 0.6806 0.1188 0.3216
0.7 0.6907 0.1211 0.2851
1.0 0.6997 0.1228 0.2720
1.5 0.7168 0.1257 0.2297
2.0 0.7306 0.1278 0.1896
2.5 0.7392 0.1288 0.1760
5.0 0.7498 0.1296 0.1464 239
References
• Agresti, A. (2012). Categorical Data Analysis, Wiley Series in
Probability and Statistics.
• Bishop, Y., Fienberg, S. E. and Holland, P. W. (1975). Discrete
Multivariate Analysis, MIT Press, Cambridge.
• Christensen, R. (1990). Loglinear Models. Springer-Verlag,
New York.
• Ramsey, F. L. and Schafer, D. W. (1997). The Statistical Sleuth.
Duxbury Press, Belmont, California.
• Read, T. R. C. and Cressie, N. (1988). Goodness of fit Statistics
for Discrete Multivariate Data. Springer-Verlag, New York.
• Goon, Gupta, Dasgupta, Fundamentals of Statistics, Volume
One.
240