Anda di halaman 1dari 9

effect_size

July 2, 2017

1 Effect Size
Examples and exercises for a tutorial on statistical inference.
Copyright 2016 Allen Downey
License: Creative Commons Attribution 4.0 International
Main question: how can we express the difference between groups

In [1]: from __future__ import print_function, division

import numpy
import scipy.stats

import matplotlib.pyplot as pyplot

from ipywidgets import interact, interactive, fixed


import ipywidgets as widgets

# seed the random number generator so we all get the same results
numpy.random.seed(17)

# some nice colors from http://colorbrewer2.org/


COLOR1 = '#7fc97f'
COLOR2 = '#beaed4'
COLOR3 = '#fdc086'
COLOR4 = '#ffff99'
COLOR5 = '#386cb0'

%matplotlib inline

1.1 Part One


To explore statistics that quantify effect size, well look at the difference in height between men
and women. I used data from the Behavioral Risk Factor Surveillance System (BRFSS) to estimate
the mean and standard deviation of height in cm for adult women and men in the U.S.
Ill use scipy.stats.norm to represent the distributions. The result is an rv object (which
stands for random variable).

1
In [12]: mu1, sig1 = 178, 7.7
male_height = scipy.stats.norm(mu1, sig1)
male_height

Out[12]: <scipy.stats._distn_infrastructure.rv_frozen at 0x10c4a8f60>

In [3]: mu2, sig2 = 163, 7.3


female_height = scipy.stats.norm(mu2, sig2)

The following function evaluates the normal (Gaussian) probability density function (PDF)
within 4 standard deviations of the mean. It takes an rv object and returns a pair of NumPy
arrays.

In [4]: def eval_pdf(rv, num=4):


mean, std = rv.mean(), rv.std()
xs = numpy.linspace(mean - num*std, mean + num*std, 100) # x value
ys = rv.pdf(xs) # y value
return xs, ys

Heres what the two distributions look like.

In [6]: xs, ys = eval_pdf(male_height) # here male_height is the popu


pyplot.plot(xs, ys, label='male', linewidth=4, color=COLOR2)

xs, ys = eval_pdf(female_height)
pyplot.plot(xs, ys, label='female', linewidth=4, color=COLOR3)
pyplot.xlabel('height (cm) of the popu')
None

2
Lets assume for now that those are the true distributions for the population.
Ill use rvs to generate random samples from the population distributions. Note that these are
totally random, totally representative samples, with no measurement error!
In [25]: male_sample = male_height.rvs(1000) # random sample of the popu
In [8]: female_sample = female_height.rvs(1000)
Both samples are NumPy arrays. Now we can compute sample statistics like the mean and
standard deviation.
In [9]: mean1, std1 = male_sample.mean(), male_sample.std()
mean1, std1
Out[9]: (178.16511665818112, 7.8419961712899502)
The sample mean is close to the population mean, but not exact, as expected.
In [10]: mean2, std2 = female_sample.mean(), female_sample.std()
mean2, std2
Out[10]: (163.48610226651135, 7.382384919896662)
And the results are similar for the female sample.
Now, there are many ways to describe the magnitude of the difference between these distribu-
tions. An obvious one is the difference in the means:
In [11]: difference_in_means = male_sample.mean() - female_sample.mean()
difference_in_means # in cm
Out[11]: 14.679014391669767
On average, men are 1415 centimeters taller. For some applications, that would be a good
way to describe the difference, but there are a few problems:
Without knowing more about the distributions (like the standard deviations) its hard to
interpret whether a difference like 15 cm is a lot or not.
The magnitude of the difference depends on the units of measure, making it hard to compare
across different studies.
There are a number of ways to quantify the difference between distributions. A simple option
is to express the difference as a percentage of the mean.
Exercise 1: what is the relative difference in means, expressed as a percentage?
a percentage makes it easier to compare between different studies, say, compare to beattles.
In [14]: # Solution goes here
difference_in_means / male_sample.mean()
Out[14]: 0.082389946286916566
In [15]: difference_in_means / female_sample.mean()
Out[15]: 0.089787536605040408
STOP HERE: Well regroup and discuss before you move on.

3
1.2 Part Two
An alternative way to express the difference between distributions is to see how much they over-
lap. To define overlap, we choose a threshold between the two means. The simple threshold is the
midpoint between the means:

In [16]: simple_thresh = (mean1 + mean2) / 2


simple_thresh

Out[16]: 170.82560946234622

A better, but slightly more complicated threshold is the place where the PDFs cross.

In [17]: thresh = (std1 * mean2 + std2 * mean1) / (std1 + std2)


thresh

Out[17]: 170.6040359174722

In this example, theres not much difference between the two thresholds.
Now we can count how many men are below the threshold:

In [18]: male_below_thresh = sum(male_sample < thresh) # male_sample < thresh retur


male_below_thresh

Out[18]: 164

And how many women are above it:

In [20]: female_above_thresh = sum(female_sample > thresh)


female_above_thresh

Out[20]: 174

The overlap is the total area under the curves that ends up on the wrong side of the thresh-
old.

In [21]: overlap = male_below_thresh / len(male_sample) + female_above_thresh / len


overlap

Out[21]: 0.33799999999999997

Or in more practical terms, you might report the fraction of people who would be misclassified
if you tried to use height to guess sex:

In [22]: misclassification_rate = overlap / 2


misclassification_rate

Out[22]: 0.16899999999999998

4
Another way to quantify the difference between distributions is whats called probability of
superiority, which is a problematic term, but in this context its the probability that a randomly-
chosen man is taller than a randomly-chosen woman.
Exercise 2: Suppose I choose a man and a woman at random. What is the probability that the
man is taller?

In [37]: x = range(20)
y = range(20,40)
zip(x,y)

Out[37]: <zip at 0x10f135188>

In [45]: # Solution goes here


test_size = 1000
storage = []
for i in range(test_size):
random_male_height = male_height.rvs(1)
random_female_height = female_height.rvs(1)
if random_male_height > random_female_height:
storage.append(1)
else:
storage.append(0)
sum(storage)/test_size

"""or"""

prob_sup = sum(x > y for x,y in zip(male_sample, female_sample))


# zip since sample is already random, we just need to pair them up
# x > y return T/F

prob_sup/len(male_sample)

Out[45]: 0.92000000000000004

Overlap (or misclassification rate) and probability of superiority have two good properties:

As probabilities, they dont depend on units of measure, so they are comparable between
studies.

They are expressed in operational terms, so a reader has a sense of what practical effect the
difference makes.

1.2.1 Cohens d
There is one other common way to express the difference between distributions. Cohens d is the
difference in means, standardized by dividing by the standard deviation. Heres a function that
computes it:

5
In [46]: def CohenEffectSize(group1, group2):
"""Compute Cohen's d.

group1: Series or NumPy array


group2: Series or NumPy array

returns: float
"""
diff = group1.mean() - group2.mean()

n1, n2 = len(group1), len(group2)


var1 = group1.var()
var2 = group2.var()

pooled_var = (n1 * var1 + n2 * var2) / (n1 + n2)


d = diff / numpy.sqrt(pooled_var)
return d

Computing the denominator is a little complicated; in fact, people have proposed several ways
to do it. This implementation uses the pooled standard deviation, which is a weighted average
of the standard deviations of the two groups.
And heres the result for the difference in height between men and women.

In [47]: CohenEffectSize(male_sample, female_sample)

Out[47]: 1.9698908086614964

Most people dont have a good sense of how big d = 1.9 is, so lets make a visualization to get
calibrated.
Heres a function that encapsulates the code we already saw for computing overlap and prob-
ability of superiority.

In [48]: def overlap_superiority(control, treatment, n=1000):


"""Estimates overlap and superiority based on a sample.

control: scipy.stats rv object


treatment: scipy.stats rv object
n: sample size
"""
control_sample = control.rvs(n)
treatment_sample = treatment.rvs(n)
thresh = (control.mean() + treatment.mean()) / 2

control_above = sum(control_sample > thresh)


treatment_below = sum(treatment_sample < thresh)
overlap = (control_above + treatment_below) / n

superiority = sum(x > y for x, y in zip(treatment_sample, control_samp


return overlap, superiority

6
Heres the function that takes Cohens d, plots normal distributions with the given effect size,
and prints their overlap and superiority.

In [49]: def plot_pdfs(cohen_d=2):


"""Plot PDFs for distributions that differ by some number of stds.

cohen_d: number of standard deviations between the means


"""
control = scipy.stats.norm(0, 1)
treatment = scipy.stats.norm(cohen_d, 1)
xs, ys = eval_pdf(control)
pyplot.fill_between(xs, ys, label='control', color=COLOR3, alpha=0.7)

xs, ys = eval_pdf(treatment)
pyplot.fill_between(xs, ys, label='treatment', color=COLOR2, alpha=0.7

o, s = overlap_superiority(control, treatment)
print('overlap', o)
print('superiority', s)

Heres an example that demonstrates the function:

In [50]: plot_pdfs(2)

overlap 0.311
superiority 0.93

7
And an interactive widget you can use to visualize what different values of d mean:

In [52]: slider = widgets.FloatSlider(min=0, max=4, value=2)


interact(plot_pdfs, cohen_d=slider)
None

overlap 0.041
superiority 1.0

Cohens d has a few nice properties:

Because mean and standard deviation have the same units, their ratio is dimensionless, so
we can compare d across different studies.

In fields that commonly use d, people are calibrated to know what values should be consid-
ered big, surprising, or important.

Given d (and the assumption that the distributions are normal), you can compute overlap,
superiority, and related statistics.

In summary, the best way to report effect size often depends on the audience and your goals.
There is often a tradeoff between summary statistics that have good technical properties and statis-
tics that are meaningful to a general audience.

8
1.2.2 To summarize
We need to create percentange to compare the result between different groups. * Method 1: over-
lap compute a threshold using both means, calculate the misclassification rate. * Method 2:
random test prob of A > B * Method 3: compare difference of means using pooled variance. (Co-
hens d)

In [ ]:

Anda mungkin juga menyukai