Anda di halaman 1dari 99

ChoosingtheSample:

PowerCalculationsandSamplingforImpact
Evaluation
Stanislao Maldonado
UniversityofCalifornia,Berkeley
AppliedImpactEvaluation
Spring2014
1
Unlessdatacollectionischeap,asampleisrequiredto
identifythecausaleffectofinterest
Twokeyquestions:
Howlargedoesthesampleneedtobetocredibly detecta
giveneffectsize?
Howwillthesamplebechosen?
2
1.Motivation
Attheendofanexperiment,wewillcomparethe
outcomeofinterestinthetreatmentandthecomparison
groups.
Butwe(generally)donotobservetheentirepopulation,
justasample
Randomization(whenpossible)removesbias,butitdoes
notremovenoise:itworksbecauseofthelawoflarge
numbershowlargemuchlargebe?
Samplingmattersforexternalvalidity
3
Considerthetrueimpactofanrandomizedprogramina
populationof3million
Thetrueimpactisthefollowing:
. regress lny_random edu_random
Source | SS df MS Number of obs = 3000000
-------------+------------------------------ F( 1,2999998) = 7645.62
Model | 7639.90493 1 7639.90493 Prob > F = 0.0000
Residual | 2997756.072999998 .999252691 R-squared = 0.0025
-------------+------------------------------ Adj R-squared = 0.0025
Total | 3005395.982999999 1.00179899 Root MSE = .99963
------------------------------------------------------------------------------
lny_random | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
edu_random | .1009284 .0011543 87.44 0.000 .0986661 .1031907
_cons | 4.999916 .000816 6127.05 0.000 4.998317 5.001516
------------------------------------------------------------------------------
4
Example:ImpactofMBAsonearnings
Duetobudgetconstraints,wecanonlycollectasample
of3,000observations.Theestimatedimpactisthe
following:
. regress lny_random edu_random
Source | SS df MS Number of obs = 3000
-------------+------------------------------ F( 1, 2998) = 8.90
Model | 8.56254841 1 8.56254841 Prob > F = 0.0029
Residual | 2883.82628 2998 .961916703 R-squared = 0.0030
-------------+------------------------------ Adj R-squared = 0.0026
Total | 2892.38882 2999 .964451092 Root MSE = .98077
------------------------------------------------------------------------------
lny_random | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
edu_random | .1068677 .035819 2.98 0.003 .0366354 .1771001
_cons | 5.002025 .0255632 195.67 0.000 4.951902 5.052149
------------------------------------------------------------------------------
5
Imaginenowthatwecanonlycollectasampleof300
observations.Theestimatedimpactisthefollowing:
. regress lny_random edu_random
Source | SS df MS Number of obs = 300
-------------+------------------------------ F( 1, 298) = 0.25
Model | .253603988 1 .253603988 Prob > F = 0.6152
Residual | 298.464191 298 1.00155769 R-squared = 0.0008
-------------+------------------------------ Adj R-squared = -0.0025
Total | 298.717795 299 .99905617 Root MSE = 1.0008
------------------------------------------------------------------------------
lny_random | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
edu_random | .0582793 .1158176 0.50 0.615 -.1696447 .2862034
_cons | 4.997034 .0845812 59.08 0.000 4.830582 5.163486
------------------------------------------------------------------------------
6
=Ineligible
=Eligible
1.Population
ExternalValidity
2.Evaluationsample
3.Randomize
treatment
InternalValidity
Comparison
Treatment
X
Determininghowlargethesamplemustbetobeableto
detectaneffect(incasethiseffectexists)isknownas
PowerCalculation
Powercalculationindicatestheminimum samplesize
neededtoconductanimpactevaluationandtoanswer
thepolicyquestionofinterest
Tradeoffbetweensamplesizeandcost
8
2.ReviewofHypothesisTesting
Mostimpactevaluationstestasimplehypothesis:Does
theprogramhaveimpact?
Twosteps:
1. Estimatetheaverageoutcomesfortreatmentandcontrol
groups
2. Assesswhetheradifferenceexistsbetweentheaverage
outcomeforbothgroups
Animpactevaluationtestthenullhypothesisofno
impact(H0)againstthealternativeofimpact(HA),
9
0
: 0
: 0
T C
A T C
H Y Y
H Y Y
A = =
A = =
Significanttestareusedtomakeaprobabilityargument
(probabilityofobservingagivenoutcomebychanceis
verysmall)
Aformulaforatwosamplettestforcomparingmeans:
Weusethisteststatisticstocomputetheprobabilityof
chance,orsamplingvariability
10
T C
C T
T C
Y Y
t
s s
n n

=
+
MeanDifference
VariabilityofMeanDifference
-12
-10
-8
-6
-4
-2
0
2
4
6
8
10
12
M
e
a
n
1 50 100 200 300400
Sample Size (log scale)
Statistical Regularity for H0(Normal 3,9) and HA(Normal 0,9)
11
Applyingausualstandards,weusethemiddle95%(90%
and99%arealsocommon)ofthetstatisticsvaluesto
representvaluesconsistentwithnostudyeffect(null
hypothesis):
Outer5%ofthesetstatisticsareconsideredconsistent
withastudyeffect
Usingtheoreticalvalues,thiscanberepresentedinthis
way:
12
0
:
T C
H =
:
A T C
H =
13
2.5% 2.5%
95%
t = -1.98 t = 0 t = 1.98
t statistic
t distribution for n = 50 in each group
3.PowerAnalysis
Whentestingwhetheraprogramhasanimpact,two
typesoferrorcanbemade:
TypeIerror():rejectingthenullhypothesiseventhough
itistrue(falsepositive)
TypeIIerror():failingtorejectthenullhypothesis
(concludingthereisnodifference),whenindeedthenull
hypothesisisfalse.
ResearcherscanlimitthesizeoftypeIerrorbychoosing
amoreconservativesignificancelevel(=1%)or
equivalently,theconfidencelevel(1=99%)
14
Significancelevel():theprobabilitythatwewillreject
thenullhypothesiseventhoughitistrue
TypeIIerrorsarealsorelevantforpolicymakers,being
samplesizethemostcriticalcomponent
Ifnoimpactisfoundforanintervention,whetherthis
differencewasestimatedwithalargeorasmallsample
mattersforinterpretingtheresult
Power(1):Ifthereisameasureableeffectofour
intervention(thenullhypothesisisfalse),theprobability
thatwewilldetectaneffect(rejectthenullhypothesis)
15
Statisticalpoweristheabilitytorejectthe
hypothesisthatprogramdoesntworkwhenit
reallydoes
Truestate ofthe world
Program does not change
testscores
Program increases testscores
Ho = program does not increase test scores
This
column
represents
Program
working
16
Statisticalpoweristheabilitytorejectthe
hypothesisthatprogramdoesntworkwhenit
reallydoes
Truestateoftheworld
Program does not change
testscores
Program increases
testscores
Estimate
Program does not
change test
scores
Program
increases
testscores
Thisrowrepresentstheevaluationfinding
thatprogram isworking
17
Ho = program does not increase test scores
Statisticalpoweristheabilitytorejectthe
hypothesisthatprogramdoesntworkwhenit
reallydoes
Truestateoftheworld
Program does not change
testscores
Program increases
testscores
Estimate
Program does not
change test
scores
Program
increases
testscores
Correctrejection
ofHo
Webelieve
program
works
Wehopeevaluation
willconfirmthatit
works
Ifprogram works,we
wanttomaximizethe
chancethatevaluation
saysitworks
18
Ho = program does not increase test scores
Therearetwotypesoferror
thatwewanttoavoid
Truestateoftheworld
Program does not change
testscores
Program increases
testscores
Estimate
Program does not
change test
scores
Noability to reject Ho
Program
increases
testscores
TypeIError
Falsepositive
Correctrejection
ofHo
Evaluationsaysprogram
workswhenitdoesnt
19
Ho = program does not increase test scores
Therearetwotypesoferror
thatwewanttoavoid
Truestate ofthe world
Program does not change
testscores
Program increases
testscores
Estimate
Program does not
change test
scores
Noability to reject Ho
Type IIError
FalseNegative
Program
increases
testscores
Type IError
Falsepositive
Correct rejection
ofHo
Evaluationsaysprogram
doesntwork
whenitreallydoes
20
Ho = program does not increase test scores
Statisticalpoweristhechance
thatwerejectthenullhypothesis
whenitisfalse
Truestateoftheworld
Program does not change
testscores
Program increases
testscores
Estimate
Program does not
change test
scores
Noability to reject Ho
Type IIError
FalseNegative
Program
increases
testscores
Type IError
Falsepositive
Correctrejection
ofHo
Power: probability
that you reject no
impact when there
really is impact
21
Ho = program does not increase test scores
22
HA H0
t = 0
t statistic
No effect vs. treatment
Power:Graphicaltreatment
23
HA H0
t = -1.98 t = 0 t = 1.98
t statistic
No effect vs. treatment (adding significance level)
24
HA H0
t = -1.98 t = 0 t = 1.98
t statistic
No effect vs. treatment (adding power)
Power:Formaltreatment
Letsconsiderasimpleprogramwithonepossible
individualleveltreatment.ATEcanbecomputedbya
simpleOLSregression:
Assumethateachindividualwasrandomlysampled
fromanidenticalpopulation(observationsarei.i.d),the
varianceofthetreatmenteffectisgivenby:
25
i i
Y T o | c = + +

2
1
( )
(1 )
Var
P P N
o
| =

Toachievepowerk:
Example:
Usingthegraph,wecanidentifydifferentapproachesto
poweranalysis:
MinimumDetectableEffect
PowerCalculation
SampleSizeDetermination
26

1
( ) ( )
k
t t SE
o
| |

> +
1
0.84 for 80%
k
t k

= =
WedefinetheMinimumDetectableEffect (MDE):
Thisformulaalreadysuggestmanyinterestingfeatures
ofpoweranalysis:
Ifweincreasesamplesize,whathappenswithMDE?
Ifweincreasepower,whathappenswiththeMDE?
27
( )
2
1
1
( , , , ) ( )*
(1 )
k
MDE k N P t t
P P N
o
o
o

= +

A.MinimumDetectableEffect(MDE)
WedefineN(samplesize)inthefollowingway:
Again,youmayanalyzewhathappenswithsamplesizeif
wechangelevelofpowerandtheeffectsize
28
( )
2
1
1
*( )*
(1 )
( , , , )
k
E
E
t t
P P
N k P
o
o
o |
|

(
+
(

(
=
(
(

B.SampleSizeEstimation
Wedefinethepowerlevelinthefollowingway:
Again,youmayanalyzewhathappenswithpowerifwe
changesamplesizeandtheeffectsize
29
( )
1
2
( , , , )
1
(1 )
E
E
k
t N P
P P N
o
|
o | t
o

C.Power
RecallthedefinitionofMinimumDetectableEffect
(MDE):
Anstandardizedversioncanbeformulatedifyou
normalizeditintermsofstandarddeviations:
Then:
30
( )
2
1
1
( , , , ) ( )*
(1 )
k
MDE k N P t t
P P N
o
o
o

= +

StandardizedEffectSize
T C
T C
Y Y
Y Y
|
| o
o o

= = =
( , , , )
( , , , )
MDE k N P
SMDE k N P
o
o
o
=
Samplesizecalculationsstepbystep
Carryingoutsamplesizecalculationsrequiresanswering
thefollowingquestions(Gertler etal2011):
1. Whatis(are)theoutcomeindicator(s)?
2. Whatistheminimumlevelofimpact thatwouldjustify
theintervention?
3. Whatisthebaselinemeanandvarianceforoutcome
indicators?
4. Whatisthereasonablelevelofpowerfortheevaluation
tobeingconducted?
5. Doestheprogramcreateclusters?
31
Step1:Counterfactualmeanandvariance
Weneedtohaveanestimateoftheoutcomeofinterest
beforetheintervention
Howtogetancredibleestimate?
1. Similarindicatorsfromotherdatasets
Previoushouseholdsurveys,administrativedataand
censusdata
2. Sameindicatorsfromsimilarregions/countries
3. Someothergooddatasources:
DemographicHealthSurveys(DHS)
WorldBankLSMS
Nationalstatisticalinstitutes
32
Howmuchwillestimatesofoutcomeindicatorvary?
Estimatestandarddeviation(SD)asbestyoucanusing
othersources
Whydoesvariationmatter?
Morevariation,moreerrorinestimates
Hardertomeasurestatisticallysignificanteffectifone
exists(i.e.lowerpower)
Largersamplesizeneeded
33
34
ES=0 ES=2.5
N=large, SD=1
Impact of Variance
35
ES=0 ES=2.5
N=large, SD=2
Impact of Variance
36
ES=0 ES=2.5
N=large, SD=0.7
Impact of Variance
Step2:MinimumEffectSize
Whatisthesmallesteffectthatshouldjustifythe
programtobeadopted?
Fundamentallyapolicyquestion:Policyrelevantvs.
statisticallysignificant
Smallerthetreatmenteffectwewanttodetect
Morepowerweneed
Biggersamplesizeweneed
Sometimesitisexpressedintermsofstandard
deviations:
37
T C
T C
Y Y
Y Y
|
| o
o o

= = =
ConsiderationsinDetermining
MinimumEffectSize
Whatimpactwouldmatterforpolicydecisions?
Currentmeanofoutcomeindicator
Costoftheprogramvs.benefitsitbrings
Costoftheprogramvs.alternativeuseofthemoney
Whatisrealistic?
Effectsizesofsimilarprogramsdoneinotherplaces
Effectsizesofotherprogramswithsimilargoals
38
39
40
ES=0 ES=2.5
N=100
Impact of Effect Size
41
ES=0 ES=4
N=100
Impact of Effect Size
42
ES=0 ES=1
N=100
Impact of Effect Size
Step3:SignificanceLevelandPower
Determinesignificancelevel
Levelwerequireforstatisticalsignificance(usually5%with
atstatisticof1.96forlargeN)
Determinepowerwewant
Likelihoodoffindingasignificantimpactifprogramreally
doeshaveone(usually80%withatstatisticof0.84fora
largeN)
43
Step4:Accountingforclusters
Sometimeslevelofinterventionisdifferentfromthe
levelwhereyoumeasureoutcomes
Examples:
Interventioninaschool,outcomesonstudents
Interventioninavillage,outcomeonvillagers
Interventioninahospital,outcomeonpatients
Outcomesforalltheindividualswithinatreatmentunit
maybecorrelated
44
Allvillagersexposedtosameweather
Allpatientshavesamedoctor
Allstudentshavesameteacher
Allteachersmanagedbysameprincipal
Whyitmatters?
Morecorrelationbetweenoutcomeswithinclusters
Lessinformationfromunitswithincluster
Lowerpowerforagivensamplesize
Samplesizeestimatesneedtobeadjustedforclustering
45
ReasonsforClusterRandomization
Ifitreducespower,whyrandomizebycluster?
Needtominimizeorremovecontamination
Example:Informationcampaignswithinavillagebecause
villagerstalktoeachother
Politicalfeasibility
Example:Negativepoliticalramificationsfromoffering
socialbenefitstosomepoorfamiliesinavillagebutnot
others
Onlynaturalchoice
Example:Educationinterventionthataffectsentire
classroom(e.g.teachertraining)
46
ImplicationsofClustering
Itisextremelyimportanttorandomizeanadequate
numberofclusters
Often,#ofclustersmattersfarmorethan#ofindividuals
withinacluster
Thinkthatlawoflargenumberasapplyingtoclusters
ratherthanunitswithinclusters
47
EstimatingIntraclusterCorrelation(ICC)
ICCisdegreeofcorrelationbetweenoutcomeswithina
cluster
Needtoestimate:
1. Usingbaselinedata
2. Usingsimilarmeasuresfromthesamesetting
3. Usingsamemeasuresinsimilarcountries/regions
Formally,wedefinethetwosourcesofvariability:
48
2
2
0
( )
( )
ij
j
Var r
Var
o
t
=
=
Individuallevelvariance
Clusterlevelvariance
Thepercentageofobservedvariationintheoutcome
duetoclusterlevelcharacteristicsisdefinedas:
2
2 2
t

t o
=
+
49
Lowintraclustercorrelation(Rho)
50
Highintraclustercorrelation(Rho)
51
ClusterRandomizedDesigns:FormalTreatment
Programsrandomizedatgrouplevelarecommon
(OPORTUNIDADES,Familias enAccion,etc)
Errortermmaynotbeindependentacrossindividualsin
agivengroup(commonshockstoallindividualsina
treatedarea)
Formally:
Standarderror:
52
ij j ij
Y T o | u e = + + +

2 2
1
( )
(1 )
n
SE
P P nJ
t o
|
+
=

Ifrandomizationhadbeenconductedatthelevelofthe
individual:
ThisimpliesthattheratiobetweenSE(knownasDesign
Effect)is:
53

2 2
1
( )
(1 )
SE
P P nJ
t o
|
+
=

2 2
2 2
1
(1 ) ( )
1 ( 1)
( )
1
(1 )
cluster
individual
n
P P nJ SE
D n
SE
P P nJ
t o
|

|
t o
+

= = = +
+

NoticethatDisincreasinginICCandn,leadingtoan
increaseinvariancerespecttoindividualrandomization
Keyresult:samplesizeforclusteredrandomizeddesign
canbeobtainedbymultiplyingthedesigneffectwiththe
samplesizecomputedunderindividualrandomized
design:
54
( )
2
1
*
1
( )* * 1 ( 1)
(1 )
cluster individual
k
MDE MDE D
t t n
P P nJ
o
o

=
= + +

Where:
Bloom(2005)showsthattheMDEwithJgroupsofsizen
eachisgivenby:
Threeimplications:
MDEvariesproportionallytoJ
n affectsprecisionmuchless,especiallyforlargeICC
JandndependcriticallyonICC
55
2
2 2
t

t o
=
+
2
2 /2 1
1
(1 )
J
J k
M
MDE
n
P P J
M t t
o

= +

= +
56
ICCinEducationalPrograms
Duflo etal(2008)
57
Source:Hedgesetal(2007)
ThePeruviangovernmenthaslaunchedanewschool
feedingprogram(calledQali Warma)inorderto
improvelearningoutcomesofpreschoolandprimary
levelstudents
1. Outcome:testscores
2. Meanandvarianceofbaselineindicators:Mean=541.6
andSD=106.2
3. EffectSize:0.2standarddeviationsintestscores(Mean
fortreatment=562.8)
4. Levelofpower:80%
5. Clusterlevel:districts,ICC=0.17
58
Example:SchoolfeedingprograminPeru
0
.
0
0
1
.
0
0
2
.
0
0
3
.
0
0
4
D
e
n
s
i
t
y
0 200 400 600 800 1000
Math Test Score
kernel =epanechnikov, bandwidth = 7.2857
Math Test Score Distribution in 2011
59
CaseI:IndividualRandomization
60
. power twomeans 541.6 562.8, sd(106.2) alpha(0.05) power(0.8)
Performing iteration ...
Estimated sample sizes for a two-sample means test
t test assuming sd1 = sd2 = sd
Ho: m2 = m1 versus Ha: m2 != m1
Study parameters:
alpha = 0.0500
power = 0.8000
delta = 21.2000
m1 = 541.6000
m2 = 562.8000
sd = 106.2000
Estimated sample sizes:
N = 790
N per group = 395
61
. power twomeans 0 0.2, sd(1) alpha(0.05) power(0.8)
Performing iteration ...
Estimated sample sizes for a two-sample means test
t test assuming sd1 = sd2 = sd
Ho: m2 = m1 versus Ha: m2 != m1
Study parameters:
alpha = 0.0500
power = 0.8000
delta = 0.2000
m1 = 0.0000
m2 = 0.2000
sd = 1.0000
Estimated sample sizes:
N = 788
N per group = 394
62
0
.2
.4
.6
.8
1
P
o
w
e
r

(
1
-
|
)
0 500 1000 1500
Total sample size (N)
Parameters: o = .05, o = .2,
1
= 0,
2
= .2, o = 1
t test assuming o
1
= o
2
= o
H
0
:
2
=
1
versus H
a
:
2
=
1
Power for a two-sample means test
power twomeans 0 0.2, sd(1) alpha(0.05) power(0.1(0.05)0.95) graph(y(power))
Powercurve
63
0
.2
.4
.6
.8
1
P
o
w
e
r

(
1
-
|
)
0 500 1000 1500 2000
Total sample size (N)
.1 .2
.3
Experimental-group mean (
2
)
Parameters: o = .05,
1
= 0, o = 1
t test assuming o
1
= o
2
= o
H
0
:
2
=
1
versus H
a
:
2
=
1
Estimated power for a two-sample means test
power twomeans 0 (0.1(0.1)0.3), sd(1) alpha(0.05) n(10(100)2000) graph
Simulation:changesineffectsizes
CaseII:ClusteredRandomization
64
. sampsi 0 0.2, sd(1) power(0.8) alpha(0.05) ratio(1)
Estimated sample size for two-sample comparison of means
Test Ho: m1 = m2, where m1 is the mean in population 1
and m2 is the mean in population 2
Assumptions:
alpha = 0.0500 (two-sided)
power = 0.8000
m1 = 0
m2 = .2
sd1 = 1
sd2 = 1
n2/n1 = 1.00
Estimated required sample sizes:
n1 = 393
n2 = 393
65
. sampclus, obsclus(10) rho(0.17)
Sample Size Adjusted for Cluster Design
n1 (uncorrected) = 393
n2 (uncorrected) = 393
Intraclass correlation = .17
Average obs. per cluster = 10
Minimum number of clusters = 199
Estimated sample size per group:
n1 (corrected) = 995
n2 (corrected) = 995
ICC
66
0
.
1
.
2
.
3
.
4
.
5
.
6
.
7
.
8
.
9
1
P
o
w
e
r
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
Sample Size
ICC=0.17 ICC=0.50
ICC=0.10
Powersurveysshowthatmostresearchlacksof
adequatesamplesizestodetectcausaleffects
Lowpoweriscommoninmanysciences:
Psychology(Cohen1962,Sedlmeier etal1989)
Management(Cashen etal2004)
Behavioralecology(Jennions etal2003)
Psychiatry(BrownandHale1992)
Biology(ThomasandJuanes 1996)
Education(West1985)
Whatabouteconomics?
67
68
Lowpowerinbusinesstrainingprograms
(McKenzieetal2012)
69
Lowpowerinmanagement(Cashen etal2004)
Undersizedstudiesareawasteofresourcesfornot
havingthecapabilitytoproduceusefulresults
Samplesizeisalsoapivotalissueforethicalreasons
70
4.SamplingStrategy(Optional)
Sizeisnottheonlyrelevantfactor,thewaythesampleis
drawnfromthepopulationofinterest
Samplingrequiresthreesteps(Gertler etal2011):
Determinethepopulationofinterest
Identifyasampleframe
Drawasmanyunitsfromthesamplingframeasrequired
bypowercalculation
71
SamplingFrame
Howwillyoufindthepopulationofinterest?
Listofunitsinthepopulationofinterest
Commonsources:
Populationcensus
Schoolcensusforstudents/schools
Businessregistriesformicroenterprises
Datafromprogramitself
72
SamplingFrameConcerns
Samplingframealwayslikelytobeslightlydifferentfrom
populationofinterest
Coveragebias
Mayexcludespecificsegmentsofyourpopulation
Reducesexternalvalidityforwholepopulation
Examples
PhonebookassamplingframeforallHHsinacity
Businessregistry(mayexcludeinformalsector)
Villagemeetingspromotinghealthinsurance(onlyforthose
villagerswhogotomeetings)
73
SamplingProcedure
Howtochooseasamplefromthesamplingframe?
Probabilitysampling
Randomsampling
Stratifiedrandomsampling
Clusterrandomsampling
Nonprobabilitysampling
Conveniencesampling,quotasampling,purposive
sampling,etc.
Notusuallyrecommendedforquantitativeresearch
74
SimpleRandomSampling
Everyoneinpopulationframehasequalchancetobein
sample
Letsillustratesomebasicprinciplesaboutrandom
samplingviasimulationusingtheexcelfilespostedon
Bspace (lesson1.xls)
Problem:estimatethepovertyrateinaAfricancountry
N=900(Populationsize)
P=450(PovertyRate)
Samplesize=100
75
76
77
SamplingFrame
Resultsfromoneexperiment
Estimatedpovertyrate
Samplingerror
FrequencyHistogram
78
79
Resultsfromtwoexperiments
Averageerror=[(3)+(9)]/2=6
RMSE=squareroot(3^2+9^2)/2=6.71
80
Resultsfrom500experiments
Averageerrortendstozero(LawofLargeNumbers)
unbiasedestimatoroftrueproportion
NormalDistribution(CentralLimitTheorem)
81
Effectofpopulationsizeonprecision(N=2000)
RMSEincreasedbyverylittle!
Lesson1:Populationsizedoesnotmatterforprecision
82
Effectofsamplesizeonprecision(n=400)
Frequencyhistogramhasbecomeslimmer
RMSEhasdropped!
Lesson2:Errorsareinverselyproportionaltothesquarerootofthesamplesize
StratifiedRandomSampling
Sampledividedintoimportantgroups
Malefemale,urbanruralandincomequintiles
Randomlyselectindividualswithineachgroup
Allindividualsinagrouphaveequalchanceofbeing
drawn
Representativeofpopulationandeachsubgroup
Advantages
Canmakeinferencesaboutimpactsonasubgroup
Cancompareimpactsacrosssubgroups
Disadvantages
Mayrequirelargersamplesizes
Canbemorecostly
83
84
Strata
Openlesson3.xls
85
86
Resultsfrom1500experiments
Lesson3:Globalaverageisaweightedmeanofindividualestimatesforeachstrata
Stratificationpursuestwogoals:
Improveprecisionoftheoverallestimate
Obtainseparateestimatesofadequateprecisionforeach
strata
Unfortunately,bothseemstobedifficulttoreachin
practiceatthesametime
Thewaysampleisallocatedmatters
Twoalternatives:
Equalallocation
Proportionalallocation
Neyman allocation(stratasizeandoutcomevariability)
87
88
ProportionalAllocation
89 Lesson4:Proportionalallocationgivesmorepreciseforcountryasawhole
90
Neyman Allocation
Lesson5:Neyman allocationdoesnotgiveusabetterapproximation
ClusterandMultistageSampling
Randomsamplingisnotcommonappliedinpractice
becausesamplingframesareoftenincomplete
Examples:
Censusdataareusuallycollectedeach10years
Voterregistration/telephonelistingsareincomplete
Twostage(multistage)samplingassolution
Cluster(calledPrimarySamplingUnitinsomecases)
Households/individuals
91
Groupunitsintonaturalclusters
Geographic
Functional
Randomlysampleclusters
Withineachcluster:
Surveyallunitsor
Chooserandomsampleofunitstosurvey
Advantages
Costsavings:Donthavetogoeverywhere
Canpermitlargersamplesize
92
93
#Clusters Clustersize
94
Cluster
Twostagesampling
Simplerandomsampling
95
ClusterEffect
SummingUp
Carefulthoughtshouldgointochoosingyoursample
PowerCalculations
Avoidfalseconclusionsthatprogramdoesntwork
Avoidwastingresourcescollectingtoomuchortoolittle
data
Manyfactorsgointosamplesizeneeded
Payspecialattentiontohowyouassignthetreatment
SamplingStrategy
Populationofinterest
Samplingframe
Samplingmethodology
96
Appendix:Usingcontrolvariables
ConsiderthefollowingOLSregression:
Thenthetreatmenteffectcanbewritteninthe
followingway:
Then,
97

2 2
1 (1 )
( )
(1 )
X
R
Var
P P N
o
|

=

i i i
Y T X e o | = + + +
( )
T C
T C
Y Y X X | =
Recallthefollowingresult:
Addingcovariatesreducestheresidualvarianceand
therebytendstoreducethevarianceofparameter
estimates(Duflo etal2008)
TheMDEwithcovariates canbewrittenasfollows:
98
2
| 2
2
1
X
X
R
o
o
=
( )
2 2
1
1 (1 )
( , , , ) ( )*
(1 )
X
k
R
MDE k N P t t
P P N
o
o
o


= +

AccountingforImperfectCompliance
Imperfectcomplianceinanissueinimpactevaluation
withimplicationsforpowercalculation
Letsdefinethefollowinggroups:
thefractionofthoseinitiallyassignedtothetreatment
whowereactuallytreated
theshareofsubjectsassignedtocontrolgroupwho
receivethetreatment
Then,theMDEaccountingforimperfectcompliance:
99
: c
: s
( )
2
1
1 1
( , , , ) ( )*
(1 )
k
MDE k N P t t
P P N c s
o
o
o

= +

Anda mungkin juga menyukai