Anda di halaman 1dari 6

Tutorial:

Empirical Distribution Function (EDF)


Thisisthesecondentryinourongoingseriesaboutempiricalorsampledistribution.Inthistutorial,we willstartwiththegeneraldefinition,motivationandapplicationsofEDF,andthenuseNumXLtocarry outourEDFanalysis. Inanearlierentry,wediscussedthehistogramasanonparametricmethodfortheprobability distributioninferenceofarandomvariable.Inthistutorial,wegoovertheempiricaldistribution functionandestimateitsvaluesforthedifferentpointsinthesample. Forsampledata,wegeneratedadatasetof29randomlygeneratedvaluesfromtheGaussian distribution.

Background
Theempiricaldistributionfunction(EDF)orempiricalcdfisastepfunctionthatjumpsby1/Natthe occurrenceofeachobservation: Where

EDF ( x)

1 N

I {x
i 1

x}

I {A} istheindicatorofaneventfunction

1 xi x I {xi x} 0 xi x

Bydefinition,theEDFfunctioncomputesthecumulativedistributionoftheunderlyingrandomnumber.

Why do we care?
TheEDFestimatesthetrueunderlyingcumulativedensityfunctionofthepointsinthesample;itis virtuallyguaranteedtoconvergewiththetruedistributionasthesamplesizegetssufficientlylarge.

Process
First,letsorganizeourinputdata.Wecanstartbyplacingthevaluesofthesampledatainaseparate column.Thesamplemaycontainoneormoremissingvalues.

EmpiricalDistributionFunction(EDF)Tutorial

SpiderFinancialCorp,2013

NowwearereadytoconstructourEDFPlotFirst,selecttheemptycellinyourworksheetwhereyou wishtheoutputtabletobegenerated,thenlocateandclickontheDescriptiveStatisticsiconinthe NumXLtab(ortoolbar).Then,selecttheEmpiricalDistributionFunctionitemfromthedropdown menu.

TheEDFWizardpopsup.

EmpiricalDistributionFunction(EDF)Tutorial 2 SpiderFinancialCorp,2013

Selectthecellsrangeforthevaluesoftheinputvariable. Notes:

1. Thecellsrangeincludes(optional)theheading(Label)cell,whichwouldbeusedintheoutput tableswhereitreferencesthosevariables. 2. Bydefault,theoutputtablecellsrangeissettothecurrentselectedcellinyourworksheet. 3. Bydefault,theoutputgraphcellsrangeissettothe7cellsrightofthecurrentselectedcellin yourworksheet. Finally,onceweselecttheinputdata(X)cellsrange,theOptionsandMissingValuestabsbecome available(enabled). Next,selecttheOptionstab.

Initially,thetabissettothefollowingvalues: OverlayNormaldistributionischecked.Thisoptionineffectinstructsthewizardtogeneratea secondcurvefortheGaussiandistributionforcomparisonpurposes.Leavethisoptionchecked.

Now,clickontheMissingValuestab.

EmpiricalDistributionFunction(EDF)Tutorial

SpiderFinancialCorp,2013

Inthistab,youcanselectanapproachtohandlemissingvaluesinthedataset(Xs).Bydefault,any observationwithmissingvaluewouldbeexcludedfromtheanalysis. Thistreatmentisagoodapproachforouranalysis,soletsleaveitunchanged. Now,clickOKtogeneratetheoutputtables.

EmpiricalDistributionFunction(EDF)Tutorial 4 SpiderFinancialCorp,2013

Notes:

1. ThevaluesofallobservationsaresortedinascendingorderandplacedincolumnE. 2. TheXBarandYBarcolumnscarrynospecialstatisticalmeaning;theyaremerelycomputedto assistusgeneratingastepwisetypeofgraphinExcel. 3. Finally,theequivalentcumulativedensityfunction(CDF)ofthenormaldistributioniscomputed incolumnI. ThegeneratedplotoftheEDFisshownbelow:

Conclusion
Inthistutorial,wedemonstratedtheprocesstogenerateanempiricaldistributionfunctioninExcel usingNumXLsaddinfunctions.

EmpiricalDistributionFunction(EDF)Tutorial

SpiderFinancialCorp,2013

Wheredowegofromhere? Toobtaintheprobabilitydensityfunction(PDF),oneneedstotakethederivativeoftheCDF,butthe EDFisastepfunctionanddifferentiationisanoiseamplifyingoperation.Asaresult,theconsequent PDFisveryjaggedandneedsconsiderablesmoothingformanyareasofapplication. Inournextentry,wewilllookatthekerneldensityestimationmethodtoobtaintheprobabilitydensity functionoftheunderlyingrandomprocess.

EmpiricalDistributionFunction(EDF)Tutorial

SpiderFinancialCorp,2013

Anda mungkin juga menyukai