Anda di halaman 1dari 50

UnderstandingBasic

Statistical Concepts
StatisticalConcepts
(asAppliedtoFingerprints)
Presented by Michele Triplett, CLPE
at the International Association for Identification's
97th International Educational Conference
24 July 2012
Background
Why is this a hot topic?
Whyisthisahottopic?
Recent
Recentadvancesinprobabilisticmodels
advances in probabilistic models
Themodelscanbeconfusing
Therearedifferentviews,whichmakesitmore
There are different views, which makes it more
confusing

ThispresentationwillgiveyouBASICinformation.
Specifically:simpleterminologyandconcepts.
Explaining Probabilities
ExplainingProbabilities
Fun with Statistics
FunwithStatistics
Terms
Statistics CollectionandInterpretationofdata.
Frequency Howmanytimesaneventhappens.
Probabilities Themeasurementofhowoftenaneventwill
happenwhenwearentcertain(rateoffrequencyofevents).
Probabilit or Rate
ProbabilityorRate= Freq enc of e ent
Frequencyofevent
Totalnumberofpossibleoutcomes
Conditional
ConditionalProbabilities
Probabilities Probabilities
Probabilitiesofaneventhappening
of an event happening
whenweknowadditionalinformation.
(ProbabilityofsomeonespeakingSpanishvs.Probability
ofsomeonespeakingSpanishwhoisfromSpain).
LikelihoodRatios Ratiooftwoprobabilitiesunderdifferent
circumstances, comparing the two probabilities to each other.
circumstances,comparingthetwoprobabilitiestoeachother.
BayesianMethods ConditionalProbability/Bayesian
Inference.
Reverend Thomas Bayes
ReverendThomasBayes

17011761,EnglishMathematician
Bayesss Theorem
Bayes
Two
Twoapplications
applications
Bothareusedinsituationswhereprobabilities
canttbedirectlycalculated.
can be directly calculated
1)Conditionalprobabilities(withtheabilityto
updatethemodelasnewbeliefs
d h d l b li f are
established).
2)relatetwoevents(knownasBayesian
Inference).
Bayes Factors
EstablishingtheRateofFrequencyof
anOccurrence(Event)
( )
ClassicalProbabilities percentageofpossible
p g p
outcomes(finiteoutcomes)
EmpiricalProbabilities basedonasampleof
observances(largerandrelevantsamplesarebetter)
b (l d l t l b tt )
SubjectiveProbabilities userdefinestheprobability
based on existing knowledge. Can be dangerous if user
basedonexistingknowledge.Canbedangerousifuser
isusingagutfeelingorexperienceinsteadofusing
currentrelevantknowledge.
Allmethodsarevaluablefordifferentsituations.
Choosethemethodthatworksbestforthesituationat
hand.
DifferentUsesofSubjective
Probabilities
YYankeeswinningagamebasedonagut
k i i b d t
feeling.
Yankeeswinningagamebasedontheother
teamhavingalotofinjuredplayers.
Accuracy of Methods
AccuracyofMethods
Theresultsareasaccurateasthedata,themethodused
The results are as accurate as the data, the method used
andthethoroughnessused.Thismeanstheresultsof
researchareUSELESSwithoutknowinghowtheywere
arrivedat.
Reminder
ClassicalProbabilities finitenumberof
outcomes.
EmpiricalProbabilities basedonasample
group.
SubjectiveProbabilities
j userdefinesthe
probability.
Comparing Different Methods
ComparingDifferentMethods
Probabilityofitrainingtoday
Classically:1/2outcomes=50%
Cl i ll 1/2 t 50%
Assumesequallylikelyevents
Empirically:Samplegroup=1/100people=1%
Assumesthesamplelookedataccuratelyrepresentsthesituation
Subjectively:Lookofthewindowandseeitscloudy=85%
Canconsidermoreinformationthansuppliedinthemodel
Can consider more information than supplied in the model
Probabilityofmewinninganelection
Classically:1/2outcomes=50%
Assumesequallylikelyevents
Empirically:Samplegroup=9/10people=90%
Assumesthesamplelookedataccuratelyrepresentsthesituation
p y p
Subjectively:WithcurrentknowledgeImnotrunning=0%
Canconsidermoreinformationthansuppliedinthemodel
Comparing Different Methods
ComparingDifferentMethods
Probabilityoftossingacoinandgettingheads
Classically:1/2outcomes=50%
Classically: 1/2 outcomes = 50%
Assumesequallylikelyevents
Empirically:Samplegroupof10tosses=couldbe20%
Assumesthesamplelookedataccuratelyrepresentsthesituation
Assumes the sample looked at accurately represents the situation
Assumesthesamplegroupisanadequatesize
Subjectively:=85%
Canjustbethegutfeelingoftheperson
Can just be the gut feeling of the person
Probabilityofalongcommute
Classically:1/2outcomes=50%
Assumesequallylikelyevents
Empirically:Samplegroupofnumberofcars(low#)=15%
Assumesthesamplelookedataccuratelyrepresentsthesituation
Subjectively:Withcurrentknowledgeoffog(low#)=90%
Canconsidermoreinformationthansuppliedinthemodel
Comparing Different Methods
ComparingDifferentMethods
Probabilityofitbeingbusyatworkona
Tuesday
Classically:1/2outcomes=50%
Assumesequallylikelyevents
Empirically:SamplegroupofTuesdays=couldbe
30%(Tuesdaysareusuallyslow)
Assumesthesamplelookedataccuratelyrepresents
the situation
thesituation
Assumesthesamplegroupisanadequatesize
Subjectively:=85%
Subjectively: = 85%
Knowledgeofabigarrest
Benefits and Limitations
BenefitsandLimitations
Classical:
Benefit:Objective
Limitation:Maynotconsiderallrelevantinformation
Empirical:
E ii l
Benefit:Canshowthateventsdonthavethesame
q y
frequencyofoccurrence.
Limitation:Samplesizemaynotbeadequate,sample
groupmaynotberepresentative
Subjective:
Benefit:Allowsonetoconsidercurrentrelevantdata
Limitation:Allowsfornotusingrelevantdata
g
Example 1
Example1
Jamiegets80%(80/100or.8)Asin
g ( / )
herclasses.Ifshetakesan
automotiveclass,whatisthe
probabilityofhergettinganA?
Empirically:80%(averagefromsamplegroup)
Classically:20%(1outof5possiblegrades)
(assumesequalpossibilitiesofeachevent)
S bj ti l
Subjectively:
Couldbe60%ifshedislikesthetopic.
Could be 99% if she likes the topic
Couldbe99%ifshelikesthetopic
Which Method is Best?
WhichMethodisBest?
Equalchanceofeachgrade:
P(A)=1Aoutof5possiblegrades=1/5=20%
Empiricallyprobabilitymayshowtheresnotanequalpossibilityof
eachevent
P(A)=80%
( )
P(B)=10%
P(C)=5%
P(D)=4%
P(F)=1%

Classicalprobabilitiesdoesntconsiderallrelevantinformation.
Empiricalprobabilitiesmayshowfrequencyofeacheventisdifferent
butmaynotbeshowingothercurrentfactors(desire).
Subjectiveprobabilitiesallowsforotherrelevantdata.
What
Whatifweknowthelowergradeswereorweren
if we know the lower grades were or werentt
recent?Thenthecurrentprobabilitiesmaybe
differentthantheoverallprobabilities.

Needthesamplegrouptoberelevant (current)in
additiontobeinganadequatesize.

Needtousethebestmethodforthesituation
(classical,empiricalorsubjective).
Example 2
Example2
P(flipping3tailsinarow)
P(flipping 3 tails in a row)=
Classical,faircoin=(1/2)x(1/2)x(1/2)=1/8
Classical, unfair coin =1x1x1
Classical,unfaircoin 1x1x1 =1
1
Likelihoodratioofitbeingafaircoin=(1/8)/1
=1/8thelikelihoodofitbeingafaircoinor8times
1/8 the likelihood of it being a fair coin or 8 times
morelikelythatitsnotafaircoin.
Empirically=??(Tryit)
p y ( y )
Empiricalprobabilitiesarebetterwithlargesample
groups(inthelongrun LawofLargeNumbers).
3 Tails 1of8possibilities
3Tails 1 of 8 possibilities
HHH
HTT
HTH
HTH
HHT
TTT
THT
TTH
THH
Example 3
Example3
P(being
P(beingbornonJanuary1)
born on January 1)=??
Classically=1/365

Arebirthdaysequallydistributedthroughouttheyear?

Whatisitdependenton(factors)?

Whichmethodwouldworkbestfordeterminingthe
frequencyofbirthdates?
q y
Frequency of an Event
FrequencyofanEvent

Isitthesameeverywhere(equallydistributed)?
Frequency of an Event
FrequencyofanEvent

Thefrequencyofacharacteristicisdifferentin
differentareas(enclosureunderacorev.enclosure
overacore).Wedontknowthefrequencyof
characteristics We haventtfoundamathematicalway
characteristics.Wehaven found a mathematical way
ofestablishingthefrequencyofridgeevents.
How
Howmanypointsareenough?
many points are enough?
Sincewehaventbeenabletoestablishthe
frequencyoffeatures,classicalprobabilitiescant
determineahighprobabilitythatacertainnumber
offeaturesaresufficiency(onereasonclassical
probabilitiesdontwork).
b biliti d t k)
10commonfeaturesmayhavelessweightthan5
uncommon features Need to consider the rarity
uncommonfeatures.Needtoconsidertherarity
(weightoffeatures).
CloseNonMatch
HenryTempleman
l
Only Considering L2
OnlyConsideringL2
meansyoureignoringother
relevantinformation
l f
Questions for the Use of Statistics
QuestionsfortheUseofStatistics
Howareprobabilitiesandlikelihoodratiosgoingtobe
p g g
used?
Usetosupportcurrentconclusions(moreaccurate)?
Usetoarriveatadditionalconclusions(forcurrent
inconclusivedecisions,lowerthesufficiencyneeded)?
In2010,theIAIrescindedtheirresolutionstatingno
In 2010 the IAI rescinded their resolution stating no
possible,probableorlikelyconclusions.Thiswasnotto
incorporatestatisticalmodels,itwastoacknowledgethat
conclusionsarenotabsoluteconclusions,theyare
inferences(subjectiveprobabilities).
The answer on how
Theansweron howstatswillbeused
stats will be usedhasn
hasnttbeen
been
decidedyet. DecidedinFeb2012.
Questions for the Use of Probabilities
QuestionsfortheUseofProbabilities
Are
Arewelookingattheprobabilityofduplication?
we looking at the probability of duplication?
Arewelookingattheprobabilitythattheconclusionis
inerror?
Majorityoferrorsarenotduetothepossibilityof
duplication,theyareduetomisinterpretationofdata.
Misuse of Statistics
MisuseofStatistics
Interpretingthedataonesidedly,ignoringrelevant
p g y, g g
information(intentionallyandunintentionally).
Mathematicalmodelsmaygiveafalsesenseofsecurity
yg y
inaconclusion.Needtolookat:
Isthesamplesizeadequate
Isthesamplegrouprelevance
Isthefrequencyofeventsagoodestimate
Aremajorfactorsconsidered(perhaps:clarity,quantity,
Are major factors considered (perhaps clarit q antit
rarity(frequency),dissimilarities,interveningridges,
level1,level2,level3)
Dothemodelsaccountformisinterpretation?
Fingerprint Individuality Models
FingerprintIndividualityModels
Galton(1892)
Galton (1892)
P(fingerprintconfiguration)=(1/16)x(1/256)x(1/2)24
24regions
24 regions
P(correctnumberofridgesenteringand
exiting) 1/256
exiting)=1/256
P(patterntype)=1/16
Fingerprint Individuality Models
FingerprintIndividualityModels
Henry(1900),Balthazard
y( ), ((1911),Bose(1917)
), ( )
4possibleevents(theeventsforeachmodelwere
different)
Assumedequalprobabilityofeachevent
P(event)=
P(aconfiguration)=()x () x() x()=()n
For10minutia,oneinmillions
Expectationofhowmanywouldhavethisconfiguration
=P(aconfiguration)/numberofpeople
Fingerprint Individuality Models
FingerprintIndividualityModels
Wentworth(1918)
Wentworth (1918)
P(event)=1/50,buthadnolistofthe50features
P(a configuration)=(1/50)n
P(aconfiguration)
Assumedequalprobabilityofeachevent
Advanced Models
AdvancedModels
Gupta(1968)
M
Morepossibilitiesofevents
ibiliti f t
Establishedthatthefrequencyofeventsisntequal
1/10 f
1/10forendingsandbifurcations
di d bif i
1/100forothers
Ch
Champod d (1995)
9possibleevents
Improvedtheknownfrequency
d h k f
ofevents
Generative Models
GenerativeModels
GenerativeModelsincorporateridgetypes(direction)
Pankanti,Prabhakar andJain(minutiae only)2001
Dass,ZhuandJain (minutiae only)2005
Fang,Srihari and Srinivasan (minutiae andridges)2006
Differences in Models
DifferencesinModels
Usedifferenttypesoffeatures
Use different types of features
Usedifferentfrequenciesforfeatures
Use different mathematical equations
Usedifferentmathematicalequations
Testedtheirmodelsonadifferentnumberof
fingerprints(easierwithcomputertechnology)
g p ( p gy)
Theresearchhasbeenvaluableand
leadtotheuseinbiometricsandlightsouttechnology!
g gy
Asmoreinformationisconsidered,theresearchmaybe
abletobeusedforlatentprintsaswell.
Current Models for Partial Prints
CurrentModelsforPartialPrints
ChristopheChampods Model Noinfo
Dactsys ForensicScienceService(FSS)inUK
OriginalprototypedevelopedbyCedricNeumannin2006
TTModel
Model HenryTempleman
Henry Templeman
OriginalversionreleasedinAug2008
WOVI NetherlandsForensicInstitute(NFI) Noinfo
CedricNeumannsModel
C di N M d l
Usedfor11pointsandunder.Wouldntaccountforthepast
problemcases.
Verylittleispublishedonthesemodels.
Noneofthesemodelshavebeen
independently validated
independentlyvalidated.
Current Models for Partial Prints
CurrentModelsforPartialPrints
Arethesemodelsanobjectiveweightofconclusions?
Are these models an objective weight of conclusions?
Dotheyaccountforthemajorfactors?
Dotheyuseempiricaldata?
y p
Basedoninformationindatabases.
Aretheresubjectiveelements?
Examinersarestillplottingpoints.
Views on Use for Partial Prints
ViewsonUseforPartialPrints
Juststartingquestioningthevalidityofdifferent
statisticalmodels.Theyhaventstarted
evaluatingmodels.

Estimatesitlltake2milliondollarsandyearsto
evaluatethevalidityof4models.

Individuals:
Somethinkwellbemovingtoprobabilitiesorlikelihood
ratiosquickly;othersthinkitisfaroff.
i i kl h hi k i i f ff
Mostpeoplefeelitsworthresearching.Theyacknowledge
weneedtobeopentotheuseofstatisticsbutquestion
whethermodelshaveasoundbasis.
h th d l h db i
Answering Questions in Court
AnsweringQuestionsinCourt
Lotsofmodelshavebeenattemptedsince1892.
p
Currentstatisticalmodelsareusefulwhenyouhavea
largenumberofclearminutia.
Nomodelhasbeenfoundtoreplicaterealityaccurately
forlownumberofminutia(toomuchinformationisnt
considered).
d d)
Currentmodelshavenotbeenexternallyvalidated.
Ourconclusions(subjectiveprobabilities)incorporate
moreinformationthanstatisticalmodelscanconsider
at this time
atthistime.
Answering in Court
AnsweringinCourt
Currently,ourcurrentconclusionsarebasedon
subjectiveprobabilities.WeconsiderL1,L2,L3,
quality,quantity,rarity,interveningridgesand
dissimilarities of features
dissimilaritiesoffeatures.
Wemakeourcurrentconclusionlesssubjective:
Byarrivingatconclusionsthatotherswouldarriveat
By arriving at conclusions that others would arrive at
(intersubjectivelytestable).
Bybeingabletoexplainthebasisbehindour
By being able to explain the basis behind our
conclusions.
Byusingobjectivedata(dataotherswoulduseordata
y g j (
otherscansee).
The Future
TheFuture
Statistics
Statisticsandprobabilitiesarenotnewinfingerprints
and probabilities are not new in fingerprints
butmanyofushavebeenabletoavoidthem.
Werenolongerabletoavoidstatisticssincemodels
arebeingrecommendedforuseandbecausemodels
aregettingbetter(closertobeingused).

Everyonewillneedtobeabletodiscussbasicconcepts
aboutstatisticsandprobabilities.
b t t ti ti d b biliti
Most Important Concept
MostImportantConcept
Be
Beabletoexplainthebenefitsandlimitationsof
able to explain the benefits and limitations of
differenttypesofprobabilities.
Classical
Empirical
Subjective
j
Ifyoucantexplainthevalueofthemethodyoure
usingthenotherswillexploitthelimitationsinan
efforttodiscountthevalueofyourconclusion.
Ifyoudecidetodiscussmodelswithattorneys,youd
b
betterbeabletodiscussspecificfactorsofthemodels.
b bl di ifi f f h d l
Terms and Concepts Not Included
TermsandConceptsNotIncluded
Error
ErrorRates
Rates sinceerrorsseemsorare,ahugesample
since errors seem so rare, a huge sample
sizeisneededtomathematicallyestimateerrorrates
LawofLargeNumbers.
MutuallyExclusive canthappenatthesametime
(boyorgirl,richorpoor)
Independent oneoccurrencedoesnteffectother
occurrences(I:rollingdice,dependent:pullingacard
from a deck of cards may be dependent on what cards
fromadeckofcardsmaybedependentonwhatcards
werealreadypicked).
There are different mathematical rules for different
Therearedifferentmathematicalrulesfordifferent
situations.
Terms and Concepts Not Included
TermsandConceptsNotIncluded
Sensitivity canexperimentfindpositiveresults,equation
(
(orusedwithadifferentformula,incorporatingfalseinconclusives)
d ith diff tf l i ti f l i l i )
Specificity canexperimentfindnegativeresults,equation
( p
(orspecificnotspecificbeingnonspecificity,asinL2andL3)
p g p y )
(orraritywhichisthefrequency)
Selectivity abilitytodistinguishsmalldifferences
(or used in place of the specificity equation)
(orusedinplaceofthespecificityequation)
(orusedwithaslightlydifferentequation,incorporatingfalse
inconclusives)
(orraritywhichisthefrequency)
Thesetermsareuseddifferentlyinchemistryandinstatistics.
Sincefingerprintshavebothelements,peopleareusingthesame
g p p p g
wordsfordifferentmeanings.Analystsneedtoconsiderhowan
authorisusingtheword.
Quote of the Day
QuoteoftheDay

Statisticiansarepeoplewholikemathbutdont
have the personality to be accountants
havethepersonalitytobeaccountants.
Questions?

Anda mungkin juga menyukai