What Is Evaluation

What is Evaluation?
Evaluationistheprocessofexaminingaprogramorprocesstodeterminewhat'sworking,what'snot,and
why.
Evaluationdeterminesthevalueofprogramsandactsasblueprintsforjudgmentandimprovement.
(Rossett&Sheldon,2001)
TypesofEvaluationsinInstructional
Design
Evaluationsarenormallydividedintotwobroadcategories:formativeandsummative.
Formative
Aformativeevaluation(sometimesreferredtoasinternal)isamethodforjudgingtheworthofaprogram
whiletheprogramactivitiesareforming(inprogress).Thispartoftheevaluationfocusesontheprocess.
Thus,formativeevaluationsarebasicallydoneonthefly.Theypermitthedesigners,learners,and
instructorstomonitorhowwelltheinstructionalgoalsandobjectivesarebeingmet.Itsmainpurposeisto
catchdeficienciessothattheproperlearninginterventionscantakeplacethatallowsthelearnerstomaster
therequiredskillsandknowledge.
Formativeevaluationisalsousefulinanalyzinglearningmaterials,studentlearningandachievements,and
teachereffectiveness....Formativeevaluationisprimarilyabuildingprocesswhichaccumulatesaseriesof
componentsofnewmaterials,skills,andproblemsintoanultimatemeaningfulwhole.WallyGuyot
(1978)
Summative
Asummativeevaluation(sometimesreferredtoasexternal)isamethodofjudgingtheworthofaprogram
attheendoftheprogramactivities(summation).Thefocusisontheoutcome.
Allassessmentscanbesummative(i.e.,havethepotentialtoserveasummativefunction),butonlysome
havetheadditionalcapabilityofservingformativefunctions.Scriven(1967)
Thevariousinstrumentsusedtocollectthedataarequestionnaires,surveys,interviews,observations,and
testing.Themodelormethodologyusedtogatherthedatashouldbeaspecifiedstepbystepprocedure.It
shouldbecarefullydesignedandexecutedtoensurethedataisaccurateandvalid.
Questionnairesaretheleastexpensiveprocedureforexternalevaluationsandcanbeusedtocollectlarge
samplesofgraduateinformation.Thequestionnairesshouldbetrialed(tested)beforeusingtoensurethe
recipientsunderstandtheiroperationthewaythedesignerintended.Whendesigningquestionnaires,keep
inmindthemostimportantfeatureistheguidancegivenforitscompletion.Allinstructionsshouldbe
clearlystated...letnothingbetakenforgranted.
HistoryoftheTwoEvaluations
Scriven(1967)firstsuggestedadistinctionbetweenformativeevaluationandsummativeevaluationwhen
describingtwomajorfunctionsofevaluation.Formativeevaluationwasintendedtofosterdevelopmentand
improvementwithinanongoingactivity(orperson,product,program,etc.).Summativeevaluation,in
contrast,isusedtoassesswhethertheresultsoftheobjectbeingevaluated(program,intervention,person,
etc.)metthestatedgoals.
Scrivensawtheneedtodistinguishtheformativeandsummativerolesofcurriculumevaluation.While
Scrivenpreferredsummativeevaluationsperformingafinalevaluationoftheprojectorperson,hedid
cometoacknowledgeCronbach'smeritsofformativeevaluationpartoftheprocessofcurriculum
developmentusedtoimprovethecoursewhileitisstillfluid(hebelieveditcontributesmoretothe
improvementofeducationthanevaluationusedtoappraiseaproduct).
Later,Misanchuk(1978)deliveredapaperontheneedtotightenupthedefinitionsinordertogetmore
accuratemeasurements.Theonethatseemstocausethegreatestdisagreementisthekeepingoffluid
movementsorchangesstrictlyintheprereleaseversions(beforeithitsthetargetpopulation).
InPaulSaettler's(1990)historyofinstructionaltechnology,hedescribesthetwoevaluations(pp.430431)
inthecontextofhowtheywereusedindevelopingSesameStreetandTheElectricCompanybythe
Children'sTelevisionWorkshop.CTWusedformativeevaluationsforidentifyanddefiningprogram
designsthatcouldprovidereliablepredictorsoflearningforparticularlearners.Theylaterusedsummative
evaluationstoprovetheirefforts(toquitegoodeffectImightadd).WhileSaettlerpraisesCTWfora
significantlandmarkinthetechnologyofinstructionaldesign,hewarnsthatitisstilltentativeandshould
beseenmoreasapointofdepartureratherthanafixedformula.
Saettlerdefinesthetwotypesofevaluationsas:1)formativeisusedtorefinegoalsandevolvestrategies
forachievinggoals,while2)summativeisundertakentotestthevalidityofatheoryordeterminethe
impactofaneducationalpracticesothatfutureeffortsmaybeimprovedormodified.
Thus,usingMisanchuk'sdefiningtermswillnormallyachievemoreaccuratemeasurements;however,the
costisquitehighasitishighlyresourceintensive,particularlywithtimebecauseofallthepreworkthat
hastobeperformedinthedesignphase:create,trial,redo,trial,redo,trial,redo,etc.;andallpreferably
withoutusingthetargetpopulation.
However,mostorganizationsaredemandingshorterdesigntimes.Thustheformativepartismovedoverto
theothermethods,suchasthroughtheuseofrapidprototypingandusingtestingandevaluationsmethods
toimproveasonemoveson.Whichofcourseisnotasaccuratebutitismoreappropriatetomost
organizationsastheyarenotreallythatinterestedinaccuratemeasurementsofthecontentbutratherthe
endproductskilledandknowledgeableworkers.
Misanchuk'sdefiningtermsbasicallyputsallthewaterinacontainerforaccuratemeasurementswhilethe
typicalorganizationestimatesthevolumeofwaterrunninginastream.
Thusifyouareavendor,researcher,orneedhighlyaccuratemeasurementsyouwillprobablydefinethe
twoevaluationsinthesamemannerasMisanchuk.Ifyouneedtopushthetraining/learningoutfasterand
arenotallthatworriedabouthighlyaccuratemeasurements,thenyoudefineitclosertohowmost
organizationsdoandSaettlerdoeswiththeCTWexample.
Kirkpatrick's Four Level Evaluation Model

PerhapsthebestknownevaluationmethodologyforjudginglearningprocessesisDonaldKirkpatrick's
FourLevelEvaluationModelthatwasfirstpublishedinaseriesofarticlesin1959intheJournalof
AmericanSocietyofTrainingDirectors(nowknownasT+DMagazine).Theserieswaslatercompiledand
publishedasanarticle,TechniquesforEvaluatingTrainingPrograms,inabookKirkpatrickedited,
EvaluatingTrainingPrograms(1975).Howeveritwasnotuntilhis1994bookwaspublished,Evaluating
TrainingPrograms,thatthefourlevelsbecamepopular.Nowadays,hisfourlevelsremainacornerstonein
thelearningindustry.
Whilemostpeoplerefertothefourcriteriaforevaluatinglearningprocessesaslevels,Kirkpatricknever
usedthatterm,henormallycalledthemsteps(Craig,1996).Inaddition,hedidnotcallitamodel,but
usedwordssuchastechniquesforconductingtheevaluation(Craig,1996,p294).
Thefourstepsofevaluationconsistof:
Step 1: Reaction - How well did the learners like the learning
process?
Step 2: Learning - What did they learn? (the extent to which the
learners gain knowledge and skills)
Step 3: Behavior - (What changes in job performance resulted

from the learning process? (capability to perform the newly
learned skills while on the job)
Step 4: Results - What are the tangible results of the learning

process in terms of reduced cost, improved quality, increased
production, efficiency, etc.?
Kirkpatrick'sconceptisquiteimportantasitmakesanexcellentplanning,evaluating,andtroubling
shootingtool,especiallyifwewemakesomeslightimprovementsasshowbelow.
Not Just For Training

Whilesomemistakenlyassumethefourlevelsareonlyfortrainingprocesses,themodelcanbeusedfor
otherlearningprocesses.Forexample,theHumanResourceDevelopment(HRD)professionisconcerned
withnotonlyhelpingtodevelopformallearning,suchastraining,butotherforms,suchasinformal
learning,development,andeducation(Nadler,1984).Theirhandbook,editedbyoneofthefoundersof
HRD,LeonardNadler(1984),usesKirkpatrick'sfourlevelsasoneoftheirmainevaluationmodels.
Kirkpatrickhimselfwrote,Theseobjectives[referringtohisarticle]willberelatedtoinhouseclassroom
programs,oneofthemostcommonformsoftraining.Manyoftheprinciplesandproceduresappliestoall
kindsoftrainingactivities,suchasperformancereview,participationinoutsideprograms,programmed
instruction,andthereadingofselectedbooks(Craig,1996,p294).
Improving the Four Levels

Becauseofitsageandwithallthenewtechnologyadvances,Kirkpatrick'smodelisoftencriticizedfor
beingtoooldandsimple.Yet,almostfivedecadesafteritsintroduction,therehasnotbeenaviableoption
toreplaceit.AndIbelievethereasonwhyisthatbecauseKirkpatrickbasicallynailedit,buthedidgeta
fewthingswrong:
Motivation, Not Reaction

Whenalearnergoesthroughalearningprocess,suchasanelearningcourse,informallearningepisode,or
usingajobperformanceaid,thelearnerhastomakeadecisionastowhetherheorshewillpayattentionto
it.Ifthegoalortaskisjudgedasimportantanddoable,thenthelearnerisnormallymotivatedtoengagein
it(Markus,Ruvolo,1990).However,ifthetaskispresentedaslowrelevanceorthereisalowprobability
ofsuccess,thenanegativeeffectisgeneratedandmotivationfortaskengagementislow.Inaddition,
researchonReactionevaluationsgenerallyshowthatitisnotavalidmeasurementforsuccess(seethelast
section,Criticisms).
ThisdiffersfromKirkpatrick(1996)whowrotethatreactionwashowwellthelearnerslikedaparticular
learningprocess.However,thelessrelevancethelearningpackageistoalearner,thenthemoreeffortthat
hastobeputintothedesignandpresentationofthelearningpackage.Thatis,ifitisnotrelevanttothe
learner,thenthelearningpackagehastohookthelearnerthroughslickdesign,humor,games,etc.Thisis
nottosaythatdesign,humor,orgamesareunimportant;however,theiruseinalearningpackageshouldbe
topromoteoraidthelearningprocessratherthanjustmakeitfun.Andifalearningpackageisbuiltof
soundpurposeanddesign,thenitshouldsupportthelearnersinbridgingaperformancegap.Hence,they
shouldbemotivatedtolearnifnot,somethingdreadfullywentwrongduringtheplanninganddesign
processes!Ifyoufindyourselfhavingtohookthelearnersthroughslickdesign,thenyouprobablyneedto
reevaluatethepurposeofyourlearningprocesses.
Performance, Not Behavior

AsGilbertnoted(1998),performanceisabetterobjectivethanbehaviorbecauseperformancehastwo
aspects:behaviorbeingthemeansanditsconsequencebeingtheend...anditistheendwearemostly
concernedwith.
Flipping it into a Better Model

Themodelisupsidedownasitplacesthetwomostimportantitemslastresults,andbehavior,which
basicallyimprintstheimportanceoforderinmostpeople'shead.Thusbyflippingitupsidedownand
addingtheabovechangesweget:
Result - What impact (outcome or result) will improve our

business?
Performance - What do the employees have to perform in order

to create the desired impact?
Learning - What knowledge, skills, and resources do they need

in order to perform? (courses or classrooms are the LAST answer,
see Selecting the Instructional Setting)
Motivation - What do they need to perceive in order to learn

and perform? (Do they see a need for the desired performance?)
Thismakesitbothaplanningandevaluationtoolwhichcanbeusedasatroublingshootingheuristic:
(Chyung,2008):
Compare criterion and norm-referenced tests for MR students

Whataretheadvantagesanddisadvantagesofnormandcriterionrelatedassessmentsandformativeand
summativeevaluations?
"Whenthecooktastesthesoup,that'sformative;whenthegueststastethesoup,that'ssummative."
NormandcriterionrelatedassessmentsMUSTbeusedinbothtypesofevaluation.Criterionreferenced
referstohowourstudentmeasuresuptosomestandardsetbyanoutsidesource.
Forexample,acriterionwouldbetobeabletojumpacertainheight,ortoreadacertainsetofwords.
whichisprobablynottrueinthecaseofanmentallretarded(MR)student.Maybeonecriterionwouldbe
thatthestudentbeabletonamehis/hercolorsbyacertaintime.
Normreferencedtestsaretestswhichcomparethestudentbeingtestedwithallotherstudents.Norm
referencedtestsareusedtoclassifystudents,toplacethem.MRstudentsbydefinitiondonottestaswellas
otherstudentshis/herage.
Theadvantageofanormreferencedtestisthatitshowsushowourstudentisdoingrelatedtoother
studentsacrossthecountry.Adisadvantageisthattheyarestandardizedanddonotshowsmallincrements
ofgain.Theyaregoodforusingforplacementatthebeginningandthenagainfourorsixmonthslater,or
attheendoftheyear.Thiswillshowgrowthovertheperiodofthetime.
Normreferenced(alsocalledstandardizedorcriterionreferenced)testsalongwithinformalobservational
evaluationareusefulforshowingstudentgrowthovertime.Theyaren'ttobeusedforgradingthoughthey
canbeoneelementinatotalgrade.Onemustrememberwecan'texpectgreatgrowth,ifany,overshort
periodsoftimes,particularlyasshownonanormreferencedtest.
Thedefinitionofretardedis"slowed."Thatmeansthatthegrowthofourstudentsisslowed,butinmost
cases,formanythingsandformoststudents,notstopped.Asamatteroffact,itisjustunlikelyfora
"normal"populationofstudentstoshowmuchgrowthevenafterasemester'stime.Thesetestsarenot
intendedformeasuringsmallincrementsofgain.
Criterionrelatedtestsarenicebecausewecanseejustwhatourstudentaccomplished.Sonow,afterthree
months,s/hecanrecognize35morewords,ormaybe65morewordsthans/hecouldbefore.Thestudent
cannameallthecolors.Thestudentisnowputtingawaytoyswheretheybelongwherebefores/heeither
wouldnotorcouldnot.
What is Authentic Assessment?

Definitions
WhatDoesAuthenticAssessmentLookLike?
HowisAuthenticAssessmentSimilarto/Differentfrom
TraditionalAssessment?
Traditional Assessment
Authentic Assessment
Authentic Assessment Complements Traditional Assessment
Defining Attributes of Authentic and Traditional Assessment
Teaching to the Test
AlternativeNamesforAuthenticAssessment
Definitions
Aformofassessmentinwhichstudentsareaskedtoperformrealworldtasksthatdemonstratemeaningful
applicationofessentialknowledgeandskillsJonMueller
"...Engagingandworthyproblemsorquestionsofimportance,inwhichstudentsmustuseknowledgeto
fashionperformanceseffectivelyandcreatively.Thetasksareeitherreplicasoforanalogoustothekindsof
problemsfacedbyadultcitizensandconsumersorprofessionalsinthefield."GrantWiggins
(Wiggins,1993,p.229).
"Performanceassessmentscallupontheexamineetodemonstratespecificskillsandcompetencies,thatis,
toapplytheskillsandknowledgetheyhavemastered."RichardJ.Stiggins(Stiggins,1987,p.34).
What does Authentic Assessment look like?

Anauthenticassessmentusuallyincludesataskforstudentstoperformandarubricbywhichtheir
performanceonthetaskwillbeevaluated.Clickthefollowinglinkstoseemanyexamplesofauthentic
tasksandrubrics.
Examples from teachers in my Authentic Assessment course
How is Authentic Assessment similar to/different from

Traditional Assessment?
Thefollowingcomparisonissomewhatsimplistic,butIhopeitilluminatesthedifferentassumptionsofthe
twoapproachestoassessment.
TraditionalAssessment
By"traditionalassessment"(TA)Iamreferringtotheforcedchoicemeasuresofmultiplechoicetests,fill
intheblanks,truefalse,matchingandthelikethathavebeenandremainsocommonineducation.
Studentstypicallyselectananswerorrecallinformationtocompletetheassessment.Thesetestsmaybe
standardizedorteachercreated.Theymaybeadministeredlocallyorstatewide,orinternationally.
Behindtraditionalandauthenticassessmentsisabeliefthattheprimarymissionofschoolsistohelp
developproductivecitizens.ThatistheessenceofmostmissionstatementsIhaveread.Fromthis
commonbeginning,thetwoperspectivesonassessmentdiverge.Essentially,TAisgroundedin
educationalphilosophythatadoptsthefollowingreasoningandpractice:
1.Aschool'smissionistodevelopproductivecitizens.
2.Tobeaproductivecitizenanindividualmustpossessacertainbodyofknowledgeandskills.
3.Therefore,schoolsmustteachthisbodyofknowledgeandskills.
4.Todetermineifitissuccessful,theschoolmustthenteststudentstoseeiftheyacquiredtheknowledge
andskills.
IntheTAmodel,thecurriculumdrivesassessment."The"bodyofknowledgeisdeterminedfirst.That
knowledgebecomesthecurriculumthatisdelivered.Subsequently,theassessmentsaredevelopedand
administeredtodetermineifacquisitionofthecurriculumoccurred.
AuthenticAssessment
Incontrast,authenticassessment(AA)springsfromthefollowingreasoningandpractice:
1.Aschool'smissionistodevelopproductivecitizens.
2.Tobeaproductivecitizen,anindividualmustbecapableofperformingmeaningfultasksinthereal
world.
3.Therefore,schoolsmusthelpstudentsbecomeproficientatperformingthetaskstheywillencounter
whentheygraduate.
4.Todetermineifitissuccessful,theschoolmustthenaskstudentstoperformmeaningfultasksthat
replicaterealworldchallengestoseeifstudentsarecapableofdoingso.
Thus,inAA,assessmentdrivesthecurriculum.Thatis,teachersfirstdeterminethetasksthatstudentswill
performtodemonstratetheirmastery,andthenacurriculumisdevelopedthatwillenablestudentsto
performthosetaskswell,whichwouldincludetheacquisitionofessentialknowledgeandskills.Thishas
beenreferredtoasplanningbackwards(e.g.,McDonald,1992).
IfIwereagolfinstructorandItaughttheskillsrequiredtoperformwell,Iwouldnotassessmystudents'
performancebygivingthemamultiplechoicetest.Iwouldputthemoutonthegolfcourseandaskthem
toperform.Althoughthisisobviouswithathleticskills,itisalsotrueforacademicsubjects.Wecanteach
studentshowtodomath,dohistoryanddoscience,notjustknowthem.Then,toassesswhatourstudents
hadlearned,wecanaskstudentstoperformtasksthat"replicatethechallenges"facedbythoseusing
mathematics,doinghistoryorconductingscientificinvestigation.
Authentic Assessment Complements Traditional Assessment

ButateacherdoesnothavetochoosebetweenAAandTA.Itislikelythatsomemixofthetwowillbest
meetyourneeds.Touseasillyexample,ifIhadtochooseachauffeurfrombetweensomeonewhopassed
thedrivingportionofthedriver'slicensetestbutfailedthewrittenportionorsomeonewhofailedthe
drivingportionandpassedthewrittenportion,Iwouldchoosethedriverwhomostdirectlydemonstrated
theabilitytodrive,thatis,theonewhopassedthedrivingportionofthetest.However,Iwouldprefera
driverwhopassedbothportions.Iwouldfeelmorecomfortableknowingthatmychauffeurhadagood
knowledgebaseaboutdriving(whichmightbestbeassessedinatraditionalmanner)andwasabletoapply
thatknowledgeinarealcontext(whichcouldbedemonstratedthroughanauthenticassessment).
Defining Attributes of Traditional and Authentic Assessment

AnotherwaythatAAiscommonlydistinguishedfromTAisintermsofitsdefiningattributes.Ofcourse,
TA'saswellasAA'svaryconsiderablyintheformstheytake.But,typically,alongthecontinuumsof
attributeslistedbelow,TA'sfallmoretowardstheleftendofeachcontinuumandAA'sfallmoretowards
therightend.
TraditionalAuthentic
SelectingaResponsePerformingaTask
ContrivedReallife
Recall/RecognitionConstruction/Application
TeacherstructuredStudentstructured
IndirectEvidenceDirectEvidence
Letmeclarifytheattributesbyelaboratingoneachinthecontextoftraditionalandauthenticassessments:
SelectingaResponsetoPerformingaTask:Ontraditionalassessments,studentsaretypicallygiven
severalchoices(e.g.,a,b,cord;trueorfalse;whichofthesematchwiththose)andaskedtoselecttheright
answer.Incontrast,authenticassessmentsaskstudentstodemonstrateunderstandingbyperformingamore
complextaskusuallyrepresentativeofmoremeaningfulapplication.
ContrivedtoReallife:Itisnotveryofteninlifeoutsideofschoolthatweareaskedtoselectfromfour
alternativestoindicateourproficiencyatsomething.Testsofferthesecontrivedmeansofassessmentto
increasethenumberoftimesyoucanbeaskedtodemonstrateproficiencyinashortperiodoftime.More
commonlyinlife,asinauthenticassessments,weareaskedtodemonstrateproficiencybydoing
something.
Recall/RecognitionofKnowledgetoConstruction/ApplicationofKnowledge:Welldesigned
traditionalassessments(i.e.,testsandquizzes)caneffectivelydeterminewhetherornotstudentshave
acquiredabodyofknowledge.Thus,asmentionedabove,testscanserveasanicecomplementtoauthentic
assessmentsinateacher'sassessmentportfolio.Furthermore,weareoftenaskedtorecallorrecognizefacts
andideasandpropositionsinlife,sotestsaresomewhatauthenticinthatsense.However,the
demonstrationofrecallandrecognitionontestsistypicallymuchlessrevealingaboutwhatwereallyknow
andcandothanwhenweareaskedtoconstructaproductorperformanceoutoffacts,ideasand
propositions.Authenticassessmentsoftenaskstudentstoanalyze,synthesizeandapplywhattheyhave
learnedinasubstantialmanner,andstudentscreatenewmeaningintheprocessaswell.
TeacherstructuredtoStudentstructured:Whencompletingatraditionalassessment,whatastudent
canandwilldemonstratehasbeencarefullystructuredbytheperson(s)whodevelopedthetest.Astudent's
attentionwillunderstandablybefocusedonandlimitedtowhatisonthetest.Incontrast,authentic
assessmentsallowmorestudentchoiceandconstructionindeterminingwhatispresentedasevidenceof
proficiency.Evenwhenstudentscannotchoosetheirowntopicsorformats,thereareusuallymultiple
acceptableroutestowardsconstructingaproductorperformance.Obviously,assessmentsmorecarefully
controlledbytheteachersofferadvantagesanddisadvantages.Similarly,morestudentstructuredtasks
havestrengthsandweaknessesthatmustbeconsideredwhenchoosinganddesigninganassessment.
IndirectEvidencetoDirectEvidence:Evenifamultiplechoicequestionasksastudenttoanalyzeor
applyfactstoanewsituationratherthanjustrecallthefacts,andthestudentselectsthecorrectanswer,
whatdoyounowknowaboutthatstudent?Didthatstudentgetluckyandpicktherightanswer?What
thinkingledthestudenttopickthatanswer?Wereallydonotknow.Atbest,wecanmakesomeinferences
aboutwhatthatstudentmightknowandmightbeabletodowiththatknowledge.Theevidenceisvery
indirect,particularlyforclaimsofmeaningfulapplicationincomplex,realworldsituations.Authentic
assessments,ontheotherhand,offermoredirectevidenceofapplicationandconstructionofknowledge.
Asinthegolfexampleabove,puttingagolfstudentonthegolfcoursetoplayprovidesmuchmoredirect
evidenceofproficiencythangivingthestudentawrittentest.Canastudenteffectivelycritiquethe
argumentssomeoneelsehaspresented(animportantskilloftenrequiredintherealworld)?Askinga
studenttowriteacritiqueshouldprovidemoredirectevidenceofthatskillthanaskingthestudentaseries
ofmultiplechoice,analyticalquestionsaboutapassage,althoughbothassessmentsmaybeuseful.
TeachingtotheTest
Thesetwodifferentapproachestoassessmentalsoofferdifferentadviceaboutteachingtothetest.Under
theTAmodel,teachershavebeendiscouragedfromteachingtothetest.Thatisbecauseatestusually
assessesasampleofstudents'knowledgeandunderstandingandassumesthatstudents'performanceonthe
sampleisrepresentativeoftheirknowledgeofalltherelevantmaterial.Ifteachersfocusprimarilyonthe
sampletobetestedduringinstruction,thengoodperformanceonthatsampledoesnotnecessarilyreflect
knowledgeofallthematerial.So,teachershidethetestsothatthesampleisnotknownbeforehand,and
teachersareadmonishednottoteachtothetest.
WithAA,teachersareencouragedtoteachtothetest.Studentsneedtolearnhowtoperformwellon
meaningfultasks.Toaidstudentsinthatprocess,itishelpfultoshowthemmodelsofgood(andnotso
good)performance.Furthermore,thestudentbenefitsfromseeingthetaskrubricaheadoftimeaswell.Is
this"cheating"?Willstudentsthenjustbeabletomimictheworkofotherswithouttrulyunderstanding
whattheyaredoing?Authenticassessmentstypicallydonotlendthemselvestomimicry.Thereisnotone
correctanswertocopy.So,byknowingwhatgoodperformancelookslike,andbyknowingwhatspecific
characteristicsmakeupgoodperformance,studentscanbetterdeveloptheskillsandunderstanding
necessarytoperformwellonthesetasks.(Forfurtherdiscussionofteachingtothetest,seeBushweller.)
Alternative Names for Authentic Assessment

YoucanalsolearnsomethingaboutwhatAAisbylookingattheothercommonnamesforthisformof
assessment.Forexample,AAissometimesreferredtoas
Performance Assessment (or Performance-based) -- so-called

because students are asked to perform meaningful tasks. This is
the other most common term for this type of assessment. Some
educators distinguish performance assessment from AA by
defining performance assessment as performance-based as

Stiggins has above but with no reference to the authentic nature
of the task (e.g., Meyer, 1992). For these educators, authentic
assessments are performance assessments using real-world or
authentic tasks or contexts. Since we should not typically ask
students to perform work that is not authentic in nature, I choose
to treat these two terms synonymously.
Alternative Assessment -- so-called because AA is an

alternative to traditional assessments.
Direct Assessment -- so-called because AA provides more

direct evidence of meaningful application of knowledge and
skills. If a student does well on a multiple-choice test we might
infer indirectly that the student could apply that knowledge in
real-world contexts, but we would be more comfortable making
that inference from a direct demonstration of that application
such as in the golfing example above.
NormReferenced
TypesofTests
Standardizedtestscomparestudents'performancetothatofanormingorsamplegroupwhoareinthesamegradeorareofthesameage.
Students'performanceiscommunicatedinpercentileranks,gradeequivalentscores,normalcurveequivalents,scaledscores,orstanine
scores.
Examples:IowaTests;SAT;DRP;ACT
CriterionReferenced
Astudent'sperformanceismeasuredagainstastandard.Oneformofcriterionreferencedassessmentisthebenchmark,adescriptionofakey
taskthatstudentsareexpectedtoperform.
Examples:DIBELS;Chaptertests;Driver'sLicenseTest;FCAT(FloridaComprehensiveAssessmentTest)
Survey
Surveyteststypicallyprovideanoverviewofgeneralcomprehensionandwordknowledge.
Examples:Interestsurveys;KWL;LearningStylesInventory
DiagnosticTools
Diagnostictestsassessanumberofareasingreaterdepth.
Examples:WoodcockJohnson;BRI;"TheFoxintheBox"
FormalTests
Formaltestsmaybestandardized.Theyaredesignedtobegivenaccordingtoastandardsetofcircumstances,theyhavetimelimits,andthey
havesetsofdirectionswhicharetobefollowedexactly.
Examples:SAT;FCAT;ACT
InformalTests
Informaltestsgenerallydonothaveasetofstandarddirections.Theyhaveagreatdealofflexibilityinhowtheyareadministered.Theyare
constructedbyteachersandhaveunknownvalidityandreliability.
Examples:Reviewgames;Quizzes
Static(Summative)Tests
Measureswhatthestudenthaslearned.
Examples:Endofchaptertests;Finalexaminations;Standardizedstatetests
Dynamic(Formative)Tests
Measuresthestudents'graspofmaterialthatiscurrentlybeingtaught.Canalsomeasurereadiness.Formativetestshelpguideandinform
instructionandlearning.
Examples:Quizzes;Homework;Portfolios
LawrenceKohlberg'sstagesofmoraldevelopmentconstituteanadaptationofapsychologicaltheory
originallyconceivedbytheSwisspsychologistJeanPiaget.Kohlbergbeganworkonthistopicwhilea
psychologygraduatestudentattheUniversityofChicago
[1]in1958,andexpandedanddevelopedthis
theorythroughouthislife.
Thetheoryholdsthatmoralreasoning,thebasisforethicalbehavior,hassixidentifiabledevelopmental
stages,eachmoreadequateatrespondingtomoraldilemmasthanitspredecessor.[2]Kohlbergfollowedthe
developmentofmoraljudgmentfarbeyondtheagesstudiedearlierbyPiaget,[3]whoalsoclaimedthatlogic
andmoralitydevelopthroughconstructivestages.[2]ExpandingonPiaget'swork,Kohlbergdeterminedthat
theprocessofmoraldevelopmentwasprincipallyconcernedwithjustice,andthatitcontinuedthroughout
theindividual'slifetime,[4]anotionthatspawneddialogueonthephilosophicalimplicationsofsuch
research.[5][6]
Thesixstagesofmoraldevelopmentaregroupedintothreelevels:preconventionalmorality,conventional
morality,andpostconventionalmorality.
Stages
Kohlberg'ssixstagescanbemoregenerallygroupedintothreelevelsoftwostageseach:preconventional,
conventionalandpostconventional.[7][8][9]FollowingPiaget'sconstructivistrequirementsforastagemodel,
asdescribedinhistheoryofcognitivedevelopment,itisextremelyraretoregressinstagestolosethe
useofhigherstageabilities.[14][15]Stagescannotbeskipped;eachprovidesanewandnecessaryperspective,
morecomprehensiveanddifferentiatedthanitspredecessorsbutintegratedwiththem. [14][15]
Level 1 (Pre-Conventional)
1. Obedience and punishment orientation
(How can I avoid punishment?)
2. Self-interest orientation
(What's in it for me?)
(Paying for a benefit)
Level 2 (Conventional)
3. Interpersonal accord and conformity
(Social norms)
(The good boy/girl attitude)
4. Authority and social-order maintaining orientation
(Law and order morality)
Level 3 (Post-Conventional)
5. Social contract orientation
6. Universal ethical principles
(Principled conscience)
Theunderstandinggainedineachstageisretainedinlaterstages,butmayberegardedbythoseinlater
stagesassimplistic,lackinginsufficientattentiontodetail.
Pre-conventional
Thepreconventionallevelofmoralreasoningisespeciallycommoninchildren,althoughadultscanalso
exhibitthislevelofreasoning.Reasonersatthisleveljudgethemoralityofanactionbyitsdirect
consequences.Thepreconventionallevelconsistsofthefirstandsecondstagesofmoraldevelopment,and
issolelyconcernedwiththeselfinanegocentricmanner.Achildwithpreconventionalmoralityhasnot
yetadoptedorinternalizedsociety'sconventionsregardingwhatisrightorwrong,butinsteadfocuses
largelyonexternalconsequencesthatcertainactionsmaybring.[7][8][9]
InStageone(obedienceandpunishmentdriven),individualsfocusonthedirectconsequencesoftheir
actionsonthemselves.Forexample,anactionisperceivedasmorallywrongbecausetheperpetratoris
punished."ThelasttimeIdidthatIgotspankedsoIwillnotdoitagain."Theworsethepunishmentforthe
actis,themore"bad"theactisperceivedtobe.[16]Thiscangiverisetoaninferencethateveninnocent
victimsareguiltyinproportiontotheirsuffering.Itis"egocentric,"lackingrecognitionthatothers'points
ofviewaredifferentfromone'sown.[17]Thereis"deferencetosuperiorpowerorprestige."[17]
Anexampleofobedienceandpunishmentdrivenmoralitywouldbeachildrefusingtodosomething
becauseitiswrongandthattheconsequencescouldresultinpunishment.Forexample,achild'sclassmate
triestodarethechildinplayinghookyfromschool.Thechildwouldapplyobedienceandpunishment
drivenmoralitybyrefusingtoplayhookybecausehewouldgetpunished.Anotherexampleofobedience
andpunishmentdrivenmoralityiswhenachildrefusestocheatonatestbecausethechildwouldget
punished
Stagetwo(selfinterestdriven)expressesthe"what'sinitforme"position,inwhichrightbehavioris
definedbywhatevertheindividualbelievestobeintheirbestinterestbutunderstoodinanarrowway
whichdoesnotconsiderone'sreputationorrelationshipstogroupsofpeople.Stagetworeasoningshowsa
limitedinterestintheneedsofothers,butonlytoapointwhereitmightfurthertheindividual'sown
interests.Asaresult,concernforothersisnotbasedonloyaltyorintrinsicrespect,butrathera"You
scratchmyback,andI'llscratchyours."mentality.[2]Thelackofasocietalperspectiveinthepre
conventionallevelisquitedifferentfromthesocialcontract(stagefive),asallactionshavethepurposeof
servingtheindividual'sownneedsorinterests.Forthestagetwotheorist,theworld'sperspectiveisoften
seenasmoralrelativism.
Anexampleofselfinterestdriveniswhenachildisaskedbyhisparentstodoachore.Thechildasks
"what'sinitforme?"Theparentswouldofferthechildanincentivebygivingachildanallowancetopay
themfortheirchores.Thechildismotivatedtodochoresforselfinterest.Anotherexampleofselfinterest
driveniswhenachilddoestheirhomeworkinexchangeforbettergradesandrewardsfromtheirparents
Conventional
Theconventionallevelofmoralreasoningistypicalofadolescentsandadults.Toreasoninaconventional
wayistojudgethemoralityofactionsbycomparingthemtosociety'sviewsandexpectations.The
conventionallevelconsistsofthethirdandfourthstagesofmoraldevelopment.Conventionalmoralityis
characterizedbyanacceptanceofsociety'sconventionsconcerningrightandwrong.Atthislevelan
individualobeysrulesandfollowssociety'snormsevenwhentherearenoconsequencesforobedienceor
disobedience.Adherencetorulesandconventionsissomewhatrigid,however,andarule'sappropriateness
orfairnessisseldomquestioned.[7][8][9]
InStagethree(goodintentionsasdeterminedbysocialconsensus),theselfenterssocietybyconforming
tosocialstandards.Individualsarereceptivetoapprovalordisapprovalfromothersasitreflectssociety's
views.Theytrytobea"goodboy"or"goodgirl"toliveuptotheseexpectations,[2]havinglearnedthat
beingregardedasgoodbenefitstheself.Stagethreereasoningmayjudgethemoralityofanactionby
evaluatingitsconsequencesintermsofaperson'srelationships,whichnowbegintoincludethingslike
respect,gratitudeandthe"goldenrule"."Iwanttobelikedandthoughtwellof;apparently,notbeing
naughtymakespeoplelikeme."Conformingtotherulesforone'ssocialroleisnotyetfullyunderstood.
Theintentionsofactorsplayamoresignificantroleinreasoningatthisstage;onemayfeelmoreforgiving
ifonethinks,"theymeanwell..."[2]
InStagefour(authorityandsocialorderobediencedriven),itisimportanttoobeylaws,dictumsandsocial
conventionsbecauseoftheirimportanceinmaintainingafunctioningsociety.Moralreasoninginstage
fouristhusbeyondtheneedforindividualapprovalexhibitedinstagethree.Acentralidealoridealsoften
prescribewhatisrightandwrong.Ifonepersonviolatesalaw,perhapseveryonewouldthusthereisan
obligationandadutytoupholdlawsandrules.Whensomeonedoesviolatealaw,itismorallywrong;
culpabilityisthusasignificantfactorinthisstageasitseparatesthebaddomainsfromthegoodones.Most
activemembersofsocietyremainatstagefour,wheremoralityisstillpredominantlydictatedbyanoutside
force.[2]
Post-Conventional
Thepostconventionallevel,alsoknownastheprincipledlevel,ismarkedbyagrowingrealizationthat
individualsareseparateentitiesfromsociety,andthattheindividualsownperspectivemaytake
precedenceoversocietysview;individualsmaydisobeyrulesinconsistentwiththeirownprinciples.Post
conventionalmoralistslivebytheirownethicalprinciplesprinciplesthattypicallyincludesuchbasic
humanrightsaslife,liberty,andjustice.Peoplewhoexhibitpostconventionalmoralityviewrulesas
usefulbutchangeablemechanismsideallyrulescanmaintainthegeneralsocialorderandprotecthuman
rights.Rulesarenotabsolutedictatesthatmustbeobeyedwithoutquestion.Becausepostconventional
individualselevatetheirownmoralevaluationofasituationoversocialconventions,theirbehavior,
especiallyatstagesix,canbeconfusedwiththatofthoseatthepreconventionallevel.
Sometheoristshavespeculatedthatmanypeoplemayneverreachthislevelofabstractmoralreasoning. [7][8]
[9]
InStagefive(socialcontractdriven),theworldisviewedasholdingdifferentopinions,rightsandvalues.
Suchperspectivesshouldbemutuallyrespectedasuniquetoeachpersonorcommunity.Lawsareregarded
associalcontractsratherthanrigidedicts.Thosethatdonotpromotethegeneralwelfareshouldbe
changedwhennecessarytomeetthegreatestgoodforthegreatestnumberofpeople." [8]Thisisachieved
throughmajoritydecisionandinevitablecompromise.Democraticgovernmentisostensiblybasedonstage
fivereasoning.
InStagesix(universalethicalprinciplesdriven),moralreasoningisbasedonabstractreasoningusing
universalethicalprinciples.Lawsarevalidonlyinsofarastheyaregroundedinjustice,andacommitment
tojusticecarrieswithitanobligationtodisobeyunjustlaws.Legalrightsareunnecessary,associal
contractsarenotessentialfordeonticmoralaction.Decisionsarenotreachedhypotheticallyina
conditionalwaybutrathercategoricallyinanabsoluteway,asinthephilosophyofImmanuelKant.[18]This
involvesanindividualimaginingwhattheywoulddoinanothersshoes,iftheybelievedwhatthatother
personimaginestobetrue.[19]Theresultingconsensusistheactiontaken.Inthiswayactionisnevera
meansbutalwaysanendinitself;theindividualactsbecauseitisright,andnotbecauseitavoids
punishment,isintheirbestinterest,expected,legal,orpreviouslyagreedupon.AlthoughKohlberginsisted
thatstagesixexists,hefounditdifficulttoidentifyindividualswhoconsistentlyoperatedatthatlevel. [15]
MontessorieducationisaneducationalapproachdevelopedbyItalianphysicianandeducatorMaria
Montessoriandcharacterizedbyanemphasisonindependence,freedomwithinlimits,andrespectfora
childsnaturalpsychological,physical,andsocialdevelopment.Althougharangeofpracticesexistsunder
thename"Montessori",theAssociationMontessoriInternationale(AMI)andtheAmericanMontessori
Society(AMS)citetheseelementsasessential:[2][3]
Mixedageclassrooms,withclassroomsforchildrenages2or3to6yearsoldbyfarthemost
common
Studentchoiceofactivityfromwithinaprescribedrangeofoptions
Uninterruptedblocksofworktime,ideallythreehours
Aconstructivistor"discovery"model,wherestudentslearnconceptsfromworkingwithmaterials,
ratherthanbydirectinstruction
SpecializededucationalmaterialsdevelopedbyMontessoriandhercollaborators
Freedomofmovementwithintheclassroom
AtrainedMontessoriteacher
Montessori education is fundamentally a model of human

development, and an educational approach based on that model. The
model has two basic principles. First, children and developing adults
engage in psychological self-construction by means of interaction with
their environments. Second, children, especially under the age of six,
have an innate path of psychological development. Based on her
observations, Montessori believed that children at liberty to choose
and act freely within an environment prepared according to her model
would act spontaneously for optimal development.
UnderstandingbyDesign,orUbD,isatoolutilizedforeducationalplanningfocusedon"teachingfor
understanding"advocatedbyJayMcTigheandGrantWigginsintheirUnderstandingbyDesign(1998),
publishedbytheAssociationforSupervisionandCurriculumDevelopment.[1][2]TheemphasisofUbDison
"backwarddesign",thepracticeoflookingattheoutcomesinordertodesigncurriculumunits,
performanceassessments,andclassroominstruction.[3]
"UnderstandingbyDesign"and"UbD"areregisteredtrademarksoftheAssociationforSupervisionand
CurriculumDevelopment("ASCD").AccordingtoWiggins,"ThepotentialofUbDforcurricular
improvementhasstruckachordinAmericaneducation.Over250,000educatorsownthebook.Over
30,000Handbooksareinuse.Morethan150Universityeducationclassesusethebookasatext."[1]As
definedbyWigginsandMcTighe,UnderstandingbyDesignisa"frameworkfordesigningcurriculum
units,performanceassessments,andinstructionthatleadyourstudentstodeepunderstandingofthecontent
youteach,"[4]UbDexpandson"sixfacetsofunderstanding",whichincludestudentsbeingabletoexplain,
interpret,apply,haveperspective,empathize,andhaveselfknowledgeaboutagiventopic.[5]
UnderstandingbyDesignreliesonwhatWigginsandMcTighecall"backwarddesign"(alsoknownas
"backwardsplanning").Teachers,accordingtoUbDproponents,traditionallystartcurriculumplanning
withactivitiesandtextbooksinsteadofidentifyingclassroomlearninggoalsandplanningtowardsthat
goal.Inbackwarddesign,theteacherstartswithclassroomoutcomesandthenplansthecurriculum,
choosingactivitiesandmaterialsthathelpdeterminestudentabilityandfosterstudentlearning. [6]
TheBackwarddesignapproachisdevelopedinthreestages.Stage1startswitheducatorsidentifyingthe
desiredresultsoftheirstudentsbyestablishingtheoverallgoalofthelessonsbyusingcontentstandards,
commoncoreorstatestandards.Inaddition,UbD'sstage1defines"Studentswillunderstandthat..."and
listsessentialquestionsthatwillguidethelearnertounderstanding.Stage1alsofocusesonidentifying
"whatstudentswillknow"andmostimportantly"whatstudentswillbeabletodo".
Difficulty Index - Teachers produce a difficulty index for a test item by

calculating the proportion of students in class who got an item correct.
(The name of this index is counter-intuitive, as one actually gets a
measure of how easy the item is, not the difficulty of the item.) The
larger the proportion, the more students who have learned the content
measured by the item.
C. Item Analysis
Afteryoucreateyourobjectiveassessmentitemsandgiveyourtest,howcan
youbesurethattheitemsareappropriatenottoodifficultandnottooeasy?
Howwillyouknowifthetesteffectivelydifferentiatesbetweenstudentswho
dowellontheoveralltestandthosewhodonot?Anitemanalysisisa
valuable,yetrelativelyeasy,procedurethatteacherscanusetoanswerbothof
thesequestions.
Todeterminethedifficultyleveloftestitems,ameasurecalledtheDifficulty
Indexisused.Thismeasureasksteacherstocalculatetheproportionofstudents
whoansweredthetestitemaccurately.Bylookingateachalternative(for
multiplechoice),wecanalsofindoutifthereareanswerchoicesthatshouldbe
replaced.Forexample,let'ssayyougaveamultiplechoicequizandtherewere
fouranswerchoices(A,B,C,andD).Thefollowingtableillustrateshowmany
studentsselectedeachanswerchoiceforQuestion#1and#2.
Question
#1
24*
#2
12*
13
*Denotescorrectanswer.
ForQuestion#1,wecanseethatAwasnotaverygooddistractornoone
selectedthatanswer.Wecanalsocomputethedifficultyoftheitembydividing
thenumberofstudentswhochoosethecorrectanswer(24)bythenumberof
totalstudents(30).Usingthisformula,thedifficultyofQuestion#1(referredto
asp)isequalto24/30or.80.Arough"ruleofthumb"isthatiftheitem
difficultyismorethan.75,itisaneasyitem;ifthedifficultyisbelow.25,itisa
difficultitem.Giventheseparameters,thisitemcouldberegardedmoderately
easylots(80%)ofstudentsgotitcorrect.Incontrast,Question#2ismuch
moredifficult(12/30=.40).Infact,onQuestion#2,morestudentsselectedan
incorrectanswer(B)thanselectedthecorrectanswer(A).Thisitemshouldbe
carefullyanalyzedtoensurethatBisanappropriatedistractor.
Anothermeasure,theDiscriminationIndex,referstohowwellanassessment
differentiatesbetweenhighandlowscorers.Inotherwords,youshouldbeable
toexpectthatthehighperformingstudentswouldselectthecorrectanswerfor
eachquestionmoreoftenthanthelowperformingstudents.Ifthisistrue,then
theassessmentissaidtohaveapositivediscriminationindex(between0and1)
indicatingthatstudentswhoreceivedahightotalscorechosethecorrect
answerforaspecificitemmoreoftenthanthestudentswhohadaloweroverall
score.If,however,youfindthatmoreofthelowperformingstudentsgota
specificitemcorrect,thentheitemhasanegativediscriminationindex
(between1and0).Let'slookatanexample.
Table2displaystheresultsoftenquestionsonaquiz.Notethatthestudentsare
arrangedwiththetopoverallscorersatthetopofthetable.
Student
Total
Score (%)
Questions
1
Asif
90
Sam
90
Jill
80
Charlie
80
Sonya
70
Ruben
60
Clay
60
Kelley
50
Justin
50
Tonya
40
"1"indicatestheanswerwascorrect;"0"indicatesitwasincorrect.
FollowthesestepstodeterminetheDifficultyIndexandtheDiscrimination
Index.
1. After the students are arranged with the

highest overall scores at the top, count the
number of students in the upper and lower
group who got each item correct. For Question
#1, there were 4 students in the top half who
got it correct, and 4 students in the bottom
half.
2. Determine the Difficulty Index by dividing the
number who got it correct by the total number
of students. For Question #1, this would be
8/10 or p=.80.
3. Determine the Discrimination Index by
subtracting the number of students in the
lower group who got the item correct from the
number of students in the upper group who
got the item correct. Then, divide by the
number of students in each group (in this case,
there are five in each group). For Question #1,
that means you would subtract 4 from 4, and
divide by 5, which results in a Discrimination
Index of 0.
4. The answers for Questions 1-3 are provided in

Table 2.
A. Bloom's Taxonomy
Questions (items) on quizzes and exams can demand different levels of
thinking skills. For example, some questions might be simple
memorization of facts, and others might require the ability to
synthesize information from several sources to select or construct a
response. Benjamin Bloom created a hierarchy of cognitive skills
(called Bloom's taxonomy) that is often used to categorize the levels
of cognitive involvement (thinking skills) in educational settings. The
taxonomy provides a good structure to assist teachers in writing
objectives and assessments. It can be divided into two levels -- Level I
(the lower level) contains knowledge, comprehension and application;
Level II (the higher level) includes application, analysis, synthesis, and

evaluation (see the diagram below).
Figure1.Bloom'sTaxonomy.
Bloom'staxonomyisalsousedtoguidethedevelopmentofstandardizedassessments.Forexample,in
Florida,about65%ofthequestionsonthestatewidereadingtest(FCAT)aredesignedtomeasureLevelII
thinkingskills(application,analysis,synthesis,andevaluation).Topreparestudentsforthesestandardized
tests,classroomassessmentsmustalsodemandbothLevelIandIIthinkingskills.Integratinghigherlevel
skillsintoinstructionandassessmentincreasesthelikelihoodthatstudentswillsucceedontestsand
becomebetterproblemsolvers.
Sometimesobjectivetests(suchasmultiplechoice)arecriticizedbecausethequestionsemphasizeonly
lowerlevelthinkingskills(suchasknowledgeandcomprehension).However,itispossibletoaddress
higherlevelthinkingskillsviaobjectiveassessmentsbyincludingitemsthatfocusongenuine
understanding"how"and"why"questions.Multiplechoiceitemsthatinvolvescenarios,casestudies,
andanalogiesarealsoeffectiveforrequiringstudentstoapply,analyze,synthesize,andevaluate
information
B. Writing Selected Response Assessment Items

Selectedresponse(objective)assessmentitemsareveryefficientoncetheitemsarecreated,youcan
assessandscoreagreatdealofcontentratherquickly.Notethatthetermobjectivereferstothefactthat
eachquestionhasarightandwronganswerandthattheycanbeimpartiallyscored.Infact,thescoringcan
beautomatedifyouhaveaccesstoanopticalscannerforscoringpapertestsoracomputerfor
computerizedtests.However,theconstructionoftheseobjectiveitemsmightwellincludesubjective
inputbytheteacher/creator.
Beforeyouwritetheassessmentitems,youshouldcreateablueprintthatoutlinesthecontentareasandthe
cognitiveskillsyouaretargeting.Onewaytodothisistolistyourinstructionalobjectives,alongwiththe
correspondingcognitivelevel.Forexample,thefollowingtablehasfourdifferentobjectivesandthe
correspondinglevelsofassessment(relativetoBloom'staxonomy).Foreachobjective,fiveassessment
itemswillbewritten,someatLevelIandsomeatLevelII.Thisapproachhelpstoensurethatallobjectives
arecoveredandthatseveralhigherlevelthinkingskillsareincludedintheassessment.
Objectiv
e
Number of Items at
Level I
(Bloom's Taxonomy)
Number of Items at
Level II
(Blooms' Taxonomy)
Afteryouhavedeterminedhowmanyitemsyouneedforeachlevel,youcanbeginwritingthe
assessments.Thereareseveralformsofselectedresponseassessments,includingmultiplechoice,
matching,andtrue/false.Regardlessoftheformyouselect,besuretheitemsareclearlywordedatthe
appropriatereadinglevelanddonotincludeunintentionalclues.Thevalidityofyourtestwillsuffer
tremendouslyifthestudentscantcomprehendorreadthequestions!Thissectionincludesafewguidelines
forconstructingobjectiveassessmentitems,alongwithexamplesandnonexamples.
MultipleChoice
Multiplechoicequestionsconsistofastem(questionorstatement)withseveralanswerchoices
(distractors).Foreachofthefollowingguidelines,clickthebuttonstoviewanExampleorNonExample.
All answer choices should be plausible and homogeneous.

o
Example
Non-Example
Answer choices should be similar in length and grammatical

form.
o
Example
Non-Example
List answer choices in logical (alphabetical or numerical) order.

o
Example
Non-Example
Avoid using "All of the Above" options.

o
Example
Non-Example
Matching
Matchingitemsconsistoftwolistsofwords,phrases,orimages(oftenreferredtoasstemsandresponses).
Studentsreviewthelistofstemsandmatcheachwithaword,phrase,orimagefromthelistofresponses.
Foreachofthefollowingguidelines,clickthebuttonstoviewanExampleorNonExample.
Answer choices should be short, homogeneous and arranged in

logical order.
o
Example
Non-Example
Responses should be plausible and similar in length and

grammatical form.
o
Example
Non-Example
Include more response options than stems.

o
Example
Non-Example
As a general rule, the stems should be longer and the responses

should be shorter.
o
Example
Non-Example
True/False
True/falsequestionscanappeartobeeasiertowrite;however,itisdifficulttowriteeffectivetrue/false
questions.Also,thereliabilityofT/Fquestionsisnotgenerallyveryhighbecauseofthehighpossibilityof
guessing.Inmostcases,T/Fquestionsarenotrecommended.
Statements should be completely true or completely false.

o
Example
Non-Example
Use simple, easy-to-follow statements.

o
Example
Non-Example
Avoid using negatives -- especially double negatives.

o
Example
Non-Example
Avoid absolutes such as "always; never."

o
Example
Non-Example
Test Topics
Step 9. Conduct the Item Analysis
DownloadthisinformationinPDFformat
Introduction
Theitemanalysisisanimportantphaseinthedevelopmentofanexamprogram.Inthisphasestatistical
methodsareusedtoidentifyanytestitemsthatarenotworkingwell.Ifanitemistooeasy,toodifficult,
failingtoshowadifferencebetweenskilledandunskilledexaminees,orevenscoredincorrectly,anitem
analysiswillrevealit.Thetwomostcommonstatisticsreportedinanitemanalysisaretheitemdifficulty,
whichisameasureoftheproportionofexamineeswhorespondedtoanitemcorrectly,andtheitem
discrimination,whichisameasureofhowwelltheitemdiscriminatesbetweenexamineeswhoare
knowledgeableinthecontentareaandthosewhoarenot.Anadditionalanalysisthatisoftenreportedis
thedistractoranalysis.Thedistractoranalysisprovidesameasureofhowwelleachoftheincorrect
optionscontributestothequalityofamultiplechoiceitem.Oncetheitemanalysisinformationisavailable,
anitemreviewisoftenconducted.
Item Analysis Statistics

ItemDifficultyIndex
Theitemdifficultyindexisoneofthemostuseful,andmostfrequentlyreported,itemanalysisstatistics.It
isameasureoftheproportionofexamineeswhoansweredtheitemcorrectly;forthisreasonitis
frequentlycalledthepvalue.Astheproportionofexamineeswhogottheitemright,thepvaluemight
moreproperlybecalledtheitemeasinessindex,ratherthantheitemdifficulty.Itcanrangebetween0.0
and1.0,withahighervalueindicatingthatagreaterproportionofexamineesrespondedtotheitem
correctly,anditwasthusaneasieritem.Forcriterionreferencedtests(CRTs),withtheiremphasison
masterytesting,manyitemsonanexamformwillhavepvaluesof.9orabove.Normreferencedtests
(NRTs),ontheotherhand,aredesignedtobeharderoverallandtospreadouttheexaminees'scores.Thus,
manyoftheitemsonanNRTwillhavedifficultyindexesbetween.4and.6.
ItemDiscriminationIndex
Theitemdiscriminationindexisameasureofhowwellanitemisabletodistinguishbetweenexaminees
whoareknowledgeableandthosewhoarenot,orbetweenmastersandnonmasters.Thereareactually
severalwaystocomputeanitemdiscrimination,butoneofthemostcommonisthepointbiserial
correlation.Thisstatisticlooksattherelationshipbetweenanexaminee'sperformanceonthegivenitem
(correctorincorrect)andtheexaminee'sscoreontheoveralltest.Foranitemthatishighlydiscriminating,
ingeneraltheexamineeswhorespondedtotheitemcorrectlyalsodidwellonthetest,whileingeneralthe
examineeswhorespondedtotheitemincorrectlyalsotendedtodopoorlyontheoveralltest.
Thepossiblerangeofthediscriminationindexis1.0to1.0;however,ifanitemhasadiscrimination
below0.0,itsuggestsaproblem.Whenanitemisdiscriminatingnegatively,overallthemost
knowledgeableexamineesaregettingtheitemwrongandtheleastknowledgeableexamineesaregetting
theitemright.Anegativediscriminationindexmayindicatethattheitemismeasuringsomethingother
thanwhattherestofthetestismeasuring.Moreoften,itisasignthattheitemhasbeenmiskeyed.
Wheninterpretingthevalueofadiscriminationitisimportanttobeawarethatthereisarelationship
betweenanitem'sdifficultyindexanditsdiscriminationindex.Ifanitemhasaveryhigh(orverylow)p
value,thepotentialvalueofthediscriminationindexwillbemuchlessthaniftheitemhasamidrangep
value.Inotherwords,ifanitemiseitherveryeasyorveryhard,itisnotlikelytobeverydiscriminating.A
typicalCRT,withmanyhighitempvalues,mayhavemostitemdiscriminationsintherangeof0.0to0.3.
Ausefulapproachwhenreviewingasetofitemdiscriminationindexesistoalsovieweachitem'spvalue
atthesametime.Forexample,ifagivenitemhasadiscriminationindexbelow.1,buttheitem'spvalueis
greaterthan.9,youmayinterprettheitemasbeingeasyforalmosttheentiresetofexaminees,and
probablyforthatreasonnotprovidingmuchdiscriminationbetweenhighabilityandlowabilityexaminees.
DistractorAnalysis
Oneimportantelementinthequalityofamultiplechoiceitemisthequalityoftheitem'sdistractors.
However,neithertheitemdifficultynortheitemdiscriminationindexconsiderstheperformanceofthe
incorrectresponseoptions,ordistractors.Adistractoranalysisaddressestheperformanceoftheseincorrect
responseoptions.
Justasthekey,orcorrectresponseoption,mustbedefinitivelycorrect,thedistractorsmustbeclearly
incorrect(orclearlynotthe"best"option).Inadditiontobeingclearlyincorrect,thedistractorsmustalso
beplausible.Thatis,thedistractorsshouldseemlikelyorreasonabletoanexamineewhoisnotsufficiently
knowledgeableinthecontentarea.Ifadistractorappearssounlikelythatalmostnoexamineewillselectit,
itisnotcontributingtotheperformanceoftheitem.Infact,thepresenceofoneormoreimplausible
distractorsinamultiplechoiceitemcanmaketheitemartificiallyfareasierthanitoughttobe.
Inasimpleapproachtodistractoranalysis,theproportionofexamineeswhoselectedeachoftheresponse
optionsisexamined.Forthekey,thisproportionisequivalenttotheitempvalue,ordifficulty.Ifthe
proportionsaresummedacrossallofanitem'sresponseoptionstheywilladdupto1.0,or100%ofthe
examinees'selections.
Theproportionofexamineeswhoselecteachofthedistractorscanbeveryinformative.Forexample,itcan
revealanitemmiskey.Whenevertheproportionofexamineeswhoselectedadistractorisgreaterthanthe
proportionofexamineeswhoselectedthekey,theitemshouldbeexaminedtodetermineifithasbeenmis
keyedordoublekeyed.Adistractoranalysiscanalsorevealanimplausibledistractor.InCRTs,wherethe
itempvaluesaretypicallyhigh,theproportionsofexamineesselectingallthedistractorsare,asaresult,
low.Nevertheless,ifexamineesconsistentlyfailtoselectagivendistractor,thismaybeevidencethatthe
distractorisimplausibleorsimplytooeasy.
ItemReview
Oncetheitemanalysisdataareavailable,itisusefultoholdameetingoftestdevelopers,
psychometricians,andsubjectmatterexperts.Duringthismeetingtheitemscanbereviewedusingthe
informationprovidedbytheitemanalysisstatistics.Decisionscanthenbemadeaboutitemchangesthat
areneededorevenitemsthatoughttobedroppedfromtheexam.Anyitemthathasbeensubstantially
changedshouldbereturnedtothebankforpretestingbeforeitisagainusedoperationally.Oncethese
decisionshavebeenmade,theexamsshouldberescored,leavingoutanyitemsthatweredroppedand
usingthecorrectkeyforanyitemsthatwerefoundtohavebeenmiskeyed.Thiscorrectedscoringwillbe
usedfortheexaminees'scorereports.
Summary
Intheitemanalysisphaseoftestdevelopment,statisticalmethodsareusedtoidentifypotentialitem
problems.Thestatisticalresultsshouldbeusedalongwithsubstantiveattentiontotheitemcontentto
determineifaproblemexistsandwhatshouldbedonetocorrectit.Itemsthatarefunctioningverypoorly
shouldusuallyberemovedfromconsiderationandtheexamsrescoredbeforethetestresultsarereleased.
Inothercases,itemsmaystillbeusable,aftermodestchangesaremadetoimprovetheirperformanceon
futureexams.
In statistics, a bimodal distribution is a continuous probability

distribution with two different modes. These appear as distinct peaks
(local maxima) in the probability density function, as shown in Figure 1.
How to Compute Mean, Median, Mode, Range, and Standard Deviation
Instatisticsanddataanalysis,themean,median,mode,range,andstandarddeviationtellresearchershow
thedataisdistributed.Eachofthefivemeasurescanbecalculatedwithsimplearithmetic.Themeanand
medianidicatethe"center"ofthedatapoints.Themodeisthevalueorvaluesthatoccurmostfrequently.
Rangeisthespanbetweenthesmallestvalueandlargersvalue.Standarddeviationmeasureshowfarthe
data"deviates"fromthecenter,onaverage.Knowinghowtocalculatethesestatisticalmeasureswillhelp
youanalyzedatafromsurveysandexperiments.
Mean
The aritmetic mean or average of a set of numbers is the expected
value. The mean is calculated by adding up all the values, and then
dividing that sum by the number of values.
For example, suppose a teacher has seven students and records the
following seven test scores for her class: 98, 96, 96, 84, 80, 80, and 72.
The average test score is
(98+96+96+84+81+81+73)/7 = 609/7 = 87.
If one more student entered her class and took the test, the expected
score would be an 87.
Median
The median is the middle value in a set of values. To find the median,
order the numbers from largest to smallest, and then choose the value
in the middle. For example, consider the following set of nine numbers:
10, 13, 4, 25, 8, 12, 9, 19, 18
If we arrange them in descending order, we get
25, 19, 18, 13, 12, 10, 9, 8, 4
The middle value is 12, so the median = 12. What if we have a set with
an even number of values? For example, consider the set
1, 2, 3, 4, 5, 6.
Both 3 and 4 are in the middle. In this case, we must take the average
of the two middle numbers. Since (3+4)/2 = 3.5, the median = 3.5.
Mode
The mode of a set is the value or values that occur most frequently.
There can be more than one mode in a set. If there is more than one
mode, you simply list all of the modes; you do not have to average
them. For eaxample, consider the set
10, 10, 4, 8, 10, 8, 3, 9, 14
The number 10 occurs three times, and no other numbers occur as
frequently. Therefore, the mode = 10
Now consider this set
10, 10, 4, 8, 10, 8, 3, 8, 14
Both 10 and 8 occur three times each, and no other numbers occur as
often. Threrfore, the modes are 8 and 10.
Range
The range of a set of numbers is the maximum distance between any
two values. In other words, it's the difference between the largest and
smalles values. Knowing the range gives you an idea of how close
together the data points are. For example, consider the set of test
scores
78, 88, 67, 90, 92, 83, 97
The highest test score is 97 and the lowest is 67, therefore the range is
97-67 = 30.
Standard Deviation
The standard deviation is another way to measure how close together
the elements are in a set of data. The s.d. is the average distance
between each data point and the mean. Knowing the standard
deviation gives a more complete picture of the distribution of elements
in a data set. Suppose you have N data points and you label them X1,
X2, X3,... XN, and you call the mean . There are two formulas for
standard deviation depending on whether your data is a complete set,
or a sample take from a larger set.
For example, suppose your data is all of the ACT scores of the students
in a small class. Then the standard deviation formula is
Suppose the scores are 15, 21, 21, 21, 25, 30, and 35. The mean of
this set is 24. The s.d. is
sqrt[((15-24)2+(21-24)2+(21-24)2+(21-24)2+(25-24)2+(30-24)2+(3524)2)/7]
= sqrt[266/7]
= sqrt[38]
= 6.16
If you take a random sample of ACT scores from a large school, the
standard deviation formula is
For example, suppose you select ten students at random from a high
school, and their ACT scores are 17, 20, 24, 25, 26, 26, 29, 29, 30 and
32. The average of this set is 25.8. The standard deviation is
sqrt[((17-25.8)2+(20-25.8)2+(24-25.8)2+(25-25.82+(26-25.8)2
+(26-25.8)2+(29-25.9)2+(29-25.8)2+(30-25.8)2+(32-25.8)2)/(10-1)]
= sqrt[(191.6 )/(10-1)]
= sqrt[191.6/9]
= sqrt[21.2889]
= 4.61
Mean,Mode,Median,andStandardDeviation
TheMeanandMode
Thesamplemeanistheaverageandiscomputedasthesumofalltheobservedoutcomesfromthesample
dividedbythetotalnumberofevents.Weusexasthesymbolforthesamplemean.Inmathterms,
wherenisthesamplesizeandthexcorrespondtotheobservedvalued.
Example
SupposeyourandomlysampledsixacresintheDesolationWildernessforanonindigenousweedand
cameupwiththefollowingcountsofthisweedinthisregion:
34,43,81,106,106and115
Wecomputethesamplemeanbyaddinganddividingbythenumberofsamples,6.
34+43+81+106+106+115
=80.83
6
Wecansaythatthesamplemeanofnonindigenousweedis80.83.
Themodeofasetofdataisthenumberwiththehighestfrequency.Intheaboveexample106isthemode,
sinceitoccurstwiceandtherestoftheoutcomesoccuronlyonce.
Thepopulationmeanistheaverageoftheentirepopulationandisusuallyimpossibletocompute.Weuse
theGreekletterforthepopulationmean.
Median,andTrimmedMean
Oneproblemwithusingthemean,isthatitoftendoesnotdepictthetypicaloutcome.Ifthereisone
outcomethatisveryfarfromtherestofthedata,thenthemeanwillbestronglyaffectedbythisoutcome.
Suchanoutcomeiscalledandoutlier.Analternativemeasureisthemedian.Themedianisthemiddle
score.Ifwehaveanevennumberofeventswetaketheaverageofthetwomiddles.Themedianisbetter
fordescribingthetypicalvalue.Itisoftenusedforincomeandhomeprices.
Example
Supposeyourandomlyselected10housepricesintheSouthLakeTahoearea.Yourareinterestedinthe
typicalhouseprice.In$100,000thepriceswere
2.7,2.9,3.1,3.4,3.7,4.1,4.3,4.7,4.7,40.8
Ifwecomputedthemean,wewouldsaythattheaveragehousepriceis744,000.Althoughthisnumberis
true,itdoesnotreflectthepriceforavailablehousinginSouthLakeTahoe.Acloserlookatthedata
showsthatthehousevaluedat40.8x$100,000=$4.08millionskewsthedata.Instead,weusethe
median.Sincethereisanevennumberofoutcomes,wetaketheaverageofthemiddletwo
3.7+4.1
=3.9
2
Themedianhousepriceis$390,000.Thisbetterreflectswhathouseshoppersshouldexpecttospend.
Thereisanalternativevaluethatalsoisresistanttooutliers.Thisiscalledthetrimmedmeanwhichisthe
meanaftergettingridoftheoutliersor5%onthetopand5%onthebottom.Wecanalsousethetrimmed
meanifweareconcernedwithoutliersskewingthedata,howeverthemedianisusedmoreoftensince
morepeopleunderstandit.
Example:
AtaskirentalshopdatawascollectedonthenumberofrentalsoneachoftenconsecutiveSaturdays:
44,50,38,96,42,47,40,39,46,50.
Tofindthesamplemean,addthemanddivideby10:
44+50+38+96+42+47+40+39+46+50
=49.2
10
Noticethatthemeanvalueisnotavalueofthesample.
Tofindthemedian,firstsortthedata:
38,39,40,42,44,46,47,50,50,96
Noticethattherearetwomiddlenumbers44and46.Tofindthemedianwetaketheaverageofthetwo.
44+46
Median==45
2
Noticealsothatthemeanislargerthanallbutthreeofthedatapoints.Themeanisinfluencedbyoutliers
whilethemedianisrobust.
Variance,StandardDeviationandCoefficientofVariation
Themean,mode,median,andtrimmedmeandoanicejobintellingwherethecenterofthedatasetis,but
oftenweareinterestedinmore.Forexample,apharmaceuticalengineerdevelopsanewdrugthat
regulatesironintheblood.Supposeshefindsoutthattheaveragesugarcontentaftertakingthe
medicationistheoptimallevel.Thisdoesnotmeanthatthedrugiseffective.Thereisapossibilitythat
halfofthepatientshavedangerouslylowsugarcontentwhiletheotherhalfhavedangerouslyhighcontent.
Insteadofthedrugbeinganeffectiveregulator,itisadeadlypoison.Whatthepharmacistneedsisa
measureofhowfarthedataisspreadapart.Thisiswhatthevarianceandstandarddeviationdo.Firstwe
showtheformulasforthesemeasurements.Thenwewillgothroughthestepsonhowtousetheformulas.
Wedefinethevariancetobe
andthestandarddeviationtobe
VarianceandStandardDeviation:StepbyStep
1.
Calculatethemean,x.
2.
Writeatablethatsubtractsthemeanfromeachobservedvalue.
3.
Squareeachofthedifferences.
4.
Addthiscolumn.
5.
Dividebyn1wherenisthenumberofitemsinthesampleThisisthe
variance.
6.
Togetthestandarddeviationwetakethesquarerootofthevariance.
Example
TheowneroftheChesTahoerestaurantisinterestedinhowmuchpeoplespendattherestaurant.He
examines10randomlyselectedreceiptsforpartiesoffourandwritesdownthefollowingdata.
44,50,38,96,42,47,40,39,46,50
Hecalculatedthemeanbyaddinganddividingby10toget
x=49.2
Belowisthetableforgettingthestandarddeviation:
x49.2
(x49.2)2
44
5.2
27.04
50
0.8
0.64
38
11.2
125.44
96
46.8
2190.24
42
7.2
51.84
47
2.2
4.84
40
9.2
84.64
39
10.2
104.04
46
3.2
10.24
50
0.8
0.64
Total
2600.4
Now
2600.4
=288.7
101
Hencethevarianceis289andthestandarddeviationisthesquarerootof289=17.
Sincethestandarddeviationcanbethoughtofmeasuringhowfarthedatavaluesliefromthemean,we
takethemeanandmoveonestandarddeviationineitherdirection.Themeanforthisexamplewasabout
49.2andthestandarddeviationwas17.Wehave:
49.217=32.2
and
49.2+17=66.2
Whatthismeansisthatmostofthepatronsprobablyspendbetween$32.20and$66.20.

Thesamplestandarddeviationwillbedenotedbysandthepopulationstandarddeviationwillbedenoted
bytheGreekletter.
Thesamplevariancewillbedenotedbys2andthepopulationvariancewillbedenotedby2.
Thevarianceandstandarddeviationdescribehowspreadoutthedatais.Ifthedataallliesclosetothe
mean,thenthestandarddeviationwillbesmall,whileifthedataisspreadoutoveralargerangeofvalues,
swillbelarge.Havingoutlierswillincreasethestandarddeviation.
Oneoftheflawsinvolvedwiththestandarddeviation,isthatitdependsontheunitsthatareused.One
wayofhandlingthisdifficulty,iscalledthecoefficientofvariationwhichisthestandarddeviationdivided
bythemeantimes100%
CV=100%
Intheaboveexample,itis
17
100%=34.6%
49.2
Thistellsusthatthestandarddeviationoftherestaurantbillsis34.6%ofthemean.
Chebyshev'sTheorem
AmathematiciannamedChebyshevcameupwithboundsonhowmuchofthedatamustlieclosetothe
mean.Inparticularforanypositivek,theproportionofthedatathatlieswithinkstandarddeviationsof
themeanisatleast
1
1
k2
Forexample,ifk=2thisnumberis
1
1=.75
22
Thistellusthatatleast75%ofthedatalieswithin75%ofthemean.Intheaboveexample,wecansay
thatatleast75%ofthedinersspentbetween
49.22(17)=15.2
and
49.2+2(17)=83.2
dollars.
Skewed Data
Datacanbe"skewed",meaningittendstohavealongtailononesideortheother:
Negative Skew
No Skew
Positive Skew
Negative Skew?
Whyisitcallednegativeskew?Becausethe
long"tail"isonthenegativesideofthepeak.
Peoplesometimessayitis"skewedtothe
left"(thelongtailisonthelefthandside)
Themeanisalsoontheleftofthepeak.
The Normal
Distribution has No
Skew
ANormalDistributionisnot
skewed.
Itisperfectlysymmetrical.
AndtheMeanisexactlyatthe
peak.
Positive Skew
Andpositiveskewiswhenthelongtail
isonthepositivesideofthepeak,and
somepeoplesayitis"skewedtothe
right".
Themeanisontherightofthepeak
value.
Example:
Income
Distribution
HereissomedataI
extractedfroma
recentCensus.
Asyoucanseeitis
positivelyskewed...
infactthetail
continueswaypast
$100,000
Calculating Skewness
"Skewness"(theamountofskew)canbecalculated,forexampleyoucouldusetheSKEW()functionin
ExcelorOpenOfficeCalc.
Normal Distribution
ThenormaldistributionisaprobabilitydistributionthatassociatesthenormalrandomvariableXwitha
cumulativeprobability.Thenormaldistributionisdefinedbythefollowingequation:
Normal equation. The value of the random variable Y is:

Y=[1/*sqrt(2)]*e(x)2/22
where X is a normal random variable, is the mean, is the standard

deviation, is approximately 3.14159, and e is approximately 2.71828.
Thegraphofthenormaldistributiondependsontwofactorsthemeanandthestandarddeviation.The
meanofthedistributiondeterminesthelocationofthecenterofthegraph,andthestandarddeviation
determinestheheightofthegraph.Whenthestandarddeviationislarge,thecurveisshortandwide;when
thestandarddeviationissmall,thecurveistallandnarrow.Allnormaldistributionslooklikeasymmetric,
bellshapedcurve,asshownbelow.
Thecurveontheleftisshorterandwiderthanthecurveontheright,becausethecurveonthelefthasa
biggerstandarddeviation.
TheRorschachtest(/rrk/or/rrk/,[3]Germanpronunciation:[oax];alsoknownastheRorschach
inkblottest,theRorschachtechnique,orsimplytheinkblottest)isapsychologicaltestinwhich
subjects'perceptionsofinkblotsarerecordedandthenanalyzedusingpsychologicalinterpretation,
complexalgorithms,orboth.Somepsychologistsusethistesttoexamineaperson'spersonality
characteristicsandemotionalfunctioning.Ithasbeenemployedtodetectunderlyingthoughtdisorder,
especiallyincaseswherepatientsarereluctanttodescribetheirthinkingprocessesopenly. [4]Thetestis
namedafteritscreator,SwisspsychologistHermannRorschach.
Inthe1960s,theRorschachwasthemostwidelyusedprojectivetest.[5]InanationalsurveyintheU.S.,the
Rorschachwasrankedeighthamongpsychologicaltestsusedinoutpatientmentalhealthfacilities. [6]Itis
thesecondmostwidelyusedtestbymembersoftheSocietyforPersonalityAssessment,anditisrequested
bypsychiatristsin25%offorensicassessmentcases,[6]usuallyinabatteryofteststhatoftenincludethe
MMPI2andtheMCMIIII.[7]Insurveys,theuseofRorschachrangesfromalowof20%bycorrectional
psychologists[8]toahighof80%byclinicalpsychologistsengagedinassessmentservices,and80%of
psychologygraduateprogramssurveyedteachit.[9]
AlthoughtheExnerScoringSystem(developedsincethe1960s)claimstohaveaddressedandoftenrefuted
manycriticismsoftheoriginaltestingsystemwithanextensivebodyofresearch, [10]someresearchers
continuetoraisequestions.Theareasofdisputeincludetheobjectivityoftesters,interraterreliability,the
verifiabilityandgeneralvalidityofthetest,biasofthetest'spathologyscalestowardsgreaternumbersof
responses,thelimitednumberofpsychologicalconditionswhichitaccuratelydiagnoses,theinabilityto
replicatethetest'snorms,itsuseincourtorderedevaluations,andtheproliferationoftheteninkblot
images,potentiallyinvalidatingthetestforthosewhohavebeenexposedtothem.[11]
TheRorschachtest(/rrk/or/rrk/,[3]Germanpronunciation:[oax];alsoknownastheRorschach
inkblottest,theRorschachtechnique,orsimplytheinkblottest)isapsychologicaltestinwhich
subjects'perceptionsofinkblotsarerecordedandthenanalyzedusingpsychologicalinterpretation,
complexalgorithms,orboth.Somepsychologistsusethistesttoexamineaperson'spersonality
characteristicsandemotionalfunctioning.Ithasbeenemployedtodetectunderlyingthoughtdisorder,
especiallyincaseswherepatientsarereluctanttodescribetheirthinkingprocessesopenly. [4]Thetestis
namedafteritscreator,SwisspsychologistHermannRorschach.
Inthe1960s,theRorschachwasthemostwidelyusedprojectivetest.[5]InanationalsurveyintheU.S.,the
Rorschachwasrankedeighthamongpsychologicaltestsusedinoutpatientmentalhealthfacilities. [6]Itis
thesecondmostwidelyusedtestbymembersoftheSocietyforPersonalityAssessment,anditisrequested
bypsychiatristsin25%offorensicassessmentcases,[6]usuallyinabatteryofteststhatoftenincludethe
MMPI2andtheMCMIIII.[7]Insurveys,theuseofRorschachrangesfromalowof20%bycorrectional
psychologists[8]toahighof80%byclinicalpsychologistsengagedinassessmentservices,and80%of
psychologygraduateprogramssurveyedteachit.[9]
AlthoughtheExnerScoringSystem(developedsincethe1960s)claimstohaveaddressedandoftenrefuted
manycriticismsoftheoriginaltestingsystemwithanextensivebodyofresearch, [10]someresearchers
continuetoraisequestions.Theareasofdisputeincludetheobjectivityoftesters,interraterreliability,the
verifiabilityandgeneralvalidityofthetest,biasofthetest'spathologyscalestowardsgreaternumbersof
responses,thelimitednumberofpsychologicalconditionswhichitaccuratelydiagnoses,theinabilityto
replicatethetest'snorms,itsuseincourtorderedevaluations,andtheproliferationoftheteninkblot
images,potentiallyinvalidatingthetestforthosewhohavebeenexposedtothem.[11]
Existence of God
ThereareseveralmainpositionswithregardtotheexistenceofGodthatonemighttake:
1. Theism - the belief in the existence of one or more divinities or

deities.
1. Pantheism - the belief that God exists as all things of the
cosmos, that God is one and all is God; God is immanent.
2. Panentheism - the belief that God encompasses all things
of the cosmos but that God is greater than the cosmos;
God is both immanent and transcendent.
3. Deism - the belief that God does exist but does not
interfere with human life and the laws of the universe; God
is transcendent.
4. Monotheism - the belief that a single deity exists which

rules the universe as a separate and individual entity.
5. Polytheism - the belief that multiple deities exist which rule
the universe as separate and individual entities.
6. Henotheism - the belief that multiple deities may or may
not exist, though there is a single supreme deity.
7. Henology - believing that multiple avatars of a deity exist,
which represent unique aspects of the ultimate deity.
2. Agnosticism - the belief that the existence or non-existence of
deities or God is currently unknown or unknowable and cannot
be proven. A weaker form of this might be defined as simply a
lack of certainty about gods' existence or nonexistence.[citation
needed]
3. Atheism - the rejection of belief in the existence of deities.[12][13]

1. Strong atheism is specifically the position that there are no
deities.[14][15]
2. Weak atheism is simply the absence of belief that any
deities exist.[15][16][17]
4. Apatheism - the lack of caring whether any supreme being exists,
or lack thereof
5. Possibilianism
Thesearenotmutuallyexclusivepositions.Forexample,agnostictheistschoosetobelieveGodexists
whileassertingthatknowledgeofGod'sexistenceisinherentlyunknowable.Similarly,agnosticatheists
rejectbeliefintheexistenceofalldeities,whileassertingthatwhetheranysuchentitiesexistornotis
inherentlyunknowable.
Hinduism, Buddhism, Confucianism, and Taoism

The four major religions of the Far East are Hinduism, Buddhism,
Confucianism, and Taoism.
Hinduism
Hinduism, a polytheistic religion and perhaps the oldest of the great
world religions, dates back about 6,000 years. Hinduism comprises so
many different beliefs and rituals that some sociologists have
suggested thinking of it as a grouping of interrelated religions.
Hinduismteachestheconceptofreincarnationthebeliefthatalllivingorganismscontinueeternallyin
cyclesofbirth,death,andrebirth.Similarly,Hinduismteachesthecastesystem,inwhichaperson's
previousincarnationsdeterminethatperson'shierarchicalpositioninthislife.Eachcastecomeswithits
ownsetofresponsibilitiesandduties,andhowwellapersonexecutesthesetasksinthecurrentlife
determinesthatperson'spositioninthenextincarnation.
Hindusacknowledgetheexistenceofbothmaleandfemalegods,buttheybelievethattheultimatedivine
energyexistsbeyondthesedescriptionsandcategories.Thedivinesoulispresentandactiveinallliving
things.
Morethan600millionHinduspracticethereligionworldwide,thoughmostresideinIndia.Unlike
MoslemsandChristians,Hindusdonotusuallyproselytize(attempttoconvertotherstotheirreligion).
Buddhism, Confucianism, and Taoism

Three other religions of the Far East include Buddhism, Confucianism,
and Taoism. These ethical religions have no gods like Yawheh or
Allah, but espouse ethical and moral principles designed to improve
the believer's relationship with the universe.
BuddhismoriginatesintheteachingsoftheBuddha,ortheEnlightenedOne(SiddharthaGautama)a
6thcenturyB.C.HinduprinceofsouthernNepal.Humans,accordingtotheBuddha,canescapethecycles
ofreincarnationbyrenouncingtheirearthlydesiresandseekingalifeofmeditationandselfdiscipline.The
ultimateobjectiveofBuddhismistoattainNirvana,whichisastateoftotalspiritualsatisfaction.Like
Hinduism,Buddhismallowsreligiousdivergence.Unlikeit,though,Buddhismrejectsritualandthecaste
system.Whileaglobalreligion,BuddhismtodaymostcommonlyliesinsuchareasoftheFarEastas
China,Japan,Korea,SriLanka,Thailand,andBurma.ArecognizeddenominationofBuddhismisZen
Buddhism,whichattemptstotransmittheideasofBuddhismwithoutrequiringacceptanceofallofthe
teachingsofBuddha.
Confucius,orK'ungFutzu,livedatthesametimeastheBuddha.Confucius'sfollowers,likethoseofLao
tzu,thefounderofTaoism,sawhimasamoralteacherandwisemannotareligiousgod,prophet,or
leader.Confucianism'smaingoalistheattainmentofinnerharmonywithnature.Thisincludesthe
venerationofancestors.Earlyon,therulingclassesofChinawidelyembracedConfucianism.Taoism
sharessimilarprincipleswithConfucianism.TheteachingsofLaotzustresstheimportanceofmeditation
andnonviolenceasmeansofreachinghigherlevelsofexistence.WhilesomeChinesestillpractice
ConfucianismandTaoism,thesereligionshavelostmuchoftheirimpetusduetoresistancefromtoday's
Communistgovernment.However,someconceptsofTaoism,likereincarnation,havefoundanexpression
inmodernNewAgereligions.

What Is Evaluation

Diunggah oleh

Informasi Dokumen

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

What Is Evaluation

Diunggah oleh

Hak Cipta:

Format Tersedia

What is Evaluation?

Kirkpatrick's Four Level Evaluation Model

Step 3: Behavior - (What changes in job performance resulted

Step 4: Results - What are the tangible results of the learning

Not Just For Training

Improving the Four Levels

Motivation, Not Reaction

Performance, Not Behavior

Flipping it into a Better Model

Result - What impact (outcome or result) will improve our

Performance - What do the employees have to perform in order

Learning - What knowledge, skills, and resources do they need

Motivation - What do they need to perceive in order to learn

Compare criterion and norm-referenced tests for MR students

What is Authentic Assessment?

Authentic Assessment Complements Traditional Assessment

Defining Attributes of Authentic and Traditional Assessment

Teaching to the Test

What does Authentic Assessment look like?

Examples from teachers in my Authentic Assessment course

How is Authentic Assessment similar to/different from

Authentic Assessment Complements Traditional Assessment

Defining Attributes of Traditional and Authentic Assessment

Alternative Names for Authentic Assessment

Performance Assessment (or Performance-based) -- so-called

defining performance assessment as performance-based as

Alternative Assessment -- so-called because AA is an

Direct Assessment -- so-called because AA provides more

Montessori education is fundamentally a model of human

Difficulty Index - Teachers produce a difficulty index for a test item by

1. After the students are arranged with the

4. The answers for Questions 1-3 are provided in

Level II (the higher level) includes application, analysis, synthesis, and

B. Writing Selected Response Assessment Items

All answer choices should be plausible and homogeneous.

Answer choices should be similar in length and grammatical

List answer choices in logical (alphabetical or numerical) order.

Avoid using "All of the Above" options.

Answer choices should be short, homogeneous and arranged in

Responses should be plausible and similar in length and

Include more response options than stems.

As a general rule, the stems should be longer and the responses

Statements should be completely true or completely false.

Use simple, easy-to-follow statements.

Avoid using negatives -- especially double negatives.

Avoid absolutes such as "always; never."

Item Analysis Statistics

In statistics, a bimodal distribution is a continuous probability

Normal equation. The value of the random variable Y is:

where X is a normal random variable, is the mean, is the standard

1. Theism - the belief in the existence of one or more divinities or

4. Monotheism - the belief that a single deity exists which

3. Atheism - the rejection of belief in the existence of deities.[12][13]

Hinduism, Buddhism, Confucianism, and Taoism

Buddhism, Confucianism, and Taoism

Anda mungkin juga menyukai