The Free Lunch Is Over: A Fundamental Turn Toward Concurrency in Software

Chip designers are under so much pressure
to deliver ever-faster CPUs that theyll risk

changing the meaning of your program,
and possibly break it, in order to make it
run faster
TheFreeLunchIsOver:AFundamentalTurnToward
ConcurrencyinSoftware

Onthe
blog
November4:OtherConcurrencySessionsatPDC
November3:PDC'09:Tutorial&Panel
October26:HoareonTesting
October23:DeprecatingexportConsideredforISOC++0x
TheFreeLunchIsOver
AFundamentalTurnTowardConcurrencyinSoftware
ByHerbSutter
ThebiggestseachangeinsoftwaredevelopmentsincetheOOrevolutionisknockingatthedoor,anditsnameisConcurrency.
ThisarticleappearedinDr.Dobb'sJournal,30(3),March2005.Amuchbrieferversionunderthetitle"TheConcurrencyRevolution"appearedin
C/C++UsersJournal,23(2),February2005.
Updatenote:TheCPUtrendsgraphlastupdatedAugust2009toincludecurrentdataandshowthetrendcontinuesaspredicted.Therestof
thisarticleincludingalltextisstilloriginalasfirstpostedhereinDecember2004.
Yourfreelunchwillsoonbeover.Whatcanyoudoaboutit?Whatareyoudoingaboutit?
Themajorprocessormanufacturersandarchitectures,fromIntelandAMDtoSparcandPowerPC,haverunoutofroomwithmostoftheirtraditional
approachestoboostingCPUperformance.Insteadofdrivingclockspeedsandstraightlineinstructionthroughputeverhigher,theyareinsteadturning
enmassetohyperthreadingandmulticorearchitectures.Bothofthesefeaturesarealreadyavailableonchipstodayinparticular,multicoreisavailable
oncurrentPowerPCandSparcIVprocessors,andiscomingin2005fromIntelandAMD.Indeed,thebigthemeofthe2004InStat/MDRFall
ProcessorForumwasmulticoredevices,asmanycompaniesshowedneworupdatedmulticoreprocessors.Lookingback,itsnotmuchofastretchto
call2004theyearofmulticore.
Andthatputsusatafundamentalturningpointinsoftwaredevelopment,atleastforthenextfewyearsandforapplicationstargetinggeneralpurpose
desktopcomputersandlowendservers(whichhappenstoaccountforthevastbulkofthedollarvalueofsoftwaresoldtoday).Inthisarticle,Ill
describethechangingfaceofhardware,whyitsuddenlydoesmattertosoftware,andhowspecificallytheconcurrencyrevolutionmatterstoyouandis
goingtochangethewayyouwilllikelybewritingsoftwareinthefuture.
Arguably,thefreelunchhasalreadybeenoverforayearortwo,onlywerejustnownoticing.
TheFreePerformanceLunch
TheresaninterestingphenomenonthatsknownasAndygiveth,andBilltakethaway.Nomatterhowfastprocessorsget,softwareconsistentlyfinds
newwaystoeatuptheextraspeed.MakeaCPUtentimesasfast,andsoftwarewillusuallyfindtentimesasmuchtodo(or,insomecases,willfeel
atlibertytodoittentimeslessefficiently).Mostclassesofapplicationshaveenjoyedfreeandregularperformancegainsforseveraldecades,even
withoutreleasingnewversionsordoinganythingspecial,becausetheCPUmanufacturers(primarily)andmemoryanddiskmanufacturers(secondarily)
havereliablyenabledevernewerandeverfastermainstreamsystems.Clockspeedisnttheonlymeasureofperformance,orevennecessarilyagood
one,butitsaninstructiveone:Wereusedtoseeing500MHzCPUsgivewayto1GHzCPUsgivewayto2GHzCPUs,andsoon.Todaywereinthe
3GHzrangeonmainstreamcomputers.
Thekeyquestionis:Whenwillitend?Afterall,MooresLawpredictsexponentialgrowth,andclearlyexponentialgrowthcantcontinueforeverbefore
wereachhardphysicallimitslightisntgettinganyfaster.Thegrowthmusteventuallyslowdownandevenend.(Caveat:Yes,MooresLawapplies
principallytotransistordensities,butthesamekindofexponentialgrowthhasoccurredinrelatedareassuchasclockspeeds.Theresevenfaster
growthinotherspaces,mostnotablythedatastorageexplosion,butthatimportanttrendbelongsinadifferentarticle.)
Ifyoureasoftwaredeveloper,chancesarethatyouhavealreadybeenridingthefreelunchwaveofdesktopcomputerperformance.Isyour
applicationsperformanceborderlineforsomelocaloperations?Nottoworry,theconventional(ifsuspect)wisdomgoestomorrowsprocessorswill
haveevenmorethroughput,andanywaytodaysapplicationsareincreasinglythrottledbyfactorsotherthanCPUthroughputandmemoryspeed(e.g.,
theyreoftenI/Obound,networkbound,databasebound).Right?
Rightenough,inthepast.Butdeadwrongfortheforeseeablefuture.
Thegoodnewsisthatprocessorsaregoingtocontinuetobecomemorepowerful.Thebadnewsisthat,atleastintheshortterm,thegrowthwillcome
mostlyindirectionsthatdonottakemostcurrentapplicationsalongfortheircustomaryfreeride.
Overthepast30years,CPUdesignershaveachievedperformancegainsinthreemainareas,thefirsttwoofwhichfocusonstraightlineexecution
flow:
clockspeed
executionoptimization
cache
Increasingclockspeedisaboutgettingmorecycles.RunningtheCPUfastermoreorlessdirectlymeansdoingthesameworkfaster.
Optimizingexecutionflowisaboutdoingmoreworkpercycle.TodaysCPUssportsomemorepowerfulinstructions,andtheyperformoptimizations
thatrangefromthepedestriantotheexotic,includingpipelining,branchprediction,executingmultipleinstructionsinthesameclockcycle(s),andeven
reorderingtheinstructionstreamforoutoforderexecution.Thesetechniquesarealldesignedtomaketheinstructionsflowbetterand/orexecute
faster,andtosqueezethemostworkoutofeachclockcyclebyreducinglatencyandmaximizingtheworkaccomplishedperclockcycle.
Briefasideoninstructionreorderingandmemorymodels:NotethatsomeofwhatIjust
calledoptimizationsareactuallyfarmorethanoptimizations,inthattheycanchangethe
meaningofprogramsandcausevisibleeffectsthatcanbreakreasonableprogrammer
expectations.Thisissignificant.CPUdesignersaregenerallysaneandwelladjustedfolks
whonormallywouldnthurtafly,andwouldntthinkofhurtingyourcodenormally.Butin
recentyearstheyhavebeenwillingtopursueaggressiveoptimizationsjusttowringyet
morespeedoutofeachcycle,evenknowingfullwellthattheseaggressive
rearrangementscouldendangerthesemanticsofyourcode.IsthisMr.Hydemakingan
appearance?Notatall.Thatwillingnessissimplyaclearindicatoroftheextremepressure
thechipdesignersfacetodelivereverfasterCPUstheyreundersomuchpressurethattheyllriskchangingthemeaningofyourprogram,and
possiblybreakit,inordertomakeitrunfaster.Twonoteworthyexamplesinthisrespectarewritereorderingandreadreordering:Allowingaprocessor
toreorderwriteoperationshasconsequencesthataresosurprising,andbreaksomanyprogrammerexpectations,thatthefeaturegenerallyhastobe
turnedoffbecauseitstoodifficultforprogrammerstoreasoncorrectlyaboutthemeaningoftheirprogramsinthepresenceofarbitrarywritereordering.
Reorderingreadoperationscanalsoyieldsurprisingvisibleeffects,butthatismorecommonlyleftenabledanywaybecauseitisntquiteashardon
programmers,andthedemandsforperformancecausedesignersofoperatingsystemsandoperatingenvironmentstocompromiseandchoosemodels
thatplaceagreaterburdenonprogrammersbecausethatisviewedasalesserevilthangivinguptheoptimizationopportunities.
Finally,increasingthesizeofonchipcacheisaboutstayingawayfromRAM.MainmemorycontinuestobesomuchslowerthantheCPUthatit
makessensetoputthedataclosertotheprocessorandyoucantgetmuchcloserthanbeingrightonthedie.Ondiecachesizeshavesoared,and
todaymostmajorchipvendorswillsellyouCPUsthathave2MBandmoreofonboardL2cache.(Ofthesethreemajorhistoricalapproachesto
boostingCPUperformance,increasingcacheistheonlyonethatwillcontinueinthenearterm.Illtalkalittlemoreabouttheimportanceofcachelater
on.)
Okay.Sowhatdoesthismean?
Afundamentallyimportantthingtorecognizeaboutthislististhatalloftheseareasareconcurrencyagnostic.Speedupsinanyoftheseareaswill
directlyleadtospeedupsinsequential(nonparallel,singlethreaded,singleprocess)applications,aswellasapplicationsthatdomakeuseof
concurrency.Thatsimportant,becausethevastmajorityoftodaysapplicationsaresinglethreaded,forgoodreasonsthatIllgetintofurtherbelow.
Ofcourse,compilershavehadtokeepupsometimesyouneedtorecompileyourapplication,andtargetaspecificminimumlevelofCPU,inorderto
benefitfromnewinstructions(e.g.,MMX,SSE)andsomenewCPUfeaturesandcharacteristics.But,byandlarge,evenoldapplicationshavealways
runsignificantlyfasterevenwithoutbeingrecompiledtotakeadvantageofallthenewinstructionsandfeaturesofferedbythelatestCPUs.
Thatworldwasaniceplacetobe.Unfortunately,ithasalreadydisappeared.
Obstacles,andWhyYouDontHave10GHzToday
CPUperformancegrowthaswehaveknownithitawalltwoyearsago.Mostpeoplehaveonlyrecentlystartedtonotice.
Youcangetsimilargraphsforotherchips,butImgoingtouseInteldatahere.Figure1graphsthehistoryofIntelchipintroductionsbyclockspeed
andnumberoftransistors.Thenumberoftransistorscontinuestoclimb,atleastfornow.Clockspeed,however,isadifferentstory.
Figure1:IntelCPUIntroductions(graphupdatedAugust2009articletextoriginalfromDecember2004)
MythsandRealities:2x3GHz<6GHz
SoadualcoreCPUthatcombinestwo3GHzcorespracticallyoffers
6GHzofprocessingpower.Right?
Wrong.Evenhavingtwothreadsrunningontwophysicalprocessors
doesntmeangettingtwotimestheperformance.Similarly,most
multithreadedapplicationswontruntwiceasfastonadualcore
box.TheyshouldrunfasterthanonasinglecoreCPUthe
performancegainjustisntlinear,thatsall.
Whynot?First,thereiscoordinationoverheadbetweenthecoresto
ensurecachecoherency(aconsistentviewofcache,andofmain
memory)andtoperformotherhandshaking.Today,atwoorfour
processormachineisntreallytwoorfourtimesasfastasasingle
CPUevenformultithreadedapplications.Theproblemremains
essentiallythesameevenwhentheCPUsinquestionsitonthe
samedie.
Second,unlessthetwocoresarerunningdifferentprocesses,or
differentthreadsofasingleprocessthatarewellwrittentorun
independentlyandalmostneverwaitforeachother,theywontbe
wellutilized.(Despitethis,Iwillspeculatethattodayssingle
threadedapplicationsasactuallyusedinthefieldcouldactuallysee
aperformanceboostformostusersbygoingtoadualcorechip,not
becausetheextracoreisactuallydoinganythinguseful,but
becauseitisrunningtheadwareandspywarethatinfestmany
userssystemsandareotherwiseslowingdownthesingleCPUthat
userhastoday.IleaveituptoyoutodecidewhetheraddingaCPU
torunyourspywareisthebestsolutiontothatproblem.)
Ifyourerunningasinglethreadedapplication,thentheapplication
canonlymakeuseofonecore.Thereshouldbesomespeedupas
theoperatingsystemandtheapplicationcanrunonseparatecores,
buttypicallytheOSisntgoingtobemaxingouttheCPUanywayso
oneofthecoreswillbemostlyidle.(Again,thespywarecanshare
theOSscoremostofthetime.)
Aroundthebeginningof2003,youllnoteadisturbingsharpturnintheprevioustrendtowardeverfasterCPUclockspeeds.Iveaddedlinestoshow
thelimittrendsinmaximumclockspeedinsteadofcontinuingonthepreviouspath,asindicatedbythethindottedline,thereisasharpflattening.It
hasbecomeharderandhardertoexploithigherclockspeedsduetonotjustonebutseveralphysicalissues,notablyheat(toomuchofitandtoohard
todissipate),powerconsumption(toohigh),andcurrentleakageproblems.
Quick:WhatstheclockspeedontheCPU(s)inyourcurrentworkstation?Areyourunningat10GHz?OnIntelchips,wereached2GHzalongtimeago
(August2001),andaccordingtoCPUtrendsbefore2003,nowinearly2005weshouldhavethefirst10GHzPentiumfamilychips.Aquicklookaround
showsthat,well,actually,wedont.Whatsmore,suchchipsarenotevenonthehorizonwehavenogoodideaatallaboutwhenwemightseethem
appear.
Well,then,whatabout4GHz?Wereat3.4GHzalreadysurely4GHzcantbefaraway?Alas,even4GHzseemstoberemoteindeed.Inmid2004,as
youprobablyknow,Intelfirstdelayeditsplannedintroductionofa4GHzchipuntil2005,andtheninfall2004itofficiallyabandonedits4GHzplans
entirely.Asofthiswriting,Intelisplanningtorampupalittlefurtherto3.73GHzinearly2005(alreadyincludedinFigure1astheupperrightmostdot),
buttheclockracereallyisover,atleastfornowIntelsandmostprocessorvendorsfuturelieselsewhereaschipcompaniesaggressivelypursuethe
samenewmulticoredirections.
Wellprobablysee4GHzCPUsinourmainstreamdesktopmachinessomeday,butitwontbein2005.Sure,Intelhassamplesoftheirchipsrunningat
evenhigherspeedsinthelabbutonlybyheroicefforts,suchasattachinghideouslyimpracticalquantitiesofcoolingequipment.Youwonthavethat
kindofcoolinghardwareinyourofficeanydaysoon,letaloneonyourlapwhilecomputingontheplane.
TANSTAAFL:MooresLawandtheNextGeneration(s)
Thereaintnosuchthingasafreelunch.R.A.Heinlein,TheMoonIsaHarshMistress
DoesthismeanMooresLawisover?Interestingly,theansweringeneralseemstobeno.Ofcourse,likeallexponentialprogressions,MooresLaw
mustendsomeday,butitdoesnotseemtobeindangerforafewmoreyearsyet.Despitethewallthatchipengineershavehitinjuicinguprawclock
cycles,transistorcountscontinuetoexplodeanditseemsCPUswillcontinuetofollowMooresLawlikethroughputgainsforsomeyearstocome.
Thekeydifference,whichistheheartofthisarticle,isthattheperformance
gainsaregoingtobeaccomplishedinfundamentallydifferentwaysforat
leastthenextcoupleofprocessorgenerations.Andmostcurrent
applicationswillnolongerbenefitfromthefreeridewithoutsignificant
redesign.
Fortheneartermfuture,meaningforthenextfewyears,theperformance
gainsinnewchipswillbefueledbythreemainapproaches,onlyoneof
whichisthesameasinthepast.Theneartermfutureperformancegrowth
driversare:
hyperthreading
multicore
cache
HyperthreadingisaboutrunningtwoormorethreadsinparallelinsideasingleCPU.HyperthreadedCPUsarealreadyavailabletoday,andtheydoallow
someinstructionstoruninparallel.Alimitingfactor,however,isthatalthoughahyperthreadedCPUhassomeextrahardwareincludingextraregisters,
itstillhasjustonecache,oneintegermathunit,oneFPU,andingeneraljustoneeachofmostbasicCPUfeatures.Hyperthreadingissometimescited
asofferinga5%to15%performanceboostforreasonablywellwrittenmultithreadedapplications,orevenasmuchas40%underidealconditionsfor
carefullywrittenmultithreadedapplications.Thatsgood,butitshardlydouble,anditdoesnthelpsinglethreadedapplications.
MulticoreisaboutrunningtwoormoreactualCPUsononechip.Somechips,includingSparcandPowerPC,havemulticoreversionsavailablealready.
Concurrency is the next major revolution
in how we write software
TheinitialIntelandAMDdesigns,bothduein2005,varyintheirlevelofintegrationbutarefunctionallysimilar.AMDsseemstohavesomeinitial
performancedesignadvantages,suchasbetterintegrationofsupportfunctionsonthesamedie,whereasIntelsinitialentrybasicallyjustglues
togethertwoXeonsonasingledie.TheperformancegainsshouldinitiallybeaboutthesameashavingatruedualCPUsystem(onlythesystemwill
becheaperbecausethemotherboarddoesnthavetohavetwosocketsandassociatedgluechippery),whichmeanssomethinglessthandoublethe
speedevenintheidealcase,andjustliketodayitwillboostreasonablywellwrittenmultithreadedapplications.Notsinglethreadedones.
Finally,ondiecachesizescanbeexpectedtocontinuetogrow,atleastinthenearterm.Ofthesethreeareas,onlythisonewillbroadlybenefitmost
existingapplications.Thecontinuinggrowthinondiecachesizesisanincrediblyimportantandhighlyapplicablebenefitformanyapplications,simply
becausespaceisspeed.Accessingmainmemoryisexpensive,andyoureallydontwanttotouchRAMifyoucanhelpit.Ontodayssystems,a
cachemissthatgoesouttomainmemoryoftencosts10to50timesasmuchgettingtheinformationfromthecachethis,incidentally,continuesto
surprisepeoplebecauseweallthinkofmemoryasfast,anditisfastcomparedtodisksandnetworks,butnotcomparedtoonboardcachewhichruns
atfasterspeeds.Ifanapplicationsworkingsetfitsintocache,weregolden,andifitdoesnt,werenot.Thatiswhyincreasedcachesizeswillsave
someexistingapplicationsandbreathelifeintothemforafewmoreyearswithoutrequiringsignificantredesign:Asexistingapplicationsmanipulate
moreandmoredata,andastheyareincrementallyupdatedtoincludemorecodefornewfeatures,performancesensitiveoperationsneedtocontinue
tofitintocache.AstheDepressioneraoldtimerswillbequicktoremindyou,Cacheisking.
(Aside:Heresananecdotetodemonstratespaceisspeedthatrecentlyhitmycompilerteam.Thecompilerusesthesamesourcebaseforthe32bit
and64bitcompilersthecodeisjustcompiledaseithera32bitprocessora64bitone.The64bitcompilergainedagreatdealofbaseline
performancebyrunningona64bitCPU,principallybecausethe64bitCPUhadmanymoreregisterstoworkwithandhadothercodeperformance
features.Allwellandgood.Butwhataboutdata?Goingto64bitsdidntchangethesizeofmostofthedatainmemory,exceptthatofcoursepointers
inparticularwerenowtwicethesizetheywerebefore.Asithappens,ourcompilerusespointersmuchmoreheavilyinitsinternaldatastructuresthan
mostotherkindsofapplicationseverwould.Becausepointerswerenow8bytesinsteadof4bytes,apuredatasizeincrease,wesawasignificant
increaseinthe64bitcompilersworkingset.Thatbiggerworkingsetcausedaperformancepenaltythatalmostexactlyoffsetthecodeexecution
performanceincreasewedgainedfromgoingtothefasterprocessorwithmoreregisters.Asofthiswriting,the64bitcompilerrunsatthesamespeed
asthe32bitcompiler,eventhoughthesourcebaseisthesameforbothandthe64bitprocessoroffersbetterrawprocessingthroughput.Spaceis
speed.)
Butcacheisit.HyperthreadingandmulticoreCPUswillhavenearlynoimpactonmostcurrentapplications.
Sowhatdoesthischangeinthehardwaremeanforthewaywewritesoftware?Bynowyouveprobablynoticedthebasicanswer,soletsconsiderit
anditsconsequences.
WhatThisMeansForSoftware:TheNextRevolution
Inthe1990s,welearnedtogrokobjects.Therevolutioninmainstreamsoftwaredevelopmentfromstructuredprogrammingtoobjectoriented
programmingwasthegreatestsuchchangeinthepast20years,andarguablyinthepast30years.Therehavebeenotherchanges,includingthemost
recent(andgenuinelyinteresting)naissanceofwebservices,butnothingthatmostofushaveseenduringourcareershasbeenasfundamentalandas
farreachingachangeinthewaywewritesoftwareastheobjectrevolution.
Untilnow.
Startingtoday,theperformancelunchisntfreeanymore.Sure,therewillcontinuetobegenerallyapplicableperformancegainsthateveryonecanpick
up,thanksmainlytocachesizeimprovements.Butifyouwantyourapplicationtobenefitfromthecontinuedexponentialthroughputadvancesinnew
processors,itwillneedtobeawellwrittenconcurrent(usuallymultithreaded)application.Andthatseasiersaidthandone,becausenotallproblems
areinherentlyparallelizableandbecauseconcurrentprogrammingishard.
Icanhearthehowlsofprotest:Concurrency?Thatsnotnews!Peoplearealreadywritingconcurrentapplications.Thatstrue.Ofasmallfractionof
developers.
RememberthatpeoplehavebeendoingobjectorientedprogrammingsinceatleastthedaysofSimulainthelate1960s.ButOOdidntbecomea
revolution,anddominantinthemainstream,untilthe1990s.Whythen?Thereasontherevolutionhappenedwasprimarilythatourindustrywasdriven
byrequirementstowritelargerandlargersystemsthatsolvedlargerandlargerproblemsandexploitedthegreaterandgreaterCPUandstorage
resourcesthatwerebecomingavailable.OOPsstrengthsinabstractionanddependencymanagementmadeitanecessityforachievinglargescale
softwaredevelopmentthatiseconomical,reliable,andrepeatable.
Similarly,wevebeendoingconcurrentprogrammingsincethosesamedarkages,writing
coroutinesandmonitorsandsimilarjazzystuff.Andforthepastdecadeorsoweve
witnessedincrementallymoreandmoreprogrammerswritingconcurrent(multithreaded,
multiprocess)systems.Butanactualrevolutionmarkedbyamajorturningpointtoward
concurrencyhasbeenslowtomaterialize.Todaythevastmajorityofapplicationsare
singlethreaded,andforgoodreasonsthatIllsummarizeinthenextsection.
Bytheway,onthematterofhype:Peoplehavealwaysbeenquicktoannouncethenextsoftwaredevelopmentrevolution,usuallyabouttheirown
brandnewtechnology.Dontbelieveit.Newtechnologiesareoftengenuinelyinterestingandsometimesbeneficial,butthebiggestrevolutionsinthe
waywewritesoftwaregenerallycomefromtechnologiesthathavealreadybeenaroundforsomeyearsandhavealreadyexperiencedgradualgrowth
beforetheytransitiontoexplosivegrowth.Thisisnecessary:Youcanonlybaseasoftwaredevelopmentrevolutiononatechnologythatsmature
enoughtobuildon(includinghavingsolidvendorandtoolsupport),anditgenerallytakesanynewsoftwaretechnologyatleastsevenyearsbeforeits
solidenoughtobebroadlyusablewithoutperformancecliffsandothergotchas.Asaresult,truesoftwaredevelopmentrevolutionslikeOOhappen
aroundtechnologiesthathavealreadybeenundergoingrefinementforyears,oftendecades.EveninHollywood,mostgenuineovernightsuccesses
havereallybeenperformingformanyyearsbeforetheirbigbreak.
Concurrencyisthenextmajorrevolutioninhowwewritesoftware.DifferentexpertsstillhavedifferentopinionsonwhetheritwillbebiggerthanOO,but
thatkindofconversationisbestlefttopundits.Fortechnologists,theinterestingthingisthatconcurrencyisofthesameorderasOObothinthe
(expected)scaleoftherevolutionandinthecomplexityandlearningcurveofthetechnology.
BenefitsandCostsofConcurrency
Therearetwomajorreasonsforwhichconcurrency,especiallymultithreading,isalreadyusedinmainstreamsoftware.Thefirstistologicallyseparate
naturallyindependentcontrolflowsforexample,inadatabasereplicationserverIdesigneditwasnaturaltoputeachreplicationsessiononitsown
thread,becauseeachsessionworkedcompletelyindependentlyofanyothersthatmightbeactive(aslongastheywerentworkingonthesame
databaserow).Thesecondandlesscommonreasontowriteconcurrentcodeinthepasthasbeenforperformance,eithertoscalablytakeadvantage
ofmultiplephysicalCPUsortoeasilytakeadvantageoflatencyinotherpartsoftheapplicationinmydatabasereplicationserver,thisfactorapplied
aswellandtheseparatethreadswereabletoscalewellonmultipleCPUsasourserverhandledmoreandmoreconcurrentreplicationsessionswith
manyotherservers.
Thereare,however,realcoststoconcurrency.Someoftheobviouscostsareactuallyrelativelyunimportant.Forexample,yes,lockscanbe
expensivetoacquire,butwhenusedjudiciouslyandproperlyyougainmuchmorefromtheconcurrentexecutionthanyouloseonthesynchronization,
ifyoucanfindasensiblewaytoparallelizetheoperationandminimizeoreliminatesharedstate.
Perhapsthesecondgreatestcostofconcurrencyisthatnotallapplicationsareamenabletoparallelization.Illsaymoreaboutthislateron.
Probablythegreatestcostofconcurrencyisthatconcurrencyreallyishard:Theprogrammingmodel,meaningthemodelintheprogrammersheadthat
heneedstoreasonreliablyabouthisprogram,ismuchharderthanitisforsequentialcontrolflow.
Everybodywholearnsconcurrencythinkstheyunderstandit,endsupfindingmysteriousracestheythoughtwerentpossible,anddiscoversthatthey
didntactuallyunderstandityetafterall.Asthedeveloperlearnstoreasonaboutconcurrency,theyfindthatusuallythoseracescanbecaughtby
reasonableinhousetesting,andtheyreachanewplateauofknowledgeandcomfort.Whatusuallydoesntgetcaughtintesting,however,exceptin
The vast majority of programmers today
dont grok concurrency, just as the vast
majority of programmers 15 years ago
didnt yet grok objects
Applications will increasingly need to be
concurrent if they want to fully exploit
continuing exponential CPU throughput
gains
Efficiency and performance optimization
will get more, not less, important
shopsthatunderstandwhyandhowtodorealstresstesting,isthoselatentconcurrencybugsthatsurfaceonlyontruemultiprocessorsystems,where
thethreadsarentjustbeingswitchedaroundonasingleprocessorbutwheretheyreallydoexecutetrulysimultaneouslyandthusexposenewclasses
oferrors.Thisisthenextjoltforpeoplewhothoughtthatsurelynowtheyknowhowtowriteconcurrentcode:Ivecomeacrossmanyteamswhose
applicationworkedfineevenunderheavyandextendedstresstesting,andranperfectlyatmanycustomersites,untilthedaythatacustomeractually
hadarealmultiprocessormachineandthendeeplymysteriousracesandcorruptionsstartedtomanifestintermittently.InthecontextoftodaysCPU
landscape,then,redesigningyourapplicationtorunmultithreadedonamulticoremachineisalittlelikelearningtoswimbyjumpingintothedeepend
goingstraighttotheleastforgiving,trulyparallelenvironmentthatismostlikelytoexposethethingsyougotwrong.Evenwhenyouhaveateamthat
canreliablywritesafeconcurrentcode,thereareotherpitfallsforexample,concurrentcodethatiscompletelysafebutisntanyfasterthanitwasona
singlecoremachine,typicallybecausethethreadsarentindependentenoughandshareadependencyonasingleresourcewhichreserializesthe
programsexecution.Thisstuffgetsprettysubtle.
JustasitisaleapforastructuredprogrammertolearnOO(whatsanobject?whatsa
virtualfunction?howshouldIuseinheritance?andbeyondthewhatsandhows,whyare
thecorrectdesignpracticesactuallycorrect?),itsaleapofaboutthesamemagnitudefor
asequentialprogrammertolearnconcurrency(whatsarace?whatsadeadlock?howcan
itcomeup,andhowdoIavoidit?whatconstructsactuallyserializetheprogramthatI
thoughtwasparallel?howisthemessagequeuemyfriend?andbeyondthewhatsand
hows,whyarethecorrectdesignpracticesactuallycorrect?).
Thevastmajorityofprogrammerstodaydontgrokconcurrency,justasthevastmajority
ofprogrammers15yearsagodidntyetgrokobjects.Buttheconcurrentprogrammingmodelislearnable,particularlyifwesticktomessageandlock
basedprogramming,andoncegrokkeditisntthatmuchharderthanOOandhopefullycanbecomejustasnatural.Justbereadyandallowforthe
investmentintrainingandtime,foryouandforyourteam.
(Ideliberatelylimittheabovetomessageandlockbasedconcurrentprogrammingmodels.Thereisalsolockfreeprogramming,supportedmost
directlyatthelanguagelevelinJava5andinatleastonepopularC++compiler.Butconcurrentlockfreeprogrammingisknowntobeverymuch
harderforprogrammerstounderstandandreasonaboutthanevenconcurrentlockbasedprogramming.Mostofthetime,onlysystemsandlibrary
writersshouldhavetounderstandlockfreeprogramming,althoughvirtuallyeverybodyshouldbeabletotakeadvantageofthelockfreesystemsand
librariesthosepeopleproduce.Frankly,evenlockbasedprogrammingishazardous.)
WhatItMeansForUs
Okay,backtowhatitmeansforus.
1.TheclearprimaryconsequencewevealreadycoveredisthatapplicationswillincreasinglyneedtobeconcurrentiftheywanttofullyexploitCPU
throughputgainsthathavenowstartedbecomingavailableandwillcontinuetomaterializeoverthenextseveralyears.Forexample,Intelistalking
aboutsomedayproducing100corechipsasinglethreadedapplicationcanexploitatmost1/100ofsuchachipspotentialthroughput.Oh,
performancedoesntmattersomuch,computersjustkeepgettingfasterhasalwaysbeenanavestatementtobeviewedwithsuspicion,andforthe
nearfutureitwillalmostalwaysbesimplywrong.
Now,notallapplications(or,moreprecisely,importantoperationsofanapplication)are
amenabletoparallelization.True,someproblems,suchascompilation,arealmostideally
parallelizable.Butothersarenttheusualcounterexamplehereisthatjustbecauseit
takesonewomanninemonthstoproduceababydoesntimplythatninewomencould
produceonebabyinonemonth.Youveprobablycomeacrossthatanalogybefore.Butdid
younoticetheproblemwithleavingtheanalogyatthat?Heresthetrickquestiontoask
thenextpersonwhousesitonyou:CanyouconcludefromthisthattheHumanBaby
Problemisinherentlynotamenabletoparallelization?Usuallypeoplerelatingthisanalogy
errinquicklyconcludingthatitdemonstratesaninherentlynonparallelproblem,butthats
actuallynotnecessarilycorrectatall.Itisindeedaninherentlynonparallelproblemifthe
goalistoproduceonechild.Itisactuallyanideallyparallelizableproblemifthegoalisto
producemanychildren!Knowingtherealgoalscanmakeallthedifference.Thisbasic
goalorientedprincipleissomethingtokeepinmindwhenconsideringwhetherandhowtoparallelizeyoursoftware.
2.PerhapsalessobviousconsequenceisthatapplicationsarelikelytobecomeincreasinglyCPUbound.Ofcourse,noteveryapplicationoperation
willbeCPUbound,andeventhosethatwillbeaffectedwontbecomeCPUboundovernightiftheyarentalready,butweseemtohavereachedtheend
oftheapplicationsareincreasinglyI/Oboundornetworkboundordatabaseboundtrend,becauseperformanceinthoseareasisstillimprovingrapidly
(gigabitWiFi,anyone?)whiletraditionalCPUperformanceenhancingtechniqueshavemaxedout.Consider:Werestoppinginthe3GHzrangefornow.
Thereforesinglethreadedprogramsarelikelynottogetmuchfasteranymorefornowexceptforbenefitsfromfurthercachesizegrowth(whichisthe
maingoodnews).Othergainsarelikelytobeincrementalandmuchsmallerthanwevebeenusedtoseeinginthepast,forexampleaschipdesigners
findnewwaystokeeppipelinesfullandavoidstalls,whichareareaswherethelowhangingfruithasalreadybeenharvested.Thedemandfornew
applicationfeaturesisunlikelytoabate,andevenmoresothedemandtohandlevastlygrowingquantitiesofapplicationdataisunlikelytostop
accelerating.Aswecontinuetodemandthatprogramsdomore,theywillincreasinglyoftenfindthattheyrunoutofCPUtodoitunlesstheycancode
forconcurrency.
Therearetwowaystodealwiththisseachangetowardconcurrency.Oneistoredesignyourapplicationsforconcurrency,asabove.Theotheristobe
frugal,bywritingcodethatismoreefficientandlesswasteful.Thisleadstothethirdinterestingconsequence:
3.Efficiencyandperformanceoptimizationwillgetmore,notless,important.Thoselanguagesthatalreadylendthemselvestoheavyoptimizationwill
findnewlifethosethatdontwillneedtofindwaystocompeteandbecomemoreefficientandoptimizable.Expectlongtermincreaseddemandfor
performanceorientedlanguagesandsystems.
4.Finally,programminglanguagesandsystemswillincreasinglybeforcedtodealwellwithconcurrency.TheJavalanguagehasincludedsupportfor
concurrencysinceitsbeginning,althoughmistakesweremadethatlaterhadtobecorrectedoverseveralreleasesinordertodoconcurrent
programmingmorecorrectlyandefficiently.TheC++languagehaslongbeenusedtowriteheavydutymultithreadedsystemswell,butithasno
standardizedsupportforconcurrencyatall(theISOC++standarddoesntevenmentionthreads,anddoessointentionally),andsotypicallythe
concurrencyisofnecessityaccomplishedbyusingnonportableplatformspecificconcurrencyfeaturesandlibraries.(Itsalsooftenincompletefor
example,staticvariablesmustbeinitializedonlyonce,whichtypicallyrequiresthatthecompilerwrapthemwithalock,butmanyC++implementations
donotgeneratethelock.)Finally,thereareafewconcurrencystandards,includingpthreadsandOpenMP,andsomeofthesesupportimplicitaswell
asexplicitparallelization.Havingthecompilerlookatyoursinglethreadedprogramandautomaticallyfigureouthowtoparallelizeitimplicitlyisfineand
dandy,butthoseautomatictransformationtoolsarelimitedanddontyieldnearlythegainsofexplicitconcurrencycontrolthatyoucodeyourself.The
mainstreamstateoftheartrevolvesaroundlockbasedprogramming,whichissubtleandhazardous.Wedesperatelyneedahigherlevelprogramming
modelforconcurrencythanlanguagesoffertodayI'llhavemoretosayaboutthatsoon.
Conclusion
Ifyouhaventdonesoalready,nowisthetimetotakeahardlookatthedesignofyourapplication,determinewhatoperationsareCPUsensitivenow
orarelikelytobecomesosoon,andidentifyhowthoseplacescouldbenefitfromconcurrency.Nowisalsothetimeforyouandyourteamtogrok
concurrentprogrammingsrequirements,pitfalls,styles,andidioms.
Afewrareclassesofapplicationsarenaturallyparallelizable,butmostarent.EvenwhenyouknowexactlywhereyoureCPUbound,youmaywellfind
itdifficulttofigureouthowtoparallelizethoseoperationsallthemostreasontostartthinkingaboutitnow.Implicitlyparallelizingcompilerscanhelpa
little,butdontexpectmuchtheycantdonearlyasgoodajobofparallelizingyoursequentialprogramasyoucoulddobyturningitintoanexplicitly
parallelandthreadedversion.
Thankstocontinuedcachegrowthandprobablyafewmoreincrementalstraightlinecontrolflowoptimizations,thefreelunchwillcontinuealittlewhile
longerbutstartingtodaythebuffetwillonlybeservingthatoneentreandthatonedessert.Thefiletmignonofthroughputgainsisstillonthemenu,
butnowitcostsextraextradevelopmenteffort,extracodecomplexity,andextratestingeffort.Thegoodnewsisthatformanyclassesof
applicationstheextraeffortwillbeworthwhile,becauseconcurrencywillletthemfullyexploitthecontinuingexponentialgainsinprocessorthroughput.
Copyright2009HerbSutter

The Free Lunch Is Over: A Fundamental Turn Toward Concurrency in Software

Diunggah oleh

Informasi Dokumen

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

The Free Lunch Is Over: A Fundamental Turn Toward Concurrency in Software

Diunggah oleh

Hak Cipta:

Format Tersedia

Chip designers are under so much pressure

to deliver ever-faster CPUs that theyll risk

Anda mungkin juga menyukai