Login
ABOUT
PCCOMPONENTS
TRENDINGTOPICS
CPUS
SMARTPHONES&TABLETS
SMARTPHONES
INTEL
GPUS
SYSTEMS
AMD
MOBILE
NVIDIA
Register
BENCH
FORUMS
ENTERPRISE&IT
SSDS
SAMSUNG
PODCAST
WEARABLES
Follow
45.7K followers
QualcommDetailsHexagon680DSPin
Snapdragon820:AcceleratedImaging
byJoshuaHoonAugust24,20159:00AMEST
Postedin Mobile
Snapdragon
Qualcomm
SOC
39
Comments
+AddA
Comment
Snapdragon820
+SUBMITNEWS
EarlyAMDZenServer
CPUandMotherboard
Details:Codename
Naples,32-cores,Dual
SocketPlatforms,Q2
2017
NVIDIAReleasesGeForceGTX
10603GB:GTX1060,YetNot
HPandMAINGEARTeamUpfor
OmenXHigh-EndGamingPC
MicronCancelsBallistixTX3NVMeSSD
MoreDetailsonBroxton:QuadCore,ECC,
Upto18EUsofGen9
MarvellAnnouncesNVMeControllerfor
DRAM-LessPCIe3.0x2SSDs
IntelAnnouncesKnightsMill:AXeonPhi
ForDeepLearning
IntelUnveilsJoule:AHigh-Performance
Atom-PoweredIoTModule&MakerKit
AlthoughwetendnottofocustoomuchonthetertiaryaspectsofaSoC,theyareoftenimportantto
enablingmanyaspectsoftheuserexperience.DSPsareimportantforanumberofuniqueapplicationssuch
asvoiceprocessing,audioprocessing,andotherinputprocessingapplications.Beforewegetintothemeat
ofthearticlethoughit'simportanttonotethattheaboveimageisnotadieshotoranactualblockdiagram,
butisveryroughlyapproximatingtherelativesizeofeachcomponentintheSoC.
TodayatHotChips,QualcommelectedtorevealanumberofdetailsabouttheirHexagon680DSP,whichwill
shipintheSnapdragon820.ThosethathavefollowedourcoverageregardingtheSnapdragon820ISP
featureswillprobablybeabletoguessthatanumberoffeaturesontheSnapdragon820SpectraISPare
enabledthroughtheuseofthisnewerDSP.
IntelAnnouncesProjectAlloy:Untethered
AugmentedRealityinaVRHeadsetwith
RealSense
IntelTeasesMobileKabyLake:HEVC
Main10ProfileSupport,ComingThis
Autumn
IntelOptaneAnd3DXPointUpdatesFrom
IDF
AMDReleasesRadeonSoftwareCrimson
Edition16.8.2Hotfix
TWEETS
IanCutress:ForthoseaskingabtAMD'sZen
blendercoverage,I'mworkingoncontext.
There'salotofecosystemtounderstandb4
blindlyreportingscore
IanCutress:@paulhardwareZEN!=8core
BristolRidge.<internalscreaming>
IanCutress:I'minthebackgroundgesturing
wildly@paulhardware@AMD@YouTube
IanCutress:@FPiednoel@TheKanterSure,
sampleswouldbegreat.Butmostaren't
http://www.anandtech.com/show/9552/qualcomm-details-hexagon-680-dsp-in-snapdragon-820-accelerated-imaging
1/11
8/21/2016
ForthosethatareunfamiliarwithDSPs,thebasicideabehindDSPsisthattheyareasortofinbetween
ganeshts:Can'tmakeconclusionsbasedon
engg.samples/withouteval.of
perf/power/cost.Let'swaitforshipping
versionshttps://t.co/xQGrIBukhf
pointinarchitecturedesignbetweenhighlyefficientfixedfunctionhardware(think:videodecoders)andhighly
flexibleCPUs.DSPsareprogrammable,butarerigidindesignandaredesignedtodoalimitednumberof
taskswell,makingthemefficientatthosetasksrelativetoaCPU,butmoreflexiblethanfixedfunction
RyanSmithAT:Andthespecsarepostedat
https://t.co/VCvo0gS063:
https://t.co/b8fG6Fi92D$199
hardware.ThesedesigngoalsaretypicallymanifestedinDSPsasinorderarchitectures,whichmeansthat
RyanSmithAT:NVIDIAhasjustformally
announceda3GBGTX1060.Ithasanother
SMdisabled(9)versusthe6GB(10),soit
willperformdifferently...
there'smuchlesspowerandareadedicatedonsilicontoparallelizecodeonthefly.Thismeansthatwhilea
DSPcandoanumberofworkloadsthatwouldotherwisebeimpossibleonacompletelyfixedfunctionblock,
youwouldn'twanttotryanduseonetoreplaceaCPU.It'simportanttoemphasizethatDSPsaregenerally
morefocusedoninstructionlevelparallelism(singlecoreperformance)ratherthanthreadlevelparallelism
RyanSmithAT:RT@anandtech:AMDZen
Microarchitecture:DualSchedulers,Micro
OpCacheandMemoryHierarchyRevealed
https://t.co/TfOwSf3rK4
(multicoreperformance),soyouwon'tseehundreds/thousandsof"cores"inaDSPlikeyouwouldinaGPU
architecturelikeMaxwell.
ConsequentlythearchitectureofDSPsliketheHexagon680arerelativelyaliencomparedtostandardCPUs,
ganeshts:@ServerAceAtahigherlevel,the
32C/64TNaplesx86SoCseemsuniquein
termsofcorecount/cangointo2Psystems.
asoptimizationiseverythingintheapplicationswhereDSPsmakesense.Forexample,DSPinstructionsets
areoftenVLIW(verylonginstructionword),inwhichmultipleexecutionunitsaredriveninparallelwitha
singleinstruction.Certainarithmeticoperationsarealsohighlyacceleratedwithspecialinstructionsinorderto
RyanSmithAT:@TheKanter@DianeBryant
isitconfirmedthatthisisKnightsHill
renamed?PCWorldhastheirdoubts.Anda
superlarge10nmchipin2017?
enablekeyalgorithmsforsignalprocessingsuchasFastFourierTransform(FFT).
Follow@ANANDTECH
http://www.anandtech.com/show/9552/qualcomm-details-hexagon-680-dsp-in-snapdragon-820-accelerated-imaging
2/11
8/21/2016
InthecaseoftheHexagon680,oneofthekeyfeaturesQualcommisfocusingonforthislaunchare
HexagonVectorExtensions(HVX).HVXisdesignedtohandlesignificantcomputeworkloadsforimage
processingapplicationssuchasvirtualreality,augmentedreality,imageprocessing,videoprocessing,and
computervision.ThismeansthattasksthatmightotherwiseberunningonarelativelypowerhungryCPUor
GPUcanrunacomparativelyefficientDSPinstead.
TheHVXextensiontoHexagonhas1024bitvectordataregisters,withtheabilitytoaddressuptofourof
theseslotsperinstruction,whichallowsforupto4096bitspercycle.It'simportanttokeepinmindthatthe
instructionwidthismuchsmallerthanthisasthisisasingleinstruction,multipledata(SIMD)unitwhichuses
oneoperationovermultiplechunksofdata.Thereare32ofthesevectorregisters,whichappeartobesplit
betweentwoHVXcontexts.Thereissupportforupto32bitfixedpointdecimaloperations,butfloatingpoint
isnotsupportedtoreducediesizeandpowerconsumption,asthepreviouslymentionedapplicationsfor
Hexagon680dontneedfloatingpointsupport.AsDSPstendtohaveISAstailoredfortheapplication,the
Hexagon680HVXunitssupportslidingwindowfilters,LUTs,andhistogramaccelerationattheISAlevel.The
performanceoftheseunitsaresaidtobesufficientfor4Kvideopostprocessing,20MPcameraburst
processing,andotherapplicationswithsimilarcomputerequirements.
http://www.anandtech.com/show/9552/qualcomm-details-hexagon-680-dsp-in-snapdragon-820-accelerated-imaging
3/11
8/21/2016
Outsideofthesedetailsatapercontextbasis,thethreadingmodelandmemoryhierarchyoftheHexagon
680isquiteunique.Forscalarinstructions,fourthreadsareavailablewitha4wayVLIWarchitecturerunning
at500MHzperthread.ThesescalarunitsallshareanL1instructioncache,L1datacache,andL2cache.
ThetwoHVXcontextsintheHexagon680canbecontrolledbyanytwoscalarthreadsandalsorunat500
MHzwithoutstallingotherscalarunitsnotinvolvedincontrollingthevectorunits.Thislevelofhardwarelevel
multithreadingalongwithQoSsystemsandL2softpartitioningonaperthreadhelpstomakesureaudioand
imagingtasksarentfightingforexecutiontimeontheHexagonDSP.
http://www.anandtech.com/show/9552/qualcomm-details-hexagon-680-dsp-in-snapdragon-820-accelerated-imaging
4/11
8/21/2016
MeanwhilethevectorunitsarefedexclusivelyfromtheL2cachethatissharedwiththescalarunits,achoice
QualcommmadeduetotheoverheadthatcomeswithanL1cacheforimageprocessingworkloads.ThisL2
cachecandoloadtouseinasinglecyclethough,soonecouldarguethatthisistechnicallyanL1cacheat
timesanyhow.TheHexagon680intheSnapdragon820willalsobeabletohavedatafromthecamera
sensordirectlystreamedtotheL2cacheandsharedwiththeISPtoavoidthepowercostofgoingoffdieto
DRAM.TheresalsoanSMMU(SystemMemoryManagementUnit)whichallowsfornocopydatasharing
withtheCPUformultiplesimultaneousapplications.DSPmemorywriteswillalsosnoopinvalidateCPU
cachewithouttheneedfortheCPUtodoanyworkinvolvingcachemaintenancetoreducepower
consumptionandimproveperformance.
http://www.anandtech.com/show/9552/qualcomm-details-hexagon-680-dsp-in-snapdragon-820-accelerated-imaging
5/11
8/21/2016
RelativetoaquadcoreKrait,theadvantagesofrunningsomeworkloadsonaDSPisenormousbasedon
Qualcomm'sinternalbenchmarks.AccordingtoQualcomm,theNEONunitsintheKraitCPUaregenerally
representativeofNEONunitswithintheindustry,whichisthereasonwhythey'vebeenusedasthereference
pointhere.Withinasinglelogicalcore,Kraitwillonlysupport128bitNEONwithasingleSIMDpipeline,
comparedtothe4way,1024bitSIMDunitsoftheHexagon680.SIMDthreadsalsorunona512KBL2but
almostL1cache,asopposedtothe32KBL1instruction/datacacheofKrait,whichhelpstohidelatency
effectsofDRAM.TheNEONunitsofaKraitandmanyotherARMCPUsarecapableoffloatingpoint,butina
workloadlikelowlightvideoenhancementHexagon680willbeabletocompletethesameamountofworkat
threetimesthespeed,whileusinganorderofmagnitudelesspowerduetotheinherentadvantagesofa
taskspecificDSParchitecture.ThefourscalarthreadsavailableintheDSPalsomeansthatentirealgorithms
canbeoffloadedtotheDSPinsteadofpartiallyrunningontheCPU,whichalsoreducespowerconsumption
andmakesiteasierfordeveloperstotakeadvantageoftheDSP.
http://www.anandtech.com/show/9552/qualcomm-details-hexagon-680-dsp-in-snapdragon-820-accelerated-imaging
6/11
8/21/2016
WhileHexagon680svectorandscalarenginesareusefulforheavydutysignalprocessingworkloads,the
additionofthelowpowerisland(LPI)DSPmakesitpossibletodoawaywithseparatesensorhubsin
smartphones.AccordingtoQualcomm,thisDSPiscompletelyseparatefromthescalarandvectorcompute
DSPpreviouslydiscussed(yetstillpartoftheoverallHexagonDSPdesign),andsitsonitsownpowerisland
sotherestoftheSoCcanbeshutdownwhilekeepingtheLPIon.Thisalsoshouldnthaveadifferent
processtechnologyoraradicallydifferentstandardcelllibrary,astheadvantagesfromtheleadingedge
FinFETprocessshouldhelpsignificantlywithpowerconsumption.
http://www.anandtech.com/show/9552/qualcomm-details-hexagon-680-dsp-in-snapdragon-820-accelerated-imaging
7/11
8/21/2016
ItssaidthatthislowpowerislandwithanindependentDSPandnewerprocessnodeisenoughtoimprove
powerefficiencybyuptothreetimesincertainworkloadscomparedtoSnapdragon808.Isuspectthatthis
wasdoneinsteadofacomparisontotheMSM8974/Snapdragon800generationbecausetheHexagonDSP
wasupdatedinthemovefromSnapdragon805to808.QualcommemphasizedthechoiceofaDSPoveran
MCUforthistask,asintheirinternaltestingaDSPdeliversbetterpowerefficiencythanaCortexMclass
MCUformoreadvancedsensoralgorithms.Thesoftwarestackforallofthesefeaturesisalreadysaidtobe
quitecomplete,withaframeworkandalgorithmsincludedforOEMdevelopment.ThebroaderHexagon600
seriesSDKisalsoquiteextensive,withanumberofutilitiestoallowforfasterandeasierdevelopment.
Ifyourelikeme,aftergoingthroughallofthisinformationyoumightbewonderingwhatthevalueofthese
vectorDSPextensionsare.IndiscussionswithQualcomm,itseemsthatthereasoningbehindpushinga
numberofimageprocessingtaskstotheHexagonDSPcoreismostlybecausethealgorithmsbehindthings
suchasHDRvideo,HDRimagemerging,lowlightimageenhancement,andotheradvancedalgorithmsare
stillinfluxevenfromsoftwareupdatetosoftwareupdate.Asaresult,itisntviabletomaketheseaspectsof
theimagingpipelinedoneinfixedfunctionhardware.WithouttheuseoftheHexagonDSP,thesetaskscould
potentiallyenduprunningontheCPUorGPU,affectinguserexperienceintheformofhighershottoshot
latency,reducedbatterylifewhenusingthecamera,andhigherskintemperatures.Itremainstobeseen
whetherOEMsusingSnapdragon820willusetheseDSPstothefullestextent,buttheSnapdragon820is
shapinguptobeapromising2016highendSoC.
Like 72
Share
39Comments
25
Tweet
PRINTTHISARTICLE
POSTACOMMENT
ViewAllComments
http://www.anandtech.com/show/9552/qualcomm-details-hexagon-680-dsp-in-snapdragon-820-accelerated-imaging
8/11
8/21/2016
madwolfaMonday,August24,2015link
"TodayatHotChips..."...Ismelltheirony.
REPLY
sandy105Monday,August24,2015link
Theonlyfirstcommentyou'lleverneedtoread!!lol
REPLY
ddriverMonday,August24,2015link
Hmm...theDSPisbiggerthantheCPUintermsofdiearea...
REPLY
ddriverMonday,August24,2015link
Onasecondglance...itseemstheprimarymotivationistopromotetheirDSParchitecture.Imean
theGPUismostlikelyOpenCL2compliant,andofferseasily10timestheCPUcompute
performanceatthesamepower.Oh,andithasFP,both32and64bit,whichisamustfor
professionalimage,videoorsoundprocessing.ThatDSPisverylimiteduseandspecialpurpose,
complicatedandnotreadilyavailableforprogrammingbytheuser,thatdieareacouldbebetter
investedintoabiggerGPU,whichwilldeliverbettergraphicsperformanceandcaneasilyhandleall
thetasksthatwouldtypicallybehandledbytheDSP.
REPLY
saratoga4Monday,August24,2015link
Thisisn'treallytherightwaytothinkaboutit.DSPsandGPUsarerelatedinthattheyareboth
heavilyapplicationoptimizedgeneralpurposeprocessors,buttheyareoptimizedinvery,very
differentways.AGPUismassivelythreadparallel,highlatency,andverygoodatfloatingpoint
multiplyaddinstructions,whilevery,verybadatbranches.ADSPisusuallyfixed(orsometimes
floatingpoint)optimized,verylowlatency,veryfastatbranches,andwithveryhighsingle
threadedperformanceperwatt.FortaskstheGPUisgoodatyoureallydon'twanttousethe
DSP.FortaskstheDSPisgoodat,youreallydon'twanttousetheGPU.
REPLY
ddriverMonday,August24,2015link
GPUlatencyhashistoricallybeenhighbecauseonatypicaldesktopsystemwithdiscreteGPU
youhavetotransferdatabackandforthPCIE.Onthischipthememoryissharedbetweenthe
GPUandtheCPU,soallthelatencypenaltyofhavingtotransferdataiseliminated.
Theyputfocusonsensorsandimageprocessing.NeitheraresodemandingthattheGPU
latencyonthischipwouldbeanissue.
Naturally,thereisnobenefitfrompushingsensordatatotheGPU,thatcouldbehandledbya
dedicatedMCU,likeappledid.Andwhileitmaybetruethatitmightbeatadmoreefficient
withadedicatedDSP,Idoubtthedifferencewillinfactbetangibleonthedevicescale.
Asforimageprocessing,IreckontheGPUisamuchbetterfit,ithasmorepowerandmore
features.AnditcanrunOpenCL,whichmeansapplicationswillbeportableandrunonevery
platformwhichsupportsOpenCL,whereaswiththeDSPyoumustexplicitlytargetit,andthat
codewilldoyounogoodonotherplatforms.
Allinall,asIsaid,seemslikequalcommarepushingforaproprietarytechnologywiththe
hopesoflockingindeveloperandsubsequentlyuserbase.That'sbad.Therearebetterways
toinvestthatchipdiearea.
AnMCUforthesensorsandaGPUforimageprocessingissimple,moreflexible,more
portable.ItisafullandfeaturerichchaintopowerthedevicetheMCUcanrunavarietyof
tasks,evenasimpleOS,makingitpossibletosuspendthemainCPUcompletely,theCPU
ALUslowlatency,poorthroughput,theCPUSIMDsmediumlatency,mediumthroughput,
andtheGPU"high"latency,highestthroughput.
REPLY
saratoga4Monday,August24,2015link
>Asforimageprocessing,IreckontheGPUisamuchbetterfit,
Prettysureyou'veneverprogrammedaGPUorDSPthen:)
ProblemswithaGPUfortheseapplications:verypoorpowerefficiency,lackofgoodfixed
pointhardwaresupport(modernGPUsarefloatingpoint),andextremelypoorhandingof
branches.YoucandosomesimpleimageprocessingapplicationsefficientlyonaGPU(e.g.
filteringisverynatural)butmorecomplexoperationsareveryhardtoimplement.Youreally
http://www.anandtech.com/show/9552/qualcomm-details-hexagon-680-dsp-in-snapdragon-820-accelerated-imaging
9/11
8/21/2016
"lackofgoodfixedpointhardwaresupport(modernGPUsarefloatingpoint)"showswhatyou
know.GPUshavenoproblemwithintegerswhatsoever.Imageprocessingisaparallel
workload,thereislittletonobranchinginvolved.Anditisnotjustfiltering,butalsoblending,
noisereduction,awidevarietyofimageeffectsblur,sharpen,edgedetection,shadows,
transformationsyounameit...imageprocessingbenefitsfromGPUcomputetremendously.I
thinkyouareCONFUSINGimageprocessingwithcomputervision,whicharecompletely
differentareas.CVcanstillbenefitalotfromGPUcompute,butcertainlynotasmuchas
imageprocessing.
OpenCL2massivelyimprovestheversatilityofGPUcompute.Assomeone,whodoesimage,
audioandvideoprocessingforaliving,Iamextremelyexcitedaboutitbecomingwidely
adopted.I'vehadtopaymadbucksinthepastfor"specialpurposehardware"withDSPsfor
realtimeoracceleratedmediaprocessing,buttoday'sGPUscompletedestroythosenotonly
intermsofpriceperformanceratio,butalsopeakperformance.
REPLY
saratoga4Monday,August24,2015link
>"lackofgoodfixedpointhardwaresupport(modernGPUsarefloatingpoint)"showswhat
youknow.GPUshavenoproblemwithintegerswhatsoever.
Fixedpointisnotthesameasinteger.Nooffense,butyoushouldn'tbearguingaboutthisif
youdon'tknowwhatthewordsmean;)
REPLY
name99Monday,August24,2015link
Onedayyou'regoingtolookbackatthiscommentandwishyou'dneversaidit...
Howdoyouimaginefixedpointmattersmateriallyfrominteger?Thewayeveryonehandles
fixedpointistoimagineavirtualbinarypoint,whichisusuallyeffectedbymultiplyingbypre
shiftedcoefficients.Thenattheendoftheprocessyoushiftrightorsomethingsimilartoround
tointeger.
Lookatsomethinglikehttp://halicery.com/jpeg/idct.htmlforanexample.
Theonlywayadevicemighthandlefixedpointdifferentlyisifautomaticallydownshifted
appropriatelywhenmultiplyingtogethertwofixedpointnumbers.InprincipleHexagoncoulddo
this,butI'munawareofanydevicesthatdothis,beyondtrivialcaseslikeamultiplyhigh
instructionthattakes16.16inputandreturnsthehigh32bits.
REPLY
LINKS
TOPICS
Home
About
CPUs
Motherboards
Forums
RSS
PipelineNews
Bench
Galleries
TermsofUseandSale
CopyrightPolicy
SSD/HDD
GPUs
Mobile
Enterprise&IT
Smartphones
Memory
Cases/Cooling/PSU(s)
FOLLOW
Displays
Mac
Systems
Cloud
TradeShows
Guides
TheMostTrustedinTechSince1997
http://www.anandtech.com/show/9552/qualcomm-details-hexagon-680-dsp-in-snapdragon-820-accelerated-imaging
Facebook
Twitter
RSS
About
Advertising
PrivacyPolicy
10/11
8/21/2016
http://www.anandtech.com/show/9552/qualcomm-details-hexagon-680-dsp-in-snapdragon-820-accelerated-imaging
11/11