Anda di halaman 1dari 11

8/21/2016

Qualcomm Details Hexagon 680 DSP in Snapdragon 820: Accelerated Imaging

Login

ABOUT

PCCOMPONENTS
TRENDINGTOPICS

CPUS

SMARTPHONES&TABLETS
SMARTPHONES

INTEL

GPUS

SYSTEMS

AMD

MOBILE

NVIDIA

Register

BENCH

FORUMS

ENTERPRISE&IT
SSDS

SAMSUNG

PODCAST

WEARABLES
Follow

45.7K followers

Home > Mobile


PIPELINESTORIES

QualcommDetailsHexagon680DSPin
Snapdragon820:AcceleratedImaging
byJoshuaHoonAugust24,20159:00AMEST
Postedin Mobile

Snapdragon

Qualcomm

SOC

39

Comments

+AddA
Comment

Snapdragon820

+SUBMITNEWS

EarlyAMDZenServer
CPUandMotherboard
Details:Codename
Naples,32-cores,Dual
SocketPlatforms,Q2
2017
NVIDIAReleasesGeForceGTX
10603GB:GTX1060,YetNot
HPandMAINGEARTeamUpfor
OmenXHigh-EndGamingPC
MicronCancelsBallistixTX3NVMeSSD
MoreDetailsonBroxton:QuadCore,ECC,
Upto18EUsofGen9
MarvellAnnouncesNVMeControllerfor
DRAM-LessPCIe3.0x2SSDs
IntelAnnouncesKnightsMill:AXeonPhi
ForDeepLearning
IntelUnveilsJoule:AHigh-Performance
Atom-PoweredIoTModule&MakerKit

AlthoughwetendnottofocustoomuchonthetertiaryaspectsofaSoC,theyareoftenimportantto
enablingmanyaspectsoftheuserexperience.DSPsareimportantforanumberofuniqueapplicationssuch
asvoiceprocessing,audioprocessing,andotherinputprocessingapplications.Beforewegetintothemeat
ofthearticlethoughit'simportanttonotethattheaboveimageisnotadieshotoranactualblockdiagram,
butisveryroughlyapproximatingtherelativesizeofeachcomponentintheSoC.
TodayatHotChips,QualcommelectedtorevealanumberofdetailsabouttheirHexagon680DSP,whichwill
shipintheSnapdragon820.ThosethathavefollowedourcoverageregardingtheSnapdragon820ISP
featureswillprobablybeabletoguessthatanumberoffeaturesontheSnapdragon820SpectraISPare
enabledthroughtheuseofthisnewerDSP.

IntelAnnouncesProjectAlloy:Untethered
AugmentedRealityinaVRHeadsetwith
RealSense
IntelTeasesMobileKabyLake:HEVC
Main10ProfileSupport,ComingThis
Autumn
IntelOptaneAnd3DXPointUpdatesFrom
IDF
AMDReleasesRadeonSoftwareCrimson
Edition16.8.2Hotfix

TWEETS
IanCutress:ForthoseaskingabtAMD'sZen
blendercoverage,I'mworkingoncontext.
There'salotofecosystemtounderstandb4
blindlyreportingscore
IanCutress:@paulhardwareZEN!=8core
BristolRidge.<internalscreaming>
IanCutress:I'minthebackgroundgesturing
wildly@paulhardware@AMD@YouTube
IanCutress:@FPiednoel@TheKanterSure,
sampleswouldbegreat.Butmostaren't

http://www.anandtech.com/show/9552/qualcomm-details-hexagon-680-dsp-in-snapdragon-820-accelerated-imaging

1/11

8/21/2016

Qualcomm Details Hexagon 680 DSP in Snapdragon 820: Accelerated Imaging


paranoidtheendresultshavetomatchup
toootherwisegameover
IanCutress:@FPiednoel@TheKanterIntel
knewConroewasoverandabove,butAMD
isonlytryingtoreachparity.Diff
circumstances,Intelnotdonesince
RyanSmithAT:@annalee88999Sorry,no,
wedon'treviewmobileapps
ganeshts:Ihopeso:)BythetimeZen
launchesintheconsumermarket,Intelwould
havemovedwellbeyondtheBDWuarch
https://t.co/6q7zjUk361
ganeshts:@FPiednoelYouknowthatZen
NEEDStobegoodtokeepIntelonitstoes:)
Isn'titlonelyatthetop?;)
ganeshts:Takeawayformewas:AMD
claimsZencompetitivevs.BDWE.If
shippingver.isn't,willbeadisappointment
https://t.co/rLw46HCu62

ForthosethatareunfamiliarwithDSPs,thebasicideabehindDSPsisthattheyareasortofinbetween

ganeshts:Can'tmakeconclusionsbasedon
engg.samples/withouteval.of
perf/power/cost.Let'swaitforshipping
versionshttps://t.co/xQGrIBukhf

pointinarchitecturedesignbetweenhighlyefficientfixedfunctionhardware(think:videodecoders)andhighly
flexibleCPUs.DSPsareprogrammable,butarerigidindesignandaredesignedtodoalimitednumberof
taskswell,makingthemefficientatthosetasksrelativetoaCPU,butmoreflexiblethanfixedfunction

RyanSmithAT:Andthespecsarepostedat
https://t.co/VCvo0gS063:
https://t.co/b8fG6Fi92D$199

hardware.ThesedesigngoalsaretypicallymanifestedinDSPsasinorderarchitectures,whichmeansthat

RyanSmithAT:NVIDIAhasjustformally
announceda3GBGTX1060.Ithasanother
SMdisabled(9)versusthe6GB(10),soit
willperformdifferently...

there'smuchlesspowerandareadedicatedonsilicontoparallelizecodeonthefly.Thismeansthatwhilea
DSPcandoanumberofworkloadsthatwouldotherwisebeimpossibleonacompletelyfixedfunctionblock,
youwouldn'twanttotryanduseonetoreplaceaCPU.It'simportanttoemphasizethatDSPsaregenerally
morefocusedoninstructionlevelparallelism(singlecoreperformance)ratherthanthreadlevelparallelism

RyanSmithAT:RT@anandtech:AMDZen
Microarchitecture:DualSchedulers,Micro
OpCacheandMemoryHierarchyRevealed
https://t.co/TfOwSf3rK4

(multicoreperformance),soyouwon'tseehundreds/thousandsof"cores"inaDSPlikeyouwouldinaGPU
architecturelikeMaxwell.
ConsequentlythearchitectureofDSPsliketheHexagon680arerelativelyaliencomparedtostandardCPUs,

ganeshts:@ServerAceAtahigherlevel,the
32C/64TNaplesx86SoCseemsuniquein
termsofcorecount/cangointo2Psystems.

asoptimizationiseverythingintheapplicationswhereDSPsmakesense.Forexample,DSPinstructionsets
areoftenVLIW(verylonginstructionword),inwhichmultipleexecutionunitsaredriveninparallelwitha
singleinstruction.Certainarithmeticoperationsarealsohighlyacceleratedwithspecialinstructionsinorderto

RyanSmithAT:@TheKanter@DianeBryant
isitconfirmedthatthisisKnightsHill
renamed?PCWorldhastheirdoubts.Anda
superlarge10nmchipin2017?

enablekeyalgorithmsforsignalprocessingsuchasFastFourierTransform(FFT).

Follow@ANANDTECH

http://www.anandtech.com/show/9552/qualcomm-details-hexagon-680-dsp-in-snapdragon-820-accelerated-imaging

2/11

8/21/2016

Qualcomm Details Hexagon 680 DSP in Snapdragon 820: Accelerated Imaging

InthecaseoftheHexagon680,oneofthekeyfeaturesQualcommisfocusingonforthislaunchare
HexagonVectorExtensions(HVX).HVXisdesignedtohandlesignificantcomputeworkloadsforimage
processingapplicationssuchasvirtualreality,augmentedreality,imageprocessing,videoprocessing,and
computervision.ThismeansthattasksthatmightotherwiseberunningonarelativelypowerhungryCPUor
GPUcanrunacomparativelyefficientDSPinstead.

TheHVXextensiontoHexagonhas1024bitvectordataregisters,withtheabilitytoaddressuptofourof
theseslotsperinstruction,whichallowsforupto4096bitspercycle.It'simportanttokeepinmindthatthe
instructionwidthismuchsmallerthanthisasthisisasingleinstruction,multipledata(SIMD)unitwhichuses
oneoperationovermultiplechunksofdata.Thereare32ofthesevectorregisters,whichappeartobesplit
betweentwoHVXcontexts.Thereissupportforupto32bitfixedpointdecimaloperations,butfloatingpoint
isnotsupportedtoreducediesizeandpowerconsumption,asthepreviouslymentionedapplicationsfor
Hexagon680dontneedfloatingpointsupport.AsDSPstendtohaveISAstailoredfortheapplication,the
Hexagon680HVXunitssupportslidingwindowfilters,LUTs,andhistogramaccelerationattheISAlevel.The
performanceoftheseunitsaresaidtobesufficientfor4Kvideopostprocessing,20MPcameraburst
processing,andotherapplicationswithsimilarcomputerequirements.

http://www.anandtech.com/show/9552/qualcomm-details-hexagon-680-dsp-in-snapdragon-820-accelerated-imaging

3/11

8/21/2016

Qualcomm Details Hexagon 680 DSP in Snapdragon 820: Accelerated Imaging

Outsideofthesedetailsatapercontextbasis,thethreadingmodelandmemoryhierarchyoftheHexagon
680isquiteunique.Forscalarinstructions,fourthreadsareavailablewitha4wayVLIWarchitecturerunning
at500MHzperthread.ThesescalarunitsallshareanL1instructioncache,L1datacache,andL2cache.
ThetwoHVXcontextsintheHexagon680canbecontrolledbyanytwoscalarthreadsandalsorunat500
MHzwithoutstallingotherscalarunitsnotinvolvedincontrollingthevectorunits.Thislevelofhardwarelevel
multithreadingalongwithQoSsystemsandL2softpartitioningonaperthreadhelpstomakesureaudioand
imagingtasksarentfightingforexecutiontimeontheHexagonDSP.

http://www.anandtech.com/show/9552/qualcomm-details-hexagon-680-dsp-in-snapdragon-820-accelerated-imaging

4/11

8/21/2016

Qualcomm Details Hexagon 680 DSP in Snapdragon 820: Accelerated Imaging

MeanwhilethevectorunitsarefedexclusivelyfromtheL2cachethatissharedwiththescalarunits,achoice
QualcommmadeduetotheoverheadthatcomeswithanL1cacheforimageprocessingworkloads.ThisL2
cachecandoloadtouseinasinglecyclethough,soonecouldarguethatthisistechnicallyanL1cacheat
timesanyhow.TheHexagon680intheSnapdragon820willalsobeabletohavedatafromthecamera
sensordirectlystreamedtotheL2cacheandsharedwiththeISPtoavoidthepowercostofgoingoffdieto
DRAM.TheresalsoanSMMU(SystemMemoryManagementUnit)whichallowsfornocopydatasharing
withtheCPUformultiplesimultaneousapplications.DSPmemorywriteswillalsosnoopinvalidateCPU
cachewithouttheneedfortheCPUtodoanyworkinvolvingcachemaintenancetoreducepower
consumptionandimproveperformance.

http://www.anandtech.com/show/9552/qualcomm-details-hexagon-680-dsp-in-snapdragon-820-accelerated-imaging

5/11

8/21/2016

Qualcomm Details Hexagon 680 DSP in Snapdragon 820: Accelerated Imaging

RelativetoaquadcoreKrait,theadvantagesofrunningsomeworkloadsonaDSPisenormousbasedon
Qualcomm'sinternalbenchmarks.AccordingtoQualcomm,theNEONunitsintheKraitCPUaregenerally
representativeofNEONunitswithintheindustry,whichisthereasonwhythey'vebeenusedasthereference
pointhere.Withinasinglelogicalcore,Kraitwillonlysupport128bitNEONwithasingleSIMDpipeline,
comparedtothe4way,1024bitSIMDunitsoftheHexagon680.SIMDthreadsalsorunona512KBL2but
almostL1cache,asopposedtothe32KBL1instruction/datacacheofKrait,whichhelpstohidelatency
effectsofDRAM.TheNEONunitsofaKraitandmanyotherARMCPUsarecapableoffloatingpoint,butina
workloadlikelowlightvideoenhancementHexagon680willbeabletocompletethesameamountofworkat
threetimesthespeed,whileusinganorderofmagnitudelesspowerduetotheinherentadvantagesofa
taskspecificDSParchitecture.ThefourscalarthreadsavailableintheDSPalsomeansthatentirealgorithms
canbeoffloadedtotheDSPinsteadofpartiallyrunningontheCPU,whichalsoreducespowerconsumption
andmakesiteasierfordeveloperstotakeadvantageoftheDSP.

http://www.anandtech.com/show/9552/qualcomm-details-hexagon-680-dsp-in-snapdragon-820-accelerated-imaging

6/11

8/21/2016

Qualcomm Details Hexagon 680 DSP in Snapdragon 820: Accelerated Imaging

WhileHexagon680svectorandscalarenginesareusefulforheavydutysignalprocessingworkloads,the
additionofthelowpowerisland(LPI)DSPmakesitpossibletodoawaywithseparatesensorhubsin
smartphones.AccordingtoQualcomm,thisDSPiscompletelyseparatefromthescalarandvectorcompute
DSPpreviouslydiscussed(yetstillpartoftheoverallHexagonDSPdesign),andsitsonitsownpowerisland
sotherestoftheSoCcanbeshutdownwhilekeepingtheLPIon.Thisalsoshouldnthaveadifferent
processtechnologyoraradicallydifferentstandardcelllibrary,astheadvantagesfromtheleadingedge
FinFETprocessshouldhelpsignificantlywithpowerconsumption.

http://www.anandtech.com/show/9552/qualcomm-details-hexagon-680-dsp-in-snapdragon-820-accelerated-imaging

7/11

8/21/2016

Qualcomm Details Hexagon 680 DSP in Snapdragon 820: Accelerated Imaging

ItssaidthatthislowpowerislandwithanindependentDSPandnewerprocessnodeisenoughtoimprove
powerefficiencybyuptothreetimesincertainworkloadscomparedtoSnapdragon808.Isuspectthatthis
wasdoneinsteadofacomparisontotheMSM8974/Snapdragon800generationbecausetheHexagonDSP
wasupdatedinthemovefromSnapdragon805to808.QualcommemphasizedthechoiceofaDSPoveran
MCUforthistask,asintheirinternaltestingaDSPdeliversbetterpowerefficiencythanaCortexMclass
MCUformoreadvancedsensoralgorithms.Thesoftwarestackforallofthesefeaturesisalreadysaidtobe
quitecomplete,withaframeworkandalgorithmsincludedforOEMdevelopment.ThebroaderHexagon600
seriesSDKisalsoquiteextensive,withanumberofutilitiestoallowforfasterandeasierdevelopment.
Ifyourelikeme,aftergoingthroughallofthisinformationyoumightbewonderingwhatthevalueofthese
vectorDSPextensionsare.IndiscussionswithQualcomm,itseemsthatthereasoningbehindpushinga
numberofimageprocessingtaskstotheHexagonDSPcoreismostlybecausethealgorithmsbehindthings
suchasHDRvideo,HDRimagemerging,lowlightimageenhancement,andotheradvancedalgorithmsare
stillinfluxevenfromsoftwareupdatetosoftwareupdate.Asaresult,itisntviabletomaketheseaspectsof
theimagingpipelinedoneinfixedfunctionhardware.WithouttheuseoftheHexagonDSP,thesetaskscould
potentiallyenduprunningontheCPUorGPU,affectinguserexperienceintheformofhighershottoshot
latency,reducedbatterylifewhenusingthecamera,andhigherskintemperatures.Itremainstobeseen
whetherOEMsusingSnapdragon820willusetheseDSPstothefullestextent,buttheSnapdragon820is
shapinguptobeapromising2016highendSoC.

Like 72

Share

39Comments

25

Tweet

PRINTTHISARTICLE

POSTACOMMENT

ViewAllComments

http://www.anandtech.com/show/9552/qualcomm-details-hexagon-680-dsp-in-snapdragon-820-accelerated-imaging

8/11

8/21/2016

Qualcomm Details Hexagon 680 DSP in Snapdragon 820: Accelerated Imaging

madwolfaMonday,August24,2015link

"TodayatHotChips..."...Ismelltheirony.
REPLY
sandy105Monday,August24,2015link

Theonlyfirstcommentyou'lleverneedtoread!!lol
REPLY
ddriverMonday,August24,2015link

Hmm...theDSPisbiggerthantheCPUintermsofdiearea...
REPLY
ddriverMonday,August24,2015link

Onasecondglance...itseemstheprimarymotivationistopromotetheirDSParchitecture.Imean
theGPUismostlikelyOpenCL2compliant,andofferseasily10timestheCPUcompute
performanceatthesamepower.Oh,andithasFP,both32and64bit,whichisamustfor
professionalimage,videoorsoundprocessing.ThatDSPisverylimiteduseandspecialpurpose,
complicatedandnotreadilyavailableforprogrammingbytheuser,thatdieareacouldbebetter
investedintoabiggerGPU,whichwilldeliverbettergraphicsperformanceandcaneasilyhandleall
thetasksthatwouldtypicallybehandledbytheDSP.
REPLY
saratoga4Monday,August24,2015link

Thisisn'treallytherightwaytothinkaboutit.DSPsandGPUsarerelatedinthattheyareboth
heavilyapplicationoptimizedgeneralpurposeprocessors,buttheyareoptimizedinvery,very
differentways.AGPUismassivelythreadparallel,highlatency,andverygoodatfloatingpoint
multiplyaddinstructions,whilevery,verybadatbranches.ADSPisusuallyfixed(orsometimes
floatingpoint)optimized,verylowlatency,veryfastatbranches,andwithveryhighsingle
threadedperformanceperwatt.FortaskstheGPUisgoodatyoureallydon'twanttousethe
DSP.FortaskstheDSPisgoodat,youreallydon'twanttousetheGPU.
REPLY
ddriverMonday,August24,2015link

GPUlatencyhashistoricallybeenhighbecauseonatypicaldesktopsystemwithdiscreteGPU
youhavetotransferdatabackandforthPCIE.Onthischipthememoryissharedbetweenthe
GPUandtheCPU,soallthelatencypenaltyofhavingtotransferdataiseliminated.
Theyputfocusonsensorsandimageprocessing.NeitheraresodemandingthattheGPU
latencyonthischipwouldbeanissue.
Naturally,thereisnobenefitfrompushingsensordatatotheGPU,thatcouldbehandledbya
dedicatedMCU,likeappledid.Andwhileitmaybetruethatitmightbeatadmoreefficient
withadedicatedDSP,Idoubtthedifferencewillinfactbetangibleonthedevicescale.
Asforimageprocessing,IreckontheGPUisamuchbetterfit,ithasmorepowerandmore
features.AnditcanrunOpenCL,whichmeansapplicationswillbeportableandrunonevery
platformwhichsupportsOpenCL,whereaswiththeDSPyoumustexplicitlytargetit,andthat
codewilldoyounogoodonotherplatforms.
Allinall,asIsaid,seemslikequalcommarepushingforaproprietarytechnologywiththe
hopesoflockingindeveloperandsubsequentlyuserbase.That'sbad.Therearebetterways
toinvestthatchipdiearea.
AnMCUforthesensorsandaGPUforimageprocessingissimple,moreflexible,more
portable.ItisafullandfeaturerichchaintopowerthedevicetheMCUcanrunavarietyof
tasks,evenasimpleOS,makingitpossibletosuspendthemainCPUcompletely,theCPU
ALUslowlatency,poorthroughput,theCPUSIMDsmediumlatency,mediumthroughput,
andtheGPU"high"latency,highestthroughput.
REPLY
saratoga4Monday,August24,2015link

>Asforimageprocessing,IreckontheGPUisamuchbetterfit,
Prettysureyou'veneverprogrammedaGPUorDSPthen:)
ProblemswithaGPUfortheseapplications:verypoorpowerefficiency,lackofgoodfixed
pointhardwaresupport(modernGPUsarefloatingpoint),andextremelypoorhandingof
branches.YoucandosomesimpleimageprocessingapplicationsefficientlyonaGPU(e.g.
filteringisverynatural)butmorecomplexoperationsareveryhardtoimplement.Youreally

http://www.anandtech.com/show/9552/qualcomm-details-hexagon-680-dsp-in-snapdragon-820-accelerated-imaging

9/11

8/21/2016

Qualcomm Details Hexagon 680 DSP in Snapdragon 820: Accelerated Imaging


don'twanttouseaGPUforthisstuff.ItmakesevenlesssensethantheCPUforalotof
things.
REPLY
ddriverMonday,August24,2015link

"lackofgoodfixedpointhardwaresupport(modernGPUsarefloatingpoint)"showswhatyou
know.GPUshavenoproblemwithintegerswhatsoever.Imageprocessingisaparallel
workload,thereislittletonobranchinginvolved.Anditisnotjustfiltering,butalsoblending,
noisereduction,awidevarietyofimageeffectsblur,sharpen,edgedetection,shadows,
transformationsyounameit...imageprocessingbenefitsfromGPUcomputetremendously.I
thinkyouareCONFUSINGimageprocessingwithcomputervision,whicharecompletely
differentareas.CVcanstillbenefitalotfromGPUcompute,butcertainlynotasmuchas
imageprocessing.
OpenCL2massivelyimprovestheversatilityofGPUcompute.Assomeone,whodoesimage,
audioandvideoprocessingforaliving,Iamextremelyexcitedaboutitbecomingwidely
adopted.I'vehadtopaymadbucksinthepastfor"specialpurposehardware"withDSPsfor
realtimeoracceleratedmediaprocessing,buttoday'sGPUscompletedestroythosenotonly
intermsofpriceperformanceratio,butalsopeakperformance.
REPLY
saratoga4Monday,August24,2015link

>"lackofgoodfixedpointhardwaresupport(modernGPUsarefloatingpoint)"showswhat
youknow.GPUshavenoproblemwithintegerswhatsoever.
Fixedpointisnotthesameasinteger.Nooffense,butyoushouldn'tbearguingaboutthisif
youdon'tknowwhatthewordsmean;)
REPLY
name99Monday,August24,2015link

Onedayyou'regoingtolookbackatthiscommentandwishyou'dneversaidit...
Howdoyouimaginefixedpointmattersmateriallyfrominteger?Thewayeveryonehandles
fixedpointistoimagineavirtualbinarypoint,whichisusuallyeffectedbymultiplyingbypre
shiftedcoefficients.Thenattheendoftheprocessyoushiftrightorsomethingsimilartoround
tointeger.
Lookatsomethinglikehttp://halicery.com/jpeg/idct.htmlforanexample.
Theonlywayadevicemighthandlefixedpointdifferentlyisifautomaticallydownshifted
appropriatelywhenmultiplyingtogethertwofixedpointnumbers.InprincipleHexagoncoulddo
this,butI'munawareofanydevicesthatdothis,beyondtrivialcaseslikeamultiplyhigh
instructionthattakes16.16inputandreturnsthehigh32bits.
REPLY

LINKS

TOPICS

Home
About

CPUs
Motherboards

Forums
RSS
PipelineNews
Bench
Galleries
TermsofUseandSale
CopyrightPolicy

SSD/HDD
GPUs
Mobile
Enterprise&IT
Smartphones
Memory
Cases/Cooling/PSU(s)

FOLLOW
Displays
Mac
Systems
Cloud
TradeShows
Guides

TheMostTrustedinTechSince1997

http://www.anandtech.com/show/9552/qualcomm-details-hexagon-680-dsp-in-snapdragon-820-accelerated-imaging

Facebook
Twitter
RSS

About

Advertising

PrivacyPolicy

10/11

8/21/2016

Qualcomm Details Hexagon 680 DSP in Snapdragon 820: Accelerated Imaging


COPYRIGHT2016.ALLRIGHTSRESERVED.

http://www.anandtech.com/show/9552/qualcomm-details-hexagon-680-dsp-in-snapdragon-820-accelerated-imaging

11/11

Anda mungkin juga menyukai