com
(http://www.radio-electronics.com/)
Google+ (https://plus.google.com/102744187429709400446?prsrc=3)
Search
27 Sep 2016
In 2014, Google made an entry to the ImageNet large-scale visual recognition challenge (ILSVRC), titled GoogLeNet. It is an interesting case study because it is a 22-
layer deep convolutional network, and includes nine inceptions, creating a very rich and complex topology.
In the GoogLeNet network, each connection in each layer can potentially go back and forth through DDR. To handle this in an embedded system poses a challenge.
The complex topology of the network must be divided into batches of layers to run on a DSP or dedicated hardware. We call this subnetwork division.
In our CEVA network generator tool, all analysis is done automatically without user intervention. The network is divided into subnetworks and each subnetwork runs
on the DSP according to the execution order set by the network generator. For example, lets take a look at the inception part of the GoogLeNet network after going
through our network generator tool.
As you can see in the above image, the network generator created four subnetworks. Of these subnetworks, three run at different execution time, but two can run in
parallel on different cores. Additionally, the network generator is designed to create long layer sequences, which potentially will only go through internal memory.
Reducing bandwidth
Due to tight constraints of bandwidth in embedded platforms, implementation of convolutional neural networks will undoubtedly generate some bandwidth issues.
These are caused by either the network filter weight, or data transfer from layer to layer.
Here are two rules that can help reduce the bandwidth significantly:
1. Each output map is created by running the same filter on a different position in the input map. Relying on this rule, we can save the massive load of the data
weight, reducing unnecessary bandwidth usage.
2. Each output is calculated by the same input data. Applying this rule, the input can be loaded and used for all the outputs without utilizing the DDR more than
once.
http://www.radioelectronics.com/articles/processingembedded/deeplearningchallengesinembeddedplatforms195 1/3
2017613 DeepLearningChallengesinEmbeddedPlatforms::RadioElectronics.com
1. A low number of large input maps
In the first case, we will prefer to complete the filter calculation for each input map before going to the next map. This way we benefit from overlapping filters, and on
the edges of the map we will have redundant MAC utilization loss. As shown in the formula below, width and height are calculated first, in this case. We call this
approach local filter calculation.
Formula for local filter calculation, used for large sized maps
In the second case, of small-sized input maps that occur in large amounts, the calculation should be performed across the maps. Different input maps are processed
to one output map. In this case, partial filter results are calculated and at the end of the process all the partial results are summed together to one result using the
property of the convolutional filter enabling this. As shown in the next formula, channels are calculated first. We call this approach cross map filter calculation.
Formula for cross map filter calculation, used for large number of maps with small size (last layers)
All these problems and their solutions are clearly something that the user would like to avoid dealing with when implementing deep learning on an embedded
platform. At CEVA, we believe this should be a basic demand for a real-time system to perform without the users involvement, or even awareness. This is core
responsibility of the CEVA deep neural network framework and CEVA network generator.
Recognize when the focus should be on the weights and when it should be on the map size network dependent
Compress and decompress better over time (learn from frame by frame execution)
Conclusion
As you can see, there is a lot that can be done in the technical aspects of deep convolutional neural networks for embedded systems. Once the challenges of deep
learning in embedded systems has been overcome, there are many opportunities that are open.
Page 1 of 1
http://www.radioelectronics.com/articles/processingembedded/deeplearningchallengesinembeddedplatforms195 2/3
2017613 DeepLearningChallengesinEmbeddedPlatforms::RadioElectronics.com
CEVA is the leading licensor of signal processing IP for a smarter, connected world. We partner with semiconductor companies and OEMs
worldwide to create power-efficient, intelligent and connected devices for a range of end markets, including mobile, consumer, automotive,
industrial and IoT. Our ultra-low-power IPs for vision, audio, communications and connectivity include comprehensive DSP-based platforms for
LTE/LTE-A/5G baseband processing in handsets, infrastructure and machine-to-machine devices, computer vision and computational photography
for any camera-enabled device, audio/voice/speech and ultra-low power always-on/sensing applications for multiple IoT markets. CEVA can be
found at www.ceva-dsp.com
THE WEBSITE
About us (/rec-information/about-us.php)
Privacy Policy (/rec-information/privacy-policy.php)
Submit news / articles (/rec-information/article-submission.php)
Advertise with us (/rec-information/advertising.php)
SECTIONS
News (/news.php)
Articles (/articles.php)
Training (/learning-training-courses/)
Jobs (/electronics-software-jobs/)
Events (/events-exhibitions-trade-shows/)
Bookshop (/bookstore/)
Equipment store (/test-equipment-shop/)
Whitepapers (/whitepapers.php)
CHANNELS
Manufacture (/info/manufacture/)
Satellites (/info/satellite/)
Telecoms & networks (/info/telecommunications_networks/)
History (/info/radio_history/)
Radio-Electronics.com is operated and owned by Adrio Communications Ltd and edited by Ian Poole. All information is Adrio Communications Ltd and may not be copied except for individual
personal use. This includes copying material in whatever form into website pages. While every effort is made to ensure the accuracy of the information on Radio-Electronics.com, no liability is
accepted for any consequences of using it. This site uses cookies. By using this site, these terms including the use of cookies are accepted. More explanation can be found in our Privacy Policy
http://www.radioelectronics.com/articles/processingembedded/deeplearningchallengesinembeddedplatforms195 3/3