Deep Learning Challenges in Embedded Platforms

2017613 DeepLearningChallengesinEmbeddedPlatforms::RadioElectronics.
com
(http://www.radio-electronics.com/)
LinkedIn (https://www.linkedin.com/company/radio-electronics-com) YouTube (https://www.youtube.com/user/radioelectronicscom)
Twitter (http://twitter.com/RadioElec) News feed (/rss.php) Newsletter (/newsletter/)
Google+ (https://plus.google.com/102744187429709400446?prsrc=3)
Search
PROCESSING & EMBEDDED SYSTEMS (HTTP://WWW.RADIO-ELECTRONICS.COM/INFO/PROCESSING-EMBEDDED/)
27 Sep 2016
Deep Learning Challenges in Embedded Platforms

Liran Bar, Director of Product Marketing, CEVA, Imaging & Vision DSP core product line looks at how to overcome the deep
learning challenges in embedded systems.
The successful spread of artificial intelligence (AI) into everyday applications will be dependent on how easy it is to deploy deep neural networks in small, low-power
devices rather than large server networks
In this post we look at ways to deal with those challenges.
Googlenet deep convolutional neural network
In 2014, Google made an entry to the ImageNet large-scale visual recognition challenge (ILSVRC), titled GoogLeNet. It is an interesting case study because it is a 22-
layer deep convolutional network, and includes nine inceptions, creating a very rich and complex topology.
In the GoogLeNet network, each connection in each layer can potentially go back and forth through DDR. To handle this in an embedded system poses a challenge.
The complex topology of the network must be divided into batches of layers to run on a DSP or dedicated hardware. We call this subnetwork division.
In our CEVA network generator tool, all analysis is done automatically without user intervention. The network is divided into subnetworks and each subnetwork runs
on the DSP according to the execution order set by the network generator. For example, lets take a look at the inception part of the GoogLeNet network after going
through our network generator tool.
CEVA network generator tool
As you can see in the above image, the network generator created four subnetworks. Of these subnetworks, three run at different execution time, but two can run in
parallel on different cores. Additionally, the network generator is designed to create long layer sequences, which potentially will only go through internal memory.
Overcoming the Challenges

Next, lets take a look at methods designed to overcome some of the most significant challenges of deep learning in embedded platforms.
Reducing bandwidth
Due to tight constraints of bandwidth in embedded platforms, implementation of convolutional neural networks will undoubtedly generate some bandwidth issues.
These are caused by either the network filter weight, or data transfer from layer to layer.
Here are two rules that can help reduce the bandwidth significantly:
1. Each output map is created by running the same filter on a different position in the input map. Relying on this rule, we can save the massive load of the data
weight, reducing unnecessary bandwidth usage.
2. Each output is calculated by the same input data. Applying this rule, the input can be loaded and used for all the outputs without utilizing the DDR more than
once.
Multiply and Accumulate Utilization

A powerful feature of DSP architecture is the ability to perform single cycle multiply-accumulate (MAC) instructions for intense computations. In order to maximize
efficiency, it is beneficial to have a continuous sequence of MAC instructions. This can be handled differently in two distinct cases:
http://www.radioelectronics.com/articles/processingembedded/deeplearningchallengesinembeddedplatforms195 1/3
2017613 DeepLearningChallengesinEmbeddedPlatforms::RadioElectronics.com
1. A low number of large input maps
2. A high number of small input maps
In the first case, we will prefer to complete the filter calculation for each input map before going to the next map. This way we benefit from overlapping filters, and on
the edges of the map we will have redundant MAC utilization loss. As shown in the formula below, width and height are calculated first, in this case. We call this
approach local filter calculation.
Formula for local filter calculation, used for large sized maps
In the second case, of small-sized input maps that occur in large amounts, the calculation should be performed across the maps. Different input maps are processed
to one output map. In this case, partial filter results are calculated and at the end of the process all the partial results are summed together to one result using the
property of the convolutional filter enabling this. As shown in the next formula, channels are calculated first. We call this approach cross map filter calculation.
Formula for cross map filter calculation, used for large number of maps with small size (last layers)
Utilizing internal memory

To use the embedded resources efficiently, we must have all the input maps in the internal memory, and loaded only once. But, what if we dont have enough
memory to preserve this rule? In this case we will need to perform tile division of the input, but still preserve the rule. After the division, we will have the same
number of inputs, but in tiles. The impact of this division is loading the weights in correlation to the number of tiles.
All these problems and their solutions are clearly something that the user would like to avoid dealing with when implementing deep learning on an embedded
platform. At CEVA, we believe this should be a basic demand for a real-time system to perform without the users involvement, or even awareness. This is core
responsibility of the CEVA deep neural network framework and CEVA network generator.
What else can be done?

Weve covered a few embedded algorithmic solutions that serve to change the convolution calculation to our benefit. In addition to these, more things can be done on
the algorithmic level by understanding neural networks work. Here are a few examples that use compression approach and prior knowledge to reduce bandwidth and
improve performance:
Using algorithms like Huffman coding
Work in pipeline to save BW
Identify when some of the calculation can be saved
Share data between calculations
Recognize when the focus should be on the weights and when it should be on the map size network dependent
Compress and decompress better over time (learn from frame by frame execution)
Conclusion
As you can see, there is a lot that can be done in the technical aspects of deep convolutional neural networks for embedded systems. Once the challenges of deep
learning in embedded systems has been overcome, there are many opportunities that are open.
Page 1 of 1
About the author

Lira Bar is Director of Product Marketing, CEVA, Imaging & Vision DSP core product line. Liran has more than fifteen years of experience in the imaging semiconductor
industry. He holds a B.Sc. in Electrical Engineering from Ben-Gurion University.
2017613 DeepLearningChallengesinEmbeddedPlatforms::RadioElectronics.com
CEVA is the leading licensor of signal processing IP for a smarter, connected world. We partner with semiconductor companies and OEMs
worldwide to create power-efficient, intelligent and connected devices for a range of end markets, including mobile, consumer, automotive,
industrial and IoT. Our ultra-low-power IPs for vision, audio, communications and connectivity include comprehensive DSP-based platforms for
LTE/LTE-A/5G baseband processing in handsets, infrastructure and machine-to-machine devices, computer vision and computational photography
for any camera-enabled device, audio/voice/speech and ultra-low power always-on/sensing applications for multiple IoT markets. CEVA can be
found at www.ceva-dsp.com
Most popular articles in Processing & embedded

Deep Learning Challenges in Embedded Platforms (/articles/processing-embedded/deep-learning-challenges-in-embedded-platforms-195)
Embedded World 2017 (/articles/processing-embedded/embedded-world-2017-210)
Choice: Microcontroller, MCU or Microprocessor, MPU (/articles/processing-embedded/choice-microcontroller-mcu-or-microprocessor-mpu-91)
Xilinx FPGA Enables Scalable MIMO Precoding Core (/articles/processing-embedded/xilinx-fpga-enables-scalable-mimo-precoding-192)
Capacitive Proximity Sensing Technology Update (/articles/processing-embedded/capacitive-proximity-sensing-technology-update-67)
Share this page

Share 13 Share 0 Tweet 2 Share 6
Want more like this? Register for our newsletter (/newsletter/)
THE WEBSITE
About us (/rec-information/about-us.php)
Privacy Policy (/rec-information/privacy-policy.php)
Submit news / articles (/rec-information/article-submission.php)
Advertise with us (/rec-information/advertising.php)
SECTIONS
News (/news.php)
Articles (/articles.php)
Training (/learning-training-courses/)
Jobs (/electronics-software-jobs/)
Events (/events-exhibitions-trade-shows/)
Bookshop (/bookstore/)
Equipment store (/test-equipment-shop/)
Whitepapers (/whitepapers.php)
CHANNELS
Antennas & propagation (/info/antennas/)

Cellular telecoms (/info/cellulartelecomms/)
Circuit design (/info/circuits/)
Components (/info/data/)
Power management (/info/power-management/)

RF technology (/info/rf-technology-design/)
Test (/info/t_and_m/)
Wireless (/info/wireless/)
Broadcast technology (/info/broadcast/)

Embedded (/info/processing-embedded/)
Design principles (/info/electronics-design/)
Distribution (/info/distribution-supply/)
Formulae (/info/formulae/)
Manufacture (/info/manufacture/)
Satellites (/info/satellite/)
Telecoms & networks (/info/telecommunications_networks/)
History (/info/radio_history/)
Radio-Electronics.com is operated and owned by Adrio Communications Ltd and edited by Ian Poole. All information is Adrio Communications Ltd and may not be copied except for individual
personal use. This includes copying material in whatever form into website pages. While every effort is made to ensure the accuracy of the information on Radio-Electronics.com, no liability is
accepted for any consequences of using it. This site uses cookies. By using this site, these terms including the use of cookies are accepted. More explanation can be found in our Privacy Policy

Deep Learning Challenges in Embedded Platforms - Radio-Electronics

Diunggah oleh

Informasi Dokumen

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Deep Learning Challenges in Embedded Platforms - Radio-Electronics

Diunggah oleh

Hak Cipta:

Format Tersedia

2017613 DeepLearningChallengesinEmbeddedPlatforms::RadioElectronics.

LinkedIn (https://www.linkedin.com/company/radio-electronics-com) YouTube (https://www.youtube.com/user/radioelectronicscom)

Twitter (http://twitter.com/RadioElec) News feed (/rss.php) Newsletter (/newsletter/)

PROCESSING & EMBEDDED SYSTEMS (HTTP://WWW.RADIO-ELECTRONICS.COM/INFO/PROCESSING-EMBEDDED/)

In this post we look at ways to deal with those challenges.

Googlenet deep convolutional neural network

CEVA network generator tool

Overcoming the Challenges

Multiply and Accumulate Utilization

2. A high number of small input maps

Utilizing internal memory

What else can be done?

Using algorithms like Huffman coding

Work in pipeline to save BW

Identify when some of the calculation can be saved

Share data between calculations

About the author

Most popular articles in Processing & embedded

Share this page

Want more like this? Register for our newsletter (/newsletter/)

Antennas & propagation (/info/antennas/)

Power management (/info/power-management/)

Broadcast technology (/info/broadcast/)

Anda mungkin juga menyukai