Canny Report

EECS 222A: SYSTEM-ON-CHIP DESCRIPTION AND MODELING Modeling of a Canny Edge Detector System-on-Chip for a Digital Camera
Vivekanand Veeracholan 17292864 06/13/2012
ABSTRACT
In Image processing, when we want extract the object of interest from the remaining image data the first stage is detecting the edges of all the objects in the image and filtering the objects with the required features. So edge detection becomes the necessary step in image processing. This step is normally done in a computer. In this project we wanted to design a customized system for edge detection to be embedded in digital cameras. The algorithm we chose for edge detection is Canny. This is simple and easy algorithm to implement. We designed the entire system using SpecC language and the simulation gave a good result.
ii
CONTENTS
S.No 1. Title Introduction a. System Level Modeling b. System Level Description Languages Case Study on a Canny Edge Detector SoC a. Canny Application Reference C Code. b. System Level Model in SpecC. c. Estimation, Optimization and Refinement using SCE. Conclusion Reference Page No. 1 3
2.
5 5 8
3. 4.
9 10
iii
1. INTRODUCTION
In designing a system, we need to make lot of decisions and the two main decisions that reflect on all other decisions are choosing the model of computation and the selection of description language. These two parameters decide the flow of design and tools required. Their selection depends on the nature of the system we will be developing. In the following subsections we will discuss about the System level modeling and the System level description languages.
a. SYSTEM LEVEL MODELING

At the beginning of any project, the only thing we will have in our hands is what the black box we are going to design must perform. These requirements will lead us to the functional description of the system. This is the first model of the project, Specification model. This is highest level of abstraction in modeling the system. During the design process, the level of abstraction will start decreasing as we put in more details to the design and we will reach the level which can be synthesized. The following pyramid shows the relation between levels of abstraction, number of components and the accuracy of our design.
This clearly shows that at the highest level of abstraction, the components we will be working with is very less compared to the lowest level which is result of adding more and more details and requirement to the design. The accuracy also improves as we move towards lower level of abstraction and this is because we can individually specify the behavior of the components that perform the very basic operation. For example, at transistor level abstraction we will know the
1
width and length of the transistor which helps us to predict the exact timing of the gates and eventually the timing of the entire system. The following figure will help you understand the models as we move towards the lower level of abstraction.
As you can see, we start with the requirements and specification model which is pure functional description, then we add more details to the system like different processing elements, Architecture model, the deciding on the communication networks in the chip, Communication model, deciding which technology to use and the RTL, Implementation model. In some systems where the number of tasks is high, the scheduling of those tasks plays an important role in the efficiency of the system.
Figure 3. will give an overall view of the SoC design flow.
b. System Level Description Languages

Once we have the computation model of the system, we must choose a language to capture that computational model correctly and we should be able to achieve the desired result. Over the years there have been a number of languages that helped and helping us with this process. Goals and Requirements of these Languages. Formality Executability Synthesizability Modularity Completeness Orthogonality Simplicity
Few of the languages are C o C++ o Java o VHDL o Good with functional representation. Cannot be used for hardware level modeling. Same as C with additional feature of exception handling. Like C++ with features of concurrency and synchronization. Hardware Description language. Has almost all features required to synthesize a hardware with structural hierarchy.
Verilog o Another variant of VHDL SpecC o Perfect for capturing the system. Have all the features that are missing in the above mentioned languages. SystemC o It is more like a library to C++ than a language. It also has all the features of SpecC
The following figure gives the capability of each language in the context of system level modeling.
2. Case Study on a Canny Edge Detector SoC.

In order to understand the design flow of the SoC, we decided to design a SoC for Canny edge detection algorithm for digital camera and simulate it for analyzing it. Edge detection is a very important part in any image processing algorithms related to object recognition, machine vision
Canny Edge Detector

The Canny edge detector is an edge detection operator that uses a multi-stage algorithm to detect a wide range of edges in images. This algorithm is a very optimal (i.e.) it has good detection, localization and response. The main stages of the algorithm are Noise reduction using Gaussian smoothing Finding the intensity gradient of the image Non-maximum suppression Tracing edges through the image and hysteresis thresholding.
a. Canny Application Reference C Code & Porting to SpecC

To start with we downloaded an existing code from internet. The code was written by Mike Heath from University of South Florida. The entire code was written across 3 source files "canny_edge.c", "hysteresis.c" and "pgm_io.c". The code was written for both .pgm and .ppm type of images. The memory required are dynamically allocated rather than static. In order to make the code run on SpecC we had to change few things because of the limitations on the SpecC compiler. It is not as relaxed as GCC in terms of variable declaration and constant assignments and also the compiler has no NULL keyword. Our goal is to synthesize it in hardware so dynamic memory allocation doesnt make any sense. Because of this we removed all the dynamic allocation from the code and replaced them with fixed size arrays. Due to this we also created a new limitation on the input image size. The image size was restricted to 320X240 pixels. After these basic changes the code compiled and produced the expected result.
b. System Level Model in SpecC

Once those initial changes are made and the reference code runs on SpecC compiler we had to start modeling it like a system. So the code is modeled with the structural hierarchy shown in the next page. This is the test bench that is used for the entire project. The entire program is split into three main behaviors Stimulus, Platform and Monitor. The Platform behavior represents the actual chip while stimulus and monitor represents the image input interface like the CMOS sensor of Digital Camera and image output like the LCD interface.
Structural Hierarchy of the model. Bio Bil Bic Bil Bil Bil Cil Cil Bil Cil Cil behavior Main |------ Monitor monitor |------ Platform platform | |------ DUT canny | |------ DataIn din | |------ DataOut dout | |------ c_img_queue q1 | \------ c_img_queue q2 |------ Stimulus stimulus |------ c_img_queue q1 \------ c_img_queue q2
Queue
Queue
q1
Queue
PLATFORM
Queue
q2 P
P q1 STIMULUS
Read_pgm() P.Send(img) P1.Read(img) P2.Send(img)
q2 DUT
Canny() Hysteresis() Gaussian()
P1
P2
P1
P2
MONITOR
P.Read(img) Write_pgm(img)
DATA IN
DATA OUT
P1.Read(img)
Exit() P2.Send(img)
Stimulus: This behavior implements the Read_pgm() to read the image and sends the read image to the behavior Platform through the port P. The communication channel between Stimulus and Platform is a simple Queue q1.
6
Platform: The platform has its own Data_in and Data_out interfaces to communicate with other behaviors instead of directly communicating with stimulus and monitor. These modules are included to make the future modifications easier. That is if we intend to change the interface between the stimulus or monitor and Platform we need not disturb the entire code instead we can simply modify the Data_in and Data_out. Data_in is the interface between Platform and Stimulus. DUT is the main behavior that implements the full functionality of the canny application. All the functions related to edge detection are implemented in the DUT behavior. Data_out is the interface between Platform and Monitor. Monitor: This behavior reads the processed image from the Platform and writes it to the file using Write_pgm() function. The interface between Platform and Monitor is also a simple Queue q2.
IMPROVING THE HIERARCHY

Just a single behavior for all the edge detection functions will lead to less flexible design. That is we cannot modify the design later. So to make the more flexible, the Canny behavior is broke into smaller behaviors representing each functions. The new structural hierarchy is as follows Bio Bil Bic Bis Bil Bil Bil Bil Bil Bil Bil Cil Cil Bil Cil Cil behavior Main |------ Monitor monitor |------ Platform platform | |------ DUT canny | | |------ Apply_Hysteresis apply_hysteresis | | |------ Derivative_X_Y derivative_x_y | | |------ Gaussian_Smooth gaussian_smooth | | |------ Magnitude_X_Y magnitude_x_y | | \------ Non_Max_Supp non_max_supp | |------ DataIn din | |------ DataOut dout | |------ c_img_queue q1 | \------ c_img_queue q2 |------ Stimulus stimulus |------ c_img_queue q1 \------ c_img_queue q2
c. Estimation, Optimization and Refinement using SCE.

With the initial hierarchy we simulated the system and got the execution time distribution. The following graph shows it.
This clearly shows that the Gaussian_smooth function dominates the entire computation time. The Gaussian_smooth function has two main parts blurring in X and blurring in Y. These two parts are completely independent internally (i.e.) they can be parallelized individually. Four instances are created for BlurX and four instances for BlurY. Before parallelizing this part took 400 ms and now it takes 100ms.
Architectural Refinement:
With optimized model in hand, the next step is to decide on hardware units to be allocated for the behaviors. The SCE tool offered different processors like ARM7TDMI, Motorolla, ARM 9, different DSPs and many custom hardware options. The code does not have any DSP requirement. So ARM7TDMI was chosen for the main controls. Individual custom hardware was chose for BlurX and BlurY behaviors. In selecting the hardware for Blur functions also has two options, using the same hardware for BlurX and BlurY because BlurY is executed after BlurX. So this is upto the designer and the decision is the tradeoff between performance and chip cost. In this project, the decision was made in favor of individual hardware units. For the Data_in and Data_out is allocated virtual hardwares. So the following clearly shows the mapping of Processing Elements and behaviors. Behavior Canny BlurX BlurY Data_in , Data_out
8
Processing Element ARM7TDMI Custom Hardware Custom Hardware Virtual Hardware
Scheduling Refinement
Once the decision on processing element is made, the next step is to decide on scheduling the tasks which share the same hardware unit. Fortunately in this project there is no necessity of scheduling because the functions are sequential and the only parallel part is Gaussian and that is given custom hardware units.
Network Refinement
After the scheduling the communication channels between individual hardware units has to be defined. In this project the hardware units are 1 ARM core, Virtual hardware units and 8 custom hardware units. The ARM core comes with the AMBA bus architecture therefore any communication to and from ARM can be done using the AMBA architecture. Now the communication between custom hardware units needs to finalized. We dont need to have a complex protocol so a simple double handshake protocol is selected. All BlurX hardware units have two ports one for AMBA bus, input from ARM and other for double handshake bus , output to BlurY units. All BlurY units have 5 ports. One for AMBA bus, output to ARM core and four input ports for double handshake busses from each BlurX unit.
Transaction Level Model

With the completion of the network refinement, Transaction Level model was created. After all this refinement we got accurate simulation result of 501 ms for single image.
3. CONCLUSION
Starting with just the requirement and a sample C source code, the system has been developed step by step going through all the levels of abstraction. Though final RTL refinement which is required for synthesis is not done due to the short duration of the project, the results obtained are satisfactory. The individual execution times of the Processing elements ARM core - 501.9ms BlurX HW - 47.9ms BlurY HW - 51.3ms
ISSUES AND RELATED FUTURE WORK

As you can clearly see from the execution times, the system takes nearly 600ms for processing one picture which means we can effectively process only 1 to 1.5 frames per second but in real time videos will have a minimal frame rate of 24.96. So this makes our system to be obsolete. In order to improve the performance, we can increase the frequency of ARM core. The maximum frequency supported is 500MHz. So this will give a speedup of 5 times, that is 7.5 frames per second still not good enough. After parallelizing the Gaussian smooth, the function that involves heavy computation is No_Max_support. This function is not only computationally intensive but also involves lot of floating point operations. So by using fixed point computation we can achieve some more improvement in performance.
The next issue is, all the simulation results we got are just estimates they are not real. There can be variations in the result after synthesis. This will further lower the processing power. Is 320X240 an acceptable resolution these days? No. Most of the digital camera these days captures videos with a minimal resolution of 1280X720. Our design cannot process a single image of this high resolution. So we need to complete parallelize our approach like in graphics card use as many cores possible and process the image. It is possible to work on these issues by proper selection of hardware units and optimization of code with some tradeoff in accuracy of results and make our system work in real time video processing.
4. References
1. ftp://figment.csee.usf.edu/pub/Edge_Comparison/source_code/canny.src 2. http://en.wikipedia.org/wiki/Canny_edge_detector 3. http://www.cecs.uci.edu/~doemer/publications/SpecC_LRM_20.pdf 4. http://www.cecs.uci.edu/~cad/sce.html
10

Canny Report

Diunggah oleh

Informasi Dokumen

Deskripsi Asli:

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Canny Report

Diunggah oleh

Hak Cipta:

Format Tersedia

EECS 222A: SYSTEM-ON-CHIP DESCRIPTION AND MODELING Modeling of a Canny Edge Detector System-on-Chip for a Digital Camera

Vivekanand Veeracholan 17292864 06/13/2012

a. SYSTEM LEVEL MODELING

Figure 3. will give an overall view of the SoC design flow.

b. System Level Description Languages

2. Case Study on a Canny Edge Detector SoC.

Canny Edge Detector

a. Canny Application Reference C Code & Porting to SpecC

b. System Level Model in SpecC

IMPROVING THE HIERARCHY

c. Estimation, Optimization and Refinement using SCE.

Processing Element ARM7TDMI Custom Hardware Custom Hardware Virtual Hardware

Transaction Level Model

ISSUES AND RELATED FUTURE WORK

Anda mungkin juga menyukai