Anda di halaman 1dari 5

A Network-on-Chip Simulation Framework for

Homogeneous Multi-Processor System-on-Chip


Yuan Wen Rau, M. N. Marsono, Chia Yee Ooi, M. Khalil-Rani
VeCAD Research Laboratory
Faculty of Electrical Engineering, Universiti Teknologi Malaysia.

81310 Skudai, Johor. Malaysia.


hauyuanwen@gmail.com, {nadzir, ooichiayee, khalil}@fke.utm.my

Abstract-This paper presents a Network-on-Chip (NoC) sim


ulation framework at the Electronic System Level (ESL) design
abstraction based on SystemC. The proposed ESL NoC frame
work extends the NIRGAM NoC simulator by integrating ARM
Instruction Set Simulator (ISS) as its application Intellectual
Property (IP) cores. This enables the modelling of complex homo
geneous Multi-Processor System-on-Chip (MPSoC) by simulating
the behaviour of embedded cores using ISSs attached to NoC tiles.
The actual traffic patterns

are

extracted according to the target

application for NoC performance analysis. In this paper, we


describe the development of the extended NoC framework which
includes the definitions of synchronization and data communi
cation protocol, interprocess communication module, network
interface architecture design, and device driver. Experimental
result shows that the extended platform enables early NoC
based MPSoC system functionality estimation and provides NoC
performance analysis with higher accuracy by considering the
actual traffic trace according to the target application.

Index Terms-Electronic System Level, Homogeneous Multi


processor System-on-Chip, Instruction Set Simulator, Network
on-Chip, Simulation framework, SystemC

I. INTRODUCTION
Multiprocessor system-on-chip (MPSoC) is an integration
of multiple processors or IP cores into a single chip. The use
of bus-based architecture inhibits the system scalability and
affects overall MPSoC performance. As a result, network-on
chip (NoC) [1] interconnect architectures have been defined
as an on-chip communication architecture (OCCA) where
processor cores can pass messages in the form of packet. One
important aspect of NoC is the decoupling of communication
from computational cores. Although the use of simulated traf
fic pattern may allow NoC design-space exploration without
actually having the processing cores available, the results may
be inaccurate due to the underestimation or overestimation of
the communication performance compared to the one for the
actual SoC implementation.
An MPSoC design requires a trade-off analysis among per
formance, power, area and reliability to meet the requirements
of the target application. To enable early system functional
ity verification and design-space exploration, several design
frameworks have been proposed to describe and simulate the
complex NoC-based MPSoC at higher abstraction level, i.e.,
Electronic System Level (ESL) [2]-[6]. However, most avail
able NoC simulation frameworks mainly focus on the analysis
of communication traffic. Each processor sends traffic in either

978-1-61284-193-9/11/$26.00 2011

IEEE

constant bit rate, variable bit rate, or bursty transfer. The


computation performance of each processor core is assumed
ideal and the actual time packets are transmitted is not taken
into consideration in the NoC performance analysis. Hence,
the MPSoC design-space exploration and system functionality
verification cannot be evaluated until the MPSoC is imple
mented.
Nostrum [4], written in SystemC, is limited to only mesh
topology and a fixed routing algorithm. Moreover, it does
not allow IP core integration. NOXIM [3] and NIRGAM [6]
are both extensible SystemC NoC simulation platform. These
platforms only focus on the automated generation of NoC
architecture which only take into account the generation of the
communication structure. Hence they are not able to perform
early system functionality verification of an MPSoC. The
design-space exploration is measured without considering the
computation core performance. Reference [7] presented an ap
proach which integrates an ISS into SoC simulation platform.
However, the targeted SoC architecture is limited to single
master bus-based architecture. Reference [8] presented an
approach of integrating a SimpleScalar ISS into MPSoC using
GNU-Debugger Interface to communicate between SystemC
simulator with the ISS. However, this approach only considers
the integration of the ISS into MPSoC without the usage
of complex NoC communication structures. Reference [9]
presented the most similar approach with our work, which
integrates an ISS into MPSoC based on shared memory in
an NoC simulation platform. However, they do not form
the homogeneous MPSoC by attaching the ISS into each
network tile and extract the actual traffic trace based on target
application.
In this paper, we present an extension of an available open
source NoC platform, NIRGAM [6], by integrating the ARM
instruction set simulators (lSSs) as its application IP cores.
This paper discusses the platform extension methodology
including the definition of synchronization and data commu
nication protocol, the design of interprocess communication
module and network interface architecture, and development of
device driver. This extension enables the NoC platform to fur
ther simulate the behaviour of each embedded software within
each ISS. Hence, it enables a cycle-accurate early system
functionality verification of a complex homogeneous MPSoC
design. In addition, by including the computation performance
into the consideration of overall MPSoC performance, the

early design-space exploration can be achieved with higher


accuracy compared to the original estimation on NIRGAM
platform.
The rest of the paper is organized as follows. Section II
presents the platform development. A case study based on
cryptographic application is presented in Section III as well
as the simulation result comparison between the original
NIRGAM platform with the proposed extended platform. The
conclusion and the recommendation for future enhancement
are discussed in Section IV.
II. EXTENDED SIMULATION PLATFORM ARCHITECTURE
NIRGAM [6] is a discrete event, cycle accurate simulator
targeted for modelling NoC at ESL level using SystemC. It
provides substantial support to experiment with NoC design
with various options available at every stage of NoC de
sign, such as topology, switching technique, virtual channels,
buffer parameters, routing mechanism, and applications traffic
modelling. This simulator is also extensible and modular,
which can be easily extended to include new applications and
routing algorithms. The simulator provides output performance
metrics, such as latency, throughput, and power consumption
estimation for a given set of choices.
In this work, the integration of SimIt-ARM ISS [10] as
the core in each NoC tile is proposed to simulate the Stron
gARMv7 architecture while executing an embedded software
with clock cycle accuracy. This enables early system function
ality verification and design-space exploration of a homoge
neous MPSoC. It also enables the traffic packet distribution
analysis.
The main challenges in integrating the distinct Simlt-ARM
ISSs into NIRGAM platform are: (1) To control the sim
ulation synchronization between the SimIt-ARM ISS with
the NIRGAM NoC simulator within the SystemC simulation
kernel, and (2) To enable SimIt-ARM ISS exchanges data with
other IP coreslISSs via NoC OCCA with correct functionality
and proper synchronization in the GALS (global asynchronous
local synchronous) paradigm. Both tasks require the definition
of interprocess communication (IPC) and data communication
protocol between network interface (NI) with the ISS kernel as
well as the end-to-end IP communication, the design of IPC
module and NI architecture, and the development of device
driver.

ISS Network Interface I Wrapper

ISS Kernel

ISS Kernel

Figure 1: Data synchronization mechanism in homogeneous


NoC-based MPSoC

PE readiness to receive the data. As a result, receiving the


data from local output port of the router to the input port of
NI and hence to the PE (which is Simlt-ARM ISS in this
case) with the proper synchronization during the execution of
the application software becoming a challenging task.
The end-to-end data synchronization between the ISS with
the other IP cores exists due to the nature of MPSoC ar
chitecture. In the bus-based SoC, at any given time, only
a single master can capture and hold the ownership of the
bus to transfer the data to multiple slave modules. In NoC
based MPSoC, the data transmission may happen concurrently
between multiple master modules, as long as the link is
available based on the flit control. Therefore, the IP-to-IP data
synchronization needs to be carefully considered.
In this work, the single-host IPC based on shared memory
is chosen as the communication protocol. This is due to the
fact that the shared memory architecture is the fastest form
of IPc. While two or more processes can share the same
memory region, no kernel involvement occurs in information
exchanges between processes. Shared memory communication
can be seen as a memory block where all data are stored in
the communication link/memory [11].
Shared memory declares a given section of memory to be
used by several processors in parallel. Sharing the same region
of memory, the processes may try to alter the memory area
at the same time. To avoid destroying or missing messages,
a synchronization mechanism such as semaphores is required
between the processes that are storing or fetching information
to and from the shared memory. In this work, the shared
memory is embedded within each ISS (i.e., also considered
as local memory of the ISS). The shared memory is divided
into three sections, which are:

A. Inter-process Communication Protocol: Shared Memory


The synchronization of data communication of Simlt-ARM
ISS needs to be considered in two different stages, as shown in
Figure 1: (a) the data synchronization between ISS simulation
kernel with the network interface, and (b) the end-to-end data
synchronization between ISS with the other IP coreslISSs.
The main issue is when the ISS kernel trying to receive the
correct data from the NI. Though there are buffers allocated
to each input port of the router, there is no buffer allocated
to the output port. Therefore, whenever a router receives any
data meant to a processing element (PE) from any port, it
will directly send out the data to the PE via network interface
through the local output port. This is done independent of the

ISS Network Interface I Wrapper

Normal memory section for temporary data storage This is the normal memory locations for temporary data
storage during the execution of the application software.
User-defined memory section for data exchange with
other IPs - It is dedicated for data exchange between
the ISS with the other IPs via NoC OCCA.
Reserved memory section for communication handshak
ing protocol - This is the reserve memory location that is
dedicated for high-level end-to-end handshaking protocol
among IPs or low-level handshaking protocol between
ISS simulation kernel with the NI via the IPC module.

I) Data Communication Protocol between ISS and NoC


NI: There are two types of data transmission between the
ISS with the NoC via the NI based on shared memory IPC
protocol: message sending from the ISS to the NoC and

message receiving from the NoC to the ISS.


Message Sending Process from ISS to the NoC via the NI
Figure 2 shows the message sending process from the ISS to
the NI. The ISS first needs to determine the message size and
write to the SENDCNT_ADDRESS memory location, which
are later used by the NI from the same memory location.
After that, the ISS will start sending data whereas the NI
keeps receiving data from the user defined memory location
depending on the message size. This sending process involves
certain low-level handshake protocol between the ISS kernel,
IPC module and NI. Note that whenever the SimIt-ARM ISS
writes data into the shared memory, it has to write via the
IPC module, arm_source, which its detail architecture will be
discussed in Section II-B.

ISS

NoC Dev;c8 Driver:


send_dalBO
I
I

Wrllelhe

Shared MemOfY

I
N

size
OR

I Write the data to dedicated

I
I

10 the

Read the messa

memory.

address IOf data sending

I
I
I

size

from the

Read the data from dedICated


I
memory add ress fOf data sending II I

v
o

I
I
I
I
I
I

I
I

t-------------------.
, Write data In to flt( end
send to NoC OCCA

Figure 2: Message Sending Process

Message Receiving Process from NoC to ISS via NI


During the system simulation, there is no guarantee of the
execution speed of the application software simulated by the
ISS with the data received from the NoC router from different
neighboring routers directions. Therefore, a 2-way handshake
protocol between the ISS and the NI via shared memory IPC
needs to be defined to ensure correct data reading with proper
synchronization, as shown in Figure 3.
ISS

Nae Device()oriver:
recv_data

Shared
.
Write the tile 10 altha source IP to
the SrelO AD DRE SS

Write the read mefTlOf)' address

the READMEM_ADORESS

II:
:
to
:

Write the messa e size to the


RE V NT_ADORE

I
I
I
I

ACK ADDRESS = 0, indicate


reClOy 10 receive d Clt a

G>

W it until
R Y A D DRES S = 1

ISS .

tne

Read the

10 altha source from

Srdb ADDRESS

: __

uuuuu

I
I
I

read memory address fr

Read the messa e

size from the


RE V NT_ADORE

I
I

Waituntill

__ =__: 1

I
I
I

Write the data to dedicated meffiOl'Y

I II
I

;Rl rj';-d;-irffii that

II, receive d from NoC OCCA

the READMEM_AOORESS

I I
I
I
I.... ________________________________

Read the lile

.1

I
I

NI
Nae ARM::ARM data receiveD

Memory

address for data receiving


ROY ADDRESS =

1 .indicate the
data to me mory

I
I
I
I

j NI has copy the


I
I
I
I
Read the data from dedicated
I
memory address for data receivingI
I
I
ACK ADDRESS = 1, indicate ISS
has read the data
I.-------------------------------..,J
I
I
Wait
unlil
I
I
I
I ROY_ADDRESS:: 0
II ROY ADDRESS = 0, indicate NI
r&eelved ACK slgnalfrom ISS
I
I
I

I
I
I
III
I
I

--------------------------------1:

has:

Referring to Figure 3, before the ISS starts reading data


from the user defined memory location, it first writes the
tile ID of the source IP, memory address that going to
be read, and message size into SrcID_ADDRESS, READ
MEM_ADDRESS, and RECVCNT_ADDRESS, respectively.
The SrcID_ADDRESS is to fetch the correct data from input
buffer of the NI, the value of the READMEM_ADDRESS
is to retrieve the correct IPC module to write the data into
associated shared memory location.
Based
on
the
message
size
stored
into
the
RECVCNT_ADDRESS location, the 2-ways handshaking
protocol is initiated by the ISS by reseting the ACK value to
0, indicating that it is ready to receive data. Once the NI has
detected the ACK value, it will write the data into the shared
memory location and assert the RDY signal to 1, indicating
that the data has been written to the shared memory. After the
ISS has detected the RDY value, it reads the data from the
shared memory and assert the ACK to 1, which later cause
the NI to reset the RDY signal to 0 indicating a read cycle
has been completed. The same read cycle may continues
executing depending on the message size.
2) End-to-End Data Communication Protocol among ISSs:
The NoC-based MPSoC architecture enables multiple master
or slave modules to access the NoC OCCA and send data to
each other especially in the form of one-to-many, many-to-one,
and many-to-many data exchanges. Hence, the synchronization
between the data sending and receiving for all IP cores become
a critical issue to ensure the correct functionality of the
MPSoC for specific target application.
In this work, the author implements a simple high-level
handshake protocol between the source ISS and destination
ISS, as shown in Figure 4. Assume that two ISSs involve
in a message exchange. In this protocol, a handshake signal
from the destination ISS is sent to the IPACK_ADDRESS of
the source ISS indicates that the destination ISS is ready to
receive the data. On the other hand, before the source ISS starts
sending data to any core, it will always check the content of
IPACK_ADDRESS to verify that the destination ISS is ready
to receive messages.

Figure 3: Message Receiving Process

Figure 4: High-Level End-to-End Data Communication Pro


tocol between an ISS with the other IPs

B.

Interprocess Communication (IPC) Module Design

The IPC modules act as third party modules to send/receive


data between the shared memory and the NI. This involves
modification of the ISS to extend its embedded memory to in
clude the IPC modules. Two IPC modules are inserted, named
arm source and arm_sink. The arm_source sends data from
the hared memory to the NI, and vice versa for arm_sink.

NoC_ARM:lpcore

Figure 5 shows the class diagram of arm_source and arm_sink


IPC modules.
arm sink

arm source
+
+

+
+

int interface_id;
unsigned long

dalaJo;

boot interface_written;

unsigned long access_count;

... arm_sourceO
+

void write_devtceO
void reset_flag(void)

+
+
+

int interface_id;
unsigned long data_io;
unsigned long access_count;

clock

arm_sinkO
void

read_deviceO

Figure 5: Class Diagram of IPC modules: arm_source and


arm_sink
Before these IPC modules can be used to access data to/from
the shared memory, each user-defined or reserved memory
addresses, as well as their associated IPC modules must be first
registered to the ISS kernel. This indicates that these memory
locations have been reserved for data exchange between the
ISS with the other IPs via NoC OCCA, hence preventing the
ISS kernel from storing any temporary data in these locations
during the execution of the application software.
Note that these IPC modules act as an interface between
the ISS kernel and the NI to exchange data with the other
PEs via the shared memory. The data synchronization is not
handled by the IPC modules but is through the collaboration
of device drivers running at the ISS kernel, IPC module,
and NI according to the pre-defined communication protocol
presented in Section II-AI.
C.

Network Inteiface Design for ISS

The authors control the execution of the SimIt-ARM ISS


and the NoC OCCA based on the single SystemC master
clock. Using single clock control, the execution of the appli
cation software within the SirnIt-ARM ISS can be guaranteed
to be synchronized together with the other NIRGAM internal
components. The NI wraps the ISS and each SystemC clock
trigger will update the internal state of the ISS.
Figure 6 shows the functional block diagram of SimIt
ARM ISS NI, NoC_ARM, which inherits from the ipcore
parent class. The SirnIt-ARM ISS is instantiated as a sub
module within NoC_ARM, which contains the simulation
kernel and the shared memory module (i.e., which contains the
arm_source and arm_sink IPC modules). Note that NoC_ARM
also contains an input buffer to load incoming data received
from jiiUnport.
The NoC_ARM() constructor initializes the property
value, and the registration of SystemC processes. The
ARM-preprocessing() loads various parameters from the con
figuration file, creates IPC module instances, and registers
them as well as the associated memory address to the ISS
for data exchange. The ARM-processing() updates the internal
state of the SirnIt-ARM ISS kernel during the application
software simulation using a single SystemC master clock
control, which make this process sensitive to clock events.
The send() implements the data communication protocol of
message sending process from the SimIt-ARM ISS to the NI
as illustrated in Figure 2. It then generates the flit, copy the
data to the flit, and send the flit to the jiicoutport. The recv()

Figure 6: Functional Block Diagram of NoC_ARM

(a) System architecture

(b) Characteristic graph

Figure 7: NoC-based Crypto MPSoC

receives the incoming flit from the jiicinport, extracts the data
and command from the flit structure, and pushes the data into
the input buffer. The ARM_dataJeceive() implements the data
communication protocol of the data receiving process from the
input buffer within the NI to the SirnIt-ARM ISS as illustrated
in Figure 3. This process is sensitive to clock.
D. NoC Device Driver
The device driver acts as the Hardware Abstraction Layer
(HAL) to allow SimIt-ARM ISS to exchange data with other
PEs via the NoC OCCA. In this work, the NoC device driver
is developed in C and works tightly-coupled with the NI
according to the predefined data communication protocol, as
described in Section II-AI and Section II-A2.
III. CASE STUDY AND SIMULATION RESULTS
To verify the platform extension, a simple case study of
a homogeneous NoC-based crypto MPSoC is developed to
provide data security services as shown in Figure 7a. It is fitted
in a 2x2 network based on mesh topology, and consists of four
ISSs attached to each tile. The ISS_MAN acts as the master
controller of the overall system, the ISS_XOR performs 512bit XOR data encryption, and the ISS_SHA computes SHA-l
hashing to produce message digest. The ISS_ECC performs
ECC key pair generation, as well as digital signature signing
and verification based on Elliptic Curve Digital Signature
Algorithm over 160-bit prime finite field.
Each ISS executes the embedded software and exchanges
data according to the application sequence diagram shown in
Figure 8. The total number of packets sent from one source to
another is as shown in the characteristic graph in Figure 7b.

During the simulation, the time of packet generation from each


ISS to all destinations are traced. The traffic distribution graph,
together with software computation time, is then generated
as shown in Figure 9a. From the original traffic distribution
graph, the computation time of the system is then excluded by
only considering the traffic information, as shown in Figure 9b.
This is due to the fact that only the NoC OCCA performance
metrics are evaluated, instead of the computation time and the
communication aspects of the homogeneous MPSoC.

""".

I--=-""""='"--ri-J

eccK.,-Plr.
P
P ==

f----"---oj

0(III.Q(12I

f-----f--+_---- .

FIllMAN
. XOR
.SHA
D ECC

20
18
"
12
10

Performance Metric
Average Throughput
Average Packet Latency
Average Flit Latency
Total Network Power

Actual Traffic Model

CBR Model

1.46 Gbps
4.75 cycles I packet
1.29 cycles I flit
7.54 mW

1.58 Gbps
9.07 cycles I packet
2.27 cycles I flit
7.87 mW

model. For example, the average packet latency of the actual


traffic model is 4.75 cycles/packet, whereas the CBR model
is 9.07 cycles/packet. The underestimation of the CBR model
is almost 2 times greater than the actual traffic model. It is
due to the fact that the actual traffic generation considers the
traffic load at each simulation cycle by considering the data
dependency and computation time of each ISS according to
the target system application.
IV. CONCLUSION

Figure 8: Sequence Diagram of NoC-based Crypto MPSoC

Packet Count

Table I: Performance comparison result between actual traffic


model with CBR model

CIodtCyele

This paper has presented an extension on the NIRGAM


NoC simulation platform by integrating SirnIt-ARM ISS to
model application IP cores that simulate the communication
behaviour of homogeneous MPSoC. The platform extension
includes the definition of data communication and synchro
nization protocols, the development of the IPC module as
well as the design of the network interface for the ISS. These
extensions enable the modelling of the complex homogeneous
MPSoC design and early system functionality verification for
both the computation and the communication entities in the
homogeneous MPSoC. Results shows that the extended plat
form provides more realistic NoC traffic performance analysis
by considering the data dependency and computation time of
each ISS. In addition to that, this platform also facilitates
the design-space exploration for HW/SW co-design of system
partitioning in the future.
REFERENCES

(a) With Computation Time


"AN

CIod<C

(b) Without Computation Time

Figure 9: Traffic Distribution


Based on traffic distribution shown in Figure 9b, an equiva
lent traffic generator of each ISS is then modelled in NIRGAM
platform to analyze the NoC performance. On the other hand, a
traffic generator based on CBR according to the characteristic
graph as shown in Figure 7b of each ISS is also modelled for
the NoC performance comparison as shown in Table I.
Table I shows that the actual traffic model extracted from the
extended NIRGAM platform after the ISS integration provides
realistic of the NoC performance metric compared to the CBR

[I] L. Benini and G. D. Micheli, "Network on chips: A new SoC paradigm,"


IEEE Computer, vol. 35,pp. 70-78,Jan 2002.
[2] D. Bertozzi and L. Benini, "Xpipes: A network-on-chip architecture
for gigascale systems-on-chip," IEEE Circuits and Systems Magazine,
vol. 4,pp. 18-31,September 2004.
[3] NOXIM. Available in http://noxim.sourceforge.net/.
[4] Z. Lu, R. Thid, M. Millberg, and A. Jantsch, "NNSE: Nostrum network
on-chip simulation environment," in Proceedings of the 5th Swedish
System-on-Chip Conference (SSoCC'05), (Stockholm, Sweden), pp. 1-4,
18-19 April 2005.
[5] D. Siguenza-Tortosa, T. Ahonen, and J. Nurmi, "Issues in the devel
opment of a practical noc: the proteo concept," Integration, the VLSI
Journal, vol. 38,pp. 95-105,October 2004.
[6] NIRGAM. Available in http://nirgam.ecs.soton.ac.uk.
[7] Y. W. Hau and M. K. Hani, "SystemC-based HW/SW co-simulation
platform for system-on-chip (SoC) design space exploration," International Journal of Information and Communication Technology, vol. 2,
no. I,pp. 108-119,2009.
[8] L. Benini, D. Bertozzi, D. Bruni, N. Drago, F. Fummi, and M. Poncino,
"SystemC cosimulation and emulation of mUltiprocessor SOC designs,"
IEEE Computer, vol. 36,pp. 53-59,April 2003.
[9] T. Schonwald, 1. Zimmermann, O. Bringrnann, and W. Rosenstiel,
"Network-on-chip architecture exploration framework," in 12th Euromi
cro Conference on Digital System Design I Architectures, Methods and
Tools, (Patras), pp. 375-382,27-29 August 2009.
[10] SimIt-ARM. Available in http://sourceforge.net/projects/simit-arm.
[II] W. Wolf, Computers as Components. Morgan Kaufmann Publishers Inc,
2000.

Anda mungkin juga menyukai