A Full Duplex Implementation of Internet Protocol Version 4

A FULL DUPLEX IMPLEMENTATION OF INTERNET PROTOCOL VERSION 4
IN AN FPGA DEVICE
Paulo Csar C. de Aguirre, Lucas Teixeira, Crstian Mller, Fernando Lus Herrmann,
Leandro Z. Pieper, Josu de Freitas, Gustavo Dessbesell, Joo Baptista Martins
Electrical Engineers Course Microelectronics Group
Federal University of Santa Maria
Santa Maria, Brazil
email: paulocomassetto@gmail.com, lucasteixeira@mail.ufsm.br,
cristianmuller.50@gmail.com, herrmann@mail.ufsm.br, leandrozaf@gmail.com,
josue.freitas@mail.ufsm.br, gfd@mail.ufsm.br, batista@inf.ufsm.br
ABSTRACT
This paper describes an implementation in hardware
of Internet Protocol version 4. Routing and addressing
features were integrated with Network Interfaces and
synthesized to a Stratix II FPGA device. Our work
showed two implementations of a full duplex Internet
Protocol version 4. The first implementation consists
in a Reference design and the second uses the same
design but with more buffer space. We present the
advantages and disadvantages of each implementation
and also compare in terms of throughput, frame loss
rate and power dissipation. The implementation with
more buffer space presents a better performance in
frame loss rate but it dissipates more power than the
Reference design. Both implementations presented
similar results for throughput tests.
Fig. 1.
1.
INTRODUCTION
has been taken into account.

First, a Reference design has been developed. It
works as a gateway between networks, performing
routing and addressing functions of data packets.
Then, a second design with increased buffer area
(Buffer Increased Design) has been derived from the
first and their performance has been compared.
The next section of this work presents the reference
IPv4 hardware core and its main features. Next, the
full system that was built in the FPGA device and the
modifications in the reference IP-core are described
on section 3. The tests that were used to verify and
quantify the performance of the designs are described
on section 4, while these results are discussed in the
section 5. Finally, conclusions are shown in section 6.
The development of a network protocol in hardware

brings numerous benefits for the performance of any
network. The latency, waiting time in the
communication between two computers (hosts), is one
of the main issues in the Internet [1]. The decrease of
latency and increase of data throughput in switches
and
routers
can
be
intrinsic
reached
by implementing the Internet Protocol (IP) in
hardware.
Companies like Intel and CISCO have devoted
great efforts in communication networks. Currently,
these companies have marketed some communication
protocols implemented in ASIC. Intel has already
developed applications that join Gigabit Ethernet and
PHY (Physical Layer) in a single integrated circuit
[2] and other solutions in network processors [3]. On
the other hand CISCO shows up with G8000 Packet
Processor Family that supports Gigabit Ethernet and
10 Gigabit Ethernet standards, beyond working with
IPv4 and IPv6 (Internet Protocol version 6).
In this context, the goal of this work is the
exploration of the design space of an IPv4 hardware
module, where MAC buffer size is the variable which
978-1-4244-6311-4/10/$26.00 2010 IEEE
Block diagram of the complete system
2. IPV4 DEVELOPMENT
The IP protocol is responsible for sending and
receiving data packets through the Internet and it is
described in RFC 791 [4]. This protocol is not entirely
reliable, but is a fast mechanism to data transfer. The
version 4 of this protocol was chosen because it is the
most used worldwide and gives a more area-effective
159
Table 3. Power Dissipation in Reference design

Block Type
Total Power (mW)
Routing Dynamic Power (mW)
Reference design Buffer Increased design Reference design Buffer Increased design
PLL
13.41
13.41
0.00
0.00
I/O
38.43
38.36
0.52
0.55
Dedicated memory
62.30
432.74
4.37
18.94
Combinational cell
55.68
65.17
10.00
10.02
Register cell
127.23
193.91
65.55
122.36
design compared to the more recent IPv6 protocol.

The developed IP-core, whose architecture is
shown in the shaded area of Fig. 1, is divided in three
main blocks, each one is responsible for a single task.
Blocks Receiver and Sender are responsible to
evaluate and forward, with necessary modifications,
the received datagram.
The ARP (Address Resolution Protocol) is intrinsic
to the addressing function. This is a partial
implementation that performs address resolution of
known hosts. The ARP protocol is responsible for
locating and storing the MAC (Media Access
Control) address of each host in the network. A static
ARP table has been used due to schedule constraints,
since a dynamic one would require more effort (thus
time), besides not causing impact (nor positive,
neither negative) on the measurements performed
here.
3.
internal FIFOs.
The Hardware layer consists in an ASIC (High
Performance
Triple-Speed
Marvell
88E1111
10/100/1000 Ethernet PHY [7]), available in an
expansion board connected to the development board
which contains the FPGA device. Two daughter
boards were used in this implementation. Each one
contains one network interface and a PHY.
The HDL (Hardware Description Language) used
in the IP-core codification was SystemVerilog. After
verified all functional requirements of the design a
few more blocks were codified to allow the
prototipation in a development board containing an
FPGA device and network interfaces.
The full system that was build to test the Internet
Layer in real application is shown in Fig. 1. Drivers
where coded to adapt the MAC interfaces with
Avalon Streaming [8] communication protocol to
IPv4 interfaces with AMBA AXI [9] communication
protocol. AMBA AXI protocol is used at IPv4 block
because it is simple to implement and allows high
frequency operation. Each input driver contains a
buffer responsible for storing the frame incoming
from the MAC. This buffers can store only one frame,
so that the received frame is stored and just the
Internet Protocol datagram is sent to IPv4-core. The
MAC header is discarded by the driver. Analogous to
the input drivers, the output ones have also buffers
with the same storage capacity and build a frame
header that is sent to the network layer.
A development kit named Nios II Development
Board Stratix II Edition containing a Stratix II
EP2S60-F672C3N FPGA device was used. Both
logic synthesis and power dissipation estimation were
performed using Altera Quartus II 9.0 tool. Synthesis
results are shown in Table 1, while power dissipation
estimates are depicted in Table 2.
FPGA PROTOTIPATION
The developed IP-core was implemented in a Stratix

II EP2S60-F672C3N FPGA device and some tests
were performed. The main functions performed in
this implementation are routing and addressing.
3.1. Reference Design
The communication stack can be described as a 5
layer engine, 5 Conceptual Layers [5]: Application
(upper), Transport, Internet, Network Interface and
Hardware (lower). The prototipation of the design in
an FPGA development board device includes the three
lower communication layers, Internet, Network
Interface and Hardware, from this conceptual stack.
The IP is responsible for the Internet layer. Any
datagram arriving in this layer, if forwarded, should
have its data field intact. IP is able just to modify its
own header values during routing process.
The layer designed in this work was the Internet,
while the other two were obtained, from Altera and a
commercial ASIC.
The Network Interface layer is Alteras Triple
Speed Ethernet MegaCore version 9.0 [6], it was
generated using 10/100/1000 Ethernet MAC Core
Variation with MII/GMII interfaces including
3.2. Buffer Increased design

We proprosed a buffer size increase in the
Network interface layer. Receiver and Sender FIFOs
were
set
64KB
and
32KB size, respectively. This modification aims to
store more frames to decrease the frame loss.
160
Table 2. Total Logic Blocks Used In FPGA Device

Reference
Buffer Increased
Type
design
design
Combinational
9% (4,313)
10% (4,874)
ALUTs
Dedicated logic
9% (4,187)
11% (5,133)
registers
Total block
3% (77,392) 77% (1,956,576)
memory bits
Fig. 2. Maximum throughput reached with each frame

size.
only the minimum time of the Inter Frame GAP (IFG)

[11], which is 96 ns (12 clock cycles of 8 ns).
The Throughput test was carried out by sending a
determined number of frames at a specific rate to the
DUT and then counting the frames that are
successfully transmitted by it. In case of frame loss
(less frames received back from the DUT than sent to
it), stream rate is decreased and the test repeated.
4. PERFORMANCE MEASUREMENTS
4.3. Frame Loss Rate Test
In order to compare the performances of both

implementations, some tests were performed. Such
tests were based on RFC 2544 [10], which discusses
and defines a number of tests that may be used to
describe the performance characteristics of a network
interconnecting device.
In a test, similar to the one previously described, the

frame loss rate was evaluated. This test begins at
100% frame rate by sending a pre-determined number
of frames and reporting the percentage of lost frames.
The frame rate was reduced 5% each step reporting
for this frame rate the percentage of lost frames.
4.1. Test Environment
5. RESULTS AND EVALUATION
The tester was implemented in a development board

containing a Xilinx Virtex 4 XC4VSX35-10FF668
FPGA device. It was in charge of sending a burst of
frames to the Device Under Test (DUT) and then
receiving DUT answers.
Tests were performed using frame having 64, 128,
256, 512, 1024, 1280 and 1518 bytes. These are the
standard sizes for Ethernet testing proposed in the
respective RFC.
Frames were injected in bursts of five seconds with
pauses of three seconds between each burst. This test
was repeated five times and the average values were
computed.
Only one Gigabit Ethernet interface was used
during test sessions since the same path is traversed
by any datagram handled by the IPv4 hardware core,
regardless the source and destination network
interface. Injected frames were composed of UDP
frames on IP data field, thus they were sent to the IPcore and routed by it.
Starting the comparison between the two distinct

implementations from an utilization and power
consumption point of view, the Buffer Increased
design, as expected, has increased requirements for
both aspects when compared with the original design.
As shown in Table 2, although logic requirements are
almost identical for both designs, memory block
requirements increased from 3% to 77%. Similarly,
according to Table 3, estimated power consumption
increased from 297.05mW to 743.59mW.Although
presenting drawbacks regarding power and area
requirements, performance results are considerably
better in the Buffer Increased design when it comes to
frame loss rate.
Looking at Fig. 3 and Fig. 4., one can notice that
both designs present similar results up to a 40% input
frame rate. However, from this point onwards, frame
loss rate is not only significantly smaller in the Buffer
Increased design, but also presents a regular frame
loss pattern. Taking into account input rates over
45%, frame loss rate in the modified design is almost
14 times smaller in the best case (50% input rate and
1024-byte frame size), 1.2 times smaller in the worst
case (85% input rate and 64-byte frame size) and
around 3 times smaller in average.
Results concerning throughput are shown in Fig. 2.
As mentioned in section 4, they refer to cases where
the frame is successfully handled by the DUT (not
4.2. Throughput Test

The Throughput test determines the maximum frame
rate without the DUT losing any frame. A predetermined number of frames was sent for a full
(100%) frame rate. In full frame rate the maximum
throughput for a Gigabit Ethernet Interface is
reached.This means that between two frames there is
161
Fig. 3. Frame Loss Rate Results for the

Reference design.
Fig. 4. Frame Loss Rate Results for the Buffer

Increased design.
lost). One can notice that results are similar in most

cases. In a particular case, however, the Reference
design presented a throughput almost 2 times higher
(for a 512-byte frame).
http://download.intel.com/design/network/datashts/825
44ei.pdf.
6. CONCLUSIONS
This work presented implementation and
performance results for a reference and a modified
(with more buffer area) design of a hardware IPv4
block featuring a full communication stack..The
modified design consumes 2.5 times more power and
around 26 times more area (mainly memory blocks)
than the original one.
However, frame loss rate shown by then modified
design proved to be significantly lower in some
circumstances, ranging from around 14 times to 1.2
times lower in the best and worst cases, for input rates
over 50%.
Throughput, on the other hand, is considerably
similar for both designs in most cases, but diverges
(for the best and the worst) in some cases. It is
believed that in issue in the memory management
block is preventing the modified design from
presenting better throughput results than the original
one. This issue is under investigation and better
throughput performance is expected on the modified
design for a near future.
[4]
IEEE. Defense Advanced Research Projects Agency

Request for Comment, Virginia, E.U.A, 1981.
Available:
http://www.ietf.org/rfc/rfc0791.txt?number=791.
[5]
D. E. Comer. The TCP/IP 5-Layer Reference Model

in Internetworking with TCP/IP, U.S. 4th Edition,
vol. 1, pp. 183-185. ISBN 0-13-018380-6.
[6]
Altera. Triple Speed Ethernet Megacore Function.

Available:
http://www.altera.com/products/ip/iup/ethernet/m-altethernet-mac.html.
[7]
Ethernet PHY Daughter Board. 10/100/1000 PHY

Daughter
Board
88E1111.
Available:
http://www.morethanip.com/boards_10_100_1000_88
E1111.htm.
[8]
Altera. Avalon Interface Specification. Available:

http://www.altera.com/literature/manual/mnl_avalon_s
pec.pdf
[9]
ARM. AMBA AXI Protocol Specification. Available:

http://infocenter.arm.com/help/index.jsp?topic=/com.a
rm.doc.ihi0022b/index.html
Working
Group.
Benchmarking
Methodology for Network Interconnect Devices,
Harvard,
E.U.A.,
1999.
Available:
http://www.ietf.org/rfc/rfc2544.txt.
[11] IEEE. IEEE 802.3 LAN/MAN Carrier sense multiple
[1]
BORELLA, M.S.; SEARS, A.; JACKO, J.A. The

effects of Internet latency on user perception of
information
content
IEEE
Global
Telecommunications, 1997.
[2]
Intel. 82544EI
Datasheet.
Ethernet
Intel. Intel Network Processors. Available:

http://www.intel.com/design/network/products/npfami
ly/index.htm?iid=ncdcnav2+proc_netproc.
[10] Network
REFERENCES
Gigabit
[3]
access with Collision Detection (CSMA/CD) Access

Method and Physical Layer Specifications. 2008.
Available
at:
http://standards.ieee.org/getieee802/download/802.32008_section1.pdf
Controller
Available:
162

A Full Duplex Implementation of Internet Protocol Version 4

Diunggah oleh

Informasi Dokumen

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

A Full Duplex Implementation of Internet Protocol Version 4

Diunggah oleh

Hak Cipta:

Format Tersedia

A FULL DUPLEX IMPLEMENTATION OF INTERNET PROTOCOL VERSION 4

has been taken into account.

The development of a network protocol in hardware

978-1-4244-6311-4/10/$26.00 2010 IEEE

Block diagram of the complete system

Table 3. Power Dissipation in Reference design

Total Power (mW)

Routing Dynamic Power (mW)

design compared to the more recent IPv6 protocol.

The developed IP-core was implemented in a Stratix

3.2. Buffer Increased design

Table 2. Total Logic Blocks Used In FPGA Device

Fig. 2. Maximum throughput reached with each frame

only the minimum time of the Inter Frame GAP (IFG)

In order to compare the performances of both

In a test, similar to the one previously described, the

4.1. Test Environment

5. RESULTS AND EVALUATION

The tester was implemented in a development board

Starting the comparison between the two distinct

4.2. Throughput Test

Fig. 3. Frame Loss Rate Results for the

Fig. 4. Frame Loss Rate Results for the Buffer

lost). One can notice that results are similar in most

IEEE. Defense Advanced Research Projects Agency

D. E. Comer. The TCP/IP 5-Layer Reference Model

Altera. Triple Speed Ethernet Megacore Function.

Ethernet PHY Daughter Board. 10/100/1000 PHY

Altera. Avalon Interface Specification. Available:

ARM. AMBA AXI Protocol Specification. Available:

[11] IEEE. IEEE 802.3 LAN/MAN Carrier sense multiple

BORELLA, M.S.; SEARS, A.; JACKO, J.A. The

Intel. Intel Network Processors. Available:

access with Collision Detection (CSMA/CD) Access

Anda mungkin juga menyukai