Author
Yogesh Mittal yogesh.mittal@transwitch.com
ABSTRACT
Stimulus
Generator Response
Checker
Device
RTL
A testbench needs to have additional flexibility if it has to drive high level functional models of
the SoC design. The architecture of the testbench should support working with transactions at the
same levels of abstraction as is used to model the SoC design. With such a testbench, a golden
test suite can then be defined to ensure the equivalence of the various models at different levels
of abstraction during the SoC development cycle.
There are various challenges required to get the testbench to work with different levels of
abstraction of the high level scenarios. The scenarios needed in the form of cycle-based
transactions to drive the architecture model has to be mapped onto functional signals to connect
to the RTL world in case of RTL verification
– Synthesizable Testbench (STB) Approach: The design of such testbench can be time
consuming. It might need additional resources and in some case these testbenches can be
hard to debug. In addition these testbench cannot take advantage of advanced testbench
techniques, such as constrained randomization if the complete testbench is made
synthesizable. It also creates a lot of duplicated work between the teams. Moreover,
emulation equipment is very expensive.
Block Envs
SubEnvs
Device
Modeling
Device
Toplevel Env Emulation/
TBA
Device
APIs
Emulation /TBA
Virtual Platform RTL Platform Platform
Throughout this process, functional coverage is monitor at different abstraction layers. This
methodology supports a top-down approach to building a verification environment. The approach
allows a team to build a complete verification environment early in the development process,
even before any RTL code has been developed. The environment then becomes the "golden
reference" for verifying additional verification components and the RTL design. If the
architecture model is cycle-accurate, it is easy to add in the RTL when it is ready.
To perform all these tasks effectively, powerful features such as constrained-random stimulus
generation, coverage metrics and assertions already required for the RTL verification, combined
with an object-oriented programming style, provide a highly effective environment to implement
the higher layers of the testbench as well. Users can also verify the architecture models as well
the virtual platform models, ensuring that all models stay mutually consistent during the SoC
development process. The following figure shows how the Command layer and the signal layer
could be replaced in the above layout by a Transaction level Model.
The transaction level model [TLM] as shown in Figure 6 can be replaced by the RTL at the
appropriate time. The next sections would describe the details of the essential components of
such a system.
o The requirement was to create an untimed „C‟ model of packet processor quickly
enough to validate some critical algorithms as well as to provide feedback to the
detailed RTL design later.
o It also helped us in analyzing device behavior under complex network scenarios
– Device critical parameter characterization
o For validating performance critical features like latencies, bandwidth, buffer sizes etc
a detailed architecture model is created as a first step. A cycle accurate timed model is
then created based on the first step. This model is used to ensure that the right
architectural trade-offs are made, such as decisions on the bus infrastructure, buffer
sizing and so forth, before committing to the RTL implementation phase
– Model as RTL Checker
o Characterized Device model also served as checker later in the RTL verification cycle
as shown in following figure.
DPI
Coverage
DUV
[Model]
DPI
Testcase
Monitor
Driver
Testbench DUV
[RTL]
Scenario Gen
Transactor
SV Interface
DPI
SW Interface
Device Model
Txc_Egress_PktEditTask(in) {
..........
//Call to the exported function
sv_return(pkt->u.base.pduPtr,pkt->u.base.fragSize+12);
Sync
time
TBA Method
Communication
Channel
HW
SW Multi
VCS U U B
Multi Client D (F
Client S S
Client Handler
Socket B B M
Handler
Intf
Test case developed in SystemVerilog includes only the configuration of the device VIP,
configuration of the simulation environment and initialization of the sockets. The test case is run
in the native RTL simulation environment, where the simulation control resides with the VCS
simulator. After completing the initialization of the server component, the testbench waits for
socket connections to be established by the client residing in the “Server Host”. The “Server
Host” can be same host machine in which the simulation is running or it can be different machine
to balance the load. Once the connection is established, the „server host‟ can initiate the
transactions towards the device through the physical channel (USB in our case).
The overall HW-SW partitioning is done to maximize performance and reuse. The HW side
partition is driven by clocks while the SW side is transaction-based and less clock dependent.
The SW side could still scarcely use timed constructs such as waiting on time to allow the SW
side and HW side to synchronize and exchange transactions.
The packet generated through „scenario gen‟ is encapsulated in „messages‟ going through
physical media. These „messages‟ are decoded by the hardware transactor residing on the
prototype board and is converted into the relevant protocol by the BFM. The message encoding
and decoding is outside the purview of this document.
The following figure presents a simplified view of SW/HW testbench partitioning
DUV
Stimulus
Generator USB HW
USB Transactor
SW
Transactor
Rx Tx
FIFO FIFO
Buffering Mechanism:
In order to minimize the HW-SW interactions on cycle by cycle basis, FIFOs are provisioned to
store the stimulus before it is read out by the DUT clock/clocks. In case of reactive transactor the
communication between HW and SW need to be established before any further stimulus is
applied. It can considerably slow down the simulation speed. The HW-SW interaction required in
this case can be decoupled through these FIFOs. A programmable threshold is maintained in the
FIFO and further transactions are fetched from the SW as soon as FIFO depth falls below the
configured threshold. Similarly transactions are sent to the SW side when the FIFO goes above
the configured threshold.
Uncontrolled Clock
Server SW HW Host
VCS Client Host TX TX BFM DUV
Host
USB
Machine
ETH
Figure 144: Emulation setup with synthesizable TB mapped to HAPS FPGA board
5 Results
Based on our experiments with transaction based hardware accelerated testbench we observed
speed improvements in the range of 10 X but we took a simplified approach in the first phase to
establish the proof of concept. We had some limitations in our USB transactor which can be
enhanced to provide further speed gain
In the next phase we are planning to use HAL(Hardware Abstraction Layer) APIs for better
control over HW-SW partition along with writing synthesizable transactor. We also need to
assess the multi clock handling capability of the platform
7 Acknowledgements
I would like to thank Dinesh for his immense contribution in the development of this project. A
special thanks to Amit Sharma from Synopsys and Parag Goel from Transwitch for helping me
out in writing this paper. Last but not least, thanks to our management, in special to Santanu for
the encouragement and practical support.