Zhonghai Lu, Yuan Yao Dynamic Traffic Regulation in NoC-Based Systems IEEE
Transactions on Very Large Scale Integration (VLSI) Systems (Volume: 25, Issue: 2, Feb.
2017 ) Page(s): 556 - 569
They proposed a dynamic traffic regulation to improve the system performance for NoC-
based multi/many-processor systems-on-chip (MPSoC) and chip multi/many-core
processor (CMP) designs. It can be applied to MPSoCs for intellectual property
integration in an open-loop fashion by injecting traffic according to its run-time profiled
characteristics. It can also be applied to CMPs in a closed-loop fashion by admitting
traffic fully adaptive to the traffic and network states. Through extensive experiments and
results, we show that both the open-loop and closed-loop dynamic regulation techniques
can significantly improve the network and system performance.
1) we propose a flit serialization (FS) method to efficiently utilize partially faulty links.
The FS approach divides the links into a number of equal width sections, and serializes
sections of adjacent flits to transmit them on all fault-free link sections to mitigate the
unbalance between the flit size and the actual link bandwidth;
2) we propose the link augmentation with one redundant section as a low cost mechanism
to mitigate the FS drawback that a links available bandwidth is reduced even if it contains
only one faulty wire
3) we deactivate HD links when their fault level exceed a certain threshold to diminish
congestion caused by HD links. The optimal threshold is derived by comparing the zero
load packet transmission latency on the HD links and that on the shortest alternative path.
Our proposal is evaluated with synthetic traffic and PARSEC benchmarks. Experimental
results indicate that the FS method can achieve lower area*power/saturation_throughput
value than all state of the art link fault tolerant strategies.
With a redundant section in each link, the NoC saturation throughput can be largely
improved than just utilizing FS, e.g., 18% when 10% of the NoC wires are broken.
Simulation results we obtained at various wire broken rate configurations indicate that we
achieve the highest saturation throughput if 4- or 8-section links with a flit transmission
latency longer than four cycles are deactivated.
5. Michael Opoku Agyeman ; Quoc-Tuan Vien ; Gary Hill ; Scott Turner ; Terrence Mak
An Efficient Channel Model for Evaluating Wireless NoC Architectures Computer
Architecture and High Performance Computing Workshops (SBAC-PADW), 2016
International Symposium on 26-28 Oct. 2016
They proposed channel model demonstrates that total path loss of the wireless channel in
WiNoCs suffers from not only dielectric propagation loss (DPL) but also molecular
absorption attenuation (MAA) which reduces the reliability of the system.
6. Michael Opoku Agyeman ; Wen Zong An Efficient 2D Router Architecture for
Extending the Performance of Inhomogeneous 3D NoC-Based Multi-Core
Architectures Computer Architecture and High Performance Computing Workshops
(SBAC-PADW), 2016 International Symposium on 26-28 Oct. 2016
In this paper, they proposed a low-latency adaptive router with a low-complexity single-
cycle bypassing mechanism to alleviate the performance degradation due to the slow 2D
routers in inhomogeneous 3D NoCs. By combining the low-complexity bypassing
technique with adaptive routing, the proposed router is able to balance the traffic in the
network to reduce the average packet latency under various traffic loads. Simulation shows
that, the proposed router can reduce the average packet delay by an average of 45% in 3D
NoCs.
7.Gwangsun Kim, Michael Mihn-Jong Lee, John Kim, Member, IEEE, Jae W. Lee, Dennis
Abts, and Michael Marty Low-Overhead Network-on-Chip Support for Location-Oblivious
Task Placement IEEE TRANSACTIONS ON COMPUTERS, VOL. 63, NO. 6, JUNE 2014.
Many-core processors will have many processing cores with a network-on-chip (NoC) that
provides access to shared resources such as main memory and on-chip caches. However,
locally-fair arbitration in multi-stageNoCcan lead to globally unfair access
to shared resources and impact system-level performance depending on where each task is
physically placed. In this work, we propose an arbitration to provide equality-of-service
(EoS) in the network and provide support for location-oblivious task placement.Wepropose
using probabilistic arbitration combined with distance-based weights to achieve EoS and
overcome the limitation of round-robin arbiter.
However, the complexity of probabilistic arbitration results in high area and long latency
which negatively impacts performance. In order to reduce the hardware complexity, we
propose an hybrid arbiter that switches between a simple arbiter at low load and a complex
arbiter at high load. The hybrid arbiter is enabled by the observation that arbitration only
impacts the overall performance and global fairness a high load.Weevaluate our arbitration
scheme with synthetic traffic patterns and GPGPUbenchmarks. Our results shows that hybrid
arbiter that combines round-robin arbiter with probabilistic distance-based arbitration reduces
performance variation as task placement is varied and also improves average IPC.
8.Junwen Luo, Graeme Coapes, Terrence Mak, Tadashi Yamazaki, Chung Tin, an Patrick
Degenaar Real-Time Simulation of Passage-of-Time Encoding in Cerebellum Using a
Scalable FPGA-Based System IEEE TRANSACTIONS ON BIOMEDICAL CIRCUITS
AND SYSTEMS, VOL. 10, NO. 3, JUNE 2016.
The cerebellum plays a critical role for sensorimotor control and learning. However,
dysmetria or delays in movements onsets consequent to damages in cerebellum cannot be
cured completely at the moment. Neuroprosthesis is an emerging technology that can
potentially substitute such motor control module in the brain. A pre-requisite for this to
become practical is the capability to simulate the cerebellum model in real-time, with low
timing distortion for proper interfacing with the biological system. In this paper, we present a
frame-based network-on-chip (NoC) hardware architecture for implementing a bio-realistic
cerebellum model with neurons, which has been used for studying timing control or passage-
of-time (POT) encoding mediated by the cerebellum. The simulation results verify that our
implementation reproduces the POT representation by the cerebellum properly. Furthermore,
our field-programmable gate array (FPGA)-based system demonstrates excellent
computational speed that it can complete 1sec real world activities within 25.6 ms. It is also
highly scalable such that it can maintain approximately the same computational speed even if
the neuron number increases by one order of magnitude. Our design is shown to outperform
three alternative approaches previously used for implementing spiking neural network model.
Finally, we show a hardware electronic setup and illustrate how the silicon cerebellum can be
adapted as a potential neuroprosthetic platform for future biological or clinical
application.
In photonic integrated networks on chip (NoCs), microrings are commonly used for adding or
dropping a single optical signal to be switched in the NoC. This paper
demonstrates the feasibility of adding or dropping two optical signals at the same wavelength
in the same microring of NoCs with bus and ring topology. More specifically, the same
microring can be used to support simultaneous bidirectional transmissions of two signals to
be coupled in the NoC topology, leading to two different configurations, called shared source-
microring and shared destination-microring. Spectral characterization shows good agreement
between simulations and measurements taken on silicon-based integrated NoC. Bit-error-rate
(BER) measurements indicate that the shared sourcemicroring configuration performs better,
achieving a penalty as low as 1.5 dB for a BER of 10_9 at 10 Gb/s in the bus NoC. A higher
penalty in the ring NoC for both configurations is due to higher crosstalk in the
interconnecting ring.
The general-purpose networks-on-chip (GP-NoC) has recently attracted the attention of the
research and industry as a way to support the growing demands of computing systems. The
design and the development of the communications and networking functions for such a
large-scale versatile systems require knowledge of the traf_c exchanged between the
computing nodes. The object of the study in this paper is the
last-level shared cache interface that is likely to be a traf_c bottleneck in future GP-NoC
architectures. First, using the direct measurements, we report on the stochastic traf_c
properties at large-scales, provide _rst two moments and distribution functions.
Complementing measurements with _ne-grained cycle-accurate CPU simulations, we then
analyze the small-scale traf_c behavior.We show that even for the simplest applications such
as reading or writing of data, the nature of the traf_c is stochastic, depends on the number of
active cores, and irrespective of the application type, has an explicit batch structure.We
further reveal that the batch sizes and inter-batch intervals can be well approximated by
geometric distribution and the approximation becomes better when the number of active cores
increases. These properties identify a simple arrival model that can be used in the analytical
or simulation-based performance evaluation studies of the shared interface technologies in
prospective NoCs.
13. Emmanuel Abbe and Emre Telatar Polar Codes for the m-User
Multiple Access Channel IEEE TRANSACTIONS ON INFORMATION THEORY, VOL.
, NO. , MONTH YEAR.
In this paper, polar codes for the m-user multiple access channel (MAC) with binary inputs
are constructed. It is shown that Arkans polarization technique applied individually to each
user transforms independent uses of an m-user binary
input MAC into successive uses of extremal MACs. This transformation
has a number of desirable properties: (i) the uniform sum rate of the original MAC is
preserved, (ii) the extremal MACs have uniform rate regions that are not only polymatroids
but matroids and thus (iii) their uniform sum rate can be reached
by each user transmitting either uncoded or fixed bits; in this sense they are easy to
communicate over. A polar code can then be constructed with an encoding and decoding
complexity of O(n log n) (where n is the block length), a block error probability
of o(exp(n1=2")), and capable of achieving the uniform sum rate of any binary input
MAC with arbitrary many users. Applications of this polar code construction to channels with
a finite field input alphabet and to the AWGN channel are also
discussed.
14. Michael Opoku Agyeman, Member, IEEE, Quoc-Tuan Vien, Member, IEEE,
Ali Ahmadinia, Member, IEEE, Alexandre Yakovlev, Senior Member, IEEE,
Kin-Fai Tong, Member, IEEE, and Terrence Mak, Member, IEEE A Resilient 2-D
Waveguide Communication Fabric for Hybrid Wired-Wireless NoC Design
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL.
28, NO. 2, FEBRUARY 2017.
15.Gwangsun Kim, Michael Mihn-Jong Lee, John Kim, Member, IEEE, Jae W. Lee, Dennis
Abts, and Michael Marty An Efficient Application Mapping Approach for
the Co-Optimization of Reliability, Energy, and Performance in Reconfigurable
NoC Architectures IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF
INTEGRATED CIRCUITS AND SYSTEMS, VOL. 34, NO. 8, AUGUST 2015.
In this paper, an efficient application mapping approach is proposed for the co-optimization
of reliability, communication energy, and performance (CoREP) in networkon- chip (NoC)-
based reconfigurable architectures. A cost model
for the CoREP is developed to evaluate the overall cost of a mapping. In this model,
communication energy and latency (as a measure of performance) are first considered in
energy latency product (ELP), and then ELP is co-optimized with reliability
by a weight parameter that defines the optimization priority. Both transient and intermittent
errors in NoC are modeled in CoREP. Based on CoREP, a mapping approach, referred to as
priority and ratio oriented branch and bound (PRBB),
is proposed to derive the best mapping by enumerating all the candidate mappings organized
in a search tree. Two techniques, branch node priority recognition and partial cost ratio
utilization, are adopted to improve the search efficiency. Experimental results show that the
proposed approach achieves significant improvements in reliability, energy, and performance.
Compared with the state-of-the-art methods in the same scope, the proposed approach has the
following distinctive advantages: 1) CoREP is highly flexible to address various NoC
topologies and routing algorithms while others are limited to some specific topologies and/or
routing algorithms; 2) general quantitative evaluation for reliability, energy, and performance
are made, respectively, before being integrated into unified cost model in general context
while other similar models only touch upon two of them; and3) CoREP-based PRBB attains a
competitive processing speed which is faster than other mapping approaches.
16. Gwangsun Kim, Michael Mihn-Jong Lee, John Kim, Member, IEEE, Jae W. Lee, Dennis
Abts, and Michael Marty Mapping of Irregular IP onto NoC Architecture with
Optimal Energy Consumption Received August 2, 2016, accepted August 29, 2016, date of
publication October 4, 2016, date of current version October 31, 2016.
Network on chip (NoC) architectures have been proposed to resolve complex on-chip
communication problems. An NoC-based mapping algorithm is shown in this paper. It can
map irregular intellectual properties (IPs) cores onto regular tile 2-D mesh NoC architectures.
The basic idea is to decompose a large IP into several dummy IPs or integrate several small
IPs into one dummy IP, such that each dummy IP can fit into a single tile. It can also allocate
buffer space according to the input/output degree and avoid connection congestion by
adapting communication density. Experimental data indicate that using the algorithm
proposed in this paper, the communication energy can be reduced about 7%. Key words:
network on chip (NoC); communication matrix; router weight; communication density.
17.Michael Opoku Agyeman, Member, IEEE, Quoc-Tuan Vien, Member, IEEE, Ali
Ahmadinia, Member, IEEE, Alexandre Yakovlev, Senior Member, IEEE, Kin-Fai Tong,
Member, IEEE, and Terrence Mak, Member, IEEE A Resilient 2-D Waveguide
Communication Fabric for Hybrid Wired-Wireless NoC Design IEEE TRANSACTIONS
ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 28, NO. 2, FEBRUARY 2017.
18.Ruilian Xie, Jueping Cai and Xin Xin Simple fault-tolerant method to balance
load in network-on-chip Received August 2, 2016, accepted August 29, 2016, date of
publication October 4, 2016, date of current version October 31, 2016.
20.Chen Wu, Chenchen Deng, Leibo Liu, Jie Han, Jiqiang Chen, Shouyi Yin, and Shaojun
Wei An Efficient Application Mapping Approach for the Co-Optimization of Reliability,
Energy, and Performance in Reconfigurable NoC Architectures
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS
AND SYSTEMS, VOL. 34, NO. 8, AUGUST 2015.
In this paper, an efficient application mapping approach is proposed for the co-optimization
of reliability, communication energy, and performance (CoREP) in networkon- chip (NoC)-
based reconfigurable architectures. A cost model
for the CoREP is developed to evaluate the overall cost of a mapping. In this model,
communication energy and latency (as a measure of performance) are first considered in
energy latency product (ELP), and then ELP is co-optimized with reliability
by a weight parameter that defines the optimization priority. Both transient and intermittent
errors in NoC are modeled in CoREP. Based on CoREP, a mapping approach, referred to as
priority and ratio oriented branch and bound (PRBB),
is proposed to derive the best mapping by enumerating all the candidate mappings organized
in a search tree. Two techniques, branch node priority recognition and partial cost ratio
utilization, are adopted to improve the search efficiency. Experimental results show that the
proposed approach achieves significant improvements
in reliability, energy, and performance. Compared with the state-of-the-art methods in the
same scope, the proposed approach has the following distinctive advantages: 1) CoREP is
highly flexible to address various NoC topologies and routing
algorithms while others are limited to some specific topologies and/or routing algorithms; 2)
general quantitative evaluation for reliability, energy, and performance are made,
respectively, before being integrated into unified cost model in general context while other
similar models only touch upon two of them; and
3) CoREP-based PRBB attains a competitive processing speed, which is faster than other
mapping approaches.