FPGA Based Artificial Neural Network

A Collective Study on Recent Implementations of
FPGA-based Artificial Neural Network
Abstract—This paper explores some recent applications of of the activation function [4]–[5]. Activation function is the
the FPGA-based ANNs (Field Programmable Gate Arrays- most important, expensive, and difficult part in any hardware
based Artificial Neural Networks). ANN is a research field and implementation of ANN. It is popularly represented by
a tool in many area of the data processing system. In an sigmoid function, which is applied to the output of the linear
scholarly manner, it gives a description of organism brain combination input signals. It computes the weighted sum
(biological nervous system) based on mathematical models and then it adds direction and decision whether to active a
nonlinear processing units. Even though ANN can be specific neuron or not. Recently, Memristive crossbar arrays
programmed using software, the hardware implementation can
are a good model for the weights [6], and ANN analog
give more accurate modeling for ANN, which reveals the
computation can be achieved using Memristor as in [7].
inherent parallelism embedded in ANN dynamics. FPGA is one
of the hardware implementations of ANN that provides Using VLSI CMOS technology can offer nonlinearity
reconfigurable programmable electronic platform and offers characteristic on the cost of the inaccurate computation,
robust flexibility. Based on this study, this paper covers thermal drift, lack of flexibility (non-reconfigurable and non-
different applications (biomedicine, robotics, neuromorphic reprogrammable features), and limited scaling. ASIC can
computation, analog circuit simulators…) besides presenting realize compact parallel architecture for ANN and provide
their merits and demerits during their process. high-speed performance. However, it is expensive, non-
accurate computation and lack of flexible design.
Keywords—Artificial Neural Network, Computing, FPGA,
Neuromorphic, Parallelism, Resistive Switching.
Microprocessors and DSPs do not support parallel designs.
To support parallelism, FPGA is used which is a good
candidate for reconfiguration and flexibility in designing
ANN. It helps to have repeated iterative learning information
processing with modified weights and reconfigurable
I. INTRODUCTION structure. In addition, it has density that is more compact
An Artificial Neural Network (ANN) -biologically with lower cost and lower time cycle. FPGA-based ANN can
inspired nonparametric and nonlinear artificial intelligence offer a good modularity and dynamic adaptation for neural
algorithm- performs processing and learning abilities, computation. The main important feature of FPGA is parallel
training, classification, simulation, pattern recognition, computing which is in a good match with architecture of
prediction, recalling, and hybrid forecasting of information ANN [8]–[9]. Generally, in any parallel computing, one
based on the biological nervous system to model a complex challenging point appears as the best way to map
system implemented as a parallelism and distributed network applications onto HW. More specifically in FPGA, the basic
of simple nonlinear processing units [1]–[3]. computing cells have rigid interconnection structures and for
large size ANN, FPGA needs more HW resources (increase
Hardware (HW) implementation of ANNs is more the number of used units such as multipliers and activation
preferable software counterparts due to the executed time functions).
and performed pattern. The later consumes more time to
simulate ANN (speed constrains) as its size becomes large There are several types of ANN for various problems in
and executes ANN in sequential pattern (series). By the literature and the structure of an ANN can be divided into
implementing ANN on a circuit level, this allows for higher two phases. The first phase is the learning process (training
speed computation and parallel executed pattern. ANN process or off-line phase), in which ANN learns the data set
consists of input/hidden/output layers (analogue to neurons obtained from the system model. It includes diverse
~1011) and weighted connections among them (analogue to optimization algorithms to decide the values of the weights
synapses ~1015), the former denotes the nodes, while the and the bias. The second phase is the recalling process
former indicates the weights. Any node between input and (prediction process or real-time phase), where the optimized
output layers belongs to hidden layer (it is built of artificial values of the artificial neurons are verified by using the test
neurons), where nonlinear mapping between input nodes and dataset (test dataset differ than training dataset in which
output nodes is done. At the analog circuit level, sensors ANN may encounter different dataset than in the training
convert the inputs to suitable form for the processing, and dataset).
then converted inputs are weighted sum (linear combination) In this brief, modern miscellaneous implementations of
and multiplied by nonlinear activation function to produce ANN on FPGA are reviewed. It will be shown that different
the output. Weighted sum can be done by using either FPGA families agree with prompt prototyping of several
multiplier & summer or weighted summer opamp. Another ANN architecture and implementation approaches.
adder is used to add the output of the weighted summer with
the bias, which is used to increase or decrease the net input
The design framework of these FPGAs can be compared Literature [30] shows a comparative work on two ANN
with respect to total time response, precision, total area models; one model with a 32-bit floating point FPGA based
(CLB), maximum clock frequency, weight updates per model activated by sigmoid activation function, while the
second, and HW resource utilization. The organization of other model with a 16-bit fixed point FPGA based model
this paper is as follows. Section II surveys relevant previous activated by piecewise linear sigmoid (PLS) activation
works. Section III discusses and reports the results of the function. Each model has 1-1-1 input-hidden-output layers
different studies. Finally, Section IV concludes the paper. with eight input neurons, two hidden neurons, and one output
neuron. The first model gives 97.66% accuracy and the
II. FPGA-BASED HW IMPLEMENTATION OF ANNS second model reached 96.54% accuracy, nevertheless,
In this section, ten case studies are presented for the resource utilization of the second model was lower than the
FPGA-based ANN applications. These cases are targeted first model. The classification times are 1.07μs and 1.01μs
towards two categories: i. Applications (biomedicine, sensor for the first model and the second model, respectively.
networks, magnetism, power system, security …) and ii. An embedded system-on-chip for Atrial Fibrillation
Emulated modeling (Memristor modeling, power amplifiers, (AFIB) detection on heart screening is presented in [31]. The
chaotic circuits …). detection algorithm flow starts with pre-processing units
A. Applications based on ANN implemented on FPGA including bandpass filter cascaded with stationary wavelet
transform, then the feature extraction is done, and ends by
FPGA can be applied to widespread range of
ANN to classify the pattern of AFIB. Overall, this detection
applications, which can be a part of data acquisition system.
system achieves accuracy of 95.3% while using 40,830
In [10], [12], [24]–[25], [30]–[31], and [40], the systems are
Logic elements.
modeled using ANN, and these systems are suited for
medicine/biomedicine applications. [10] deals with the ECG Biomedical imaging techniques need to be designed
anomaly and design a system implemented on FPGA to accurately to characterize the tissue types (normal or
detect such arrhythmia. The signal records are taken from abnormal). Such a technique is intravascular optical
MIT-BIH arrhythmia database and trained using resilient coherence tomography (OCT) covered in [40] for heart
back propagation (RPROP) algorithm for Multi-Layer diseases. Feed forward ANN is used after collecting relevant
Perceptron (MLP) ANN architecture. The used activation information from the image (using feature extraction), in
functions are piecewise linearly approximated hyperbolic which ANN processes these features of the image by
tangent function (for hidden layer) and hyperbolic tangent classifier, trained to distinguish them then ANN is
function (for hidden layer and output node) which reduce implemented on FPGA platform. Operating with 180MHz,
computation time, complexity, and have simple form. the ANN process is finished with 0.7sec real time
Various data types and different number of input/hidden classification.
layers are tested to get the suitable FPGA performance. It
was shown that 12-6-2 MLP for 24-bit classifies the record Literature [27] proposes a design of FPGA-based
with accuracy of 99.82% (the best accuracy among the tested Multilayer feed forward ANN using SOC (system-on-chip)
record). design approach to make the advantage of the hardware
reuse and sharing. This obviously will reduce the on-chip
In [12], identification for human blood based on image area. The architecture of FFNN is based on hyperbolic
processing is modelled using feedforward neural network. tangent activation function and back propagation training
This is helpful when the blood sample is large while the algorithm for 16-bit representation. The activation function is
conventional method may take long time for large samples. approximated by piecewise linear function to allow
The back propagation algorithm is deployed as training implementation combinational logic-only.
process and the sigmoid function is used as activation
function. The performance of the FPGA achieves 97.5% For detection and recognition purposes, literatures [13],
accuracy for 80×80 pixel resolution. Positron Emission [15], [22]–[23], [26], [32], [35], [37]–[39], and [42] are
Tomography (PET) is a scan imaging technique to detect and modeled using different ANN topologies and each targeted
treat cancer staging. specific applications. ANN on FPGA for forest fire detection
is presented in [15]. Five sensors as input neurons are used to
Neural inverse optimal control for the glucose level detect the firestorm activated by sigmoid function in the
estimation on FPGA platform is carried out in [24] and the eight feed forward hidden layers. Wireless sensor network
Extended Kalman Filter (EKF) recurrent high-order neural (WSN) with ANN gives better detection and reduce the
network (RHONN) is chosen as neural identifier delay compare to WSN only. The maximum operating
architecture. Selecting EKF as training algorithm comes frequency for the used FPGA (Virtex-5 Altera 6.4a starter) is
from the minimum error from the optimized weight values. 604.27MHz.
16-bit fixed-point representation is used and the total power
consumption of the system is 142.18mW. Literature [22] shows detection of the fault in automotive
systems i.e. fault diagnosis in a diesel engine through FPGA-
In [25], such system is modeled by using ANN to operate based accelerated MLP ANN. The used activation function is
in real time to process triple coincidences (triplets) in a PET the sigmoid function due to using fewer nodes during
scanner by identifying the true line of response (LOR). The mapping relationship. The training algorithm to train the
operating frequency was 50 MHz and the ANN pipelined hidden layer neurons is the back propagation applied offline.
model processes the triplets without exceeding 6000 FPGA The effective area on Spartan-6 XCSLX150 is 60183 for 1/6
slices. The neurons are activated using hyperbolic tangent input layer (8 neurons) and six hidden layers obtained by
activation function for 6-2-1 layers (10, 5, and 1 neurons number of DSPs times 512 added to number of look-up-
respectively). The result for the LOR selection is 97.99% tables (LUTs).
indicating a precise selection of the time.
End-User programming platform [23] is used to facilitate one straight worthy uses ANN to recognize the input image.
the interface with FPGA for ANN model of e-Nose system The first system is preferred in a limited ANN environment
that distinguishes the odor of four coffees’ types. The feed of the hardware computing and memory resources. With the
forward ANN is trained using back propagation algorithm development of the artificial intelligent, the second system
and implemented on Virtex-4 FPGA operating under emerges that can be a good candidate for multi-hidden
122.489 MHz frequency for 1-1-1 FFANN layer. End-User layers. By electing the second system, literature [42] works
programming platform offers a relief from knowing low- onto two schemes; one is called multi-hardware layer ANN
level hardware description languages (HDLs). This is applied (MHL), which performs the implementation of all computing
through an in-built and collaborative graphical user interface layers (considering one input layer, one hidden layer, and
(GUI) to keep a rapid prototyping. one output layer) on multi hardware achieving better
performance (each layer on separate hardware). The second
In [26], it was developed an ANN network based Trax scheme is single hardware layer ANN (SHL) that
solver with a 64×64 square playing board area. The ANN implements single layer on hardware allowing for better
was trained offline using back propagation algorithm. sharing hardware to achieve better area utilization
Allowing the use of weights, this combined with a binary (considering one input layer, multi hidden layers, and one
activation function, which considerably reduced the design
output layer run on the same hardware). There is a need for a
of digital logic circuit. The design was implemented on a 40- control unit to control the forward computation by
nm process FPGA (Arria II GX; Altera Corp.) and it was multiplexing the proper weight and correct input. Each
capable of operating at a clock frequency of 75MHz. computing neuron in the hidden layer consists of
An approach for autonomous robot navigation has been accumulated multiplier, ROM storage unit, and logistic
presented in [32] as an efficient execution of the feed sigmoid activation function. Using MNIST database for
forward ANN. 3000 is trained using back propagation setting up the experiment in [42], it was shown that the
algorithm. The goal behind such design is to obtain the customized SHL ANNs performance and its scalability
shortest and safest path for the robot avoiding the obstacles (different number of hidden layers) performs efficiently
with the assist of ANN as navigation technique. ANN is hardware scheme with respect to storage resource supporting
implemented on Xilinx Virtex-II Pro FPGA. The results 16-bit half-precision floating-point representation.
obtained with 357.5 MHz clock frequency that is a good Currently, data mining with parallel computational
sound compared to previous works. intelligence provides better solutions for cloud database. One
Speech signal recognition depends on gender recognition emerging issue is to use a good modeling that matches with
and emotional recognition. Speech signal in [35] is the requirements (such as unsupervised learning and
recognized and classified using FPGA-based ANN trained compressed dimensionality). Suggested model in [38] is
under back propagation algorithm. This work shows that Self-organizing map (SOM) ANN where its principle
using ANN as classifier can achieve better results than depends on data compression. To achieve accurate hardware
existing classifier called Latent Dirichlet Allocation (LDA) realization on FPGA with fixed-point representation,
in term of the used logic elements and the processing time. conscience SOM (CSOM) is used to offer optimized entropy
to well-suit high dimensionality data structure that can be
For hardware implementation of ANN in the recognition implemented on different processing platforms.
applications, the data type is critical, in which fixed-point
representation can improve execution performance in the Another SOM ANN is proposed in [39] supporting on-
feed forward computation of the network. In [37], two line training for video categorization in autonomous
different types of the recognitions are used MNIST surveillance systems. On-chip FPGA hardware is
handwritten digit recognition benchmark and a phoneme accomplished with high speed, compact parallelism, low
recognition task on TIMIT corpus. The new contribution in power consumption. It assists to have a flexible
this work was mapping ANN on FPGA hardware using fixed implementation and works well under continuous input data
point without the need for external DRAM memory. The flow. Two SOMs are tested experimentally, one for 2D data
back propagation is used as training algorithm while stream application (Telecommunication: Quadrature
unsupervised greedy RBM (Restricted Boltzmann Machine) Amplitude Modulation identification operating under
algorithm is applied during the learning algorithm for digit maximum frequency of 2.38 MHz), and the other for 3D data
classification. stream application (real-time video compression operating
under maximum frequency of 1.51 MHz).
Literature [13] also directed toward MNIST handwritten
digit recognition (28×28 pixel image) using FPGA-based References from [16] to [20] focus on the detection of the
multi-layer feed forward ANN. The Coordinate Rotation air showers, which are generated from ultra-high-energy
Digital Computer (CORDIC) algorithm approximates the cosmic rays (1018–1020 eV) which can be observed from
activation function (sigmoid function). The data type is 32- the Pierre Auger observatory, located on western Argentina,
bit single precision floating-point number standard used for considered as the world’s largest cosmic ray observatory.
the good accuracy of the hardware implementation compared Inclined air showers consist of neutrinos with charged and
to the fixed point, since such accuracy is obtained from the neutral currents built weak interaction with other particles.
truncation error of the fixed-point operation. Neutrinos are fingerprint for inclined air showers in the
atmosphere featured with a very small cross section for
MNIST handwritten digit recognition (28×28 pixel image interactions, and sensitive detection on high zenith angles.
= 784 pixels) also is visited in [42] in which two recognition Therefore, the surface detector for the neutrino-induced air
systems are tested; the first system practices the feature showers should be with accurate pattern recognition, which
extraction (Principal Component Analysis (PCA)) to map the can be achieved via ANN as done in [16]–[17]. The FPGA
complexity of the input image data to the lower dimensional Triggers based on 8-6-1 ANN in [16]–[17], 12-8-1 ANN in
space before pattern recognition with ANN, while the second
[18], and 12-10-1 ANN in [19]–[20] were trained using maximum frequency of 35.15MHz while consuming 0.182W
Levenberg-Marguardt algorithm and the test reveals that the on-chip power.
adequate results fitted for front-end boards in the Pierre
Auger surface detector system. Also in [21], the author design Lorenz Chaotic Generator
using ANN model implemented on FPGA for secure
Power systems are one of the systems that need accurate communication. The ANN layers are trained using three
modeling to achieve efficiently the desired optimized results. different algorithms: Levenburg-Marguerdt, Bayesian
In [33], a compact virtual anemometer for MPPT (maximum Regulation, and Scaled Conjugate Gradient. One hidden
power point tracking) of wind turbines (horizontal-axis and layer is used with 1 to 16 neurons where the minimal number
vertical-axis wind turbines) is implemented on FPGA. This of neurons in hidden layer needs to fit with the precision
system is based on a Growing Neural Gas (GNG) ANN, requirement. Therefore, three training algorithms are used to
which provides a good trade-off among accuracy, resource get the optimal number of hidden neurons for FPGA
utilization, computational speed, and complexity in the hardware implementation in which the optimal value was 8
implementation. This ANN is a special type of self- neurons. Sigmoid activation function is used for the neurons
supervised neural networks where it adds new neurons in the hidden layer while ramp function is used as activation
(units) in a progressively manner. The maximum number of function for the output layer.
units is elected considering the nature of data and the size of
the required area. The literature [28] presents Pehlivan–Uyaroglu Chaotic
System (PUCS) which a novel approach to be modeled using
The literature [35] suggests a new smart sensor for on- 3-8-3 Feed Forward Neural Network (FFNN) trained by back
line detection and classification of the power quality (PQ) propagation algorithm. The FFNN-based PUCS is trained
and power quality disturbances (PQD) using some basics offline using Matlab environment Neural Network
power sensors, in which the electrical installations in a non- Processing Toolbox. Then, hardware implementation is done
disrupting approach, using HOS (high order statistics: first using FPGA operation under maximum frequency: 266.429
order statistics: mean, second order statistics: variance, third MHz with the weights and daises written to a VHDL code
order statistics: skewness, fourth order statistics: Kurtosis) for 32-bit single precision floating point.
processing. This FPGA-based smart sensor can exhibit like
waveform analyzer (device for power quality monitoring and In [29], another chaotic system is designed and mapped
measurement). In addition, it has a precise PQD classifier, to FPGA for MVPDOC (Modified Van Pol-Duffing
which makes it able to classify different types of single PQD Oscillator Circuit). The modeled system is based on wavelet
and some combinations of PQD. The HOS processing adds a decomposition and MLP ANN trained by Levenberg–
distinguish feature to the system as it makes the use of very Marquardt (LM) back propagation algorithm. The
low FPGA resources compatible with the improvement of implemented resources show a good utilization in term of
the high-performance signal processing techniques for power logic elements making the hardware MVPDOC system
system smart sensors. The simplicity and the low cost with proper to be used in any nonlinear dynamic chaotic system.
on-line test for the three-phase power systems are the main The inverse characteristics of power amplifiers (GaN
features for the proposed sensors. The activation function for class-F power amplifier working with a LTE signal center at
the hidden layer FFNN is the log-sigmoid function and they 2 GHz) is modeled using NARX Neural network under the
are trained with Levenberg-Marquardt algorithm. requirement of AM/AM and AM/PM characteristics [34].
NARX is a type of the Recurrent Neural Network and can
B. Analog circuits simulators using FPGA-based hardware
linearize microwave power amplifiers using digital pre-
implementation of ANNs distortion method (DPD) which is based on the modified
Various techniques are utilized to design some analog baseband signal related to the inverse function of the power
circuits such as Memristor, power amplifiers, and chaotic amplifier. The FPGA implementation via Verilog language
circuits. The selective case studies are [11], [14], [21], and operates under 95.511 MHz maximum frequency.
[28]–[29]. In [11], Memristor HW simulator is used along
with the weighted summation (KCL) electronic modules as Similarly, [41] works to model the behavior of RF power
basic blocks to implement ANN hardware on FPGA for amplifiers using MLP ANN trained with back propagation
Single-Layer Perceptron (SLP). KCL module is considered algorithm considering two folds. One fold is nonlinearity
as weighted summation function that will update the effects while the other is memory effects. AM/AM and
weighted values. Spike-timing-dependent plasticity (STDP) AM/PM characteristics demonstrate how the model is
unsupervised spatiotemporal pattern learning is chosen accurate. Implementing the model on DSP-FPGA kit shows
because it facilitates extraction of the input pattern correctly small complexity with high processing behavioral model for
for ANN. FPGA implementation is done for several input RF power amplifiers.
neurons, and logic data for eight input neurons gives total III. DISCUSSION OF THE ANN FEATURES AND FPGA
logic elements of 306 and total registers of 3751. RESOURCES IN THE PRESENTED CASE STUDIES
Chaotic generator is a good candidate representation to FPGA-based reconfigurable computing hardware
capture the chaotic behavior of the brain activities since EEG architectures are well suited for implementation of ANNs as
(Electroencephalogram) signal processing realizes as a one can achieve rapidly configurability to adjust the weights
stochastic process. The chaotic system considered in [14] is and topologies of an ANN. This will help to have fast
Hénon map, and the suitable model for biological brain prototyping [43]. The density of FPGA reconfiguration
activities is artificial neural network with MLP-BP topology. allows for having high numbers of elements to reach to the
The 3 input 4 hidden neurons ANN models for the chaotic proper functionality within the unit chip area. Further, as a
generator are mapped on FPGA hardware with fixed-point feature for ANN, it needs to be learnt using off-line
representation. The FPGA hardware is operating with learning/training algorithm, and it requires to be adopted
through the recalling phase. These features work together to REFERENCES
customize the topology of ANN and the computational [1] V. Ntinas, et al., IEEE Trans. Neural Networks Learning Systems,
accuracy. 2018.
[2] D. Korkmaz, et al., AWERProcedia Information Technology &
Table I and Table II summarize the main feature of ANN Computer Science, 2013, pp. 342–348.
and the FPGA hardware resources, respectively, for the 32
[3] I. Li, et al., In IEEE Inter. SoC Design Conf. (ISOCC), 2016, pp. 297–
collective studies published from 2014 to 2018 (4 in 2014, 7 298.
in 2015, 5 in 2016, 13 in 2017, and 3 in 2018). There are 23 [4] M. Alçın, et al., Optik, vol. 127, pp. 5500–5505, 2016.
literatures published as conference papers, 8 literatures [5] L. Gatet, et al, IEEE Sensors Journal, vol. 8, no. 8, pp. 1413–1421,
published as journal articles, and one as a book (Outstanding 2008.
PhD Research). [6] O. Krestinskaya, et al., In IEEE Inter. Symp. Circ. Syst. (ISCAS), 2018,
pp. 1–5.
A. Results Analysis of ANN Features
[7] L. Hu, et al., Adv. Mater., 1705914, 2018.
ANN has grasped many practical applications and [8] Z. Hajduk, Neurocomputing, vol. 247, pp. 59–61, 2017.
implementations in a medicine/biomedicine [10], [12], [24] [9] R. Tuntas, et al., Applied Soft Computing, vol. 35, pp. 237–246, 2015.
[25], [30]−[31], [40], analog circuit simulators [11], [14], [10] M. Wess, et. al., in 2017 IEEE International Symposium on Circuits
[21], [28]−[29], [34], [41], pattern detection and recognition and Systems (ISCAS), 2017, pp. 1–4.
[13], [15], [22]−[23], [26]−[27], [32], [35], [37]−[39], air [11] Ntinas et al., IEEE TNNLS, 2018.
showers [16]−[20], and power system [33], [36]. ANN [12] D. Darlis, et. al., In IEEE 2018 Inter. Conf. Signals and Systems
structural design requires a massive amount of the parallel (ICSigSys), 2018, pp. 142–145.
computing and storage resources, thus; it demands parallel [13] Z.F. Li, , et. al., In IEEE 2017 Inter. Conf. Electron Devices Solid-
computing devices such as FPGAs. State Circuits (EDSSC), 2017, pp. 1–2.
[14] L. Zhang, In 2017 IEEE XXIV Inter. Conf. Electron. Electrical
Data representation plays a significant rule in Engineering Computing (INTERCON), 2017, pp. 1–4.
implementing ANN on FPGA hardware. The choice of the [15] S. Anand, et. al., In 2017 2nd Inter. Conf. Comput. Commun.
data type format for the weights and the activation functions Technologies (ICCCT), 2017, pp. 265–270.
is important to the recognition rate and the performance, for [16] Z. Szadkowski, et. al., IEEE Trans. Nucl. Sci., vol. 64, no. 6, pp.
which numerous data type representations can be considered, 1271–1281, 2017.
like fixed-point [10], [14], [21], [24]–[25], [29], [33], [37]– [17] Z. Szadkowski, et. al., In IEEE Progress in Electromag. Research
[39], floating point [10], [13], [22], [28], [34] or Symp. (PIERS), 2016, pp. 1517–1521.
integer/binary representations. Fixed-point and integer/binary [18] Z. Szadkowski, et. al., In 2015 4th IEEE Inter. Conf. Advance.
representations can reach improved execution performance Nuclear Instrum. Measurement Methods their Applic. (ANIMMA),
2015, pp. 1–8.
in the forward computation of the networks, it is a great
[19] Z. Szadkowski, et. al, In 2015 IEEE Federated Conf. Comput. Science
difficulty to train deep neural network (many hidden layers) Inform. Systems (FedCSIS), 2015, pp. 693–700.
for recognition applications, then map the optimized weights [20] Z. Szadkowski, et. al., IEEE Trans. Nuclear Science, vol. 62, no. 3 pp.
onto FPGA hardware. The multilayered ANN is referred to 1002–1009, 2015.
of three or more layers: one input layer, one or more hidden [21] L. Zhang, 2017 IEEE 30th Canadian Conf. Electr. Comput. Eng.
layer(s) and one output layer [25]. As for the accuracy (CCECE), 2017, pp. 1–4.
reduction of the FPGA hardware, it primarily caused by the [22] S. Shreejith, et al. in IEEE Design, Automation & Test in Europe
truncation error of the fixed-point operation. Some works Conference & Exhibition (DATE), 2016, pp. 37–42.
make a comparison of fixed-point with floating-point [23] A. Tisan, et. al., IEEE Trans. Indus. Informatics, vol. 12, no. 3, 2016.
operations to scale the effect of the round off errors, and the [24] J. C. Romero-Aragon, et. al., In 2014 IEEE Symp. Computat. Intellig.
CSOM can be used for the fixed-point selection as in [38]. Control Automat. (CICA), 2014, pp. 1–7.
[25] C. Geoffroy, et. al., IEEE Trans. Nuclear Science, vol. 62, no. 3, 2015.
In contrast, the floating-point data type format can bring
[26] T. Fujimori,, et. al., 2015 IEEE Inter. Conf. Field Programmable
easier ANN training processes in software and it able to Technology (FPT), 2015, pp. 260–263.
possibly give suitable recognition accuracy and execution [27] R. Biradar, et. al., In IEEE Inter. Conf. Cogn.Comput. Inform. Process.
performance. It is revealed that reduced-precision floating- (CCIP), 2015, pp. 1–6.
point representation is a candidate for the hardware [28] M. Alçın, et al., Optik, vol. 127, pp. 5500–5505, 2016.
realization of the ANN on FPGA [42]. [29] R. Tuntas, Applied Soft Computing, vol. 35, pp. 237–246, 2015.
B. Results Analysis of FPGA HW Resources [30] A. T. ÖZDEMİR, et. al., Turkish J. Electr. Eng. Comput. Sciences,
vol. 23, no. Sup. 1, 2089-2106, 2015.
Different types of FPGA technology platforms are used [31] H. W. Lim, et al. , ISOCC 2017, 2017, pp. 90–91.
in the selected literature. The two main companies are Xilinx [32] N. Aamer, et. al., in Proceedings of the 2nd International Conference
and Intel/Altera. Each one has several FPGA families on Communication and Electronics Systems (ICCES 2017), 2017, pp.
operating on various frequencies with miscellaneous 935–942.
available resources. [33] A. Accetta, In 2017 IEEE 26th Inter. Symp. Industrial Electron. (ISIE),
2017, pp. 926–933.
IV. SUMMARY [34] J. A. Renteria-Cedano, et al, In IEEE 57th Inter. Midwest Sympo. in
Circuits and Systems (MWSCAS), 2014, pp. 209–212.
Several FPGA-based ANNs targeted towards different
[35] B. Rajasekhar, et. al., In 2017 3rd International Conference on
applications are discussed in this brief. FPGA hardware Biosignals, images and instrumentation (ICBSII), 2017, pp. 1–6.
implementations make ANN more convenient to be realized [36] G. D. J. Martinez-Figueroa, et. al., IEEE Access, vol. 5, pp. 14259–
and reconfigurable. Choosing the FPGA technology platform 14274, 2017.
depends on available resources, ANN topology, and data
type representation.
[37] J. Park, et. al. in IEEE Inter Conf. Acoustics Speech Signal Process.
(ICASSP), 2016, pp. 1011–1015.
[38] J. Lachmair, et. al., In IEEE 2017 Inter. Joint Conf. Neural Networks
(IJCNN), 2017, pp. 4299–4308.
[39] M. A. A. de Sousa, et. al., In IEEE Inter. Joint Conf. Neural Networks
(IJCNN), 2017, pp. 3930–3937.
[40] P. Antonik, Springer Theses, Recognizing Outstanding Ph.D.
Research, 2018.
[41] J. C. Núñez-Perez, et. al., In Inter. Conf. Electron. Commun. Comput.
(CONIELECOMP), 2014, pp. 237–242.
[42] H. M. Vu, et. al., In: V. Bhateja , B. Nguyen, N. Nguyen, S. Satapathy,
DN. Le (eds), Information Systems Design and Intelligent
Applications. Advances in Intelligent Systems and Computing, vol
672. Springer, Singapore, 2018.
[43] J. Zhu, et. al., In Inter. Conf. Field Programmable Logic Applicat,
Springer, Berlin, Heidelberg, 2003, pp. 1062–1066.
TABLE I. ARTIFICIAL NEURAL NETWORK PROPERTIES FOR THE LITERATURE REVIEW
Comparison Weighted Training

# Input/hidden
summation Test algorithm (TA)/
ANN type layers Data type Application
(WS)/Activation data set Learning
(neurons)
function (AF) algorithm (LA)
Work/Year
AF: Piecewise
linearly
approximated
a. 12/16/24-
hyperbolic tangent
Multi-Layer bit fixed TA: Resilient
function (for hidden ECG Anomaly
10/2017 Perceptron 8/6 layer point /b. 104 backpropagation
layer)/ hyperbolic Detection
(MLP) floating (RPROP)
tangent function
point
Tanh (for hidden
layer and output
node)
LA: spike-timing-
dependent plasticity
Single-Layer
WS: KCL 1/1 Layer (8 (STDP) Memristor
11/2018 Perceptron 14-bit -
computation neurons/-) unsupervised Simulator
(SLP)
spatiotemporal
pattern learning
Feed forward
TA: Human Blood
back
12/2018 AF: sigmoid function 1/1 Layer - 40 Back Propagation Identification
propagation
(BP) Device
(FFBP)
AF: logic sigmoid 32-bit

function with single MNIST
Multilayer -/1 Layer (-
13/2017 Rotation Digital precision 60000 - handwritten
feed-forward /300neurons)
Computer (CORDIC) floating digit recognition
algorithm point
AF: bipolar sigmoid

activation
function (hidden 1/1 Layer (3 TA:
MLP-BP
14/2017 neurons) and ramp neurons/4 fixed-point 6000 Back Propagation Brain research
topology
activation neurons) (BP)
function (output
neurons)
Multilayer
Feedforward Forest fire
AF: Sigmoid 1/8 Layer (5
15/2017 Neural - - - detection in
function neurons/-)
Network WSN
(FFNN)
16/2017 @7 & 8: 8/6

Detection
17/2016 Layer TA:
AF: Tangent sigmoid of Neutrino-
18/2015 - @9: 12/8 Layer 14-bit - Levenberg-
function Induced Air
19/2015 @10: 12/10 Marguardt (LM)
Showers
20/2014 Layer
TA: Levenberg-
Marguardt (LM),
1/1 Layer (3 Bayesian
32-bit fixed Secure
21/2017 - - neurons/8 10000 Regulation (BR).
point communication
neurons) and Scaled
Conjugate
Gradient(SCG)
Fault Detection in
Automotive
AF: Sigmoid 1/6 Layer (8 Floating TA: Back
22/2016 MLP 1000 Systems (Fault
Function neurons/-) point propagation (BP)
Diagnosis of a
Diesel Engine)
Pattern recognition
module for an
TA: artificial olfactory
23/2016 Feedforward AF: sigmoid function 1/1 Layer - - Back Propagation system
(BP) to recognize
different types of
coffee (e-Nose)
Recurrent
high-order TA: Extended
AF: hyperbolic 16-bit Kalman Glucose Level
- 1400 Regulation for
24/2014 neural tangent function fixed-point
Diabetes Mellitus
network Filter (EKF) Type 1 Patients
(RHONN)
6/2 Layer (10 TA: Positron Emission

Pipelined AF: hyperbolic 18 bits
25/2015 neurons/5neuro - Back Propagation Tomography
Architecture tangent function fixed-point
ns) (BP) (PET)
1/1 Layer (1 TA:

Trax Solver (game
26/2015 MLP - neuron/7 - - Back Propagation
solver)
neurons) (BP)
Multilayer TA:
1/2 Layer (3
Feedforward AF: hyperbolic
neurons/9 16-bit - Function
27/2015 Neural tangent Function Back Propagation
neurons) Approximation
Network (BP)
32-bit
TA:
1/1 Layer (3 single
AF: Log-Sigmoid
FFNN neurons/8 precision 200,000 Pehlivan–
28/2016 function Back Propagation Uyaroglu Chaotic
neurons) floating System
(BP)
point
AF: Tangent sigmoid

activation
function (hidden 1/2 Layer (4 TA: Levenberg– Modified Van der
29/2015 MLP neurons) and Linear neurons/19neur Fixed point - Marquardt (LM) Pol–Duffing
activation ons) back propagation Oscillator Circuit
function (output
neurons)
AF: Piecewise linear 16-bit fixed
point /32- TA: Mobile ANN-
1/1 Layer (8
sigmoid (PLS) bit single based automatic
MLP-BP neurons/ 2 -
30/2015 activation function precision Back Propagation ECG arrhythmia
neurons)
/sigmoid activation floating (BP) classifier
function point
Atrial Fibrillation
31/2017 - - - - 14 -
Classifier
TA: VLSI approach for

32/2017 FFNN - - - 3000 Back Propagation autonomous robot
(BP) navigation
Self- Virtual
Supervised Anemometer for
TA: Growing
Neural MPPT of Wind
33/2017 - - fixed point - Neural Gas
Network: Energy
algorithm
Growing Conversion
Neural Gas Systems
Modeling the
Recurrent Inverse
Neural Characteristics of
AF: hyperbolic
Network: Floating Power Amplifiers
34/2014 tangent function 1/1 Layer - -
NARX point ( GaN class F PA
Tanh
Neural working with a
network LTE signal center
at 2 GHz)
TA: Emotion
1/1 Layer (-/20
35/2017 FFNN AF: sigmoid function - - Back Propagation recognition from
neurons)
(BP) speech signal
Smart Sensor for

1/1 layer (3 TA: Levenberg- Detection and
AF: log-sigmoid
36/2017 FFNN neurons/20 - 1000 Marquardt Classification of
function
neurons) algorithm Power Quality
Disturbances
TA:
Back Propagation MNIST
(BP ) handwritten
Feed-
LA: unsupervised digit recognition
forward deep AF: logistic sigmoid 8-bit Fixed-
37/2016 1/3 Layer - greedy RBM benchmark and a
neural function Point
(Restricted phoneme
networks
Boltzmann recognition task
Machine) learning on TIMIT corpus
algorithm
Self-
Organizing
LA: The Self-
Map (SOM) 16-bit fixed
38/2017 - - - organizing map Data Mining
Artificial point
(SOM)
Neural
Network
Self- Telecommunicatio
Organizing n/Video
LA: The Self-
Map (SOM) 16-bit fixed categorization in
39/2017 - - - organizing map
Artificial point autonomous
SOM1/SOM2
Neural surveillance
Network systems
AF: Tangent sigmoid Intravascular

40/2018 FFNN 1/2 Layer 16-bit 600 -
function OCT Scans
TA: Levenberg–
AF: Tangent sigmoid RF Power
41/2014 MLP - - - Marquardt (LM)
function Amplifier
back propagation
MHL-ANN
1/1 Layer (20
(multiple- Handwritten digit
neurons/ 16-bit TA: The back
hardware- recognition
12neurons) half- Propagation (BP)
layer/ AF: Logistic sigmoid application with
42/2018 / precision 10000 with the stochastic
SHL-ANN function MNIST database
1/2 Layer (784 floating- gradient descent
(single- with 28×28 pixel
neurons/80 point (SGD) algorithm
hardware- image = 784 pixels
neurons)
layer)
TABLE II. FPGA HW RESOURCE
Implementation tool
FPGA family Type Total Elements/Other Features Accuracy
on FPGA
ARM processor DSP: a. 28 / Flip-Flops: a. LUT: a. 1895 / Latency: a. 87 / b. a. 99.81%
10/2017 Vivado HLS tool
Zynq b. 42 1772 / b. 9295 b. 15163 1208 /b. 99.59
Cyclone
Total logic elements: 306 Quartus II and
11/2018 II/EP2C70F672 -
Total registers: 3751 ModelSim tools
C6 Altera
Very High Speed 97.5% for
Xilinx FPGA Integrated Circuit 96x96 pixel
12/2018 -
Spartan 3S1000 (VHSIC) Hardware sizes
Description Language (resolution)
Total logic elements: 6618
Cyclone IV Quartus II with an
Total combinational function: 5906
13/2017 EP4CE115 Altera -
Dedicated logic resisters: 2772
Altera Verilog
Embedded multiplier 9-bit elements: 21
Maximum Frequency 35.15MHz
4 input LUTs: 1954
Registers: 364
14/2017 Zynq 7020 Slices: 605 - -
DSPs: 24
BRAMs: 20
Total On-chip Power: 0.182W
Maximum frequency: 604.27MHz
Slice registers: 45
Virtex-5
Slice LUTs: 16 Verilog HDL code
15/2017 Altera 6.4a -
Logic elements: 16 utilizing Model Sim
starter
LUT-FF: 53
Bonded IOBs: 45
@7:
Maximum frequency: 172.98MHz @100°C
Registers: 3839
16/2017 DSP (18×18): 92
CORSIKA and Off
17/2016 Cyclone V E Adaptive logic module (ALM): 2189
Line simulation
18/2015 FPGA @9: -
Packages
19/2015 5CEFA9F31I7 Multipliers in Adaptive Logic Modules (ALMs): 1247
AHDL code
20/2014 Multiplier in ALMs and DSP: 107/8
@10:
Multipliers in Adaptive Logic Modules (ALMs): 41151
21/2017 - - - -
Flip-flops: 11401
Spartan-6 LUTs: 17175
STM32 platform
22/2016 Xilinx BRAMs: 0 -
(STM32F407FZ)
XC6SLX45T DSPs: 84
Latency: 105
LUTs: 332
End-User
Virtex-4 SX RAMB16s: 4
23/2016 Programming Platform -
4VSX35 DSP48s: 23
VHDL
Maximum frequency: 122.489 MHz
Altera DE2-115 Logic elements: 14262
Cyclone IV Registers: 3059
24/2014 Verilog
EP4CE115F29 Embedded Multiplier 9-bit element: 40
C7 Power consumption: 142.18mW
Maximum Frequency: 50MHz
Virtex 2 Pro
Slices: <6000 (5463 slices)
25/2015 series - 97.99%
Memory blocks: 19
XC2VP50
Multipliers: 45
Maximum Frequency: 75MHz
Very High Speed
Combinational ALUTs: 79015
40-nm process Integrated Circuit
Memory ALUTs: 473
FPGA Arria II (VHSIC) Hardware
26/2015 Dedicated logic registers: 25620
GX; Altera Description Language
Total block memory bits: 4208006
Corp. (VHDL) and
Total DSP Blocks: 0
Quartus II ver. 14.1
Total PLL: 1
Slice LUTs: 2875
Slice Registers: 2014
Virtex 5
Bonded IOBs: 4
27/2015 XUPV5- - -
Block RAM/FIFO: 2
LX110T
DSP48Es: 3
Memory: 72 kB
Slice registers: 86329
Xilinx Virtex 6 VHDL/
28/2016 Slice LUTs: 87207 -
XC6VCX240T Matlab
Fully used LUT-FF pairs: 67624
Bonded IOBs: 195
Maximum Frequency: 266.429 MHz
CLKs: 2
Slices: 1236
Xilinx Virtex-II Slice flip-flops: 329
29/2015 VHDL -
Pro XC2V1000 MULT18X18s: 13
4 input LUTs: 2134
Bonded IOBs: 82
Frequency: 50MHz
Altera Cyclone 96.54%
Logic elements: 1814/23189
30/2015 III - /
DSP elements: 40/220
EP3C120F780 97.66%
Logic registers: 784/10816
31/2017 Cyclone-IV Logic elements: 40830 Altera DE2-115 95.3%
Maximum Frequency: 357.5 MHz
Slices: 492
Xilinx
32/2017 Slice Flip-Flops: 371 Matlab -
Virtex-II Pro
4-Input LUTs: 942
Bonded IOBs: 44
Logic elements: 22148
Registers: 2265
Altera Cyclone
Memory bits: 6528 VHDL
33/2017 III -
Embedded 9-bit multipliers: 32 Matlab
EP3C25F324
Slice registers: 38572

Slice LUTs: 29057
Verilog language
Virtex-6 FPGA LUT FF: 21491
Matlab
34/2014 ML 605 Bonded IOBs: 3 -
SystemVue
Evaluation Kit Block RAM: 225
Xilinx ISE tool
DSP: 64
Maximum Frequency: 95.511MHz
Slices: 2817
FF: 2916
35/2017 - LUTs: 2900 - -
Bonded IOBs: 16
GCLKs: 1
Logic elements: \2000
Altera DE2-115
Registers: 580
Cyclone IVE VHDL
36/2017 Multiplier 9-bit: 4 -
EP4CE115F297 Matlab
Memory bits: 153396
C
Maximum Frequency: 61.75MHz
Digit recognition/ Phoneme recognition
FFs: 136677/161923
Xilinx LUTs: 213593/137300
37/2016 - -
XC7Z045
BRAMs:750.5/378
DSPs: 900/0
Slice Registers: 45036/211793
Xilinx Virtex-5
Slice LUTs: 57679/273,055
V5FX100T/
38/2017 BRAM/FIFOs: 226,1421 - -
Xilinx Virtex-7
DSPs: 100/700
V7FX690T
Power consumption: 23.5W/44W
Xilinx ISE LUTs: 8845/17945
Xilinx ISim simulator
39/2017 platform Virtex Chip utilization: 30% / 62% -
tool
5 XC5VLX50T Maximum frequency: 2.38 MHz/1.51 MHz
Xilinx VC707 FFs: 29609
evaluation LUTs: 21065
40/2018 board Virtex-7 Block RAM: 1028 (16 kb) VHDL -
XC7VX485T- DSP48E: 201
2FFG1761C Maximum frequency: 180 MHz
Logic elements: 2720
Cyclone®III Memory bits: 1549824 Matlab-Simulink
41/2014 -
Edition-Altera Logic registers: 1348 software
Multiplier 9-bit: 170
FFs: 24025/44079 90.88%
LUTs: 28340/63454 /
Xilinx Virtex-5
42/2018 Block RAM: 22/40 VHDL 97.20%
XC5VLX-110T
DSP: 22/40 recognition
Maximum frequency: 205 MHz/197 MHz rate

FPGA Based Artificial Neural Network

Diunggah oleh

Informasi Dokumen

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

FPGA Based Artificial Neural Network

Diunggah oleh

Hak Cipta:

Format Tersedia

A Collective Study on Recent Implementations of

FPGA-based Artificial Neural Network

Comparison Weighted Training

AF: logic sigmoid 32-bit

AF: bipolar sigmoid

16/2017 @7 & 8: 8/6

6/2 Layer (10 TA: Positron Emission

1/1 Layer (1 TA:

AF: Tangent sigmoid

TA: VLSI approach for

Smart Sensor for

AF: Tangent sigmoid Intravascular

Slice registers: 38572

Anda mungkin juga menyukai