Anda di halaman 1dari 5

DESIGN OF FPGA HARDWARE FOR A REAL-TIME

BLIND SOURCE SEPARATION OF FETAL ECG SIGNALS

Charayaphan Chareonsak, Yu Wei, Xiong Bing, and Farook Sattar


School of Electrical and Electronic Engineering, Nanyang Technological University,
Nanyang Avenue, Singapore 639798.
E-mail: {ecchara, efsattar}@ntu.edu.sg

ABSTRACT computation load of a real-time DSP. Being fully custom-


In monitoring Fetal ECG (FECG) signal, the unavoidable programmable, FPGA offers rapid hardware prototyping
and so a major source of interference is the Maternal and algorithm investigation. Here, we present an FPGA
ECG (MECG). The fetal heart is very small and thus the design of a real-time ICA-based BSS for the application
electrical current it generates is much lower than that of of separating the FECG from the MECG.
the mother [8]. In order to extract the fetal ECG for
proper clinical diagnostic, some adaptive filtering 2. THEORY
technique can be used to remove or suppress the maternal
ECG [9]. Often, a number of electrodes are placed around 2.1 Separation of Convolutive Mixture
the general area of the fetus to pick up multiple FECG The architecture proposed by Torkkola for separation of
signals. In this case, Blind Source Separation (BSS) convolutive mixture is shown in Fig. 1[3]. Minimizing
algorithm can be used to deal with the problem much the mutual information between outputs u1 and u2 is
more effectively. achieved by maximizing the total entropy at the output.
Blind source separation of independent sources from By forcing W11 and W22 to be a mere scaling coefficient,
their mixtures is a common problem in many real world the architecture is simplified:
multi-sensor applications. However, the algorithm L12
requires a very high computing power and thus a real- u1 (t ) = x1 (t ) + ∑ w12 (k )u 2 (t − k ) (1)
time implementation using is software is not practical. k =0
We present a low-cost real-time FPGA (Field L21
Programmable Gate Array) implementation of an u 2 (t ) = x 2 (t ) + ∑ w21 (k )u1 (t − k ) (2)
improved BSS algorithm based on ICA (Independent k =0
Component Analysis) technique. The separation is
And the learning rules for the separation matrix:
performed by implementing noncausal filters instead of
causal filters within the feedback network. This reduces ∆wij ∝ (1 − 2 yi )u j (t − k ) (3)
the required length of the unmixing filters as well as
provides better separation and faster convergence. Results
of FPGA testing using real ECG signals are reported.

1. INTRODUCTION
Blind signal separation, or BSS, refers to performing
inverse channel estimation despite having no knowledge
about the true channel (or mixing filter) [1,2,3,4,5]. BSS
method based on ICA (independent component analysis)
technique has been found effective and thus commonly
used. A limitation using ICA technique is the need for
long unmixing filters in order to estimate inverse
channels [l]. Here, we propose the use of noncausal filters
[10] to shorten the filter length. In addition to that, using Fig. 1. Torkkola’s feedback network for BSS.
noncausal filters in the feedback network allows a good
separation even if the direct channels filters do not have 2.2 Improved ICA Based BSS Method
stable inverses. A variable step-size parameter for Torkkola's algorithm works only when the stable inverse
adaptation of the learning process is used to improve the of the direct channel filters exists; which is not
convergence. guaranteed. It was shown that the algorithm can be
FPGA (Field Programmable Gate Array) architecture modified for noncausal. The relationships between the
allows optimal parallelism needed to handle the high signals are now changed to:
M −1 work shows that using piecewise approximation does not
u1 (t ) = x1 (t + M ) + ∑ w12 (k )u2 (t − k ) (4) affect the performance BSS algorithm significantly [6].
k =−M
M −1 3.1.1 Three-buffer technique
u 2 (t ) = x2 (t + M ) + ∑ w21 (k )u1 (t − k ) (5) In real-time hardware implementation, to achieve an
k =− M uninterrupted processing, the hardware must process the
input and output as streams of continuous sample.
where M is half of the filter length, L, i.e. L = 2M+1 and However, this is in contrast with the need of batch
the learning rule: processing of BSS algorithm. To perform the separation,
∆wij( t1− p1+ M ) = ∆w(ijt0 − p0 + M ) + K (ui (t0 )u j ( p0 )) (6) a block of data buffer has to be filtered iteratively. Here,
where K ( ui ( t0 )) = stepsize * (1 − 2 y i ( t0 )) (7) we implement a buffering mechanism using three 640-
sample (N = 640) buffers per one input source. While one
1
yi (t0 ) = (8) buffer is being filled with the input, second buffer is
1 + e− ui (t0 ) being filtered, and the third is being streamed out.
and t1 = t0+1 A side effect of this three-buffer technique is that the
po=t0-k and p1=t1-k for k = -M, -M+1, …, M. system produces a processing delay equivalent to twice
The variable learning step size, stepsize, in Equation 7, the time needed to fill up a buffer. For example, if the
will be explained in more detail later on. signal sampling frequency is 100 Hz, the time to fill up
one buffer is 640/100 = 6.4 second. The system will then
need another 6.4 second to process before the result being
3. ARCHITECTURE OF FPGA DESIGN FOR BSS ready for output. The total delay is then 6.4+6.4 = 12.8
In this section, we describe the architecture of the FPGA sec. This processing delay is too long for a practical real-
design of the ICA-based BSS algorithm using Torkkola’s time ECG monitoring and thus we applied an overlapped
feedback network. The system-level design is shown window technique. In our implementation, the 640-
followed by detailed FPGA simulations on real ECG sample block is sampled with overlap of 32 samples. In
signals. Then, topics on hardware realization of the FPGA this case the processing delay is reduced to
are discussed and the FPGA synthesis results given. (64/100)*2=1.28 sec.
In our work, the FPGA design tools used were
XilinxTM System Generator version 2.3 [6] and MatlabTM 3.1.2 Implementation of mechanism for the feedback
version 6.5 from MathWorks. The FPGA synthesis tool
network
used was XilinxTM ISE 5.2i. System Generator provides a
According to Equations 4 and 5, there is a need to refer to
bit-true and cycle-true FPGA blocksets for simulation
negative addresses for the values of w12(i) when i < 0.
under MATLAB SimulinkTM, thus offering a convenient
and realistic system-level FPGA simulation. The equation can be modified to include only positive
addresses:
M
3.1 Practical Implementation of Torkkola’s Network
for FPGA Realization
u1 (t ) = x1 (t + M ) + ∑w
i =− M
12 (i + M )u 2 (t − i ) (9)

As a result of our earlier experimentation [7][10], we Equation 9 performs the same non-causal filtering on
propose that in order to minimize FPGA resource needed, u2 as in Equation 4 without the need for negative
as well as to ensure real-time BSS separation given the addressing of w12. Equation 5 is also modified
limited FPGA clock speed, the specifications shown accordingly.
below be used. Subsections 3.1.1 to 3.1.5 explain the
impact of each parameter on hardware requirement.
• Filter length, L = 321 taps,
• Buffer size for iterative convolution, N = 640
(implemented using overlapped window to shorten
the latency time. See Subsections 3.1.1),
• Maximum number of iterations, I = 200,
• Approximation of the exponential learning step size
using linear piecewise approximation. Fig. 2. Implementation of (9) for Torkkola’s feedback network
The linear piecewise approximation is used to avoid
complex circuitry needed to implement the exponential The block diagram shown Fig. 2 depicts the hardware
function in hardware (see subsection 3.1.4 for more implementation of Equation 9. Note that the
explanation). The MATLAB simulation in our earlier implementation of the FIR filtering of w12 is done through
multiply-accumulate unit (MAC) which significantly
reduces the numbers of multipliers and adders needed
compared to direct implementation (see section 3.1.5).

3.1.3 Mechanism for learning the filter coefficients


The mechanism for learning of the filter coefficients were
implemented according to Equation 6.

3.1.4 Implementation of variable learning step size


In order to speed up the learning of the filter coefficients Fig. 3. Top-level design of BSS using System Generator
shown in Equations 6, we implement a variable step size
technique. In our application, the variable learning step
size in Equation 7, i.e. the parameter stepsize, is
implemented using Equation 10 below where n is the
iteration level, initstep is the initial step size, and I is the
maximum number of iterations, i.e. 200.
stepsize = exp(-u0 – n / I) (10)
1
where u0 = − log 2 (initstep ) − (11)
I
The exponential term is difficult to implement in Fig. 4. Detailed circuit for updating the filter coefficients
digital hardware. Look-up table could be used but will 2

require a large block of ROM (Read Only Memory). 1.5

Alternative to using look-up ROM is the CORDIC 1

algorithm (COrdinate Rotation DIgital Computer). 0.5

However, CORDIC algorithm will impose a long latency 0

(if not heavily pipelined) which will result in the need for - 0.5

higher FPGA clock speed. Instead, we used a linearly -1


0 0.5 1 1.5 2 2. 5 3 3.5 4 4.5 5

decreasing variable step size as shown in Equation 12. (a)


2

stepsize = 0.0006 - 0.000012n (12) 1.5

3.1.5 Calculation of required FPGA clock speed 0.5

As mentioned earlier that in order to save hardware 0

resource, multiply-accumulate (MAC) technique is used. - 0.5

MAC operation has to be done at a much higher rate than -1


0 0.5 1 1.5 2 2. 5 3 3.5 4 4.5 5

that of the input sampling frequency. This MAC (b)


operating frequency determines the frequency of the 2

FPGA clock. This frequency can be calculated using 1.5

Equation 13. Fs is the sampling frequency of the input 1

signals, L is the tap length of the FIR filter, and I is the 0.5

number of iterations.
0

- 0.5

FPGA Clock Frequency = L * I * Fs (13) -1


0 0.5 1 1.5 2 2. 5 3 3.5 4 4.5 5

The filter tap L = 321, iterations I = 200, sampling (c)


frequency Fs = 100 Hz, the required FPGA clock 2

frequency is thus 321*200*100*(640/64) = 64.2 MHz. 1.5

0.5

4. SIMULATION OF THE FPGA DESIGN USING


0

ECG SIGNALS - 0.5

The top level of the BSS FPGA design using System -1


0 0.5 1 1.5 2 2. 5 3 3.5 4 4.5 5

Generator is shown in Fig. 3. A more detailed diagram for


(d)
the circuit for updating the filter coefficients is shown in Fig. 5. (a) Original MECG, (b) original FECG, (c) and (d) are
Fig. 4. The FPGA was then simulated using ECG signals the mixed and noisy ECG signals used for BSS
and the results are given in the following paragraphs.
In order to create the mixtures of Maternal ECG and Table 1 details the gate requirement of the FPGA design.
Fetal ECG, we mixed two ECG signals downloaded from The total gate requirement reported by the ISE is
PhysioNet (www.physionet.org). The sampling frequency approximately 100 Kgates. Table 2 shows the reported
of the ECG signals is 100 Hz. The signals were mixed at maximum path delay and the maximum clock. The
different ratio between 0.4 and 0.9 in order to simulate maximum FPGA operating frequency of 71.2 MHz is
two electrodes placed at two locations. Two low-pass higher than the required 64.2 MHz and thus the design
filtered Gaussian noise sources were then added into the will operate in real-time.
mixtures in order to simulate flicker (or 1/f) noise
commonly appear in low frequency signals. The two Table 1: Detail gate requirement of the BSS FPGA design
noise sources were scaled to provide an SNR of Number of Slice for Logic 550
Number of Slice for Flip Flops 405
approximately 30 dB. Fig. 5 (a) shows the original ECG
Number of 4 input LUTs 3,002
signal used to represent MECG and the one in Fig. 5 (b)
- used as LUTs 2,030
is used for FECG. Fig. 5 (c) and (d) show the two mixed,
- used as a route-thru 450
and noisy, ECG signals used in the FPGA simulation of - used as Shift registers 522
BSS algorithm. It can be seen in Fig. 5 (c) and (d) that the Total equivalent gate count for the design 100,213
FECG can hardly be identified due to the much larger
MECG. Although some of the QRS complex of the Table 2: Maximum combinational path delay and operating frequency of
the FPGA design for BSS
FECG is still visible, most are hidden by the larger Maximum path delay from/to any node 15.8 nSec
MECG. Maximum operating frequency 71.2 MHz
Fig. 6 (a) and (b) show the separated output ECG
signals as the results from FPGA simulation. It can be
seen that the separated FECG signal in Fig. 6 (b) is clean 6. CONCLUSION
and the QRS complex, as well as other components, can I this paper, we have shown that our designed FPGA
be easily detected. Comparing Fig. 6 (b) to the original performs the improved BSS algorithm that successfully
FECG signal in Fig. 5 (b), it can be seen that the BSS separate the Maternal ECG (MECG) and the Fetal ECG
algorithm preserve the shape of the signal well. Similar (FECG) from the mixtures of recorded ECG signals. The
conclusion can be drawn from inspecting the MECG algorithm is robust against flicker (or 1/f) noise and
result shown in Fig. 6(a). preserves the components in the ECG signals.
A simple and practical implementation of an ICA based
2 blind source separation circuit using FPGA is described.
1.5
The FPGA design achieves the real-time speed using a
1
relatively low system clock of 64.4 MHz.
0.5

7. REFERENCES
0
[1] T-W Lee, "Independent Component Analysis - Theory
- 0.5
and Applications", Kluwer Academic Publishers, 1998.
-1
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
[2] R.M. Gray, "Entropy and Information Theory", New
York: Springer-Verlag, 1990.
(a) [3] P. Comon, "Independent component analysis, a new
2
concept?", Signal Processing, vol. 36, 1994, pp. 287-314.
1.5 [4] K. Torkkola, "Blind Source Separation For Audio Signals
1
- Are We there yet?", IEEE Workshop on Independent
0.5
Component Analysis and Blind Signal Separation,
Aussois, France, Jan 1999.
0
[5] T-W Lee, A.J. Bell, and R. Orglmeister, "Blind source
- 0.5
separation of real world signals", Proc. IEEE Int. Conf.
-1
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
Neural Networks, June 97, Houston, pp. 2129-2135.
[6] Xilinx Inc., Xilinx System Generator v2.3 for The
(a) MathWorks Simulink: Quick Start Guide, February 2002.
Fig. 6. Result of FPGA simulation (a) separated MECG and (b) [7] F. Sattar and C. Charayaphan, “Low-Cost Design and
separated FECG Implementation of an ICA-Based Blind Source Separation
Algorithm”, IEEE ASIC/SoC Conference, Rochester, NY,
5. FPGA SYNTHESIS RESULTS Sept 25-28, 2002, pp. 15-19.
After the successful simulation, the VHDL codes were [8] Adam, D., and Shavit, D. “Complete foetal ECG
automatically generated from the design using System morphology recording by synchronized adaptive
Generator. The VHDL codes were then synthesized using filtration”, Medical and biological engineering and
Xilinx ISE 5.2i and targeted for Virtex-E, 600,000 gates. computing, 28, 287-292. 1990.
[9] Kam, A. and Cohen, A., “Maternal ECG elimination and
Foetal ECG Detection – Comparison of Several
Algorithms”, Proc. Of the 20th Ann. Int. Conf. IEEE
EMBS, Hong-Kong, 1998.
[10] Charayaphan Charoensak and Farook Sattar, “Hardware
for real-time ICA-based blind source separation,” in
Proc. 15th IEEE Int. Conf. SOCC, Sept. 12-15, 2004.