Anda di halaman 1dari 7

An Implementation Study of

Airborne Medium PRF Doppler Radalr Signal Processing on a


Massively Parallel SIMD processor Architecture.
Anders Astrom, Mattias Johannesson, Anders Edman
Linkoping University, S-58 1 83 LINK.oPING, Sweden
Tor Ehlersson, Ulf Nasstrom, Bo Lyckegiird
Ericsson Microwave Systems AB, S-431 134 Molndal, Sweden
email: andersa@isy.liu.se
ABSTRACT
We will show in this paper that the linear SIMD
architecture FVIP is capable of performing a typical MPD
radar signal processing including FFT and the resolving
algorithm. A conventional DSP system clocked at 50 MHz
would require approximately 200 DSPs to have the same
performance as our FVIP system. We have estimated the
size and power consumption of this our FVIP system to be
3 dm3 and 200 W. With these MPD studies together with
previous LPD studies we are confident that a "VIP-type''
architecture can handle all typical pulse Doppler radar
wave forms currently used in airborne radar.
I GENERAL
Airborne Doppler radar wave forms are categorized in
LPD (Low Pulse-repetition-frequency Doppler), MPD
(Medium PRF Doppler) and HPD (High PRF Doppler).
Figure 1 shows these three modes where (a) correspond to
HPD, (b) to MPD, and (c) to LPD. A typical airborne
Doppler radar must be able to handle all of these wave
forms.
Pulses per Coherent Processing Interval (CPI)
1000

100

For ground-based search radars we have studied the


efficiency of a linear SIMD array for normal LPD wave
forms. SIMD stands for Single Instruction stream Multiple
Data stre,am, and it means that all processors in the array
perform the same instruction but on different data. This
study showed that the RVIP architecture performed
superior to conventional digital signal processors, DSPs,
[1][3]. Ericsson Microwave Systems AB is currently
manufacturing the RVIP chip which will be ready in early
1995.
Ericsson Microwave Systems AB has developed and is
producing the radars for the Swedish Air Force JA37
Viggen aind JAS39 Gripen fighters, and are currently
developing AEW radar based on the ERIEYE concepts.
Ericsson Microwave Systems AB also develops and
produces ground based radar systems.
After the successful study of an LPD implementation on
the 1D SIMD array RVIP, it is a reasonable question if this
concept also works for MPD and HPD. Since both LPD
and HPD has a large inherent parallelism suitable for 1D
linear arrays, shown in Figure 1, it is reasonable to to
assume that also HPD can be implemented as efficient as
LPD. H0.w to get an efficient implementation for MPD is
not quite as obvious. This has now been studied, and this
paper describes the result of an MPD implementation on a
VIP-like architecture. The objectives for this shrdy have
been to take the existing RVIP and make some smaller
redesigns and to see how well an algorithm, which is
non-trivial to parallelize, can be mapped on the VIP
concept. 'The system, to be fit in an airplane, was limited in
size and effect to 20 dm3 and 1500 W respectively.

1
Range, b h number

10 10

100

loloo;c! LPD

'i'he ;Lobkm of parallelization of the MPL, algonthm is


shown in Figure 2. A frame consists of approximately
100x100~=10000elements which makes it suitable for
parallelization. However, the format of the matrix changes

Figure 1, Typical coherent processing video matrices.


551
0-7803-21
20-0/95/0000-0551
$4.000 (1 995 IEEE)

IEEE INTERNATIONAL RADAR CONFERENCE

The 5 12-PE chips are connected together forming a large


linear array, shown in Figure 4. In this application we use
16,000 PES which corresponds to 32 512-PE chip. The
clock frequency of the system is 50 MHz.

from frame to frame, as indicated in Figure 2, which makes


the parallelization less obvious.

Instruction
t

30

100

300

Range

Figure 2, Different matrix sizes in the MPD case.

The FVIP architecture described in this paper is a


development of the RVIP primarily designed for LPD
applications. The FVIP concept is built up from several
FVIP chip. Each of these chips (or multi-chip modules)
contains 512 processing elements, PES. Each PE, which is
bit-serial, consists of four 32-bit U 0 registers, a 16-bit shift
register, 5 12-2048 bits of memory, a 16-bit serial parallel
multiplier, an ALU, a Global Logic Unit, and a 32-bit
accumulator. A floor plan o i a chip is shown in Figure 3.

* 32-bit YO-registers

16-bit shift register

Figure 4, Array design.

The execution times for some typical 16-bit operations are


shown in Table 1.

L
Multiplication

1.5 Mops

20 Gops

MultiplyAnd-Accumulate

1.5 Mops

20 Gops

Table 1

512-bit memory (FVIPa)


2048-bit memory (FVIPb)
16-bit serial multiplier
ALU

I
1

Data U 0

I1 THEFVIP ARCHITECTURE

GLU
32-bit accumulator

:1

=_I

Figure 3, Layout of the FVIP

An overview of the MPD algorithm we have studied is


shown in Figure 5. It has been designed to represent an
imaginary case of a state-of-the art MPD radar algorithm
in computational complexity. We separate the
computations into two separate FVIP arrays. The first
system emulates a 2D architecture with a linear SIMD
array, and each PE process one sample in each CPI. In the
second system. which performs the, binary integration in
the ambiguity resolving, ex-,.PE is used to process data
from one rmge bin but all data for different Doppler
frequencies. Also, the second system integrates results
from the nine latest CPIs.

552
IEEE INTERNATIONAL RADAR CONFERENCE

I11 ALGORITHM

{CHI

, CH2 , CH3 , C H 4 )

&

4*2*f6

Detection of

,
(b/

4*2*16

/II ;" I

Pulse code

4*2*f6

Vector
adjustment

size. The delay between acquisition of a CPI and the result


from the integration of the nine latest CPIs are allowed to
be up to 100 ms, but here it never exceeds 2*CPI time,
which is less than 9 ms.

('1

4*2*28
Amplitude
normqlization

_$

{ CHl ,CH2

(i)
4*2*17

0)

li;.

k24

(01

11-3

14-6

I Range bins

I300

I100

:i

2 *2*24 C
{ cH
h1
43),

Samples processed

Envelope
detection

fl

9900

17-9

I30

1300
257

Pulse processed

12900

7710

Table 2, 13ifSerent CPI sizes.

From the input, i.e. the AD-converters, we receive four


channels where each of them are a 2*16 bits complex
video. This means that the total input data is 4*2*16=128
bits. The data precision after each operation is shown in
Figure 5. Each PE is assigned to process four input
samples, one from each of the four channels. The data is
written into the array so that neighboring PES have data
from the same range bin but from different pulses. This is
no problem since the IO registers can be addressed
arbitrary.

CFAR

4*2*17

I CPI

4*2*7

Amplitude
normalization

Henceforth, we will denote a complex sample as

1
4 *2*23

where r is the index in range and p is the index in pulse.

Figure 5, Overview of the MPD algorithm.

The 3 CPI formats used are collected in Table 2. All


samples are not processed as indicated in the Table. The
samples which are removed are those that are received
either when the radar is transmitting orjust after a switch to
a new PRF, and not enough pulses have been sent out. The
number of samples actually processed are selected so that
the number of correct data points for each range bin is an
even power of two at the time of the Fourier transform.
This gives between 7710 and 12900 samples in each CPI,
and thus 13212 PES are needed in the first system. The
sample frequencv is 2 MHz which gives betwem 225 000
ana 3 15 Ow0 closk cycles for tie procy-,;;fig of a CPI (at a
clock frequency G: 50 MHz). For the resolving algorithm a
separate 3072 PE FVIPb array is used. The only difference
between the FVIPa and FVIPb arrays are internal RAM

In the A.DC saturation detection, (a) in Figure 5, all


samples from a particular range bin, are set to zero if there
is at least one sample from any pulse in that range bin
where
,/12(r,p:)

+ Q 2 ( r , p ) > Threshold

(2)

This is done by first computing a standard approximation


[5] of the envelope and then subtract the threshold and
study the sign of the result. The broadcast of the result to all
other samples in the same range bin is made with the GLU
function .MARK [2].
In the next step all the complex vectors in the matrix wiii be
rotated by an angle $, which is common for the whole
matrix, and is determined by the system control, (b) in
Figure 5. The operation is described as

553
IEEE INTERNATIONAL RADAR CONFERENCE

S(&

PI =

S(&

P ) . ein+

(3)

To reduce ground clutter we perform a high pass filtering


(c) in the pulse direction as

After the pulse code compression a third normalization


(windowing) (h) is applied. This time with a global
real-valued constant

s,,,<r,n= s(g)("-f) N ,
*

(9)

The next step is to compute the envelope, or magnitude, of


the complex value (i) as
This operation requires that each PE accesses the previous
pulse sample within the same range bin which is found in
the neighboring PE. After this operation the first sample in
each pulse are discarded giving a total number of samples
within each pulse which is an even power of two.
The next operation is a windowing (d) where the first and
last pulses in each range bin are weighted down to reduce
filter sidelobe levels in the FFT. The window is
real-valued and varies in the pulse direction but is
common for all range bins. The operation is described as

This operation is a MAC operation where Nl(p) is found in


the internal RAM.
To be able to compute the Doppler frequency of the target
echoes we must compute the fourier transform in the pulse
direction. The FFT is performed as described in [4] with a
radix-2 decimation-in-time algorithm where each PE
performs half a butterfly. The long shift operations
requiered for the bit-reverse sorting, before each butterfly
layer, gives the operation a large complexity.

After the FFT a second normalization (windowing) (f)


with a real valued coefficient is performed, this time
varying in the frequency direction
(7)

The pulse code compression (g) is a correlation with the


transmitted radar wave form to find the target echoes. The
convolution kernel length is approximately 10% of the
number of range bins and the coefficients are complex.'The
Operation is

The constant false alarm rate, CFAR, processing is


computed in a 6x6 neighborhood (i),For a single sample,
the following three criterias must be fulfilled in order to
declare it as a detection:
The CHl*kl sample must be larger than 7 of the 35
neighbor CH1 samples.
The CHl"k2 sample must be larger than the CH2
sample.
The CH1 sample must be a local maximum in a 3x3
neighborhood.
The constants kl and k2 are from the CFAR definition on
the logarithm of the data, for instance that log(CHl)+lOdb
> log(CH2) which gives the second condition if k2 = alo
where a is the base of the logarithm.
The result of the CFAR processing is a number of
preliminary detections, and the position and strength of
these are extracted and input into the FVIPb array. We use
the GLU to extract the target position and amplitude in the
way described in the pseudo code below.

G = Data /* Load the extraction data vector to GLU */


num = COUNT
/* Read number of targets */
for (i=num;i>O;i-)
/* For all targets */
LFILL
pos = COUNT
/* Find the position of "1" */
LEDGE
/* Find leftmost "I" */
ST A,TMP
/* Store vector with a "I" */
for (i=O;i<b;i++) /* Read the amplitude bits */
I = Amplitude(i)& TMP /* load bit to GLU */
Bit = GOR
/* read value to ext unit */
XOR i"ILP,Data
/* Remove processed one */
cndfor
Since we need to be able to implement a gliding resolve
window, i.e. one complete resolving after each CPI we

554

IEEE INTERNATIONAL RADAR CONFERENCE

need to store data from the 9 latest CPIs. These data are
stored in a memory which is handled by the F V P b array
controller, but the writing into the table is made with the
data extracted from the FVlPa array.

Thus, the input to the resolving process, which produces


the result of the algorithm, is nine matrices of data. These
matrices are grouped into three main formats showed in
Figure 2. One group consists, for example, of matrices
with sizes 29x25630~256,and 31x256.

Frequency

One PE
FVIPa

Since the range resolving is extended to 3000 samples, we


need approximately 3000 FVIPb processor elements
where each PE stores all binary target data for one range bin
and 9 CPIs. This means that each binary data must be
spread to one PE, and to all other PEs where the range is
the same modulo the Nyquist range shown in Figure 6.

Range

Extracted data
memory

We know that the number of ones, i.e. targets, in each CPI


is at the rnost in the order of 100. (If there are much more
the resolving will be impossible and probably
uninteresting) and we do the input only for the non-zero
data. To do this we use the shift register and the PE in the
following manner. First we store all possible k2:s in the
internal R.AM of the array, or rather the vectors with ones at
the k2 interval. We then shift the appropriate k2 vectors kl
steps and read the result to the PE RAM.Since k l can be up
to 300 we also store phase shifted versions of k2 so that a
shift longer than 5 never is needed.

Frequency (RAM)
One PE

t
*

Reset RAM
while (Data in table)

/* Reset all RAM cells */


/* While any 1 in the
extraction memory */

/* Find range of this 1 */


pos = R
/* Find new PE position */
kl = LuT(~os)
/* Find correct vector */
phase = round(kl/lO)
shift = remainder(klA0) /* Find shift length */
SREG = RAM(kz+phase)
Shift(shift)
/* load the result to RAM */
RAM(k3) = SREG
endwhile

k2

respective Nyquist frequency the coincidences in the large


matrix indicates true target range and velocity. This nine
matrices are unfolded between 10 to 100 times to a size of
3000x 1000.

Range

FVIPb

Figure 6, The data extraction of FVIPa and data input to


FVIPb.

The resolving is performed using data from 9 different


hput matrices (CPIs). The aim is to detect the
unambig-rous range and velocity for each target. This is
possible since all CPIs are obtained using different PRFs
which gives different frequency sampling distances in the
different CPIs. By expanding all matrices around their

The resolving algorithm needs to detect all coincidences


with 9 occurrences and then remove those echoes from the
matrices. Thereafter all coincidences of 8 are tried and so
forth until all coincidences of 3 01 more have been found.
Additionail constraints, far instance a similarity in echo
strength, might be applied for a target to be determined as
correct. This is done by the controller who has access to all
amplitude: information which is stored in the memories.

555
IEEE INTERNATIONAL RADAR CONFERENCE

The resolving in frequency is made by addressing the


current CPI frequency address in software for the different
CPIs. When a target is detected its position is read out with
the GLU, and then the shift register and k2 vectors is used
to spread the range position to all PE's where the target data
is found so that it can be removed completely from the
array. In pseudo code the resolving can be described as
I* Test all hits w. 3-9 */
for (b = 9;b>2;b-)
/* Test all freqs */
for (f =O;f<fmax;f++)
/* Add the ones in each PE */
Add-bits( b)
I*
And
with b -> 1 if result is b */
ANDb
I* Save vector */
Store Data
I* Read number of targets */
num = COUNT
/* For all targets */
for (i=num;i>O;i--)
LFlLL I* Find the position of this (target) */
range = COUNT /* Find leftmost 1 */
LEDGE
ST A,TMP
for (i=l;i<=9;i++) /* Remove bit from RAM */
/* look at leftmost one */
Load (RAM(i) AND TMP)
I* Find k2 vector shift */
if GOR
RAM(k3)= RAM(k3)
EXOR bit 1
endif
end
I* Remove processed one *I
AND LFILL,Data
endfor
endfor
endfor

What we do is to count the number of hits for each


frequency. When a hit is detected the used bit-planes are
spread to the corresponding locations and removed with an
exor operation.
The time to compute the resolving depends as shown
above on the number of target echoes in the CPIs.
Simulations will show the limit in number of targets, but
the system can handle at least 100 targets with fmax= 1000.

In this loop it is also possible to check the amplitudes in the


external table .3f each 'hit' in the reso1vin.c and only allow
(and remove from the arrdy) h s p which fulfill any certain
condition. Approximately 90% of the loop time is the time
to compute if a hit is encountered and the algorithm can
probably be speeded up by keeping track of all possible

556
IEEE INTERNATIONAL RADAR CONFERENCE

targets with 3-8 "hits" encountered in the first lap of the


loop and only testing those in the subsequent processing.
IV RESULT
The FVIP complexity for the three different CPI cases are
collected in Table 3. We see that the processing in the
FVIPa array is well below the 225 000 clock cycle limit.
The complexity of the FVIPb array computations are
collected in Table 4 and we see that we are much closer to
the 225 000 cycle limit, but it is still -25% below the limit.
Procedure

N=32

N=128

N=256

(a)

410

410

410

(b)

160

160

160

3 84

1152

2176

(c)
I(d)

I136

I136

I (e>

I23184

I47872

I79456

(f)

(g)

~~~

Total, 4
channels

I136

I132

I132

I132

I6798

I7398

I2278

1528

~528

1528

5673

7977

11049

76570

100106

119748

I
I

I
~

Table 3, Processing cycles in FVIPa.

Resolving

I Sum

161,100

171,900 I

Table 4, Processing cycles in FVIPb

V CONCLUSIONS
We have shown that the FVIP is capable of performing a
typical MPD radar signal processing. The main signal
processing part of the algorithm is parallel 1D fourier
transforms of different length. The key of the success is the
good neighbor communication which makes it possible to
do the 2D parts of the signal processing without much
performance loss apd which also to make the F'FI
implementation effective.
The resolving is solved with a separate, slightly different
FVIP array with extended RAM to handle the large

amounts of data. The reason to implement two different


arrays is that it is possible to integrate more PES on each
chip if the memory is decreased.
If we approximate the computational complexity of the
MPD algorithm we find that it is approximately
50+P+logP+FUlO complex operations/sample. With 4
operations on each complex operation and 4 channels this
gives a maximum of approximately 5000 operations for
each sample acquired at 2 MHz. With a DSP system
clocked at 50 MHz approximately 200 DSPs would then be
required to give enough performance.
This study has shown that the FVIP architecture can handle
the data from an MPD radar system under the given time
constraints. We have estimated the size and power
consumption of this system to be 3 dm3 and 200 W which
should be compared to the limits 20 dm3 and 1500W. With
these MPD studies together with previous LPD studies we
are confident that a "VIP-type" architecture can handle all
typical pulse Doppler radar wave forms currently used in
airborne radar.
The current work with the FVIP architecture is to
implement algorithms for space-time processing in a
phased array radar system.

VI ACKNOWLEDGEMENT
The authors would like to thank the Swedish Board for
Technical Development (NUTEX) for financial support.
REFERENCES
[I]

Arvidsson R., Massively parallel SIMD processor for


search radar signal processing, Proc of RADAR 94,
Pairis 1994.

[21

Astrom A., Smart Image Sensors., PhD Thesis no 319,


Linkopings University 1993.

[31

Johanneson M., Astrom A, Ingelhag P., The RVIP Image


Processing Array, Proc of CAMP 93, New Orleans,
USA.

[41

Johannesson M., FFTs on a Linear VIP SIMD Array,


Internal report, Linkoping University, Sweden, 1995.

[5]

S t h s o n G.W., Introduction to Airborne Radar, Hughes


Aircraft, 1983.

557
IEEE INTERNATIONAL RADAR CONFERENCE

Anda mungkin juga menyukai