Babak Daneshrad
I. I NTRODUCTION
In recent years, multiple input - multiple output (MIMO)
based wireless communications has received widespread attention in the communication community. To date, a majority
of the work in this area has been of a theoretical nature [1],
[2], [3] and little attention has been paid to the implementation
requirements of MIMO systems. Recently the UCLA Wireless
Integrated Research (WISR) group embarked on a project to
develop a wideband (25MHz) real-time MIMO-OFDM testbed
at 5.2GHz RF. The ultimate objective is to develop both system
solution and novel VLSI architecture to enable real-time Gigabps indoor wireless communications.
One of the challenges in building a wideband MIMO system
is the trememdous processing power required at the receiver
side. While coded MIMO schemes offer better performance
than separate channel coding and modulation scheme by fully
exploring the tradeoff between multiplexing and diversity
[4], its hardware complexity can be practically formidable,
especially for wideband system with more than 4 antennas
on both transmitter and receiver sides. On the other hand,
its much easier to find a VLSI solution using traditional
channel coding schemes such as convolutional code and Turbo
MIMO
Tx
Fig. 1.
MIMO
Rx
MIMO
Detect
Channel
Decoding
Output
bits
(1)
408
(2)
(nc ) (nc )
(nc )
(6)
i.e. the most likely transmitted signal that causes the smallest
difference (squared error) from the received signal. The problem can be solved by enumerating over all possible x and
finding the one that causes the smallest ey ey . As the signal
constellation M and Nt increase, the computational complexity increases exponentially and could become prohibitively
high for practical applications.
D. Linear Adaptive MIMO Detection
, nc = Nc /2, , Nc /2 1.
(3)
With Nc sufficiently large, the subchannel at each of the
subcarriers can be regarded as flat-fading. Therefore, when
using OFDM, the MIMO detection over frequency-selective
channels is transformed into MIMO detection over Nc narrowband flat-fading channels. For this reason, we only focus
on the MIMO detection algorithms in flat-fading channels in
the rest of the paper.
Instead of assuming known channel matrix H, which usually requires channel probing before each transmission and
then calculating W in a bursty manner, adaptive algorithms
estimate W directly through iteration via the use of a known
training sequence at the beginning of each transmission.
1) Least Mean-Square (LMS): LMS is an estimate of the
steepest descent algorithm [5] and updates W according to
=H
+v
(4)
1 Pi1 yi yi Pi1
Pi1
,
1 + 1 yi Pi1 yi
(7)
(8)
(9)
where
0 < 1 is the
exponential forgetting factor, and
1
i
ik
yk yk
is the inverse of the weighted
Pi =
k=0
correlation matrix of yi with initial condition P1 = 0 I.
The scalar 0 is usually a large positive number and
is very close to 1. Compared to the stochastic estimation
problems given previously which require the signal statistics
such as correlation matrix, LS problem is deterministic [5],
[6]. Therefore, RLS can be used to find the LS solution to a
non-stationary process, or simply said, RLS can track nonstationary process in the LS sense. When xi , H, and vi
are all stationary, it is the weighted time-average estimate to
MMSE as i if Rxy and Ry in (5) are replaced by
i
i
ik
xk yk and k=0 ik yk yk , respectively.
k=0
409
10
10
ZF
ZFVBLAST
=1
MMSE
MMSEVBLAST
=1
1
10
10
=2
=2
1x1
=4
2
10
2x4
1x1
2x2
4x4
8x8
2x4
3
10
8x8
10
=4
1x4
10
2x2
4x4
2x2
1x4
2x8
4x4
1x2
8x8
10
4x8
2x8
10
2x4 1x2
8x8
2x2
4x4
16x16
4x8
2x4
4x8
4x8
5
10
10
0
Fig. 2.
15
10
20
SNR (dB)
25
30
35
40
Simulations of performance are conducted within MIMOOFDM framework using 25MHz bandwidth. The packet structure used in simulation can be found in [7], which is similar
to the IEEE 802.11a/g standard in frequency domain. In time
domain, each packet has 400 OFDM blocks and a duration of
1.28ms excluding the training blocks in adaptive algorithms.
4-QAM is assumed unless otherwise noted. The simulation
is concluded by calculating uncoded bit/packet error rate
(BER/PER) when 400 packets with errors are collected. But
the total number of packets simulated is no less than 1, 000 and
no more than 40, 000. Perfect sampling and carrier frequency
offset synchronization are assumed throughout the simulation.
The channel is assumed to be quasi-static constant for an
entire packet, but independent among different packets. Each
channel path is generated independently using the exponential
decaying Rayleigh fading channel model [8]. The impulse
response of the channel is composed of equally spaced i.i.d.
complex Gaussian taps with a power-delay profile of
(10)
10
15
20
SNR (dB)
25
30
35
40
B. BER Performance
A. Simulation Setup
SN R = Nt x2 /v2 .
Fig. 3.
410
10
4QAM
16QAM
=1
16QAM
1
1x1
2x2
4x4
1x2
0.7
10
=2
16QAM
=1
4QAM
8x8
2x4
1x4
8x8
0.6
2x8
0.5
4x8
0.4
2x4
0.3
4x4
2x4
4x8
2x2
4x4
=2
4QAM
10
1x1
=2
0.8
10
=1
=4
0.9
2x2
1x2
10
0.2
0.1
MMSE
MMSEVBLAST
10
Fig. 4.
10
15
20
SNR (dB)
25
30
35
40
10
Fig. 5.
15
20
SNR (dB)
25
30
35
40
10
C. PER Performance
The PER performance curves are shown in Fig. 5. ZF and
MMSE yield very close PER performance, and similarly, ZFVBLAST and MMSE-VBLAST (except for Nt = Nr > 1).
For this reason, the curves in Fig. 5 only illustrate MMSE
and MMSE-VBLAST. The performance of VBLAST is consistently better than ZF/MMSE since for PER to reach below
100%, the SNR is already sufficiently high for infrequent error
propagation. At = 1, the PER for MMSE increases with the
number of antennas as compared to the roughly overlapped
curves previously observed in the BER plot.
For rms = 50ns, the channel selectivity leads to a degradation in PER compared to flat-fading channel. This is because
for each packet, its more likely to see bit errors caused by
a deep null in the channel frequency response (corresponding
to lower SNR), than in channels with smaller rms . This is
readily mitigated via interleaved channel coding techniques.
On the other hand, the BER stays the same for different rms
SNR=30dB
SNR=20dB
LMS, SNR=30dB
1
10
RLS, SNR=20dB
2x2
4x4
10
8x8
1x1
3
10
RLS, SNR=30dB
10
Fig. 6.
50
100
150
250
300
200
Number of iterations
350
400
450
500
411
TABLE I
C OMPARISON OF C OMPUTATIONAL C OMPLEXITY
LMS
RLS
Real
Algorithm
Multiplications
44
88
16 16
8Nt Nr + 2Nt
13
52
14Nr2
+ 8Nt Nr + 6Nr
36
143
19
154
1,229
+2Nt3 + 4Nt2 Nr
33
452
6,622
ML
4Nt Nr M Nt
410
4 105
1011
Wy
4Nt Nr
26
ZF/MMSE
4Nt3 + 8Nt2 Nr
ZF/MMSE
Nt4 + 83 Nt3 Nr
-VBLAST
412
413