Enhanced MIMO Detection With Parallel v-BLAST (Official Paper)

Enhanced MIMO Detection with Parallel V-BLAST
Arsne Pankeu Yomi and Bruce F. Cockburn

Department of Electrical and Computer Engineering
University of Alberta
Edmonton, AB T6G 2V4, Canada
{pankeuyo|cockburn}@ualberta.ca

AbstractFoschini quantified the large capacity of the multiple-
input multiple-output (MIMO) wireless channel and showed how
this capacity could be achieved using a layered and coded space-
time architecture. Unfortunately, his proposed diagonal Bell
Laboratories Space-Time (D-BLAST) detection algorithm has
proved awkward to implement. Attention has instead focused on
the simpler vertically-layered architecture. The well-known
vertical MIMO detectors, such as zero forcing (ZF), minimum
mean squared error (MMSE), maximum likelihood (ML),
vertical BLAST (V-BLAST), and several versions of sphere
decoding (SD), offer different trade-offs between computational
complexity and performance. V-BLAST offers intermediate, but
clearly suboptimal performance that has a computational
complexity that grows linearly in the number of transmitted
layers and in the size M of the symbol constellation. Fouladi
Fard, Alimohammad and Cockburn recently proposed a parallel
V-BLAST algorithm, which we call F-BLAST, that offers
performance that approaches that of optimal ML at the cost of
performing V-BLAST in parallel for all M possible values of the
symbol in the layer with the weakest expected signal-to-noise
ratio. Here we revisit the performance of F-BLAST and show
how the degree of parallelism can be reduced while maintaining
performance that greatly exceeds that of V-BLAST. The data
parallel structure of the new detection algorithms, and their
simpler control structure compared to SD, should offer
implementation advantages.
Keywords-MIMO, V-BLAST, sphere decoding, near-optimal
detection, parallel detection.
I. INTRODUCTION
An n
T
n
R
single-user Multiple-Input Multiple-Output
(MIMO) communication system has n
T
> 1 transmitting
antennas and n
R
> 1 receiving antennas [1]. When an n
T
-
element symbol vector s is transmitted over a flat fading radio
channel, the n
R
-element complex received signal vector y can
be expressed as the product y = Hs + n, where H is the n
R
n
T
channel matrix and n is additive white Gaussian noise
(AWGN). It has been established that the capacity of the n
T
n
R

MIMO channel, provided that H is sufficiently white and is
accurately known at the receiver, approaches min(n
T
,n
R
) times
the Shannon capacity of a conventional 11 channel [2]. The
obvious benefits of this capacity gain have caused MIMO
technology to be incorporated in most of the latest wireless
standards, including IEEE 802.11n, WiMax and LTE [3].
II. REVIEW OF MIMO DETECTORS
In Foschini's original MIMO scheme, the transmitted data
is demultiplexed into n
T
substreams of equal data rate [4].
Each substream is then encoded separately using block
encoders. The resulting n
T
substreams, called layers, are
mapped to the n
T
transmitting antennas by a diagonal encoder
that periodically rotates the mapping from the layers to the
transmitting antennas. In the receiver, the D-BLAST algorithm
uses successive interference cancellation to iteratively detect
the symbols in each diagonal layer. Specifically, D-BLAST
uses symbol nulling and symbol cancellation to minimize the
interference in the present signal vector y caused by the yet-to-
be-detected and already-detected symbols, respectively,
belonging to different layers but transmitted in the same
symbol time. Unfortunately, the rotating mapping used in
Foschinis scheme and the diagonal arrangement of the layers
in space-time leads to a relatively complicated detection
algorithm. Most subsequent work has thus considered simpler
vertically-layered MIMO schemes, where each layer lies within
the same symbol time. This allows each symbol vector to be
recovered from the corresponding single signal vector.
A variety of vertical MIMO detection algorithms have been
proposed in the literature that offer trade-offs between the
detection accuracy and the computational complexity in the
receiver. Statistically-optimal detection using the maximum
likelihood (ML) algorithm [1] is impractical for most systems
since the computational complexity increases exponentially
with the number n
T
of transmit antennas and the number of bits
encoded in the transmitted symbol vector [2]. Zero forcing
(ZF) and minimum mean squared error (MMSE) detectors
offer fast detection that is linear in both n
R
and n
T
. They both
involve premultiplying y by a conditioning matrix G, computed
from the receiver's estimate H' of H, and then slicing each
component to the nearest constellation point. The conditioning
matrix G is chosen to minimize the inter-layer interference.
For ZF detectors where n
R
= n
T
(n
R
> n
T
), the conditioning
matrix is the inverse (or pseudo inverse) of H' [1]. MMSE
detectors achieve better signal detection accuracy than ZF
detectors by using a conditioning matrix that exploits both H'
and an estimate of the signal-to-noise ratio of the channel. SD
detectors search an n
T
-dimensional hypersphere subset of the
universe of all possible signal vectors s. Estimates of the signal
power are used to dynamically adjust the size of the
hypersphere centered on an initial estimate of s. SD provides
near-optimal detection at the cost of a relatively complicated
control algorithm that ensures that the hypersphere is sized
This work was supported by iCORE and by the Natural Sciences and
Engineering Research Council (NSERC) of Canada under grant OGP0105567.
702 978-1-4577-0253-2/11/$26.00 2011 IEEE
correctly and searched efficiently [5, 6]. The algorithmic
complexity of SD and the variability in the computational load
have motivated the development of more easily implemented
algorithms derived from SD, such as the tree-search-based K-
best algorithm [7, 8]. Other researchers have investigated ways
of reducing the complexity of SD by restricting the search
space along some dimensions [9]. These SD-inspired
detectors, however, remain challenging to implement in silicon
because of their algorithmic complexity.
The vertical BLAST (V-BLAST) detection algorithm offers
greater accuracy than either ZF or MMSE with a computational
complexity that grows linearly in the number n
T
of transmitting
antennas [10, 11]. Like D-BLAST, V-BLAST uses an iterative
successive interference cancellation strategy. The first symbol
to be detected in a signal vector y must be recovered with no
knowledge of any other symbol in the corresponding symbol
vector s. Moreover, any detection errors in the first symbol
will enhance the interference experienced during the detection
of the remaining symbols in s, and this could cause error
propagation. To minimize the symbol error rate (SER) of the
first and subsequent layers, V-BLAST maintains estimates of
the post-detection signal-to-noise ratio (SNR) for each layer,
and then orders the layers from the greatest to the least
estimated SNR. The first symbol, which has the largest
expected SNR, is then usually detected following
premultiplication by either a ZF or (most commonly) an
MMSE conditioning vector, which attempts to null the
interference from the n
T
- 1 other layers.
As in D-BLAST, in V-BLAST the receiver is assumed to
have an accurate estimate H' of the channel matrix H. Let s
denote the column vector comprising n
T
symbols that arrive in
the same symbol time. These n
T
symbols are combined
linearly by H and corrupted with AWGN n to form the n
R

complex elements of the received vector y = Hs + n.
Given an estimate H' of the channel matrix, one can
compute a nulling matrix G as follows [10]:
G = (H
H
H' + (
n
2
/
s
2
) I
nT
)
1
H
H
. (1)
where H
H
denotes the Hermitian of H',
s
2
/
n
2
denotes the
estimated SNR at the receiver, and I
nT
denotes an n
T
n
T

identity matrix. This definition of G specifies a linear prefilter
that, when applied to the received signal vectors y, allows the
symbol vectors s to be estimated with minimum mean squared
error (MMSE) under the assumption of identically-distributed
transmitted symbols; that is, the expectation E{||Gy s||
2
} is
minimized. The norms of the rows of G are inversely
proportional to the expected post-detection SNR of the symbols
in s, and thus G can be used to re-order the detection of the
signal layers from (1, 2, ..., n
T
) to some permutation (k
1
, k
2
, ...,
k
nT
) according to their SNR.
(Step 0) Ordering: Define the row ordering (k
1
, k
2
, ..., k
nT
)
recursively as follows. Define k
1
e {1, ..., n
T
} to be the row
index for G=G
1
(the nulling matrix for H') that identifies the
row with the smallest norm. For j > 1, define H
j+1
to be the
matrix obtained by deleting columns k
1
, k
2
, ..., k
j
from H' and
define G
j+1
to be the nulling matrix for H
j+1
. Finally, define k
j+1

to be the original row index for G that corresponds to the row
in G
j+1
with the smallest norm. After the row/layer ordering
has been determined, let s
1
denote the symbol from layer k
1
that
is detected first from y, let s
2
denote the symbol from layer k
2

that is detected second, and so on, with s
nT
denoting the symbol
from layer k
nT
that is detected last. Without loss of generality,
the symbols in s and the columns of H can be permuted so that
k
1
=1, k
2
=2, ..., k
nT
= nT. The detection of symbol s
j
, where 1 s j
s n
T
, occurs in three steps given an n
R
-element vector y
(j)
,
where y
(1)
is the initial signal vector y and y
(j)
, for j > 2, is a
processed signal vector that results from canceling the
predicted interference from the j-1 previously detected symbols
of s. The first symbol s
1
to be detected does not have a symbol
cancellation step, so the three steps in V-BLAST are as
follows, for j iterating from 1 to n
T
:
(Step 1) Nulling: Vector y
(j)
contains interference from
symbols s
j+1
, . . . , s
nT
. However, this interference can be
minimized by premultiplying y
(j)
by the nulling vector g
j
, which
is the jth row of G
j
.
(Step 2) Slicing: Symbol s
j
is detected by selecting the symbol
s
j
that minimizes the difference || g
j
y
(j)
s
j
|| over all M pos-
sible symbols s
j
in the constellation.
(Step 3) Cancellation: Vector y
(j+1)
is computed by subtracting
H'[s
1
, s
2
, ..., s
j
, 0, ..., 0] from y. Note that this step is not
required after detecting the last symbol s
nT
.
The performance of detection algorithms at high SNRs is
usefully characterized by their diversity order, which
corresponds to the negative of the slope of the logarithm of the
bit error rate (BER) plotted against the logarithm of the SNR
per bit [1]. By fully exploiting all available information at the
receiver, the ML detector achieves a diversity order of n
R
, the
same as a maximum ratio combiner of the n
R
received signals.
Linear detectors, such as ZF and MMSE, achieve a diversity
order of n
R
n
T
+ 1 [1]. V-BLAST is limited to this same
diversity order because a linear detector, usually MMSE, is
used to recover the first symbol in each layer [12]. This
observation motivated our investigation for ways of further
reducing the interference affecting the first layer while
maintaining the simple algorithmic simplicity of V-BLAST.
Figure 1. Near-optimal performance of F-BLAST [13].

703

Figure 2. Symbol Error Rate vs. SNR for F-BLAST for different parallel-search layers and increasing numbers of antennas.

III. F-BLAST: AN IMPROVED PARALLEL V-BLAST
The performance of V-BLAST can indeed be improved by
increasing the detection accuracy of the first symbol. Fouladi
Fard et al. proposed that this be done by considering all M
possible values of the symbol in the weakest layer [13], and
then for each value canceling its predicted interference on the
signal vector y and performing conventional V-BLAST to
detect the n
T
1 remaining layers. The detected signal vector
is the one s among the M candidates that minimizes the mean
squared error ||H s y||
2
. We will refer to this parallelized and
improved V-BLAST as F-BLAST. The computational
complexity of F-BLAST is greater than conventional V-
BLAST by a factor of roughly M (n
T
-1)/n
T
.
The three outer plots in Fig. 1 are Symbol Error Rate (SER)
vs. SNR curves that were obtained when simulating a 44
MIMO system that transmits 16-QAM symbols from each
antenna. The elements of H are complex and Gaussian-
distributed with zero mean and unity variance. The received
signals are corrupted with AWGN according to the given SNR.
The upper plot was obtained with an MMSE detector. The
middle plot used a conventional V-BLAST detector. Lastly,
the lowest plot was obtained using an F-BLAST detector. The
transmitted symbols were formed from 2x10
4
blocks. Each
symbol block contains 10 frames, and each frame contains 2M
symbol vectors. M denotes the symbol constellation size (e.g.,
16, 64, 256). A new matrix H was generated for each frame.
To ensure reliable statistics at even high values of SNR, the
simulations were allowed to run longer to ensure that each SER
point represented >1000 symbol errors. These plots confirm the
significantly improved performance of F-BLAST compared to
both MMSE and V-BLAST, as reported in [13].
The inset in Fig. 1 compares the performance of an ML
detector with F-BLAST. Note that the SER of F-BLAST
approaches the optimal performance of ML for SNRs ranging
from 0 dB to at least 24 dB. The near-optimal performance of
F-BLAST has been confirmed in simulation for MIMO
configurations 44, 66 and 88, for constellation sizes M
ranging from 16 to 256, and for SNRs from 0 dB up to 40 dB.
Figs. 2(a), (b) and (c) show plots that illustrate the
simulated performance of F-BLAST for 44, 66 and 88
MIMO systems, respectively, for all possible choices of the
exhaustively searched layer. In the plots, W1 denotes the
weakest layer, W2 denotes the second weakest layer, etc.;
similarly, S1 denotes the strongest layer, S2 denotes the second
strongest layer, etc. MI denotes the layer, other than layer S1,
that is expected to cause the maximum interference on S1. Let
h
S1
denote the column of H corresponding to S1. The MI layer
is the layer, other than S1, that corresponds to the element of
h
S1
that has the largest magnitude. F(W2)-BLAST and F(MI)-
BLAST refer to F-BLAST where the exhaustively searched
layer is W2 and MI, respectively. Note that the best
performance was obtained in all cases when the exhaustively
searched layer was the weakest layer W1. Henceforth when we
refer to F-BLAST we will always mean F(W1)-BLAST.
The diversity order of the M parallel V-BLAST
subdetectors in F-BLAST will be increased by 1 with respect to
a conventional V-BLAST detector for the original problem
since the number of interfering layers experienced by the first
layer in the most likely to be successful V-BLAST subdetector
will be n
T
- 2 and not n
T
1. Instead of using the weakest layer
as the exhaustively searched layer in F-BLAST, one could in
fact use any of the other layers (including the strongest layer).
However, our simulation results confirm that the weakest layer
is the best choice for the exhaustively searched layer since, as
the channel SNR increases, the interference experienced by the
strongest layer (as well as the n
T
2 intervening layers) will
become increasingly dominated by the interference from the
weakest layer rather than from the channel AWGN. By
exhaustively searching over the weakest layer, and thus
effectively eliminating its interference contribution on the (n
T

1) other layers, the near-optimal performance of F-BLAST
shows that the overall diversity gain is increased by an
additional factor that approaches 2, for a total diversity gain
approaching n
T
(i.e., the same as ML and SD).
704

Figure 3. Search Windows of Size W = 8 (dark grey regions) and 16 (dark grey and surrounding shade) for the QAM Constellation of Size M = 64

Figure 4. Symbol Error Rate vs. SNR for MMSE, V-BLAST, FR-BLAST of Various Reduced Search Windows, and F-BLAST

IV. FR-BLAST: F-BLAST WITH REDUCED
PARALLELISM
F-BLAST has a convenient parallel structure that could
be exploited in implementations (e.g., simplified pipelining
and sharing of hardware blocks). Note also that the parallel
V-BLAST subdetectors share the same nulling matrix G.
However, the M-fold parallelism of the subdetectors would
likely be considered expensive in hardware cost and power
for M = 16 let alone for 64 or larger. We therefore began
investigating the possibility of trading off some of the
excellent performance of F-BLAST in an effort to reduce the
degree of parallelism in the searched layer, and hence reduce
the cost. We call the resulting family of reduced parallelism
detectors FR-BLAST (described with early results in [14]).
Two questions immediately arise in the design of FR-
BLAST. First, should the searched layer continue to be the
weakest layer? Second, how should the restricted window of
symbols be constructed as a subset of the symbol
constellation for the searched layer? Simulation results
(obtained after [14]) convinced us that the weakest W1 layer
was not the best choice for the parallel searched layer for FR-
BLAST, as it is for F-BLAST. Rather, the second weakest
W2 and maximum interference MI layers give the best (and
very similar) results. We also determined that the W2 and
MI layers were different over 65% of the time.
The search window position in the constellation can be
determined in various ways, but we found that the MMSE
estimate for the symbol in the searched layer was a
reasonable choice. Given a fixed search window size (i.e.,
the number of considered symbols), the shape of the search
window for each symbol position s
X
was determined
empirically by simulation experiments that collected
histograms for the (assumed near-optimal) F-BLAST
estimate given that the MMSE estimate was s
X
. For each
histogram, a search window of size W was constructed by
selecting the W most likely F-BLAST decisions for each s
X
.
The M windows were then stored in look-up tables. (The
number of tables can be reduced by exploiting constellation
symmetry.) As one might expect, the resulting optimized
windows included points that were relatively close in
Euclidean distance to each s
X
, but the precise window shapes
did not have an appreciable effect on FR-BLAST
performance. Figure 3 shows the ten unique search windows
for W = 8 and 16 for M = 64. The number of windows has
been reduced to ten in this figure by exploiting all possibly
symmetries.
Figs. 4(a), (b) and (c) were obtained with MMSE, various
versions of FR-BLAST, and F-BLAST when detecting 64,
128 and 256-QAM symbols in a 44 MIMO system. Note
705
the large and stable diversity order of F-BLAST at high
SNRs, which carries on the near-optimal performance that
was directly verified in simulation in comparison with ML
for the lower SNR values. The different versions of FR-
BLAST were obtained by considering the three best choices
(W1, W2 and MI) of the searched layer with various search
window sizes. For each considered combination of
constellation size M and window size, using either W2 or MI
as the searched layer produced better performance than W1.
For each combination of constellation size M and searched
layer, the performance improved as the search window was
increased in size. Thus, by increasing the search window
size (and hence increasing the degree of parallelism), one can
obtain SER performance that lies in between the
performance of V-BLAST and F-BLAST.
The FR-BLAST plots in Fig. 4 also illustrate how the
diversity order in each configuration levels off and
approaches the order of V-BLAST as the SNR increases.
What appears to cause this effect is that the correct value of
the symbol in the searched layer becomes increasingly
difficult to predict. The corresponding window histograms,
which record the probability of the F-BLAST estimates with
respect to each given MMSE estimate s
X
of the symbol in the
searched layer, flatten out as the SNR increases. Thus the
interference experienced by the strongest layer (i.e., the first
layer detected by the parallel V-BLAST subdetectors) is no
longer effectively reduced by the initial symbol cancellation
step. Interestingly, F-BLAST avoids the problem, at even
the highest SNRs, by simply considering all of the possible
symbols in the searched layer. As long as the strongest layer
continues to be detected near-optimally, the successive
interference cancellation strategy of V-BLAST allows the
remaining symbols to be recovered almost as accurately.
V. COMPUTATIONAL COMPLEXITY
The implementation cost of detectors involves several
different interrelated quantities. The required hardware
(e.g., the number of adders, multipliers, bits of intermediate
storage) can be traded off against the detection latency
(processing delay between the arrival of a new signal vector
and the output of the detected bits) and the data throughput
(the number of bits detected per unit time). The energy per
decoded bit is an especially important figure of merit for
detectors that are to be used in battery-powered applications.
A detailed discussion of the cost models is beyond the scope
of this paper. However, Table 1 has been included to
illustrate how the new F-BLAST and FR-BLAST detectors
compare with MMSE, V-BLAST and ML. The four last
table columns express the computational cost, per decoded
symbol vector, in terms of the number of real-valued
multiplications, additions, reciprocal operations, and the
number of clock cycles assuming one operation per clock
cycle and assuming the availability of arbitrarily large
hardware parallelism. ML detectors for M = 16, 64 and 256
are prohibitively expensive in terms of multiplications and
additions. The fully parallel time cost is misleading for this
detector since the hardware parallelism would be enormous
(proportional to M M M M). In the case of M = 64, F-
BLAST requires 4.8 times the real multiplications and 2.8
times the real additions compared to MMSE. With FR-
BLAST, restricting the window size to W = 16 (25% of the
exhaustive M = 64 search of the weakest layer in F-BLAST)
reduces the multiplications and additions to 2.1 and 1.2
times, respectively, the operations required by MMSE. As
shown in Fig. 4(a), the symbol error rate of this FR-BLAST
detector is about two orders of magnitude lower than that of
MMSE for signal-to-noise ratios exceeding 28 dB. Further
restricting the FR-BLAST window size to W = 8 (12.5% of
M = 64) reduces these relative numbers to 1.7 and 0.9, but it
also increases the symbol error rate significantly.
VI. CONCLUSIONS
We described a restricted search version of the F-BLAST
(i.e., parallel V-BLAST) detector for MIMO systems. The
resulting FR-BLAST detectors exploit the parallel structure
of F-BLAST to gracefully trade off the near-optimal
performance of F-BLAST to reduce the computational cost
by reducing the degree of parallelism in the symbol layer that
is being searched. In F-BLAST, the best choice for the
searched layer is the one with the weakest estimated SNR;
however, in the reduced parallelism FR-BLAST detector
(and unlike the version in [14]), the best choice for the
searched layer is either the layer with the second weakest
SNR or the layer that is likely to produce the greatest
interference to the strongest layer (and these layers are often
different). As with F-BLAST, the performance of FR-
BLAST scales up well as the number of antennas increases
and as the size of the constellation grows. The convenient
parallel structure of F-BLAST remains in FR-BLAST,
although some additional complexity is required to construct
the search windows. However, the fixed search windows
can computed off-line in advance and stored in a read-only
memory. The parallel structure of FR-BLAST could be
exploited in MIMO-OFDM systems, where MIMO decoders
are required for a relatively large number of subcarriers.
General expressions for the computational complexity of
ML, MMSE and F-BLAST appear in [14]. Consider the
detection of 64-QAM symbols in a 44 system. The number
of real-valued additions, multiplications and reciprocal
operations per detected signal vector (ignoring the null
matrix computation) rises by factors of 13.7, 12.9 and 50
(from 2 to 100), respectively, when changing the hard
decision detector from MMSE to FR(W2,16)-BLAST, and
the result is a reduction in the SER by a factor of roughly 40
for SNRs exceeding 30 dB. The complexity of FR-BLAST
with respect to F-BLAST is determined mostly by the ratio
of the number of parallel V-BLAST subdetectors in FR-
BLAST to the size of the symbol constellation. The search
windows required by FR-BLAST are precomputed and do
not add to the run-time computational complexity but they do
require some storage capacity, which can be minimized by
exploiting symmetries in the constellation. FR-BLAST
computes the same nulling matrix as F-BLAST, and in both
cases this cost is incurred each time the receiver updates its
estimate of the channel matrix.
The work on FR-BLAST is being extended in several
different directions. Selection diversity could be exploited to
combine the outputs of parallel FR(W2)-BLAST and
706
FR(MI)-BLAST detectors. The rule for selecting the best
layer to searched (for each updated channel matrix estimate)
could be made more sophisticated than simply selecting the
second weakest W2 or maximum interference MI layers. In
particular, for the highest SNRs it might be more appropriate
to choose the layer that has window histograms where, with
high probability, the optimal estimate of the symbol in the
searched layer will be well covered by the search window
(i.e., the layer where the profiles of the window histograms
are a good fit with the available search window size).
Finally, we are extending F-BLAST and FR-BLAST to
produce soft outputs that can then be decoded with an
iterative decoder (e.g., a standard Turbo decoder). In the
longer term it will be important to determine how the
complexity of the total system (soft detector and soft bit
decoder) and the expected energy per decoded bit are
affected by the use of parallel V-BLAST detectors.
ACKNOWLEDGMENT
The authors wish to thank Drs. Saeed Fouladi Fard and
Amirhossein Alimohammad for access to MATLAB
simulation modules and for fruitful discussions concerning
the implementation and performance of F-BLAST.
REFERENCES
[1] J. G. Proakis and M. Salehi, Digital Communications, 5
th
ed., New
York, NY: McGraw-Hill, 2008.
[2] M. Sellathurai and S. Haykin, Space-time Layered Information
Processing for Wireless Communications, Hoboken, NJ: John Wiley
& Sons, 2009.
[3] S. Sesia, I. Toufik, and M. Baker, LTE: The UMTS Long Term
Evolution: From Theory to Practice, Hoboken, NJ: John Wiley &
Sons, 2009.

[4] G. J. Foschini, Layered Space-Time Architecture for Wireless
Communication in a Fading Environment When Using Multiple
Antennas, Bell Laboratories Technical Journal, vol. 1, no. 1, pp. 41-
59, Autumn 1996.
[5] M. O. Damen, H. E. Gamel, and G. Caire, On maximum-likelihood
detection and the search for the closest lattice point, IEEE Trans. Inf.
Theory, vol. 49, no. 10, pp. 2389-2402, Oct. 2003.
[6] B. Hassibi and H. Vikalo, On the sphere-decoding algorithm II.
Generalizations, second-order statistics, and applications to
communications, IEEE Trans. Signal Process., vol. 53, no. 8, pp.
2819-2834, Aug. 2005.
[7] Z. Guo and P. Nilsson, Algorithm and implementation of the K-best
sphere decoding for MIMO detection, IEEE Sel. Areas Commun., vol.
24, no. 2, pp. 491-503, Mar. 2006.
[8] M. Shabany and P. G. Gulak, A 0.13um CMOS 655Mb/s 4x4 64-
QAM K-Best MIMO detector, 2009 IEEE Int. Solid-State Circuits
Conf., pp. 256-257, 257a.
[9] J. W. Choi, B. Shim, A. C. Singer, and N. I. Cho, Low-complexity
decoding via reduced dimension maximum-likelihood search, IEEE
Trans. Signal Proc., vol. 58, no. 3, pp. 1780-1793, Mar. 2010.
[10] P. W. Wolniansky, G. J. Foschini, G.D. Golden, and R.A. Valenzuela,
V-BLAST: An architecture for realizing very high data rates over the
rich scattering wireless channel, Proc. Int. Symp. Signals, Systems,
and Electronics, 1998, pp. 295-300.
[11] G. D. Golden, J. G. Foschini, R. A. Valenzuela, and P. W.
Wolniansky, Detection algorithm and initial laboratory results using
V-BLAST space-time communication architecture, Electronics
Letters, vol. 35, no. 1, pp. 14-15, Jan. 1999.
[12] Y. Jiang, X. Zheng, and J. Li, Asymptotic performance analysis of V-
BLAST, IEEE GLOBECOM 2005, pp. 3882-3886.
[13] S. Fouladi Fard, A. Alimohammad, and B. F. Cockburn, Improved
layered MIMO detection algorithm with near-optimal performance,
IET Electronics Letters, vol. 45, no. 13, pp. 675-677, June 18, 2009.
[14] A. Pankeu Yomi and B. F. Cockburn, Near-optimal and efficient
MIMO detectors for 64-QAM symbols, IEEE Cdn. Conf. Electrical
and Computer Eng. (CCECE 2010), May 2-5, Calgary, AB, 6 pp.

4 4 MIMO
Detector

M
Real
Multiplications
Real
Additions
Real
Reciprocals
Fully Parallel
Time in Cycles

ML
16 4,718,592 6881312 0 25
64 1,207,959,553 1,761,607,7729, 0 33
256 309,237,645,313 450,971,566,145 0 41

MMSE
16 1,304 1,138 2 39
64 1,496 1,730 2 41
256 2,264 4,050 2 43

V-BLAST
16 2,080 (1.6) 1,518 (1.3) 4 (2) 128
64 2,464 (1.7) 2,120 (1.2) 4 (2) 136
256 4,000 (1.8) 4,440 (1.1) 4 (2) 144

F-BLAST
16 9,784 (7.5) 11,708 (10) 50 (25) 109
64 54,712 (37) 72,636 (42) 194 (97) 117
256 510,904 (226) 733,360 (181) 770 (335) 126
FR-BLAST
W = 8
64 7,767 (5.2) 9,846 (5.7) 50 (25) 117
256 16,888 (7.5) 23,770 (5.9) 50 (25) 126
FR-BLAST
W = 16
64 14,392 (9.6) 18,816 (11) 98 (49) 117
256 32,824 (15) 46,660 (12) 98 (49) 126

Table 1. Symbol Vector Computational Complexity of Alternative 4 4 MIMO Detectors
Note: Numbers in brackets give counts relative to the MMSE detector with the corresponding value of M.
707

Enhanced MIMO Detection With Parallel v-BLAST (Official Paper)

Diunggah oleh

Informasi Dokumen

Deskripsi Asli:

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Enhanced MIMO Detection With Parallel v-BLAST (Official Paper)

Diunggah oleh

Hak Cipta:

Format Tersedia

Enhanced MIMO Detection with Parallel V-BLAST

Arsne Pankeu Yomi and Bruce F. Cockburn

Anda mungkin juga menyukai