Anda di halaman 1dari 9

76

IEEE Transactions on Consumer Electronics, Vol. 62, No. 1, February 2016

Selective Gray-Coded Bit-Plane Based


Low-Complexity Motion Estimation and its
Hardware Architecture
Seda Yavuz, Anl elebi, Member, IEEE, Muhammad Aslam, Ouzhan Urhan, Member, IEEE
Abstract Today, many consumer electronics devices have successor HEVC [2] (High Efficient Video Coding) standards

video capturing capability which is one of the most time, power


and memory consuming application. Motion estimation (ME) is
the key part of the video coding process in terms of
computational load. Thus, it is important to implement this
process in a resource efficient way without degrading the
encoding quality and real-time operation performance. Low bitdepth representation based ME methods draw a lot of attention
in consumer electronics area mainly thanks to its highly
efficient hardware and software implementations. However,
these low bit-depth representation based methods generally
assume that the low bit-depth images are already available.
Furthermore, these methods simply neglect the binarization cost
which is not a proper approach when whole encoding
architecture is of concern.
This paper presents a novel selective Gray-coding based ME
method and its hardware architecture with an embedded system
integration by making use of one of the most common
interconnect architecture in consumer electronics devices.
Experimental results show that it is possible to reduce
computational load of binarization stage significantly while
improving the ME accuracy by the proposed approach
compared to methods at the same category1.
Index Terms Motion estimation, Gray-coding, One-bit
transform, Low-complexity ME.

I. INTRODUCTION
The number of devices having video capturing capability is
increasing every day. Especially, smart phones and tablets are
extensively used to capture and share video data. It is obvious
that, efficient compression methods are needed to store these
videos in a limited memory. Additionally, transmission of
captured raw video requires compression as well, for utilizing
available network bandwidth efficiently.
Starting from the introduction of first video coding
methods, motion compensated hybrid coding approach has
been extensively utilized. Today, H.264/AVC [1] and its
1
S. Yavuz, A. elebi and M. Aslam are with Kocaeli University Integrated
Systems Laboratory (KUTSAL), Electronics and Telecom. Eng. Dept.,
Umuttepe
Campus,
41380,
zmit/Kocaeli,
Turkey
(e-mail:
anilcelebi@kocaeli.edu.tr).
O. Urhan is with Kocaeli University Laboratory of Embedded and Vision
Systems (KULE), Electronics and Telecom. Eng. Dept., Umuttepe Campus,
41380, zmit/Kocaeli, Turkey (e-mail: urhano@kocaeli.edu.tr).

Contributed Paper
Manuscript received 12/31/15
Current version published 03/30/16
Electronic version published 03/30/16

are also developed based on the same concept where intraframe redundancies are exploited by making use of intra
prediction and transform coding whereas block-based motion
estimation techniques are employed to take advantage of
temporal redundancies. Statistical redundancies are exploited
by entropy coding techniques such as CALVC (Contextadaptive variable-length coding) and CABAC (Contextadaptive binary arithmetic coding). It is important to note that
the ME part is generally the most time consuming stage in a
video encoder [3].
In the block-based ME, each frame is divided into
non-overlapping blocks and each block in current frame is
searched around a wider area of the same location in reference
frame/s which is referred to as search window. Sum of
squared differences (SSD) or sum of absolute differences
(SAD) criterion is utilized to decide similarity between the
original and candidate blocks. Since the current block is
searched in all possible candidate locations within the search
range, the computational complexity of this process is quite
high. This method is referred to as full-search (FS) based ME
because all the candidate locations are checked.
There are several group of approaches in the literature to
reduce computational load and the hardware complexity of the
full search based ME method. The main motivation of the first
group of approaches is to check only a sub-set of all candidate
locations in search window. Three-step search [4], diamond
search [5] and hexagonal search [6] based ME methods are
members of this category where only pre-defined search
locations are checked. Adaptive search range determination
based approaches such as the method presented by Lee et al
[7] can be put into this group as well where only limited
number of candidates are checked based on a pre-decided
search range for each block.
The second group of approaches propose to reduce the
number of pixels utilized for computing the matching criterion
by making use of a specific sub-sampling pattern such as
quarter [8], quincunx [9], 8-Queen [10] and reconfigurable
boundary [11].
The third group targets to skip computation of matching
criterion for specific or all remaining candidate location/s. For
example, successive elimination algorithm (SEA) based
methods such as the approach presented by Li et al [12],
compute the lower bound of the matching criterion at lower

0098 3063/16/$20.00 2016 IEEE

S. Yavuz et al.: Selective Gray-Coded Bit-Plane Based Low-Complexity Motion Estimation and its Hardware Architecture

complexity and thus skip impossible candidates before


computing matching for this candidate. Early termination
methods such as the scheme proposed by Yang et al [13] aim
same target by checking partial matching result against to the
lowest matching error currently available. Thus, it becomes
probable to eliminate impossible candidate locations without
computing full matching criterion for the related current
block.
The last group of approaches [14]-[23] propose to utilize
lower complexity matching criteria compared to SSD or SAD.
These methods generally referred to as bit plane matching
(BPM) based techniques where, image frames are represented
in lower bit-depth and Boolean operations are utilized for
computing the matching criteria. It is known that the Boolean
operations can be effectively carried out in the case of
hardware implementations. Since these group of approaches
with their hardware implementations are in the focus of this
paper, they are discussed in the following chapter in detail.
However, it is important to note that, by using BPM based ME
methods, it would be possible to increase the efficiency of the
single instruction multiple data (SIMD) infrastructure which is
available in almost all of the processing resources of current
consumer electronics devices.
It is also possible to combine different group of methods to
further speed up ME process [24]-[31]. The low bit depth
representation based approaches explained above are also
combined with sparse search [24], [25], early termination
[26]-[28] and, adaptive search [29]-[32] based techniques to
further speed-up ME process. These approaches might prevent
efficient data scheduling in the case of hardware
implementation. Another group of techniques propose to
perform an additional local search process around the best
matching results of BPM based method by making use of
SAD criterion [33]-[36]. Since binary nature of the method is
degraded, these kind of approaches are not suitable for
efficient hardware implementations.
A novel binarization technique and its hardware is proposed
in this paper. The ME method presented, benefits from the
easy binarization and efficient pixel representation properties
of Gray coding by constructing a single bit plane in a novel
selection scheme. Thus, the proposed method provides
superior ME performance compared to many existing multibit depth BPM based ME methods. The hardware architecture
developed for proposed ME method can operate in real-time
with no on-chip memory requirement for current and
reference block.
II. LOW BIT-DEPTH REPRESENTATION BASED ME
METHODS
In full search block based motion estimation approaches,
image frames are divided into non-overlapping blocks and each
block is searched within a search window in reference frame. Let
I c and I r show current and reference image frames,
respectively then, motion vector of a certain block of size NN

77

pixels can be decided as follows:


N 1 N 1

SSE m, n I c i, j I r i m, j n ,
2

i 0 j 0

(1)

s m, n s

where (m,n) denotes candidate motion vector, s determines the


search range. The candidate motion vector giving the lowest
matching error (SSE) is assigned as the motion vector of the
block.
As briefly described in previous section, checking all possible
candidate locations in search window using SSD or SAD based
matching criteria as in (1) causes a significant computational
burden. Low bit-depth representation based methods aim to
utilize low complexity matching criteria by reducing number of
bits/bit-planes used to represent image frames. Thus, overall
complexity of the motion estimation can be reduced. Feng et al
[14] presented bit-plane matching based motion estimation as a
preprocessing step to speed-up overall motion estimation process
where block mean ( Tbm ) is utilized as a threshold for
constructing binary image frames. In the method presented by
Natarajan et al [15], image frames are initially filtered by making
use of a multi-band pass filter and then the filtered image frame
is compared against to filtered image frame to determine binary
representation of the input frame. After the binarization step,
motion vectors are decided based on the number of nonmatching points (NNMP) criterion as follow:
N 1 N 1

NNMP m, n B c i, j B r i m, j n
i 0 j 0

(2)

where B c and B r show binary form of the current ( I c ) and


reference frame ( I r ) obtained by comparing original image
frame against the filtered frame and denotes Boolean EX-OR
operation. The candidate location giving the lowest NNMP value
is decided as the motion vector of the block. Natarajan et al [15]
also presented a hardware architecture to illustrate the
effectiveness of this matching criterion. However, the cost of the
binarization process is not assessed in this worked. It is assumed
that they are already available for block matching process. It is
important to note that this method (i.e. one-bit transform - 1BT
based ME) requires a total of 25 integer addition, 1 real division
and 1 comparison operations per pixel to obtain corresponding
binary image frame.
A diamond shaped binarization kernel which avoids real
division operation is proposed by Ertrk [16]. This kernel
includes 16 non-zero components and thus normalization
operation is carried out by making use of integer shift operation.
This method is referred to as multiplication-free one-bit
transform (MF-1BT) based ME and it requires all integer 16
additions, 1 four-bits shift and 1 comparison operations per pixel
through the binarization process. It is shown by Ertrk [16] that,
the MF-1BT is able to provide similar motion estimation
accuracy compared to 1BT based ME [15].

78

IEEE Transactions on Consumer Electronics, Vol. 62, No. 1, February 2016

g7

g6

g5

g4

g3

g2

g1

g0

Fig. 1. Gray-coded bit-planes of Foreman frame #8.

Ertrk et al [17] proposed to utilize two bit-planes for ME


process. The first bit-plane is constructed similar to the
approach presented by Feng et al [14], whereas the second
one is computed by utilizing mean and standard deviation of a
larger block around current block. The binarization cost of
this method is significantly high as illustrated in Section V.
Another two bit-planes based representation is proposed by
Urhan et al [18] where the first bit-plane is computed as in
MF-1BT based ME and the second is constructed as a
constraint mask to decide the pixels that are reliable enough to
include matching criterion. The matching criterion
(constrained NNMP) of this approach also requires three
Boolean operation similar to the 2BT based method.
Gray-coded bit-plane matching (GCBPM) for global
motion estimation is proposed by Ko et al [21] and then it is
applied to motion estimation for video coding purpose by
Urhan et al [22].
The K-bit Gray code of a pixel value can be computed

that only the 3 most significant bit planes are utilized in


matching process. The method presented by elebi et al [23]
called as T-GCBPM based ME and it outperforms 1BT, MF1BT, 2BT and C-1BT based approaches mainly because of
three bit-planes utilized similar to [20].
Kuo et al [37] proposed to utilize an interlaced Gray-coding
pattern to obtain a single bit-plane for global motion
estimation purpose. This approach enables lower complexity
binarization compared to the other low complexity ME
methods except the T-GCBPM since a selection operation is
required for interlacing process. Our experiments revealed
that, this method has a similar ME accuracy compared to the
1BT based ME when it is applied to video coding. In this
paper, a novel selection and placement scheme for Gray-coded
bit-planes to further improve the ME accuracy compared to
the method presented by Kuo et al [37] is proposed. After the
proposed binarization process, (2) is employed to decide
motion vector.
III. PROPOSED BINARIZATION APPROACH

g K 1 aK 1
g k ak ak 1 , 0 k K 2

(3)

where a shows natural binary code of pixel values. The


matching criterion (MC) for [21] is similar to (2) with a fixed
Gray-coded bit-plane. On the other hand, the MC for the
method presented by elebi et al [23] is computed as
N 1 N 1 K 1

MC m, n

i 0 j 0 k NTB

2k NTB g kc i, j g kr i m, j n

(4)

where NTB denotes number of truncated bits. It is shown that


the best ME results are obtained when NTB=5 which means

As described in the previous section, at the first step of


BPM based ME methods it is required to convert full bit depth
image frames into lower bit-depth representation. Then,
motion estimation is performed by making use of a suitable
matching criterion and search range. The main advantages of
the BPM based ME methods originate from their higher
speed, smaller footprint for area and power in hardware
implementation. As shown in many recent works [38]-[43],
efficient hardware architectures are presented in the literature
for BPM based ME methods. However, the cost of
binarization and its hardware cost in the case of 1BT, MF1BT, 2BT, C-1BT, WC-1BT based ME is neglected. The
T-GCBPM based method has a significant advantage since the
binarization can be implemented by making use of simple
EX-OR operations or look-up tables (LUTs).

S. Yavuz et al.: Selective Gray-Coded Bit-Plane Based Low-Complexity Motion Estimation and its Hardware Architecture

As described in Section II, Gray-coded bit-plane matching


based methods [21], [23] propose to employ a pre-selected
single or several bit-planes, respectively. Fig. 1 shows eight
gray-coded bit-plane of an image frame from the Foreman
sequence. As seen from this figure, higher bit-planes contains
most of the information available in the original frame.
However, when a single Gray coded bit-plane is evaluated, it
does not provide enough information about the original frame.
Since the method in [21] utilizes only a certain Gray coded
bit plane its ME performance for different image contents may
not be adequate. However, because of the single bit-plane
utilized, the overall computational complexity at the matching
stage will be lower. On the other hand, the T-GCBPM based
method employs the 3 most significant bit-planes (i.e. g7, g6,
g5) to represent images and thus provides better performance
with additional computation complexity in matching stage.
In this paper, we propose a novel combination of the
methods presented by elebi et al [23] and Kuo et al [37] to
construct a single bit-plane for each candidate positions which
contains Gray coded pixel values from the 3 most significant
bit-planes. By the proposed selection and placement of the 3
most significant bits of pixel Gray-code to construct single
bit-plane for matching, it becomes possible to exploit
advantages of both methods. Note that, the proposed method
utilizes a different bit-plane selection and placement scheme
for each candidate location compared to the method presented
by Kuo et al [37] where 4 bits are utilized in an interlaced
fashion as shown in Fig. 2.
In this paper, we present a novel bit-plane selection and
placement scheme which improves ME accuracy compared to
[37]. The bit-plane selection approach proposed in this paper
is shown in Fig. 3 for a 1616 image block. Note that we
construct binary image blocks for each candidate location
separately. The related works in GCBPM based ME [23],[43]
show that the contribution of the five least significant bitplanes to ME accuracy is limited compared to the most
significant 3-bit planes.

Fig. 2. Bit-plane selection approach presented by Kuo et al [37].

79

Fig. 3. Proposed bit-plane selection approach for a 1616 block.

Thus, we prefer not to include g4 into our selection scheme.


Additionally, distributed placement of bit-planes compared to
the method presented by Kuo et al [37] enables accurate
matching since the distance between selected bit-plane
positions are increased for neighbor pixels. Our experiments
show that the proposed bit-plane selection and placement
approach is able to improve ME accuracy of the method
proposed by Kuo et al [37].
IV. HARDWARE ARCHITECTURE
Low complexity ME methods are suitable for hardware
implementation as presented in the literature. Compared to the
hardware architectures proposed for SAD based ME methods
they are expected to be occupy smaller area on the chip at
least several orders of magnitude since only several bit planes
are utilized in BPM based ME methods. The power
consumption and memory requirements of the BPM based are
also expected to be lower compared to the that of SAD based
ME hardware architectures. The hardware architecture
proposed for the BPM based ME method developed in this
work is shown in Fig.4. Spiral search scheme is utilized as the
search method to allow further extension of the architecture to
be able to perform early termination or adaptive search range
techniques as shown in Fig. 5. The main components of the
architecture are the current block register array, search
window register array, 2D processing element (PE) array,
parallel counter and comparator. Note that, controller part is
not shown in the architecture since it is not an essential part in
the proposed architecture. The most important building block
of the proposed architecture is the MUX array placed between
register array and 2D PE array since the novel selection
scheme is implemented by this block.
Current block register array and search window register
arrays are composed of flip flops with three and four direction
shifting capabilities similar to the architecture presented by
Celebi et al [43]. Since 3 bit planes are needed for selection
process 3 register arrays are utilized for both current block
and search window.

80

IEEE Transactions on Consumer Electronics, Vol. 62, No. 1, February 2016

stage of addition for one pixel is ignored [43]. The last stage is
comparator where comparison operation is performed and
motion vectors of candidate block with minimum NNMP are
generated.
V. EXPERIMENTAL RESULTS

Fig. 4. Proposed hardware architecture

30

29

28

27

26

25

Shift register rotates upwards

31

12

11

10

24

Shift register rotates left

32

13

23

Shift register rotates downwards

33

14

22

Shift register rotates right


Data flows to the reverse direction
compared to the registers' routing
direction.

34

15

21

35

16

17

18

19

20

Fig 5. Spiral search diagram

According to the proposed selection scheme a 3 to 1


multiplexor is needed to construct the bit plane that is going to
be utilized in matching process. This functionality is
implemented by the 31 MUX array as shown in Fig. 4.
1616 center block of search window register array is sent
to 2D PE array after appropriate bit selection is performed for
each pixel by the MUX array of size 1616 as shown in Fig.
4. In 2D PE array, reference block and current block are
compared by using Boolean exclusive or (XOR) operation.
Parallel counter is utilized to calculate the number of the nonmatching pixels (NNMP) metric for each candidate location.
Parallel counter is composed of seven stages of sub parallel
counters of size 3|2, 7|3, 15|4, 31|5, 63|6, 128|7 and, 255|8
respectively.
Each macroblock contains 256 pixels but parallel counter
has 255 inputs. It is shown by experiments that the absence of
one pixel in the NNMP computation does not affect the ME
performance because of that to reduce the complexity one

In general, an open loop evaluation approach where initially


the current image frame is estimated from the previous one
and then of Peak Signal to Noise Ratio (PSNR) between the
original and estimated frames is utilized to assess estimation
performance of low bit-depth based ME methods. It might be
possible to integrate these methods into a full encoder to see
its effect on overall coding performance. However, in this
case, it may not be possible to evaluate performance of only
ME method since other components of the encoder will also
affect the coding performance. We are planning to investigate
encoder implementation of the proposed method as a future
work. In order to focus performance of ME part we have
decided to utilize open loop scheme similar to most of the
low-bit depth based ME literature.
Table I shows PSNR results in dB for six different
sequences displaying different motion characteristics. The
block size and search window are set to 16 for the result given
in this table. For a complete comparison among the methods
falling into same category, ME results of the 1BT [15], 2BT
[17], MF-1BT [16], C-1BT [18], GCBPM [22], T-GCBPM
[23] based methods are also given. Additionally, ME results
when a single Gray-coded bit-plane is utilized is provided as
well to show advantage of proposed selective Gray-coded
bit-plane based method.
As seen from the Table I, when a single Gray-coded
bit-plane is utilized, its ME performance significantly depends
on the selected bit-plane and image frame characteristic. For
example, in single Gray-coded bit-plane case, the best ME
performance is obtained from the 7th bit-plane for Football
sequence, whereas 5th bit-plane provides the best results for
Coastguard sequence. Thus, it is not reasonable to utilize a
single bit-plane to represent different type of image frames
efficiently.
In order to assess performance of proposed selective
Gray-coding based method, together with the selection
scheme presented by Kuo et at [27], we also investigate an
additional selective Gray-coding based configuration. As
described in the previous section, the method presented by
Kuo et at [37] utilizes pixels from four different bit planes
(g7,g6,g5,g4) in regular fashion to construct a single bit-plane
as shown in Fig. 2. In the second configuration (regular
selection test pattern), pixels from three different bit planes
(g7,g6,g5) are utilized and the first column contains only the
pixels coming from the 7th bit-plane while 2nd and 3rd columns
have pixels from the 6th and 5th bit-planes, respectively.
In the case of proposed configuration, g7, g6, g5 bit-planes
are utilized in a checkerboard style which enables better ME
accuracy than the regular selection test pattern mainly because
of the distributed utilization of the different bit-planes.

S. Yavuz et al.: Selective Gray-Coded Bit-Plane Based Low-Complexity Motion Estimation and its Hardware Architecture

81

TABLE I. PSNR PERFORMANCE (IN DB) OF DIFFERENT LOW COMPLEXITY ME METHODS IN OPEN LOOP SCHEME

Method
SAD (8-bit depth)
1BT [15]
MF-1BT [16]
2BT [17]
C-1BT [18]
GCBPM [22]
T-GCBPM [23]
Gray Coding 7th Bit Plane
Gray Coding 6th Bit Plane
Gray Coding 5th Bit Plane
Gray Coding 4th Bit Plane
Interlaced Gray-coding [37]
Regular Selection Test Pattern
elective Gray-coding (Proposed)

Football

Video Sequences (Frame Size, Sequence Length)


Foreman
Tennis
Flowergarden
Mobile
Coastguard

( 352 240 ) 352 288 ) ( 352 240 )


(125 frames) 00 frames) (150 frames)

22.88
21.83
21.81
22.06
22.10
21.87
22.59
21.66
20.79
20.31
19.54
21.94
22.09
22.24

32.09
30.32
30.38
30.70
30.86
30.96
31.32
28.46
27.92
29.27
28.70
30.92
30.62
31.03

29.45
28.11
28.18
28.46
28.71
28.24
28.78
27.49
27.34
27.44
26.52
28.47
28.46
28.69

( 352 240 )
(115 frames)

23.79
23.31
23.26
23.43
23.38
23.26
23.67
23.26
22.56
22.53
20.35
23.17
23.29
23.38

( 352 240 ) ( 352 288 )


(300 frames) (300 frames)

23.94
23.61
23.63
23.66
23.69
23.51
23.81
23.28
22.42
21.25
20.48
23.18
23.33
23.47

30.48
29.83
29.88
29.94
29.98
29.78
30.16
26.56
27.84
29.05
28.23
29.79
29.38
29.85

Average
of six video
sequence
27.11
26.17
26.19
26.38
26.45
26.27
26.72
25.11
24.81
24.98
23.97
26.25
26.22
26.44

TABLE II. NUMBER OF OPERATIONS REQUIRED FOR THE LOW-COMPLEXITY ME APPROACHES

ME
Approach
1BT [15]
MF-1BT[16]
2BT [17]
C-1BT [18]
T-GCBPM [23]
I-GCBPM [37]
Proposed

Transform
Matching
Addition Multiplication Shift Subtraction Comparison Boolean Op. Boolean Op. Shift Addition
(pp)
(pp)
(pp)
(pp)
(pp)
(pp)
(pp)
(pp)
(pp)
25
1
1
1
16
1
1
1
2.8125
1.0625
0.03125
3
1
3
16
1
1
2
3
2
3
3
3
1
4
2.5
1
1
5.6
2
1
-

It is also important to note that the proposed selection


approach provides 0.2dB better results in terms of PSNR on
average compared to method presented by Kuo et al [37] which
also means that the contribution of 4th Gray-coded bit-plane may
not be positive since it might contain some noisy binarization
results.
When the performance of the proposed selective Gray coded
based bit-plane method is compared against to other single bitplane based methods such as 1BT and MF-1BT, the proposed
method outperforms them around 0.3dB on average. When we
compare the proposed method against to the methods that use
two bit planes such as 2BT and C-1BT, the proposed method
provides similar or better ME performance in most of the
sequences.
Computational complexity of different methods is shown in
Table II. As seen from this table, the proposed method has
significantly lower complexity compared to 1BT, MF-1BT, C1BT and 2BT based approaches while providing similar or better
performance. Since both binarization and matching stages of the
proposed method is computationally lightweight, it is suitable
for efficient hardware and software implementations in mobile
devices having limited computational and battery power.

Proposed hardware architecture is implemented on 28nm


FPGA device. According to the synthesis results the proposed
architecture occupies 8747 LUTs and 7864 DFFs that is the
6.5% and 2.92% of the total available resources of the target
FPGA device, respectively.
The power and timing performance of the proposed hardware
architecture is also performed to evaluate its efficiency
compared to the previously proposed architectures. Power
analysis is performed with two different clock frequency. Table
III shows the power analysiss results at clock period 20ns and
10ns respectively. 3 different motion characteristics are used to
perform a fair comparison between the power consumption of
the previously proposed architectures.
There is no need for a dedicated memory for both current
block and search window thanks to the register array. Since
dedicated memories occupy smaller physical area it seems
better to use these components as a memory. However, these
blocks do not let a four-way movement which is essential for
implementing spiral search scheme. Thus instead of dedicated
block RAM resources, DFFs are used in an array like fashion.
In the proposed hardware architecture Level-D data reuse
scheme has become possible to be implemented with thanks to

82

IEEE Transactions on Consumer Electronics, Vol. 62, No. 1, February 2016


TABLE III. POWER ANALYSIS RESULTS

Power Consumption (mW)


20 ns/10 ns
signals
logic
10/14
08/12
12/17
10/14
08/ 10
07/09
11/16
10/14
08/12
07/10
06/10
05/08
12/19
11/ 16
09/12
08/ 11
06/10
05/08
9,11/13,3
7,89/11,3

Motion Vectors
hv_x
-3
3
4
8
9
9
-15
14
-12

hv_y
-3
2
0
7
-7
9
-11
-14
12
Average

Fig. 6. The data reuse scheme that proposed hardware architecture can
implement.
Control Signals

Processor
System

Control Signals,
Motion Vector

Control Signals

Video
Stream

Data
Stream

DMA IP

Data
Stream

Motion
Estimation IP

the utilized 4 way shift register array based memory


implementation. In Fig. 5 the data reuse concept is illustrated.
In [45] a detailed investigation on the impacts of data reuse
capability on the total memory bandwidth and thus the power
consumption of the ME hardware architectures are performed.
4 levels of data reuse schemes are defined in [45]. According
to [45], our architecture has the capability of implementing
Level-D data reuse scheme by which off-chip memory
bandwidth can be reduced more than 20 times. Thus, a whole
video coding system in which the proposed hardware
architecture is utilized, a low power consumption can be
easily achieved since the power consumption of the core logic
is much lower compared to an off chip dynamic memory.
According to the Table IV, it is seemed that in terms of
occupied number of LUTs, the proposed hardware
architecture occupies the largest area but no on chip memory
is utilized. Since the proposed hardware architecture has the
levelD data reuse capability as stated in [44] it will result the
lowest off chip memory bandwidth compared to the other
works presented in Table IV. It is important to note that none
of the architectures given in this table does not include
binarization data-path except the proposed hardware
architecture. Thus, they should not be considered as a turnkey
solution for the ME method they implement.
Video encoders are usually implemented as accelerators
connected to the processing system via a bus interconnect in
consumer electronics devices to offload the computational
load of the encoding process from the processor. By following
this approach, we have wrapped the proposed hardware
architecture with a common bus interconnect in order to
illustrate that it can be easily integrated into a state of the art
consumer electronics device. This concept is illustrated in Fig.
7 where DMA block is utilized to provide dense data transfer
between sensor and the ME accelerator developed in this
work. Once the data is received by the accelerator through a
buffer like memory interface it performs the matching process
and then informs the processor about the result with an
interrupt like interface. It takes 1089 clock cycles for the
hardware architecture to compute motion vector except the
memory transfer time which is a technology specific
parameter.

Fig. 7. Intergrated diagram of proposed hardware architecture


TABLE IV. ME PERFORMANCE COMPARISON

Bit depth
On chip memory
Area
Power
Maximum frequency
Technology
Search range
Search method

Proposed
3
0
8125 LUTs/7353 DFFs
8,5 mW@50MHz
243 MHz
FPGA 28 nm
[-16 16]
Spiral search

[39] Recompiled
1
24064
1121 LUTs/NAs
35,3 mW@50 MHz
218 MHz
FPGA 45nm
[-16,16]
Full search

[41]
1
4096
3914 LUTs/2517 DFFs
NA
192 MHz
FPGA 65nm
[-16,16]
Full search

[44] Recompiled
2
0
5413 LUTs/NA
30,7 mW@50 MHz
275 MHz
FPGA 45nm
[-1,1] to [-16,16]
Spiral search

S. Yavuz et al.: Selective Gray-Coded Bit-Plane Based Low-Complexity Motion Estimation and its Hardware Architecture

VI. CONCLUSIONS
In this paper, a selective Gray-coded bit-plane based
binarization approach for low complexity motion estimation
with its hardware architecture is presented. The proposed BPM
based ME method outperforms single bit-plane based methods
existing in the literature while providing similar or better
performance than the methods utilizing two bit-planes. It is
important to note that selective Gray-coded bit-plane based
method has the lowest binarization cost among the compared
methods except the conventional Gray coded BPM methods.
The proposed binarization approach is efficiently implemented
in hardware. It is shown that the architecture proposed is
suitable for seamless integration into state of the consumer
electronics devices by making use of a common bus
interconnect. Experimental results revealed that the proposed
architecture is capable of providing data reuse to reduce both
off chip data access time and power consumption dramatically.
REFERENCES
[1]

[2]
[3]

[4]
[5]
[6]
[7]

[8]
[9]
[10]

[11]
[12]
[13]
[14]

Joint Video Team (JVT) of ISO/IEC MPEG & ITU-T VCEG, Mart,
2003, "Draft ITU-T recommendation and final draft international
standard of joint video specification (ITU-T Rec. H.264/ISO/IEC 1449610 AVC)", JVT-G050.
ISO/IEC 23008-2:2013, High efficiency coding and media delivery in
heterogeneous environments -- Part 2: High efficiency video coding,
International Organization for Standardization. 2013-11-25.
T.C. Chen, Y.H. Chen, S.F. Tsai, S.Y. Chien, L.G. Chen, Fast
algorithm and architecture design of low-power integer motion
estimation, IEEE Trans. Circuits Syst. Video Technol., vol. 17, no. 5,
pp. 568-577, May 2007.
T. Koga, K. Linuma, A. Hirano, Y. Lijima, T. Ishiguro, Motion
compensated interframe coding for video conferencing, in Proc. Nat.
Telecommun. Conf., C9.6.1C9.6.5., 1981
S. Zhu, K.K. Ma, A new diamond search algorithm for fast blockmatching motion estimation, IEEE Trans. Image Process., vol. 9, no. 2
pp. 287-290, Feb. 2000.
C. Zhu C., X. Lin L.P. Chau, Hexagon-based search pattern for fast
block motion estimation, IEEE Trans. Circuits Syst. Video Technol.,
vol. 12, no. 5, pp. 349-355, May 2002.
J. Lee, M. Choi, Y. Cho, J. Kim, W.K. Cho, Fast H.264/AVC motion
estimation algorithm using adaptive search range, in Proc. 12th
International Symposium on Integrated Circuits, (ISIC '09); Singapore,
pp. 336-339, Dec. 2009.
M. Bierling Displacement estimation by hierarchical block matching,
in Proc. SPIE Conference on Visual Communications and Image
Processing; San Jose, CA, USA, pp. 942951, Oct. 1998.
K. Lengwehasatit, A. Ortega, Probabilistic partial-distance fast
matching algorithms for motion estimation, IEEE Trans. Circuits Syst.
Video Technol., vol. 11, no. 2, pp. 139-152, Feb. 2001.
C.N. Wang, S.W. Yang, C.M. Liu, T. Chiang, A hierarchical n-queen
decimation lattice and hardware architecture for motion estimation,
IEEE Trans. Circuits Syst. Video Technol., vol. 14, no. 4, pp. 429-440,
Apr. 2004.
A. Saha, J. Mukherjee, S. Sural, New pixel-decimation patterns for
block matching in motion estimation, Signal Process.-Image Commun.,
vol. 23, no. 10, pp. 725-738, Oct. 2008.
W. Li, E. Salari, Successive elimination algorithm for notion
estimation, IEEE Trans. Image Process., vol. 4, no. 1, pp. 105-107, Jan.
1995.
L. Yang, K. Yu, J. Li, S. Li, An effective variable block-size early
termination algorithm for H.264 video coding, IEEE Trans. Circuits
Syst. Video Technol., vol. 15, no. 6, pp. 784-788, June 2005.
J. Feng, K.T. Lo, H. Mehrpour, A.E. Karbowiak, Adaptive block
matching motion estimation algorithm using bit plane matching, in
Proc. of IEEE Int Conf. on Image Processing (ICIP), Washington DC,
USA. pp. 496499, Oct. 1995.

83

[15] B. Natarajan, V. Bhaskaran, and K. Konstantinides, Low-complexity


block-based motion estimation via one-bit transforms, IEEE Trans.
Circuit Syst. Video Technol., vol. 7, no. 4, pp. 702-706, Aug. 1997.
[16] S. Ertrk, Multiplication-free one-bit transform for low-complexity
block-based motion estimation, IEEE Signal Process. Lett., vol. 14, no.
2, pp. 109-112, Feb. 2007.
[17] A. Ertrk and S. Ertrk, Two-bit transform for binary block motion
estimation, IEEE Trans. Circuit Syst. Video Technol., vol. 15, no. 7, pp.
938- 946, July 2005.
[18] O. Urhan and S. Ertrk, Constrained one-bit transform for lowcomplexity block motion estimation, IEEE Trans. Circuits and Syst.
Video Technol., vol. 17, no.4, pp. 478-482, Apr. 2007.
[19] C. Choi, J. Jeong, Enhanced two-bit transform based motion estimation
via Extension of matching criterion, IEEE Trans. Consum. Electron.,
vol. 56, no. 3, pp. 1883-1889, Aug. 2010.
[20] M.K. Gll, Weighted constrained one-bit Transform based fast block
motion estimation, IEEE Trans. Consum. Electron., vol. 57, no. 2, pp.
751-755, May 2011.
[21] S.J. Ko, S.H. Lee and K.H. Lee, Fast digital image stabilizer based on
Gray-coded bit-plane matching, IEEE Trans. Consum. Electron., vol.
45, no. 3, pp. 598-603, Aug. 1999.
[22] O. Urhan, S. Ertrk, Gray coded bit-plane matching for block based
motion estimation, in Proc. of 10th Signal Processing and
Communication Applications Conf. (SIU), Pamukkale, Denizli, Turkey.
pp. 518-523, June 2002.
[23] A. elebi, O. Akbulut, O. Urhan, S. Ertrk, Truncated gray-coded bitplane matching based motion estimation and its hardware architecture,
IEEE Trans. Consum. Electron, vol. 55, no. 3, pp. 1530-1536, Aug.
2009.
[24] O. Urhan, Constrained one-bit transform based motion estimation using
predictive hexagonal pattern, J. Electron. Imaging, vol. 61, no. 3,
Article ID: 033019, July-Sep. 2007.
[25] E.S. Lee, O. Urhan, T.G. Chang, Multiplication-free one-bit transform
and diamond search combination for fast binary block motion
estimation, in Proc. of IEEE 15th Signal Processing and
Communications Applications Conf., Eskisehir, Turkey. pp. 430-433,
June 2007.
[26] H. Lee, J. Jeong, Early termination scheme for binary block motion
estimation, IEEE Trans. Consum. Electron., vol. 53, no. 4, pp. 16821686, Nov. 2007.
[27] H. Lee, S. Jin, J. Jeong, Early termination algorithm for 2BT block
motion estimation, Electronics Lett., vol. 45, no. 8, pp. 403-405, Apr.
2009.
[28] O. Urhan, S. Ertrk, Constrained one-bit transform based motion
estimation with early skip mode, in Proc. of 19th IEEE Signal
Processing and Communication Applications Conf., Antalya, Turkey,
pp. 774-776, Apr. 2011.
[29] O. Urhan, Constrained one-bit transform based fast block motion
estimation using adaptive search range, IEEE Trans. Consum.
Electron., vol. 56, no 3, pp. 1868-1871, Aug. 2010.
[30] I. Kim, J. Kim, J. Jeong, G. Jeon, Low-complexity block-based motion
estimation algorithm using adaptive search range adjustment, Opt. Eng.,
vol. 51, no. 6, Article ID: 067010, June 2012.
[31] O. Urhan, Truncated gray-coding based fast block motion estimation,
J. Electron. Imaging, vol. 22, no. 2, Article ID: 023018, Jun 2013.
[32] I. Kim, J. Jeong, Binary block motion estimation using an adaptive
search range adjustment technique, J. Automation and Control Eng.,
vol. 4, no. 4, pp. 376-380, Dec. 2014.
[33] P. H. W. Wong and O. C. Au, Modified one-bit transform for motion
estimation, IEEE Trans. Circuits Syst. Video Technol., vol. 9, no. 7, pp.
1020-1024, Oct. 1999.
[34] B. Demir and S. Ertrk, Block motion estimation using modified two
bit transform, Lect. Notes in Computer Science, vol. 4263, pp. 522-531,
2006.
[35] B. Demir and S. Ertrk, Block motion estimation using adaptive
modified two-bit transform, IET Image Process., vol. 1, no. 2, pp. 215222, June 2007.
[36] H.-Y. Oh, D.-H. Kim, O. Urhan, T.-G. Chang, Modified constrained
one-bit transform based fast block motion estimation, IEEE Trans.
Consum. Electron., vol. 53, no. 3, pp. 1093-1097, Aug. 2007.

84

IEEE Transactions on Consumer Electronics, Vol. 62, No. 1, February 2016

[37] T.Y. Kuo, C.H. Wang, Fast local motion estimation and robust global
motion decision for digital image stabilization, in Proc. Int. Conf. on
Intelligent Information Hiding and Multimedia Signal Processing,
Harbin, China. pp. 442-445, Aug. 2008.
[38] A. elebi, O. Akbulut, O. Urhan, I. Hamzaolu, S. Ertrk, An all binary
sub-pixel motion estimation approach and its hardware architecture,
IEEE Trans. Consum. Electron., vol. 54, no. 4, Nov. 2008.
[39] A. elebi, O. Urhan, I. Hamzaolu, S. Ertrk, Efficient hardware
implementations of low bit depth motion estimation algorithms, IEEE
Signal Process. Letts., vol. 16, no. 6, pp. 513-516, June 2009.
[40] A. Akn, Y. Doan, I. Hamzaolu, High performance hardware
architectures for one bit transform based motion estimation, IEEE
Trans. Consum. Electron., vol. 55, no. 2, pp. 941-949 , May 2009.
[41] A. Akn, G. Saylar, I. Hamzaolu, High performance hardware
architectures for one bit transform based single and multiple reference
frame motion estimation, IEEE Trans. Consum. Electron., vol. 56, no.
2, pp. 1144-1152, May 2010.
[42] S. K. Chatterjee, Implementation of weighted constrained one-bit
transformation based fast motion estimation, IEEE Trans. Consum.
Electron., vol. 58, pp. 646-653, May 2012.
[43] A. elebi, H. J. Lee, S. Ertrk, Bit plane matching based variable block
size motion estimation method and its hardware architecture, IEEE
Trans. Consum. Electron., vol. 56, pp. 1625-1633, Aug. 2010.
[44] A Celebi, O Urhan High performance hardware architecture for
constrained one-bit transform based motion estimation- Signal
Processing Conference, 2011 19th European, 2011.
[45] J. C. Tuan, T. S. Chang, and C. W. Jen, "On the data reuse and memory
bandwidth analysis for full-search block-matching VLSI architecture,"
IEEE Trans. Circuits and Syst. Video Technol., vol. 12, no. 1, pp. 61-72,
Jan. 2002.
BIOGRAPHIES
Seda Yavuz has been with the Department of
Electronics and Telecommunications Engineering,
University of Kocaeli, Turkey, where she is student of
bachelor degree since 2011. Her current research
interests include motion estimation algorithms and
their implementations using FPGA.

Anl elebi (S00, AM09) was born in Ordu, Turkey.


He received the B.Sc., M.Sc. and Ph.D. degrees in
electronics and communication engineering from
Kocaeli University, Kocaeli, Turkey, in 2002, 2005,
and 2008, respectively. Since 2002 he has been with
the Department of Electronics and Telecommunications
Engineering, University of Kocaeli, Turkey, where he
is currently working as an Assistant Professor. He
worked as a BK21 Post Doctoral Research fellow at the
School of Electrical Engineering and Computer Science
at Seoul National University, Korea between April - July 2009. His research
interests include very large scale integration (VLSI) design and
implementation for analog/mixed signal systems, image processing and video
coding systems.
Muhammad Aslam was born in Bahawalpur,
Pakistan. He received the B.Sc., degree in electronics
engineering from International Islamic University,
Islamabad, Pakistan, in 2014. Since 2015 he has been
with
the
Department
of
Electronics
and
Telecommunications Engineering, University of
Kocaeli, Turkey, where he is student of master degree.
His current research interests include video
coding/motion
estimation:
algorithm
and
implementation using FPGA.
Ouzhan Urhan (S02-M06) received his B.Sc., M.Sc.,
and Ph.D. degrees in Electronics and Telecommunication
engineering from the University of Kocaeli, Kocaeli,
Turkey, in 2001, 2003, and 2006, respectively. Since
2001, he has been with the Department of Electronics and
Telecommunications Engineering, University of Kocaeli,
Turkey, where he is currently full professor. He was a
visiting professor at Chung-Ang University, South Korea, from 2006 to 2007.
He is the director of Kocaeli University Laboratory of Embedded and Vision
Systems (KULE). His research interests include digital signal, image/video
processing and embedded systems.

Anda mungkin juga menyukai