Anda di halaman 1dari 14

Neurocomputing 216 (2016) 216–229

Contents lists available at ScienceDirect

Neurocomputing
journal homepage: www.elsevier.com/locate/neucom

A novel sparse-representation-based multi-focus image fusion


approach
Hongpeng Yin a,b,n, Yanxia Li b, Yi Chai b,c, Zhaodong Liu b, Zhiqin Zhu b
a
Key Laboratory of Dependable Service Computing in Cyber Physical Society, Ministry of Education, Chongqing University, Chongqing 400030, China
b
College of Automation, Chongqing University, Chongqing 400030, China
c
State Key Laboratory of Power Transmission Equipment and System Security and New Technology, College of Automation, Chongqing University, Chongqing
400030, China

art ic l e i nf o a b s t r a c t

Article history: In this paper, a novel multi-focus image fusion approach is presented. Firstly, a joint dictionary is con-
Received 5 December 2015 structed by combining several sub-dictionaries which are adaptively learned from source images using
Received in revised form K-singular value decomposition (K-SVD) algorithm. The proposed dictionary constructing method does
5 July 2016
not need any prior knowledge, and no external pre-collected training image data is required either.
Accepted 14 July 2016
Secondly, sparse coefficients are estimated by the batch orthogonal matching pursuit (batch-OMP) al-
Communicated by Huaping Liu
Available online 27 July 2016 gorithm. It can effectively accelerate the sparse coding process. Finally, a maximum weighted multi-norm
fusion rule is adopted to accurately reconstruct fused image from sparse coefficients and the joint dic-
Keywords: tionary. It can enable the fused image to contain most important information of the source images. To
Multi-focus image fusion
comprehensively evaluate the performance of the proposed method, comparison experiments are con-
Sparse representation
ducted on several multi-focus images and manually blurred images. Experimental results demonstrate
Dictionary learning
Batch-OMP that the proposed method outperforms many state-of-the-art techniques, in terms of visual and quan-
titative evaluations.
& 2016 Elsevier B.V. All rights reserved.

1. Introduction into three main categories: pixel-level fusion, feature-level fusion


and decision-level fusion [7,8]. Currently, most of the fusion algo-
Optical lenses of conventional cameras often suffer from the rithm are pixel-level [9–11]. Pixel-level fusion can be performed in
problem of limited depth of field, which makes it impossible to either spatial domain or transform domain [12]. In the spatial do-
acquire an image that contains all relevant objects in focus without main, pixels or regions are directly selected and combined in a
using expensive specialized optics and sensors. For image with linear or non-liner way to form a fused image. Typical spatial-do-
deep depth of field, it usually contains clear and blurry parts. Only main-based fusion methods include weighted average method,
those objects that are within the depth of field appear focused principal component analysis method [13], independent compo-
while other objects are often blurred. However, for human visual nent method [14]. These spatial-domain-based fusion methods
perception and computer processing, all-focused images are more usually suffer from the problem of blocking artifacts and undesired
desirable, since more information can be acquired from the shar- side effects. In the transform domain, variety of transforms such as
ply focused images to the blurred ones. Multi-focus image fusion, Laplacian pyramid [15], wavelet transform [16], curvelet transform
which aims at integrating multiple images of the same scene [17], and nonsubsampled contourlet transform [18] are utilized to
captured at different focal settings into a single all-in-focus image, fuse images. All these transform-domain-based fusion methods
is an effective solution to solve this problem [1,2]. Compared with need to perform three major steps. First, the source images are
each individual image, the fused all-in-focus image can provide decomposed by the multi-scale transform. Second, the decomposed
more informative and comprehensive information of the scene. coefficients of source images are integrated with certain fusion rule.
Nowadays, multi-focus image fusion has various application fields Finally, the fused image is obtained by the inverse multi-scale
such as military surveillance [3], medical imaging [4], remote transform. Despite transform coefficients can reasonably represent
sensing [5], and machine vision [6]. important features of an image, each transform has its own merits
The existing multi-focus image fusion methods can be divided and limitations corresponding to context of input images, thus se-
lecting an optimal transform basis relies heavily on prior knowledge
n
Corresponding author. such as scene context and applications. Moreover, there is no single
E-mail address: yinhongpeng@gmail.com (H. Yin). transform which can completely represent all features since the

http://dx.doi.org/10.1016/j.neucom.2016.07.039
0925-2312/& 2016 Elsevier B.V. All rights reserved.
H. Yin et al. / Neurocomputing 216 (2016) 216–229 217

content of an image is often complex and changeable. orthogonal matching pursuit algorithm. It can effectively accel-
The transform-domain-based fusion approach takes image erate the sparse coding process. Fused image is accurately re-
details and direction coefficients into account, it is successfully and constructed from sparse coefficients and the combined dictionary
widely used in image fusion field. However, how to select an op- using a maximum weighted multi-norm fusion rule, which can
timal transform basis remains a challenging problem. Obviously, preserve and combine the most important information of the
effectively and completely extracting the underlying information source images into the extended depth-of-focus fused image.
of original images would make fused image more accurate. To The main contribution of this work is twofold. (1) An in-
effectively extract the underlying information of the source ima- novative dictionary constructing strategy is designed to construct
ges, sparse-representation-based techniques are popularly studied a joint dictionary. Unlike previous dictionary constructing method,
in image fusion field [19–22]. The sparse representation algorithm the proposed dictionary constructing method does not need any
adopts an over-complete dictionary that contains prototype signal prior knowledge, and no external pre-collected training image
atoms to describe signals by sparse linear combinations of these data is required either. Simultaneously, the sub-dictionaries that
atoms [23]. Sparse-representation-based techniques are now in- constitute the joint dictionary are directly learned from source
creasingly attracting attention in computer vision area due to its images, so that it can improve the adaptability of constructed
state-of-the-art performance in many applications, such as image dictionary to input image data. Furthermore, the combined dic-
classification [24], face recognition [25], action recognition [26], tionary enforces that each source image can be constructed with
and object recognition [27]. the same subset of dictionary atoms. (2) A weighted multi-norm-
In sparse model, the over-complete dictionary plays an essen- based activity measure method is unitized to calculate the activity-
tial role. There are two main approaches to obtain a dictionary. The level of source image patch comprehensively. This improved ac-
first one is pre-constructing dictionary based on analytical meth- tivity measure rule can preserve more detail information such as
ods, such as DCT, wavelets and curvelets. The second one is edges and lines effectively than other methods, since it seems not
learning dictionary from a large number of example image pat- feasible to comprehensively calculate the activity-level only using
ches, using a certain training algorithm such as the method of a single measurement such as ℓ1, ℓ0.
optimal directions (MOD) or K-SVD. Yang [28] is the first to apply The rest of the paper is organized as follows. Section 2 presents
the sparse representation theory to image fusion field, in his the framework of the proposed image fusion approach. The com-
method, image is decomposed by the redundant DCT dictionary. In parative experimental results are presented to verify the perfor-
[29], sparse representation is conducted with two kinds of typical mance of the proposed method in Section 3. Finally, Section 4
over-complete dictionaries: over-complete DCT bases and hybrid concludes this work and discusses future research.
dictionary consisting of DCT bases, wavelet bases, Gabor bases, and
ridgelet bases. Based on the use of sparse representations, a novel
framework for simultaneous image fusion and super-resolution is 2. The framework of sparse-representation-based multi-focus
adopted in [30]. Six thousand patches taken from six images are image fusion approach
used to learn the dictionaries. Liu [31] proposes a multi-focus
image fusion method based on sparse representation, a database The framework of the proposed sparse-representation-based
of forty high quality natural images is utilized to learn the dic- multi-focus image fusion approach is shown in Fig. 1. The proposed
tionary. Aharon presents an image fusion method based on K-SVD algorithm mainly consists of three parts: dictionary constructing,
algorithm in which the redundant dictionary is trained on a image image representation, integrating and reconstruction. To make a
sets. Yin [32] proposes a novel multimodal image fusion scheme dictionary adaptive to input image data, a joint dictionary is con-
based on the joint sparsity model. Similarly, the dictionary is structed by combining several sub-dictionaries that are learned
trained on USC-SIPI image database (http://sipi.usc.edu/database/) from source image patches using K-SVD algorithm adaptively, as
using K-SVD algorithm. shown in Fig. 1(a). After constructing joint dictionary that can
The pre-constructed analytic dictionary shares the advantages preserve each source image signal be constructed with the same
of fast implementation. However, this category of dictionary is subset of dictionary atoms, coefficients vectors for each source
restricted to signals of a certain type, prior knowledge is needed images are estimated by applying the batch-OMP algorithm. For the
when choosing analytical bases. Moreover, it cannot be used for an coefficient fusion rule, a maximum weighted multi-norm-based
arbitrary family of signals of interest. Compared with the pre- fusion rule is utilized to obtain the fused coefficients. After all sparse
constructed ones, the learned dictionary contains much richer coefficients are fused using the proposed fusion rule, the result
feature information, leading to a better representative ability in image is previously reconstructed using the fused coefficients and
image restoration and reconstruction. However, training a dic- the combined dictionary. Fig. 1(b) gives an overview of the pro-
tionary usually requires external pre-collected training image data. posed method for the case of two source images. Following sub-
In practice, collecting a proper image set is not always feasible. sections describe the above mentioned steps in detail.
Furthermore, image contents vary significantly across different
images, it is not surprisingly that, the performance of typical 2.1. Dictionary constructing
learning-based methods varies significantly on the dictionary
learned. Thus, how to construct a over-complete dictionary In this section, the dictionary constructing algorithm is illustrated
adaptive to input image data is a crucial problem in sparse-re- in detail. The over-complete dictionary determines the signal re-
presentation-based image fusion scheme. presentation ability of sparse coding. Generally, there are two main
Exploiting the property of content diversity of images and the categories of offline approaches to obtain a dictionary. The first one
advantages of sparse representation theory, in this paper, a novel is directly using the analytical models such as over-complete wa-
sparse-representation-based multi-focus image fusion approach is velets, curvelets, and contourlets. The second category is applying
proposed to focus on aforementioned problems. Firstly, a joint the machine learning technique to obtain a dictionary from a large
dictionary is constructed by several sub-dictionaries which are number of training image patches. Relatively, the former is simple,
directly learned from source images, adaptively. The dictionary but not adaptive for the complex and changeable structure of the
constructing method does not need any prior knowledge, and no image. The latter has better adaptability.
external pre-collected training image data is required either. Sec- The dictionary learning is a training process based on a series of
ondly, sparse coefficients are estimated by applying the batch sample data. Typical dictionary learning algorithm includes PCA [33],
218 H. Yin et al. / Neurocomputing 216 (2016) 216–229

Fig. 1. The framework of sparse-representation-based multi-focus approach. (a) Procedure of proposed dictionary learning method. (b) Overview of the proposed multi-
focus image fusion approach.

MOD [34], and K-SVD [35]. K-SVD is a standard unsupervised dic- Assume DA ∈ R J × S , αA ∈ R S × L denote dictionary, and the vector of
tionary learning algorithm which is widely investigated in [36–38]. It sparse representation coefficients of training samples respectively,
is the combination of the K-means clustering and sparsity con- the objective function is:
straints. The K-SVD training method of sparse dictionary includes
min {∥ VA − DA αA ∥2F } s.t. ∀ i, ∥ αAi ∥0 ⪡T
two steps: (1) sparse reconstruction: using given dictionary to solve DA, αA (1)
sparse coefficients of the image under the current dictionary.
where the notion ∥·∥F denotes the Frobenius norm, defined as
(2) dictionary updating: updating the atoms of the dictionary se-
quentially. In this paper the developed K-SVD algorithm is used to ∥ M ∥F = ∑ij Mij2 . T is a sparsity constraint of sparse representation
learn sub-dictionaries from source images because of its simplicity to be contained no more than T nonzero coefficients. The above
and efficiency for this task. formula can be solved by alternating the sparse coding stage and the
For the case of two source images, assume that IA , IB denote two dictionary updating stage. In the sparse coding stage, the dictionary
registered source images with size of M × N . Generally, nature image DA is kept fixed and the sparse coefficients matrix αA is efficiently
contains complicated and non-stationary information as a whole, computed by
while local small patch appears simple and has a consistent struc- min {∥ VA − DA αA ∥2F } s.t. ∀ i, ∥ αAi ∥0 ⪡T
ture. For this reason, a sliding window technique is adopted to αA (2)
achieve better performance in capturing local salient features. As
In the dictionary updating stage, keeping the sparse coefficients
shown in Fig. 1(a), firstly, the sliding window technique is utilized to
matrix αA fixed, the dictionary is updated sequentially by
divide each source image, from left-top to right-bottom with a step
length of one pixel, into patches of size n × n. Then, all the patches min {∥ VA − DA αA ∥2F }
DA (3)
are transformed into vectors via lexicographic ordering, all the vec-
tors constitute one matrix VA (take source image IA for example), in After updating the dictionary, all the samples are encoded again
which each column corresponds to one patch in the source image IA . with a new dictionary. If achieving the maximum number of
The size of VA is J × L ( J = n × n, L = (M − n + 1) × (N − n + 1)). iterations or meeting the requirements of the sparsity, the iteration
H. Yin et al. / Neurocomputing 216 (2016) 216–229 219

ends. Otherwise, the algorithm returns to continue sparse coding. 2.3. Integrating and reconstruction
Once all sub-dictionaries for all input source images are obtained,
they are united as a single dictionary D ∈ R J × 2S as follows: Establishing fusion rules needs to solve two key issues. One is
how to measure the activity-level, which recognizes the salience
D = [DA, DB ] (4)
of the sparse representation coefficients of the source images. The
other is how to integrate the coefficients into the counterparts of
the fused image. As to the first issue, it is considered that the
2.2. Sparse representation ℓ1-norm of the sparse coefficient vectors reflects how much detail
information they bring. The larger the ℓ1-norm, the better the
Since the sparse representation globally handles an image, it significance of corresponding image patch. Meanwhile, the
cannot directly be used with image fusion, which depends on the ℓ0-norm of the coefficient vectors can give expression to their
local information of source images. A sliding window technique concentration ratio of the detail information. The larger the
which can divide the source images into small patches is adopted ℓ0-norm, the more detailed information contained in the image
to solve this problem. Let IA , IB ∈ RM × N represent the source image patch. However, it seems not feasible to comprehensively calculate
to be fused. By sliding window technique, each image is divided the activity-level only using a single measurement. Motivated by
into n × n patches from upper left to lower right with a step length the recent work [39] of Mertens et al., in this paper, a weighted
of one pixel. There are L (L = (M − n + 1) × (N − n + 1)) patches multi-norm-based activity measure method is adopted to calculate
denoted as {pA , pB } in IA and IB , respectively. To facilitate the the activity-level of source image patch comprehensively. For each
source image patch, the information from ℓ1-norm and ℓ0-norm
analysis, the ith patches {piA , piB } are lexicographically ordered as
measures are combined into a scalar weight map using multi-
vectors {v iA, v iB }. Then {v iA, v iB } can be expressed as follows: plication. Similar to weighted terms of a linear combination, the
v iA = DαAi (5) influence of each measure can be controlled using a power func-
tion:
ω1 ω0

v iB = DαBi (6)
(
M iA = ∥ αAi ∥1 ) (
× ∥ αAi ∥0 ) (7)

where D is the combined over-complete dictionary, αAi and αBi are


ω1 ω0
sparse coefficients for the ith patches of source images IA and IB . (
MBi = ∥ αBi ∥1 ) (
× ∥ αAi ∥0 ) (8)
When a large number of signals are sparse-coded, it is worthwhile
to consider pre-computation method to reduce the total amount of Where MAi
and MBi
are the measure results of the ith patch in the
work involved in sparse coding. Fig. 2 shows the running time source image A and source images B. The notions ∥·∥1 and ∥·∥0
required to sparse-code a variable number of signals using both denote the ℓ1-norm and ℓ0-norm of coefficient vectors. ω1 and ω0
are the corresponding weighting exponents. If an exponent ω
batch-OMP and OMP for an explicit dictionary. In the sparse-
equals 0, the corresponding measure is not taken into account. The
coding process, signal size is 64, dictionary size is 256  384, and
equally weighted activity measure ( ω1 = ω0 = 1) is used in this
target sparsity is 16. The running time is gathered by averaging 15
paper.
times of the sparse coding procedure. As shown in Fig. 2, it can be
As to the second issue of integrating coefficients into the
noted clearly that as the number of signals increases, the sparse
counterparts of the fused image, average value and maximum
coding process using batch-OMP takes less time than the process absolute value are two frequently-used fusion rules. The average-
using standard OMP, indicating that batch-OMP can accelerate the value rule forms the fused coefficients by averaging the corre-
sparse coding process evidently. Thus, in this paper, batch-OMP sponding coefficients with some weights which are relied on the
algorithm is exploited to accelerate the sparse coding process. The activity-level, therefore, it can suppress the noises, meanwhile
iterations will stop when the representation error drops below the smoothing the salient features, for instance, edges and lines, and
specified tolerance. reducing the contrast of the fused image. For coefficient combi-
nation, an ideal fusion rule should enable the fused image to
contain all the visual information of the source images. In fact,
seeking out this ideal rule seems impractical. An acceptable al-
ternative fusion rule is the one that enables the fused image to
reflect the most important information of the source images only.
Thus selecting maximum absolute value of the coefficients is an
efficient alternative. Moreover, as mentioned above, the weighted
multi-norm-based activity measure method for the sparse coeffi-
cient comprehensively reflects the important information of an
image. Taking the above factors into account, the maximum
weighted multi-norm fusion rule is selected as the fusion rule in
this paper. The fusion coefficients are obtained by selecting the
coefficients according to maximum weighted multi-norm value as
follows:
⎧ i if M i > M i
⎪ α
A A B
αFi = ⎨

⎩ αBi otherwise (9)

Then the fused result of V iF is calculated by

Fig. 2. Running time of batch-OMP versus standard OMP for an explicit dictionary.
V iF = DαFi (10)
220 H. Yin et al. / Neurocomputing 216 (2016) 216–229

3. Experiments (SSIM). In particular, AG, EI, FD and MI are non-reference image


based approaches while CE, RW and SSIM are reference image
This section firstly presents the detailed experimental settings, based approaches. The fused image is better with the increasing
and then investigates the effects of sparsity level and dictionary numerical index of AG, MI, FD and EI, while it is opposite for CE
size. Furthermore, the experimental results are analyzed visually and RW. The Q AB / F and SSIM expound the impression of the fused
and quantitatively. Finally the computational efficiency is briefly image better than others if the value approximate to 1.
analyzed at the end of the section.
3.2. Effects of parameters
3.1. Experimental settings
In this section, the effects of sparsity level and dictionary size
The experiments aim at evaluating the fusion performance of on fusion performance are evaluated. The experiments are im-
the proposed approach and comparing the proposed scheme with plemented on all the source images. The patch size is set to 8  8
other fusion methods. Accordingly, the experimental setting in- which has been proved to be an appropriate setting for many
volves source images, compared methods and quality metrics. image processing applications [41–43]. Four different sparsity level
are discussed, including 2, 4, 8 and 16. The effect of dictionary size
3.1.1. Source images is analyzed by five different dictionary size, i.e., 64  64, 64  128,
The fusion scheme is verified using two different types of 64  256, 64  384 and 64  512. Four popular quality metrics FD,
source images: multi-focus images taken by digital still camera Q AB / F , CE, SSIM are calculated and normalized to [0,1] to assess the
and blurred visions of reference images. For the first three ex- quality of fused images. Fig. 4 shows the average value of quality
periments, three pairs of multi-focus images are selected as source metrics overs all source images when sparsity level is changed
images to be fused. As shown in Fig. 3(a)–(f), the scene of the along with dictionary size.
images are Pepsi, Clock and Leaf, respectively. Pepsi and Clock have As shown in Fig. 4, when the dictionary size is small, fusion
the size of 512  512 while the size of Leaf is 268  204. All the performance degrades. On the other hand, the quality measures
source images and reference images are available at http://www. for dictionary size beyond 384 atoms are not improved sig-
imagefusion.org. nificantly or decreased while the computational cost increases
In practice, obtaining a reference image is not always feasible, with the dictionary size. Moreover, it also can be observed from
especially in multi-focus image fusion. Fortunately, It has been Fig. 4 that for the dictionary size K ¼384, sparsity factor of 16
observed in existing literature that in some special cases of multi- achieves better objective performance. Therefore we chose the
focus image fusion, an “ideal” fused image may be manually con- learned dictionary with K ¼384 and fix the sparsity factor T¼16 in
structed, which can then be used as a reference image to test all the experiments.
multi-focus fusion algorithms [28,40]. Motivated by the work lis-
ted above, for the rest experiments, the proposed method is 3.3. Image fusion results
evaluated with two pairs of simulated images. Two popular
grayscale images, Lena and Barche, which are available at CVG- In this section, the performance of the proposed method is
UGR-Image dataset (http://decsai.ugr.es/cvg/dbimagenes/) are presented and compared with those of other methods listed above
used as reference images. Both of the original images have the size visually and numerically. Five pairs of multi-focus images are used
of 256  256. Gaussian smoothing filter with size 3  3 and s¼ 7 is to assess the performance of the proposed method. All available
utilized to blur the reference images. Different regions of the re- patches from 256  256 image, 268  204 image and every second
ference images are blurred to build the source image sets, as patch from every second row in the 512  512 size image are used
shown in Fig. 3(e) and (f) and (h) and (i). Then the blurred images when constructing the dictionary in the experiments. All the ex-
with different focus points are taken as the source images, the periments are implemented in MATLAB R2012a on a 2.3 GHz Intel
original image is taken as the reference image. (R) Core(TM) CPU with 4 GB RAM.

3.1.2. Compared methods 3.3.1. Fusion on multi-focus images “Pepsi”


The performance of the proposed method is evaluated against In this section, the comparison experiments are implemented
state-of-the-art multi-focus reproducible research methods. These on the multi-focus images “Pepsi” to illustrate the performance of
are: Average method (Av), Select maximum method (Max), PCA the proposed method. The source images Pepsi A and Pepsi B are
method (PCA), Gradient Pyramid method (GP), Laplacian Pyramid listed in Fig. 5(a) and (b). As shown in Fig. 5, Pepsi A focuses on the
method (LP), Curvelet method (CVT), sparse-representation-based characters on the label. In comparison, Pepsi B focuses on char-
method with a pre-defined DCT dictionary (SR-DCT), sparse-re- acters on the can. In order to illustrate the fusion result clearly and
presentation-based method with a trained dictionary on pre-col- intuitively, Fig. 5(c)–(l) depict the fused images obtained by the
lected image sets (SR-PRE), sparse-representation-based method Average, Max, PCA, GP, LP, CVT, SR-DCT, SR-ℓ1, SR-PRE and the
using Max-ℓ1 fusion rule (SR-ℓ1), respectively. Among these proposed method, respectively.
compared methods, Av, Max and PCA are typical spatial-domain- As can be seen from Fig. 5(c)–(l), the fused images largely
fusion methods, GP, CVT and LP belong to transform-domain-fu- combine the complementary information from the source images.
sion methods, SR-DCT, SR-PRE and SR-ℓ1 are three representative The characters on both the can and the label are clear in the fused
sparse-representation-based fusion methods. images. By contrast, the SR-DCT, SR- ℓ1 method, SR-PRE method
and the proposed fusion approach can produce brighter images
3.1.3. Quality metrics than Average, Max, PCA, GP, CVT and LP method. It is noteworthy
For the purpose of quantitative comparison between the pro- that the fused images obtained by Average, Max, and PCA method
posed fusion method and the other fusion methods mentioned demonstrate severe ringing artifacts around the characters on the
above, eight popular quality metrics are utilized to evaluate the label. Fig. 5(d) has a bright block in its middle, which does not
quality of fused image quantitatively. These metrics include exist in each source images and impairs image quality severely.
Average Gradient (AG), Edge Intensity (EI), Figure Definition (FD), The SR-DCT, SR-ℓ1 and SR-PRE based method works well in most
Edge Retention Degree ( Q AB / F ), Mutual Information (MI), Cross parts of the fused image however artifacts can be observed in the
Entropy (CE), Relatively Warp (RW) and Structural Similarity label and characters of the fused images. For better comparison,
H. Yin et al. / Neurocomputing 216 (2016) 216–229 221

Fig. 3. Five source image sets. (a) and (b) Pepsi; (c) and (d) Clock; (e) and (f) Leaf; (g) and (h) Lena; (i) and (j) Barche.

quantitative assessments are presented in Table 1. largest quality indexes, the quality metrics of the proposed ap-
The values of AG, EI, FD, Q AB / F , MI, CE, RW and SSIM of Fig. 5(c)– proach is generally close to all the best values. On the other hand,
(l) are listed in Table 1. The best results are indicated in bold. From the fused result provided by the Max-based method have the best
Table 1, one can see that the proposed fusion scheme has four best values of MI, however, severe ringing artifacts can be observed
values, two second-best values, two third-best values. As shown in around the can and label. Obviously, it can be concluded that the
Table 1, despite the proposed fusion approach does not take all the fused image obtained by the proposed fusion approach contains
222 H. Yin et al. / Neurocomputing 216 (2016) 216–229

Fig. 4. Average value of fusion quality metrics with respect to the sparsity level and dictionary size.

much more abundant information, such as shapes and edges from 268  204. As shown in Fig. 7(a) and (b), each “Leaf” source image
the source images. At this point, it can be concluded that the randomly focuses on different regions. In the source image Leaf A,
proposed scheme works well and exhibits excellent fusion ability the front leaves are in focus and clear, while the back leaves are
visually and quantitatively. out-of-focus and blurred. On the contrary, the source image Leaf B
randomly focuses on the back leaves and the front leaves are fuzzy.
3.3.2. Fusion on multi-focus images “Clock” Fig. 7(c)–(l) present the fusion results obtained by different fusion
The second experiment is realized on multi-focus images
methods.
“Clock”. As shown in Fig. 6(a) and (b), the large clock in Clock A is
From the perspective of human visual perception mechanism,
out-of-focus and blurred, while the small clock is in focus and
the Max method produces the fused image with lots of halo,
clear. Contrary to Clock A, in Clock B, the large clock is defocused
especially in the region of the leaf vein, as shown in Fig. 7(b). The
and the small clock is in focus. In addition, Fig. 6(c)–(l) depict the
Average method, PCA method and GP method generate better
fusion results obtained by different methods to offer a direct view.
Compared with the source image Fig. 6(a) and (b), the large fused images than the Max method, but also severely lose in-
clock and small clock are equally clear in the fused images. As formation in luminance compared with the images acquired by LP
shown in Fig. 6, it can be observed that the Max-based fusion method, CVT method, SR-DCT method, SR-ℓ1 method, SR-PRE
method blurs not only the top and bottom edge of the large clock, method and the proposed method. Fig. 7(f) is the fused image of
but also the top edge of the small clock. The CVT-based method the LP method which has some artifacts in edge region. Fig. 7
produces sharp image but shows serious artifacts around edges, (e) appears some blurring around the edges. Visually, as shown in
e.g. clock borders. The fused results (i)–(l), which are obtained by Fig. 7(g)–(j), the fused images obtained by SR-DCT method, SR-ℓ1
SR-DCT, SR-ℓ1, SR-PRE and the proposed fusion approach, are more method, SR-PRE method and the proposed method have good
clear than other fused results. To further compare the performance performance in luminance and detail information, it is difficult for
of different fusion methods, quantitative assessments are pre- the human eye to find the difference subjectively. Thus, the further
sented in Table 2. From Table 2, the further objective comparison quantitative assessments are required for objective comparison.
shows that the results of average method are much smaller than The results of the quantitative criteria are shown in Table 3.
those of other methods. Numerically, the proposed method has the The values of AG, EI, FD, QAB/F, MI, CE, RW and SSIM are listed
best values for AG, EI, FD, Q AB / F and SSIM, and the second-best
in Table 3. The best values are indicated in bold. As shown in Ta-
values for MI and RW. On the whole, the proposed method has
ble 3, the proposed fusion approach achieves better fusion results
more comprehensive fusion performance, compared with the
with six best values and two second-best values, illustrating that
other methods.
the proposed fusion approach can capture much more information
3.3.3. Fusion on multi-focus images “Leaf” from source images. In brief, the results of subjective and objective
In this section, the corresponding experiments are im- evaluation demonstrate the superiority of the propose fusion ap-
plemented on a pair of multi-focus “Leaf” images with the size of proach when compared with many state-of-the-art methods.
H. Yin et al. / Neurocomputing 216 (2016) 216–229 223

Fig. 5. The “Pepsi” source images and fusion results by different fusion methods: (a) Source image Pepsi A; (b) Source image Pepsi B; (c) The fused image obtained by Average
method; (d) The fused image obtained by Max method; (e) The fused image obtained by PCA method; (f) The fused image obtained by GP method; (g) The fused image
obtained by LP method; (h) The fused image obtained by CVT method; (i) The fused image obtained by SR-DCT method; (j) The fused image obtained by SR-ℓ1 method; (k)
The fused image obtained by SR-PRE method; (l) The fused image obtained by the proposed method.

3.3.4. Fusion on multi-focus images “Lena” Compared with the source images Fig. 8(a) and (b), the fused
Two “Lena” source images with different blur regions are used images successfully preserve the focused parts of each source
to evaluate the fusion performance in the fourth experiment. Fig. 8 image and combine them together to generate a clearer picture of
(a) and (b) are the source images Lena A and Lena B. In Fig. 8(a), the whole scene. As shown in Fig. 8, it can be observed that there
the left object is in-focus and clearly depicted, while the right are some losses of luminance distortion in Fig. 8(c)–(f), which are
object is out-of-focus and blurred. The circumstance of Fig. 8(b) is obtained by Average method, Max method, PCA method, and GP
contrary to that of Fig. 8(a). To show the fusion results more ex- method. Moreover, the fused image obtained by Max method
plicitly, Figs. 8(c)–(l) present the fused images acquired by the suffers from ringing effect to some degree. It also losses edge
Average, Max, PCA, GP, LP, CVT, SR-DCT, SR-ℓ1, SR-PRE and the contrast particularly in edge regions of source images such as Le-
proposed method, respectively. na's hair. Fig. 8(e) appears some blurring around the person

Table 1
Quantitative assessments of the compared methods and the proposed method.

Methods AG EI FD Q AB / F MI CE RW SSIM

Av 2.9786 31.3430 3.9672 0.6433 4.7459 0.0236 0.0329 0.9533


Max 2.9051 30.7171 4.0838 0.6143 6.2557 0.0322 0.0273 0.9325
PCA 2.9926 31.4870 3.9872 0.6477 4.7985 0.0215 0.0325 0.9536
GP 3.4143 35.7330 4.7610 0.7394 4.2526 0.0501 0.0575 0.9705
LP 3.7350 39.1940 5.0783 0.7604 4.7001 0.0129 0.0212 0.9774
CVT 3.8819 40.8700 5.2317 0.7299 4.6374 0.0136 0.0199 0.9720
SR-DCT 3.9881 42.5990 5.1259 0.7823 5.1810 0.0070 0.0111 0.9833
SR-ℓ1 3.9919 42.7390 5.1477 0.7785 5.1696 0.0066 0.0109 0.9818
SR-PRE 3.9837 42.3860 5.2005 0.7839 5.0993 0.0085 0.0128 0.9817
Proposed 4.0181 42.7380 5.2353 0.7851 5.1640 0.0069 0.0107 0.9832
224 H. Yin et al. / Neurocomputing 216 (2016) 216–229

Fig. 6. The “Clock” source images and fusion results by different fusion methods: (a) source image Clock A; (b) Source image Clock B; (c) The fused image obtained by Average
method; (d) The fused image obtained by Max method; (e) The fused image obtained by PCA method; (f) The fused image obtained by GP method; (g) The fused image
obtained by LP method; (h) The fused image obtained by CVT method; (i) The fused image obtained by SR-DCT method; (j) The fused image obtained by SR-ℓ1 method; (k)
The fused image obtained by SR-PRE method; (l) The fused image obtained by the proposed method.

provided by the PCA method. Visually, the fused images of LP The best results for each metric are labeled in bold. As shown in
method, SR-DCT method SR-ℓ1, SR-PRE and the proposed method Table 4, the proposed method has the best values for all adopted
behave better in both light intensity and detail information in- quality metrics except MI, demonstrating that the proposed fusion
clude shapes and edges, it is difficult for the human eye to find the approach can capture much more significant information from
difference subjectively. Thus, the further objective comparison are multi-focus images to integrate an “ideal” result. Based on these
required. The results of the quantitative assessments are shown in
results, it can be concluded that the proposed method consistently
Table 4.
outperforms the other methods visually and quantitatively.
Table 4 presents a quantitative comparison in terms of the
metrics AG, EI, FD, Q AB / F , MI, CE, RW and SSIM of various methods.

Table 2
Quantitative assessments of the conventional methods and the proposed method.

Methods AG EI FD Q AB / F MI CE RW SSIM

Av 2.4611 26.9770 2.6193 0.5877 4.8379 0.4827 0.0607 0.9496


Max 2.3511 25.4080 2.5595 0.5304 6.4464 0.4804 0.0444 0.9053
PCA 2.4484 26.8260 2.6083 0.5823 4.8745 0.4134 0.0615 0.9482
GP 2.5458 27.3660 2.8132 0.5142 4.7031 0.4702 0.0257 0.9101
LP 3.3658 36.6190 3.7348 0.6665 4.9386 0.1426 0.0311 0.9731
CVT 2.5956 28.0670 2.9073 0.5866 4.7321 0.2060 0.0612 0.9505
SR-DCT 3.4346 37.5690 3.6982 0.6836 5.3803 0.2380 0.0307 0.9779
SR-ℓ1 3.4461 37.6690 3.7036 0.6926 5.4063 0.2448 0.0315 0.9781
SR-PRE 3.4550 37.6610 3.7607 0.7097 5.4590 0.2278 0.0322 0.9778
Proposed 3.4657 37.7940 3.7596 0.7106 5.4922 0.2311 0.0302 0.9783
H. Yin et al. / Neurocomputing 216 (2016) 216–229 225

Fig. 7. The “Leaf” source images and fusion results by different fusion methods: (a) Source image Leaf A; (b) Source image Leaf B; (c) The fused image obtained by Average
method; (d) The fused image obtained by Max method; (e) The fused image obtained by PCA method; (f) The fused image obtained by GP method; (g) The fused image
obtained by LP method; (h) The fused image obtained by CVT method; (i) The fused image obtained by SR-DCT method; (j) The fused image obtained by SR-ℓ1 method; (k)
The fused image obtained by SR-PRE method; (l) The fused image obtained by the proposed method.

3.3.5. Fusion on multi-focus images “Barche” images obtained by the LP, SR-DCT, SR-ℓ1, SR-PRE method and the
In order to further evaluate the fusion performance, the fifth proposed approach are much clearer than the results of other
experiment is performed on the blurred versions of “Barche”. The methods. For better comparison, the quantitative assessments of
reference image Barche is blurred diagonally to build the source different methods for “Barche” are shown in Table 5.
images Barche A and Barche B as illustrated in Fig. 9(a) and (b). The The quantitative assessments of different methods for Barche
upper right corner of Barche A is in focused while the lower left are shown in Table 5. In that table, the best values are shown in
corner of Barche A is out-of-focus. Different from Barche A, Barche bold. It can be seen clearly from the table that the proposed ap-
B focuses on the lower left corner and defocuses on the upper
proach has the best values of all quality metrics except Q AB / F and
right corner of Barche. Moreover, the fused images obtained by
MI, indicating that the fused image obtained by the proposed
different method are depicted in Fig. 9(c)–(l).
approach is more similar to the reference image. Evidently, the
Similar to previous example, the focused parts of each source
proposed approach can preserve more comprehensive information
image are persisted in the fused results by different fusion meth-
ods. However, it can be confirmed that there are some losses of from source images to produce a satisfactory fused result. Overall,
luminance distortion or local information in Fig. 9(c)–(f). Evidently, it can be concluded based on the experiment, that both by visual
more or less artifacts appear in the images obtained by the Max comparison and by objective assessments, the proposed method
method and PCA method particularly in the edge regions such as shows competitive fusion performance compared with other tes-
the boats' mast. Obviously, as shown in Fig. 9(g)–(l), the fused ted methods.

Table 3
Quantitative assessments of the conventional methods and the proposed method.

Methods AG EI FD Q AB / F MI CE RW SSIM

Av 6.8745 71.1460 7.8961 0.5430 3.1238 0.0678 0.0808 0.8855


Max 7.2379 73.3410 8.6569 0.5098 5.5801 0.1317 0.0187 0.8020
PCA 6.8924 71.3110 7.9212 0.5457 3.1425 0.0618 0.0812 0.8860
GP 8.5701 84.7380 10.6790 0.6690 2.9116 0.1470 0.1218 0.9491
LP 10.5398 105.8719 12.7073 0.6849 4.2174 0.0140 0.0197 0.9682
CVT 10.6650 107.8701 12.7460 0.6793 3.0792 0.0341 0.0149 0.9618
SR-DCT 10.7677 108.9888 12.8613 0.7275 4.6018 0.0034 0.0023 0.9924
SR-ℓ1 10.8101 109.4020 12.8670 0.7318 4.5623 0.0037 0.0007 0.9942
SR-PRE 10.8562 110.1643 12.8483 0.7289 4.1231 0.0059 0.0016 0.9958
Proposed 10.8850 110.1508 12.9613 0.7333 4.7578 0.0027 0.0004 0.9959
226 H. Yin et al. / Neurocomputing 216 (2016) 216–229

Fig. 8. The “Lena” source images and fusion results by different fusion methods: (a) Source image Lena A; (b) Source image Lena B; (c) The fused image obtained by Average
method; (d) The fused image obtained by Max method; (e) The fused image obtained by PCA method; (f) The fused image obtained by GP method; (g) The fused image
obtained by LP method; (h) The fused image obtained by CVT method; (i) The fused image obtained by SR-DCT method; (j) The fused image obtained by SR-ℓ1 method;
(k) The fused image obtained by SR-PRE method; (l) The fused image obtained by the proposed method.

Table 4
Quantitative assessments of the compared methods and the proposed method.

Methods AG EI FD Q AB / F MI CE RW SSIM

Av 4.9363 51.5450 5.7418 0.6380 4.4506 0.1147 0.0621 0.9298


Max 5.3788 54.9450 6.4673 0.6438 6.3938 0.1797 0.0634 0.9027
PCA 4.9335 51.5130 5.7403 0.6357 4.4895 0.1068 0.0622 0.9293
GP 7.2338 72.0330 9.0960 0.7140 4.9656 0.0075 0.0180 0.9918
CVT 7.2059 71.6880 9.0740 0.7236 5.1538 0.0040 0.0090 0.9955
LP 7.2547 72.2510 9.1234 0.7117 4.9238 0.0100 0.0197 0.9909
SR-DCT 7.2508 72.1310 9.1215 0.7455 5.9690 0.0007 0.0023 0.9990
SR-ℓ1 7.2718 72.3440 9.1464 0.7467 6.0544 0.0003 0.0012 0.9993
SR-PRE 7.2323 71.9620 9.0960 0.7447 5.8842 0.0010 0.0037 0.9986
Proposed 7.2721 72.3480 9.1465 0.7468 6.0569 0.0003 0.0011 0.9993

3.4. Computational efficiency analysis different fusion methods.


As shown in Table 6, it can be seen clearly that the proposed
In this section, a group of experiments are sketched to estimate method takes more time than other compared fusion approaches
the operational speed of the proposed method. The running time, except SR-PRE model. This phenomenon mainly stems from the
which is closely relevant to the effectiveness of fusion approaches, fact that in sparse-representation-based fusion techniques, the
is utilized to offer a quantitative expression of the operational sliding window technique, which aims at achieving better per-
speed. In this work, the running time is obtained by averaging 15 formance in capturing local salient features, is utilized to divide
times of the fusion procedures. Table 6 presents the running time source images into a great deal of small patches. It takes a lot of
which is required to fuse Pepsi, Clock, Leaf, Lena and Barche by time to sparse-code a large number of patches in the dictionary
H. Yin et al. / Neurocomputing 216 (2016) 216–229 227

Fig. 9. The “Barche” source images and fusion results by different fusion methods: (a) Source image Barche A; (b) Source image Barche B; (c) The fused image obtained by
Average method; (d) The fused image obtained by Max method; (e) The fused image obtained by PCA method; (f) The fused image obtained by GP method; (g) The fused
image obtained by LP method; (h) The fused image obtained by CVT method; (i) The fused image obtained by SR-DCT method; (j) The fused image obtained by SR-ℓ1
method; (k)The fused image obtained by SR-PRE method; (l) The fused image obtained by the proposed method.

learning and sparse representation stage, which makes sparse- time, when compared with spatial-domain-based fusion methods.
representation-based fusion techniques time-consuming. However, the fusion ability of these methods is limited as shown
In Table 6, one can see it obviously that the spatial-domain- in Tables 1–5, since there is no single transform which can com-
based fusion methods (Av, Max and PCA) hold less time to form a pletely represent all features. To effectively extract the underlying
fused image, since pixels or regions are directly selected and information of the source images, sparse-representation-based
combined in a linear or non-liner way. However, these methods fusion methods are preferred to achieve high-quality fused images
are incapable of guaranteeing details of source images, which is at the cost of running time. Compared with other sparse-re-
specifically illustrated in Section 3.3. Taking image details into presentation-based fusion methods (SR-DCT, SR-ℓ1 and SR-PRE),
account, the transform-domain-based fusion methods (GP, LP and the proposed method can achieve better fused images with better
CVT) can achieve better fusion performance using relatively more quantitative assessments as shown in Tables 1–5. Furthermore, in

Table 5
Quantitative assessments of the conventional methods and the proposed method.

Methods AG EI FD Q AB / F MI CE RW SSIM

Av 5.8239 60.1980 7.2855 0.6722 4.1330 0.0231 0.0472 0.9287


Max 5.9005 60.0710 7.5456 0.6346 5.9758 0.0213 0.0494 0.8828
PCA 5.8291 60.2210 7.2975 0.6727 4.1709 0.0218 0.0471 0.9288
GP 6.5917 64.0100 9.3789 0.7131 3.2773 0.1241 0.1317 0.9575
LP 8.0329 79.7350 10.6420 0.7768 4.7346 0.0063 0.0043 0.9932
CVT 8.0072 79.6040 10.5740 0.7642 4.5660 0.0054 0.0046 0.9939
SR-DCT 8.0658 80.2530 10.5800 0.7816 5.3680 0.0022 0.0021 0.9976
SR-ℓ1 8.1023 80.4180 10.6870 0.7842 5.5267 0.0016 0.0023 0.9984
SR-PRE 8.0805 80.2260 10.6570 0.7845 5.3935 0.0022 0.0037 0.9981
Proposed 8.1041 80.4350 10.6890 0.7844 5.5456 0.0013 0.0019 0.9986
228 H. Yin et al. / Neurocomputing 216 (2016) 216–229

Table 6 Acknowledgments
The running time (Time/s) required to fuse Pepsi, Clock, Leaf, Lena and Barche using
different fusion methods.
We would like to thank the support by National Natural Science
Methods Images Foundation of China (61374135 and 61203321), China Postdoctoral
Science Foundation (2012M521676), China Central Universities
Pepsi Clock Leaf Lena Barche Foundation (106112015CDJXY170003 and 106112016CDJZR175511),
Chongqing Natural Science Foundation of China (cstc2015jcyjB0569)
Av 0.0019 0.0020 0.0005 0.0003 0.0006
Max 0.0107 0.0063 0.0015 0.0015 0.0017 and Chongqing Graduate Student Research Innovation Project
PCA 0.0087 0.0079 0.0011 0.0022 0.0021 (CYB14023).
GP 0.1434 0.1447 0.0306 0.0308 0.0311
LP 0.0132 0.02067 0.0058 0.0066 0.0060
CVT 4.6369 3.4458 1.1937 1.2140 1.1256
SR-DCT 439.537 332.3798 106.4618 112.5463 122.3114 References
SR-ℓ1 743.3922 678.2953 460.4978 506.4105 522.0846
SR-PRE 1214.2324 951.9228 726.7140 717.5696 723.4113 [1] J. Duan, G. Meng, S. Xiang, Multifocus image fusion via focus segmentation and
proposed 747.0962 680.6142 460.8295 506.8173 523.1878 region reconstruction, Neurocomputing 140 (2014) 193–209.
[2] B. Zhang, X. Lu, H. Pei, Multi-focus image fusion algorithm based on focused
region extraction, Neurocomputing 130 (2014) 44–51.
[3] Z.D. Liu, H.P. Yin, B. Fang, A novel fusion scheme for visible and infrared images
based on compressive sensing, Opt. Commun. 335 (2015) 168–177.
the proposed method, the sub-dictionaries are simultaneously [4] A.P. James, B.V. Dasarathy, Medical image fusion: a survey of the state of the
learned from source images, which can effectively enhance the art, Inf. Fusion 19 (2014) 4–19.
speed of dictionary constructing procedure in comparison to SR- [5] C.L. Chien, W.H. Tsai, Image fusion with no gamut problem by improved
nonlinear IHS transforms for remote sensing, IEEE Trans. Geosci. Remote Sens.
PRE model. 52 (1) (2014) 651–663.
In brief, the proposed fusion method can get better fused [6] V. Aslantas, A.N. Toprak, A pixel based multi-focus image fusion method, Opt.
Commun. 332 (2014) 350–358.
images at the cost of running time, which is extremely important
[7] B. Yu, B. Jia, L. Ding, Hybrid dual-tree complex wavelet transform and support
to deal with images with comprehensive information. vector machine for digital multi-focus image fusion, Neurocomputing 182
(2016) 1–9.
[8] N. Wang, Y. Ma, K. Zhan, Spiking cortical model for multifocus image fusion,
Neurocomputing 174 (2016) 733–748.
[9] Y. Jiang, M.H. Wang, Image fusion with morphological component analysis, Inf.
4. Conclusions and discussions Fusion 18 (2014) 107–118.
[10] Y. Liu, S.P. Liu, Z.F. Wang, A general framework for image fusion based on
Multi-focus image fusion plays a crucial role in military sur- multi-scale transform and sparse representation, Inf. Fusion 24 (2015)
147–164.
veillance, medical imaging, remote sensing, and machine vision. [11] G. Bhatnagar, Q.J. Wu, Z. Liu, A new contrast based multimodal medical image
Sparse-representation-based techniques are increasingly attract- fusion framework, Neurocomputing 157 (2015) 143–152.
[12] S.T. Li, B. Yang, J.W. Hu, Performance comparison of different multi-resolution
ing attention in multi-focus image fusion field. However, how to
transforms for image fusion, Inf. Fusion 12 (2) (2011) 74–84.
construct a over-complete dictionary adaptive to input image data [13] P. Chavez, S.C. Sides, J.A. Anderson, Comparison of three different methods to
is a crucial problem in sparse-representation-based image fusion merge multiresolution and multispectral data-Landsat TM and SPOT pan-
chromatic, Photogramm. Eng. Remote Sens. 57 (3) (1991) 295–303.
scheme. In this paper, a novel sparse-representation-based multi- [14] N. Cvejic, D. Bull, N. Canagarajah, Region-based multimodal image fusion using
focus image approach is presented to overcome the problem. A ICA bases, IEEE Sens. J. 7 (5) (2007) 743–751.
joint dictionary is constructed by combining several sub-diction- [15] P.J. Burt, E.H. Adelson, The Laplacian pyramid as a compact image code, IEEE
Trans. Commun. 31 (4) (1983) 532–540.
aries which are adaptively learned form source images using [16] G. Pajares, J.M. De La Cruz, A wavelet-based image fusion tutorial, Pattern
K-SVD algorithm. Batch-OMP algorithm is utilized to estimate the Recognit. 37 (9) (2004) 1855–1872.
[17] S.Q. Ren, J. Cheng, M. Li, Multiresolution fusion of PAN and MS images based
sparse coefficients. Furthermore, a maximum weighted multi- on the curvelet transform, in: Proceedings of the IEEE International conference
norm fusion rule is exploited to reconstruct the fused all-in-focus on Geoscience and Remote Sensing Symposium (IGARSS), 2010, pp. 472–475.
image. According to the fusion results and objective measures, it [18] A.L. Da Cunha, J. Zhou, M.N. Do, The nonsubsampled contourlet transform:
theory, design, and applications, IEEE Trans. Image Process. 15 (10) (2006)
can be observed that the proposed fusion approach can achieve 3089–3101.
competitive results compared to a number of state-of-the-art fu- [19] S.T. Li, H.T. Yin, L.Y. Fang, Remote sensing image fusion via sparse re-
presentations over learned dictionaries, IEEE Trans. Geosci. Remote Sens. 51
sion method.
(9) (2013) 4779–4789.
However, there are still many work worth to do in the follow- [20] X.X. Zhu, R. Bamler, A sparse image fusion algorithm with application to pan-
up study. Firstly, the current learned dictionary is not computa- sharpening, IEEE Trans. Geosci. Remote Sens. 51 (5) (2013) 2827–2836.
[21] H. Liu, Y. Yu, F. Sun, Visual-tactile fusion for object recognition, IEEE Trans.
tionally efficient because it still has a large number of redundant Autom. Sci. Eng. 99 (2016) 1–13.
atoms, a compact but informative dictionary constructing method [22] Z. Zhu, Y. Chai, H. Yin, A novel dictionary learning approach for multi-
may be considered to improve the performance of the developed modalitymedical image fusion, Neurocomputing (2016).
[23] H. Cheng, Z.C. Liu, L. Yang, X.W. Chen, Sparse representation and learning in
dictionary constructing scheme further. Secondly, since color acts visual recognition: theory and applications, Signal Process. 93 (6) (2013)
as a basic factor in the human visual system, how to extend the 1408–1425.
[24] Y. Chen, N.M. Nasrabadi, T.D. Tran, Hyperspectral image classification via
current fusion framework to color images fusion is another issue kernel sparse representation, IEEE Trans. Geosci. Remote Sens. 51 (1) (2013)
that requires further investigation. In addition, there appears to be 217–231.
a serious need for further research on the evaluation of different [25] J. Wright, A.Y. Yang, A. Ganesh, Robust face recognition via sparse re-
presentation, IEEE Trans. Pattern Anal. Mach. Intell. 31 (2) (2009) 210–227.
fusion method. To the best of our knowledge, the ground truth is [26] T. Guha, R.K. Ward, Learning sparse representations for human action re-
not usually known in practice, yet many of the currently used cognition, IEEE Trans. Pattern Anal. Mach. Intell. 34 (8) (2012) 1576–1588.
[27] H. Liu, D. Guo, F. Sun, Object recognition using tactile measurements: Kernel
performance measures require knowledge of the ground truth.
sparse coding methods, IEEE Trans. Instrum. Meas. 65 (3) (2016) 656–665.
One potential solution is to develop the so-called objective per- [28] B. Yang, S.T. Li, Multifocus image fusion and restoration with sparse re-
formance measures, i.e. independent from the ground truth or presentation, IEEE Trans. Instrum. Meas. 59 (4) (2010) 884–892.
[29] B. Yang, J. Luo, S.T. Li, Color image fusion with extend joint sparse model, in:
human subjective evaluation. These problems will be further in- Proceedings of the IEEE International Conference onPattern Recognition
vestigated in the future work. (ICPR), 2012, pp. 376–379.
H. Yin et al. / Neurocomputing 216 (2016) 216–229 229

[30] H.T. Yin, S.T. Li, L. Fang, Simultaneous image fusion and super-resolution using Yi Chai received the B.E. degree from National Uni-
sparse representation, Inf. Fusion 14 (3) (2013) 229–240. versity of Defense Technology in 1982. He received the
[31] Y. Liu, Z.F. Wang, Multi-focus image fusion based on sparse representation M.Sc. and Ph.D. degrees from Chongqing University in
with adaptive sparse domain selection, in: Proceedings of the IEEE Interna- 1994 and 2001, respectively. He is the associate dean of
tional Conference on Image and Graphics (ICIG), 2013, pp. 591–596. the College of Automation, Chongqing University. His
[32] H.T. Yin, S.T. Li, Multimodal image fusion with joint sparsity model, Opt. Eng. research interests include information processing, in-
50 (6) (2011), 067007-067007-10. tegration and control, and computer network and sys-
[33] Q. Liu, C.M. Zhang, Q. Guo, Adaptive sparse coding on PCA dictionary for image tem control.
denoising, The Visual Computer, 2015, pp. 1–15.
[34] K. Engan, S.O. Aase, J. Hakon Husoy, Method of optimal directions for frame
design, in: Proceedings of the IEEE International Conference on Acoustics,
Speech, and Signal Processing, 1999, pp. 2443–2446.
[35] M. Aharon, M. Elad, A. Bruckstein, K-SVD: an algorithm for designing over-
complete dictionaries for sparse representation, IEEE Trans. Signal Process. 54
(11) (2006) 4311–4322.
[36] Q. Zhang, B.X. Li, Discriminative K-SVD for dictionary learning in face re-
Zhaodong Liu received the B.E. degree in College of
cognition, in: Proceedings of the IEEE Conference on Computer Vision and
Automation from Chongqing University, China. He is
Pattern Recognition (CVPR), 2010, pp. 2691–2698.
currently working towards the Ph.D. degree in College
[37] R. Rubinstein, T. Peleg, M. Elad, Analysis K-SVD: a dictionary-learning algo-
of Automation, Chongqing University. His research in-
rithm for the analysis sparse model, IEEE Trans. Signal Process. 61 (3) (2013)
terests include intelligence image processing and ma-
661–677.
chine vision.
[38] J. Jose, J.N. Patel, S. Patnaik, Application of regression analysis in K-SVD dic-
tionary learning, Opt.-Int. J. Light Electron Opt. 126 (20) (2015) 2295–2299.
[39] T. Mertens, J. Kautz, F. Van Reeth, Exposure fusion, in: Proceedings of theIEEE
Pacific Conference on Computer Graphics and Applications, 2007, pp. 382–
390.
[40] K.L. Hua, H.C. Wang, A.H. Rusdi, A novel multi-focus image fusion algorithm
based on random walks, J. Vis. Commun. Image Represent. 25 (5) (2014)
951–962.
[41] M. Elad, M. Aharon, Image denoising via sparse and redundant representations
over learned dictionaries, IEEE Trans. Image Process. 15 (2) (2006) 3736–3745.
[42] M. Elad, M. Aharon, Image denoising via learned dictionaries and sparse re- Zhiqin Zhu received the B.E degree in electronic en-
presentation, in: Proceedings of the IEEE Computer Society Conference on gineering from Chongqing University in 2010. Cur-
Computer Vision and Pattern Recognition, 2006, pp. 895–900. rently, He is a Ph.D. candidate in the College of Auto-
[43] H. Liu, Y. Liu, F. Sun, Robust exemplar extraction using structured sparse mation, Chongqing University. His research interests
coding, IEEE Trans. Neural Netw. Learn. Syst. 26 (8) (2015) 1816–1821. include image processing and machine learning.

H. Yin received the B.E. degree, M.E. degree and the Ph.
D. degree in College of Automation from Chongqing
University, China. Prof. Yin joined the College of Auto-
mation at the Chongqing University in 2009. His re-
search interests include intelligence image processing
and machine vision. Prof. Yin serves as an Associate
Editor of International Journal of Complex Systems, and
the invited reviewer of IEEE Transaction of the ASABE,
Neurocomputing, IEEE Transactions on Cybernetics.

Yanxia Li received the B.E. degree in College of Auto-


mation from Chongqing University, China. She is cur-
rently a master student in the College of Automation,
Chongqing University. Her research interests include
information fusion and machine vision.