Anda di halaman 1dari 12

IMPROVING IMAGE QUALITY: A TECHNIQUE FOR LONG

DISTANCE EARTH-BASED OBJECTS

Gabriel Scarmana
Department of Transport and Main Roads
Queensland Government
Gabriel.Z.Scarmana@tmr.qld.gov.au

Abstract

A technique for improving images of earth-based long distance objects is


discussed. The input data are from a large number of short exposure image
frames of an object which appears distant within an image of a scene taken
from a remote location.The reconstruction of a sharper and/or improved image
is achieved in two steps:

(1) Sub-sets of short exposure images of the same scene are merged
separately using a method referred to as image stacking. This preliminary
step is used to increase the signal-to-noise ratio while effectively freezing
atmospheric distortions and thereby retaining high frequency spatial
information.

(2) Once the stacking process is complete, the composites obtained from each
subset in step (1) are combined via image super-resolution techniques.
Super-resolution is a term given to single image products which have been
produced by combining images of the same scene, using algorithms that
purport to increase the resolution of the final product. The theory is that
subtle sub-pixel shifts in each image will, when combined, provide for
improved spatial resolution as if the images were sampled at more points
than detected by the sensor array. In this second step the final higher-
resolution image is obtained by mapping a model of the image formation
process using local translations, or shifts, among the composite images of
step (1). These pixel shifts, if they exist, are determined by way of a
rigorous least-squares area-based image matching scheme.

This paper discusses the development of the above two step process in detail
and concludes with an evaluation of its implementation using practical
examples and/or experiments. The aim is to demonstrate the potential
application of this technique for long range surveillance systems.

1
1. Introduction

Image sequences are degraded by the loss of resolution due to down-sampling


(not meeting the sampling theorem) of the images and to the integration over
the sensor area. However, the knowledge of sub-pixel motions between frames
depicting the same scene usually allows the reconstruction of high resolution
images from low resolution image sequences.

The author acknowledges the rapid advances that have been made in
hardware solutions for image sensors to solve the problem of increasing the
resolution of digital imagery. The work presented here does not detract from
those advances rather it provides a complementary technique that is hardware
independent. This technique can increase the resolution of any image
sequence taken from any digital image capturing device.

In this work the sequence of images is acquired using an off-the-shelf digital


camera, which under-samples an object of interest within the same scene. The
reconstruction of a sharper and/or improved image is achieved in two steps:

1. A large number of images in a scene of interest are taken with a digital


camera. Sub-sets of these images are thereafter pre-processed using a
method referred to as image stacking. Image stacking is the process of
merging aligned images together in order to enhance detail, suppress
noise, remove undesired motion effects, or otherwise leverage data
contained across multiple exposures of the same scene. In general
terms, image stacking consists of taking several hundred images of the
same scene, registering (aligning their centres) then stacking them
(adding them all together pixel-by-pixel then dividing each pixel by the
number of images).

2. Once the stacking process is complete the improved images generated


by each subset in step 1 are combined via image Super-Resolution (SR)
techniques. SR is a term given to a single image product or composite
that has been produced by combining images of the same scene, using
procedures that increase the resolution of the final product. The theory is
that accurate sub-pixel shifts in each image will, when the images are
combined, provide for a higher spatial resolution that if the images were
sampled at more points than were detected by the sensor array
(Zhouchen and Heung-Yeung, 2004).

2. Notes on image stacking

Image stacking is a popular method of image processing amongst


astronomical-photographers. The exact same technique can be applied to any
situation where very similar but not identical images can be captured over a
period of time, in other words in situations where the scene is not changing
drastically due to motion or varying light and shadow. As mentioned earlier, this
pre-processing step can be used to: (1) reduce artifacts created by

2
compression (2) increase the signal-to-noise ratio noise without compromising
detail in the image (3) increase the dynamic range of an image and (4)
effectively freeze atmospheric distortions while retaining high frequency spatial
information.

Image stacking works on the assumption that the noise in an image is truly
random. This way, random fluctuations above and below actual image data will
gradually even out as more and more images are stacked. In order to perform
a stacking operation the images should be first aligned. Image alignments
ranging between 0.5 to 1 pixels may be sufficient to carry out this operation
(Bovik, 2005).

Alignment techniques compare portions of images against one another. This is


carried out by way of a registration procedure referred to as normalised cross-
correlation (Russ, 2007). The technique allows images to be aligned without
using control points in the registration procedure. Tests by the author showed
that the alignment process is satisfactory when the correlation coefficient
between two aligned images is greater or equal to 0.999. Images that do not
comply with this figure are simply discarded from the process.

As mentioned earlier image stacking is also useful for reducing the effects of
compression. This is the case of imagery obtained from digital images which
are compressed in a lossy manner, such as by the JPEG (Joint Photographic
Expert Group) protocol, in order to reduce the storage requirements. Lossy
compression means that data is lost during compression so the quality after
decoding is less than the original picture (Gonzalez, 2008).

Lossy compression protocols introduce several distortions which can


complicate the proposed enhancement process. For example, most
compression algorithms divide the original image into blocks which are
processed independently, thus creating problems of continuity between blocks
after decompression. Moreover, at high compression ratios (>20:1) the blocking
effect is especially obvious in flat areas of an image. In areas with lots of detail,
artefacts referred to as ringing or mosquito noise also become noticeable
(Zhouchen and Heung-Yeung, 2005).

3. Notes on image Super-Resolution (SR)

The majority of the literature on SR describes the use of three basic steps: (1)
accurate estimation of shifts among the different low-resolution images at a
sub-pixel level; (2) projecting or mapping the pixels of the low-resolution images
onto a higher resolution grid using the shifts detected and; (3) interpolating or
solving sets of equations derived from the geometric relationships existing
between low and high-resolution pixels. The method for estimating sub-pixel
shifts between images of the same scene is based on first order Taylor series
(Vanderwalle et al., 2005) and can determine sub-pixel shifts between images
with an accuracy of 0,1 pixels.

3
For a correct detection of the shifts between two images, the image must
contain some features that make it possible to match two under-sampled
images. Very sharp edges and small details are most affected by aliasing, so
they are not reliable to be used to estimate these shifts. Uniform areas are
ineffective, since they are translation invariant (Farsiu et al. 2004).

The best features are slow transitions between two areas of grey values as
these areas are generally unaffected by aliasing. Such portions of an image
need not be detected specifically, although their presence is very important for
an accurate result. Hence, before attempting to match a given sequence of
images of the same scene to a sub-pixel level it is recommended to uniformly
apply a low-pass filter to each image. The purpose of a low-pass filter, as
shown in Figure 1, is to smooth:

• Sharp edges and small details


• Sudden changes of intensity values and
• Aliasing effects

(a) (b)

Figure 1 – (a) A low-resolution image of the aerial view and (b) the effect of
applying a low-pass filter.

The sub-pixel motion estimator adopted here determines the x- and y-shifts and
slight rotations between any two images, but what is really required is the
accurate relative positions of a sequence of images. By calculating the shifts
with respect to a single reference image, only one realization of the relative
positions is obtained. By repeating the procedure for another reference image,
a second estimate for the relative positions is made.

Continuing to repeat this process for all images in the sequence, a better
estimate of the relative shifts, image to image, can be found. The statistical

4
measure used to determine the ‘best’ possible value for all possible
combinations of the motion vectors between a set of shifted low-resolution
images is the vector median. If the vector mean was taken instead of the
median, then the final motion vector would be an entirely new vector, and not
one of the vectors originally estimated. In addition, the mean is less robust than
the median if outliers are present (Spiegel et al., 1999).

4. Image reconstruction

Once all the improved low-resolution images (as obtained from the stacking
operation) have been processed and matched to a sub-pixel level, they are
projected or mapped on a uniformly spaced high-resolution grid (see Figure 2).
The values of the randomly distributed pixels and shifts of these images can
then be processed to generate an image with a higher resolution. A weighted
arithmetic mean can be used for this purpose.

A weighted arithmetic mean associates each known pixel of the low-resolution


images to the high-resolution pixels. For example, in Figure 2 the low-resolution
pixel C1 can be related to the pixels of the high-resolution grid by way of
Equation 1. In Figure 2 the Xi (i=1…25) represent the high-resolution pixels
whereas the Cn, (n=1…6) are the low-resolution pixel.

Figure 2: An idealized image enhancement set-up.

After C1 is related to the high-resolution grid, the process moves on to the next
low-resolution data pixel (i.e. C2) where another equation is constructed. This
sequence of equations may be thought of as “observation equations” where the
unknowns are the values of the high-resolution pixels (Xi). These linear
equations can be solved by traditional systems of simultaneous equations
(Fryer and McIntosh, 2001).

w12x12+w13x13+w16x16+w17x17+…+w23x23
C1 =
w12 +w13 +w16+w17…+w23 (1)

5
The weights (w) are defined by the inverse of the distance that separates the
low-resolution pixel from the unknown high-resolution pixels that fall within a
circle of constant radius (R). This circle is centred on each low-resolution pixel
as shown in Figure 2. The dimension of the radius R depends on the
magnification factor required. As a general rule, if the magnification factor is
chosen to be equal to 2 then the minimum radius for the circle required to
search all the high-resolution pixels is 2√2. On the other hand, if the chosen
magnification factor is n then the minimum search radius is taken as n√n, etc.

The example in Figure 2 relates to a magnification factor of 4 where the final


high- resolution composite will have 4 times more pixel than any of the low-
resolution images. To comply with sampling theory, R must ensure that an
overlapping occurs between the circles, as it is important that each of the
unknown high-resolution pixels (Xi) appear at least twice in different
observation equations (Scarmana, 2009).

Note that there will be an equation for each low-resolution pixel, being the
number of equations at least equal or greater than the number of desired high-
resolution pixels in the final enhanced image. Hence, when (say) 10 suitably
overlapping images each of modest size 320x320 are considered, it becomes
apparent that 320x320x10 = 1.00 million observation equations could be
formed. If a magnification factor of 2 is chosen, then the resultant resolution
enhanced image may require twice as many equations.

Although more computationally expensive, as compared to alternative


reconstruction techniques based on direct interpolation methods, this
reconstruction system gives accurate estimates of the error at each computed
point, thus providing a measure of confidence and reliability for the accuracy
and precision of each high-resolution pixel of the enhanced image.

5. Reconstruction with synthetic images

The proposed image enhancement process was first tested for digital imaging
applications using synthetic data. The performance of the method was
thereafter tested with real data as extracted from a sequence of images of an
object within a static scene. In this synthetic experiment, the ‘true’ image was
known prior to the enhancement and thus the accuracy of the enhancement
could be investigated and quantified.

A set of 600 grey scale images of the aerial view (256 colour gradations) were
derived by down-sampling this image using a weighted mean average of
neighbouring pixels and using pre-assigned sub-pixel shift values. Each of the
600 images was also JPEG compressed using the same compression ratio
(i.e., 10:1). The size of the original image was 512x512 pixels whereas the size
of the under sampled images was 128x128.

Random 'salt & pepper' noise was added to each of these 600 images. The
percentage (3%) of the total number of pixels was changed to either totally

6
black or white. As illustrated in Figure 3(a) the effect is similar to sprinkling
white and black dots on the image. One example where salt and pepper noise
arises is in transmitting images over noisy digital links (Russ, 2007).

(a) (b)

Figure 3 - (a) one of the 600 low-resolution, blurred and noisy images of the aerial
view and (b) one of the improved 20 images obtained from the stacking process.

No rotations were applied in this test. Rotations in the images would have
added extra parameters to the enhancement process and may have detracted
from the strength of the conclusions reached in the experiments. Correlation
obviously exists between the image plane of a digital camera and orientation
parameters such as tilts, rotations and affinity/obliquity of the sensor. In a
controlled experiment where the aim is to demonstrate the use of a process to
enhance image resolution per se, it was thought unwise to introduce such
complications.

The 600 images were then divided in subsets of 30 images each. Image
stacking was then applied to each subset so as to obtain 20 composites of
improved quality (see an example in Figure 3(b)). Each of these composite
displayed an improved image where the effects of compression and noise in
the imagery were virtually eliminated. Note that the stacking process was not
used to increase image resolution. The purpose of the stacking process is that
of improving quality and eliminating unwanted distorting effects (i.e., noise
leves) so as to prepare the images for the SR step. Subsets of 30 images were
selected because this number proved to be sufficiently effective in a number of
synthetic experiments conducted in images of the same characteristics and
distortions used in this test. Figure 4 shows the result of combining the 20 com-
posites via SR.

7
Figure 4 - The final high resolution image (512x512) as constructed using SR. The
enhancement is a result of combining 20 (128x128) image composites as shown in
3(b). The enhanced image contains 4 times more real pixels than any of the original
low-resolution images.

6. Application to real imaging

A set of 350 images was taken to a weather radar station located approximately
1 Km away from the camera position. The scene is shown in Figure 5. The
camera resolution was set to its maximum quality (5 megapixels) but only a
section of the scene containing the dome of the radar station shown in the box
was enhanced in this experiment. The diameter of the dome is approximately
12 metres and is located about 150m above mean sea level. The images were
taken with the camera fixed on a tripod. An enlarged view of the radar dome is
shown in Figure 5(b) as extracted from one of the raw images taken by the
camera.

The proposed enhancement technique produced the result shown in Figure


5(c) and 5(d). This enhancement was obtained following the same steps
applied in the synthetic example outlined in the previous section. The only
difference is that only 7 subsets of 50 low-resolution images were created. The
resulting enhanced image of the radar station contains 4 times more pixels
(240x400) than any of the original low-resolution mages (60x100).

8
(a) (b)

(c) (d)

Figure 5 – (a) A view of the scene as taken by a 5 megapixels digital camera located 1
Km. away from the object of interest and (b) the area of interest; (c) is the improved
view of the same scene shown in (a) and (d) is an enlarged improved view of the
radar’s dome after processing 350 low resolution images using the proposed method.

Although this experiment relates to a grey scale sequence, the same process
can be applied when using colour. Colour images can be considered as three
separate images containing red, green and blue components (RGB). Each of
these components or channels can be enhanced independently and then fused
to produce a colour image with enhanced resolution (Bovik, 2005 and Rees
2007).

The underlying premise is that for any colour image sequence, the motion
between adjacent frames for each colour channel should be exactly the same
(Farsiu et al., 2001). In other words there is only one actual motion field which
describes the sub-pixel shifts from one frame to the next. In practice, however,
when the motion estimation is performed on each channel independently, the
motion vectors may differ slightly between the different colour channels, thus
requiring more complex statistical computations (Scarmana, 2009).

9
7. Number of low-resolution images required

The required number of low-resolution images generally depends on the


distribution of the sub-pixel shifts as well as on the signal to noise ratio, and the
amount of noise present in the imagery. For instance, to minimise the influence
of noise it is important that the distribution of the shifts between the low-
resolution images be as complete as possible. This is illustrated in an
analogous manner in the example of Figure 6 (after Hendriks and van Vliet,
1999). In this figure, any three exact samples define a circle (left diagram).

Figure 6 - The effect of noise on geometry determination.

However, if the samples contain noise and are situated close to one another,
almost any circle will fit (middle). Positioning theses samples far apart ensures
a more correct representation in spite of the noise (right). The reconstruction of
a higher resolution image with the minimum number of low-resolution images is
possible, but it should not be expected to always achieve a high accuracy,
especially for higher magnification factors (>4).

High magnification factors require large numbers of low-resolution images,


meaning that these low resolution images must be relatively close to one
another, that is, relatively small shifts. The accuracy of detecting those offsets
will clearly affect the accuracy of the final high-resolution image as the
uncertainty in a sub-pixel shift’s determination may be of the same magnitude
as the shift itself.

Conclusions

The problem of aliasing reduction and resolution enhancement can be


addressed by exploiting multiple frames of interest that offer unique
perspectives of a specific scene of interest. The focus here was to exploit
frame-to-frame translational shifts or motions that may result from line-of-sight
jitter of a sensor mounted on a still platform. However, exploiting these sup-
pixel motions requires accurate estimates of them.

In this context, a method for enhancing the resolution of a compressed and


noisy sequence using a multi-frame image enhancement approach has been

10
presented. The process uses two techniques referred to as Image stacking
followed by Super-Resolution. The technique does not rely on control points for
the accurate matching or registration the images.

The registration or matching methodology and subsequent use of the proposed


enhancement technique may lead to a general approach to the problem of
generating a higher resolution image from compressed and noisy sequences of
slightly offset images. The application and effectiveness of the enhancement
process has been demonstrated in 2D tests and refinements to the technique
are being undertaken to increase the accuracy achievable for larger image
magnifications.

This may extend the range of applications which could benefit from utilising this
device independent image enhancement process, possibly adapting this
method to a generalized scheme whereby both sensors and objects of interest
are dynamic and the illumination is non-uniform. It is the author’s belief that this
enhancement technique may be ideal for surveillance systems.

References:

Farsiu S., Robinson D., Elad M., and Milanfar P., 2004. “Advances and
Challenges in SR”, International Journal of Imaging Systems and Technology,
Volume 14, no 2, pp. 47-57, August.

Fryer J. and McIntosh, K.L., 2001. “Enhancement of Image Resolution in Digital


Photogrammetry”. Photogrammetric Engineering & Remote Sensing”, Vol.67,
No.6, pp.741-749.

Gonzalez R.C. and Woods R.E. 2008. “Digital image processing”. 3d Edition,
Published by Prentice Hall. 954 pages.

Hendriks L. C. and van Vliet L.J., 1999. “Resolution Enhancement of a


Sequence of Undersampled Shifted Images”. ASCI 1999. Proceeding 5th
Annual Conference of the Advanced School for Computing and Imaging
(Heijen. NL. June 15-17). ASCI. Delft. 95-102.

Rees W. G. 2007. “Physical Principles of Remote Sensing”. Second edition.


Cambridge University Press.

Russ C. J., 2007. “The Image Processing Handbook”, Published by CRC Press,
ISBN 0849372542, 9780849372544, 817 pages.

Spiegel M. R. and Stephens L., 1999. “Theory and Problems of Statistics”,


Schaum’s Outline Series, McGraw-Hill Book Company”. pp. 556.

Scarmana G., 2009. “High-resolution Image Generation Using Warping


Transformations”. SIGMAP 2009. Proceedings of the International Conference
on Signal Processing and Multimedia Applications, Milan, Italy, July 7-10.

11
Vandewalle P., Susstrunk S. and Vetterli M., 2005. “A Frequency Domain
Approach to Registration of Aliased Images with Application to SR”, EURASIP
Journal on Applied Signal Processing.

Zhouchen, L. and Heung-Yeung, S., 2004. “Fundamental limits of


Reconstruction-Based SR Algorithms under Local Translation”. IEEE
Transactions on Pattern Analysis and Machine Intelligence. Vol. 26, No.1,
January.

12

Anda mungkin juga menyukai