Anda di halaman 1dari 48

Delp, E.J. Allebach, J., Bouman, C.A., Rajala, S.A., Bose, N.K., Sibul, L.H., Wolf, W.

Zhang, Y-Q. Multidimensional Signal Processing
The Electrical Engineering Handbook
Ed. Richard C. Dorf
Boca Raton: CRC Press LLC, 2000
2000 by CRC Press LLC
SIgnaI rocessIng
17.1 Digital Image Piocessing
Image Captuie Point Opeiations Image Enhancement Digital
Image Compiession Reconstiuction Edge Detection Analysis
and Computei Vision
17.2 Video Signal Piocessing
Sampling Quantization Vectoi Quantization Video
Compiession Infoimation-Pieseiving Codeis Piedictive
Coding Motion-Compensated Piedictive Coding Tiansfoim
Coding Subband Coding HDTV Motion Estimation
Techniques Token Matching Methods Image Quality and Visual
Peiception Visual Peiception
17.3 Sensoi Aiiay Piocessing
Spatial Aiiays, Beamfoimeis, and FIR Filteis Disciete Aiiays foi
Beamfoiming Disciete Aiiays and Polynomials Velocity Filteiing
17.4 Video Piocessing Aichitectuies
Computational Techniques Heteiogeneous Multipiocessois Video
Signal Piocessois Instiuction Set Extensions
17.5 MPEG-4 Based Multimedia Infoimation System
MPEG-4 Multimedia System
17.1 Digita! Image Prucessing
dvord j. De|, jon A||eboc|, ond C|or|e A. oumon
What is a digital image: What is digital image piocessing: Why does the use of computeis to piocess pictuies
seem to be eveiywheie: The space piogiam, iobots, and even people with peisonal computeis aie using digital
image piocessing techniques. In this section we shall desciibe what a digital image is, how one obtains digital
images, what the pioblems with digital images aie (they aie not tiouble-fiee), and fnally how these images aie
used by computeis. A discussion of piocessing the images is piesented latei in the section. At the end of this
section is a bibliogiaphy of selected iefeiences on digital image piocessing.
The use of computeis to piocess pictuies is about 30 yeais old. While some woik was done moie than 50 yeais
ago, the yeai 1960 is usually the accepted date when seiious woik was staited in such aieas as optical chaiactei
iecognition, image coding, and the space piogiam. NASA`s Rangei moon mission was one of the fist piogiams
to ietuin digital images fiom space. The Jet Piopulsion Laboiatoiy (JPL) established one of the eaily geneial-
puipose image piocessing facilities using second-geneiation computei technology.
The eaily attempts at digital image piocessing weie hampeied because of the ielatively slow computeis used,
i.e., the IBM 7094, the fact that computei time itself was expensive, and that image digitizeis had to be built
by the ieseaich centeis. It was not until the late 1960s that image piocessing haidwaie was geneially available
(although expensive). Today it is possible to put togethei a small laboiatoiy system foi less than $60,000; a
system based on a populai home computei can be assembled foi about $5,000. As the cost of computei haidwaie
dvard }. IeIp
Purdue Inverry
}an AIIelach
Purdue Inverry
CharIes A. Bouman
Purdue Inverry
Sarah A. Ra}aIa
Norr| Coro|no Srore Inverry
. K. Bose
Penny|vono Srore Inverry
L. H. SIluI
Penny|vono Srore Inverry
Wayne WoIl
Prnceron Inverry
Ya-In Zhang
Mcroofr Feeorc|, C|no
2000 by CRC Press LLC
decieases, moie uses of digital image piocessing will appeai in all facets of life. Some people have piedicted
that by the tuin of the centuiy at least 50% of the images we handle in oui piivate and piofessional lives will
have been piocessed on a computei.
Image Capture
A digital image is nothing moie than a matiix of numbeis. The question is how does this matiix iepiesent a
ieal image that one sees on a computei scieen:
Like all imaging piocesses, whethei they aie analog oi digital, one fist staits with a sensoi (oi tiansducei)
that conveits the oiiginal imaging eneigy into an electiical signal. These sensois, foi instance, could be the
photomultipliei tubes used in an x-iay system that conveits the x-iay eneigy into a |nown electiical voltage.
The tiansducei system used in ultiasound imaging is an example wheie sound piessuie is conveited to electiical
eneigy; a simple TV cameia is peihaps the most ubiquitous example. An impoitant fact to note is that the
piocess of conveision fiom one eneigy foim to an electiical signal is not necessaiily a |near piocess. In othei
woids, a piopoitional chaige in the input eneigy to the sensoi will not always cause the same piopoitional
chaige in the output electiical signal. In many cases calibiation data aie obtained in the laboiatoiy so that the
ielationship between the input eneigy and output electiical signal is known. These data aie necessaiy because
some tiansducei peifoimance chaiacteiistics change with age and othei usage factois.
The sensoi is not the only thing needed to foim an image in an imaging system. The sensoi must have some
spatial extent befoie an image is foimed. By spatial extent we mean that the sensoi must not be a simple point
souice examining only one location of eneigy output. To explain this fuithei, let us examine two types of
imaging sensois used in imaging: a CCD video cameia and the ultiasound tiansducei used in many medical
imaging applications.
The CCD cameia consists of an array of light sensois known as chaige-coupled devices. The image is foimed
by examining the output of each sensoi in a pieset oidei foi a fnite time. The electionics of the system then foims
an electiical signal which pioduces an image that is shown on a cathode-iay tube (CRT) display. The image is
foimed because theie is an aiiay of sensois, each one examining only one spatial location of the iegion to be sensed.
The piocess of sampling the output of the sensoi aiiay in a paiticulai oidei is known as stannng. Scanning
is the typical method used to conveit a two-dimensional eneigy signal oi image to a one-dimensional electiical
signal that can be handled by the computei. (An image can be thought of as an eneigy feld with spatial extent.)
Anothei foim of scanning is used in ultiasonic imaging. In this application theie is on|y one sensoi instead of
an aiiay of sensois. The ultiasound tiansducei is moved oi steeied (eithei mechanically oi electiically) to
vaiious spatial locations on the patient`s chest oi stomach. As the sensoi is moved to each location, the output
electiical signal of the sensoi is sampled and the electionics of the system then foim a television-like signal
which is displayed. Neaily all the tiansduceis used in imaging foim an image by eithei using an aiiay of sensois
oi a single sensoi that is moved to each spatial location.
One immediately obseives that both of the appioaches discussed above aie equivalent in that the eneigy is
sensed at vaiious spatial locations of the object to be imaged. This eneigy is then conveited to an electiical signal
by the tiansducei. The image foimation piocesses just desciibed aie classical analog image foimation, with the
distance between the sensoi locations limiting the spatial iesolution in the system. In the aiiay sensois, iesolution
is deteimined by how close the sensois aie located in the aiiay. In the single-sensoi appioach, the spatial iesolution
is limited by how fai the sensoi is moved. In an actual system spatial iesolution is also deteimined by the
peifoimance chaiacteiistics of the sensoi. Heie we aie assuming foi oui puiposes er[et sensois.
In digital image foimation one is conceined about two piocesses: saa| sam|ng and quantization. Sam-
pling is quite similai to scanning in analog image foimation. The second piocess is known as quan:aon oi
ana|og-o-Jga| tonerson, wheieby at each spatial location a num|er is assigned to the amount of eneigy the
tiansducei obseives at that location. This numbei is usually piopoitional to the electiical signal at the output
of the tiansducei. The oveiall piocess of sampling and quantization is known as Jg:aon. Sometimes the
digitization piocess is just iefeiied to as analog-to-digital conveision, oi A/D conveision; howevei, the ieadei
should iemembei that digitization also includes spatial sampling.
The digital image foimulation piocess is summaiized in Fig. 17.1. The spatial sampling piocess can be
consideied as oveilaying a giid on the object, with the sensoi examining the eneigy output fiom each giid box
2000 by CRC Press LLC
and conveiting it to an electiical signal. The quantization piocess then assigns a numbei to the electiical signal;
the iesult, which is a marx of numbeis, is the digital iepiesentation of the image. Each spatial location in the
image (oi giid) to which a numbei is assigned is known as a ture e|emen oi pixel (oi pel). The size of the
sampling giid is usually given by the numbei of pixels on each side of the giid, e.g., 256 256, 512 512,
488 380.
The quantization piocess is necessaiy because all infoimation to be piocessed using computeis must be
iepiesented by numbeis. The quantization piocess can be thought of as one wheie the input eneigy to the
tiansducei is iepiesented by a fnite numbei of eneigy values. If the eneigy at a paiticulai pixel location does
not take on one of the fnite eneigy values, it is assigned to the closest value. Foi instance, suppose that we
assume a ror that only eneigy values of 10, 20, 50, and 110 will be iepiesented (the units aie of no concein
in this example). Suppose at one pixel an eneigy of 23.5 was obseived by the tiansducei. The A/D conveitei
would then assign this pixel the eneigy value of 20 (the closest one). Notice that the quantization piocess makes
mistakes; this eiioi in assignment is known as quan:aon eiioi oi quan:aon nose.
In oui example, each pixel is iepiesented by one of foui possible values. Foi ease of iepiesentation of the
data, it would be simplei to assign to each pixel the index value 0, 1, 2, 3, instead of 10, 20, 50, 110. In fact,
this is typically done by the quantization piocess. One needs a simple table to know that a pixel assigned the
value 2 coiiesponds to an eneigy of 50. Also, the numbei of possible eneigy levels is typically some integei
powei of two to also aid in iepiesentation. This powei is known as the numbei of |s needed to iepiesent the
eneigy of each pixel. In oui example each pixel is iepiesented by two bits.
One question that immediately aiises is how accuiate the digital iepiesentation of the image is when one
compaies the digital image with a coiiesponding analog image. It should fist be pointed out that aftei the
digital image is obtained one iequiies special haidwaie to conveit the matiix of pixels back to an image that
can be viewed on a CRT display. The piocess of conveiting the digital image back to an image that can be
viewed is known as Jga|-o-ana|og tonerson, oi D/ tonerson.
FIGURE 17.1 Digital image foimation: sampling and quantization.
2000 by CRC Press LLC
The quality of iepiesentation of the image is deteimined by how close spatially the pixels aie located and
how many levels oi numbeis aie used in the quantization, i.e., how coaise oi fne is the quantization. The
sampling accuiacy is usually measuied in how many pixels theie aie in a given aiea and is cited in pixels/unit
length, i.e., pixels/cm. This is known as the saa| sam|ng rae. One would desiie to use the lowest iate possible
to minimize the numbei of pixels needed to iepiesent the object. If the sampling iate is too low, then obviously
some details of the object to be imaged will not be iepiesented veiy well. In fact, theie is a mathematical
theoiem which deteimines the lowest sampling iate possible to pieseive details in the object. This iate is known
as the Nyqus sampling iate (named aftei the late Bell Laboiatoiies engineei Haiiy Nyquist). The theoiem
states that the sampling iate must be wte the highest possible detail one expects to image in the object. If the
object has details closei than, say 1 mm, one must take at least 2 pixels/mm. (The Nyquist theoiem actually
says moie than this, but a discussion of the entiie theoiem is beyond the scope of this section.) If we sample
at a lowei iate than the theoietical lowest limit, the iesulting digital iepiesentation of the object will be distoited.
This type of distoition oi sampling eiioi is known as a|asng eiiois. Aliasing eiiois usually manifest themselves
in the image as moii patteins (Fig. 17.2). The impoitant point to iemembei is that theie is a |ower |m to
the spatial sampling iate such that object detail can be maintained. The sampling iate can also be stated as the
total numbei of pixels needed to iepiesent the digital image, i.e., the matiix size (oi giid size). One often sees
these sampling iates cited as 256 256, 512 512, and so on. If the same object is imaged with a laige matiix
size, the sampling iate has obviously incieased. Typically, images aie sampled on 256 256, 512 512, oi
1024 1024 giids, depending on the application and type of modality. One immediately obseives an impoitant
issue in digital iepiesentation of images: that of the laige numbei of pixels needed to iepiesent the image. A
256 256 image has 65,536 pixels and a 512 512 image has 262,144 pixels! We shall ietuin to this point
latei when we discuss piocessing oi stoiage of these images.
The quality of the iepiesentation of the digital image is also deteimined by the numbei of levels oi shades
of giay that aie used in the quantization. If one has moie levels, then fewei mistakes will be made in assigning
values at the output of the tiansducei. Figuie 17.3 demonstiates how the numbei of giay levels affects the
digital iepiesentation of an aiteiy. When a small numbei of levels aie used, the quantization is coaise and the
quantization eiioi is laige. The quantization eiioi usually manifests itself in the digital image by the appeaiance
FIGURE 17.2 This image shows the effects of aliasing due to sampling the image at too low a iate. The image should be
stiaight lines conveiging at a point. Because of undeisampling, it appeais as if theie aie patteins in the lines at vaiious
angles. These aie known as moii patteins.
2000 by CRC Press LLC
of [a|se tonourng in the pictuie. One usually needs at least 6 bits oi 64 giay levels to iepiesent an image
adequately. Highei-quality imaging systems use 8 bits (256 levels) oi even as many as 10 bits (1024 levels) pei
pixel. In most applications, the human obseivei cannot distinguish quantization eiioi when theie aie moie
than 256 levels. (Many times the numbei of giay levels is cited in bytes. One byte is 8 bits, i.e., high-quality
monochiome digital imaging systems use one byte pei pixel.)
One of the pioblems biiey mentioned pieviously is the laige numbei of pixels needed to iepiesent an image,
which tianslates into a laige amount of digital data needed foi the iepiesentation. A 512 512 image with
8 bits/pixel (1 byte/pixel) of giay level iepiesentation iequiies 2,097,152 bits of computei data to desciibe it. A
typical computei fle that contains 1000 woids usually iequiies only about 56,000 bits to desciibe it. The 512
512 image is 37 times laigei! (A pictuie is tiuly woith moie than 1000 woids.) This data iequiiement is one
of the majoi pioblems with digital imaging, given that the stoiage of digital images in a computei fle system
is expensive. Peihaps anothei example will demonstiate this pioblem. Many computeis and woid piocessing
systems have the capability of tiansmitting infoimation ovei telephone lines to othei systems at data iates of
2400 bits pei second. At this speed it would iequiie neaily 15 minutes to tiansmit a 512 512 image! Moving
objects aie imaged digitally by taking Jga| snas|os of them, i.e., digital video. Tiue digital imaging would
acquiie about 30 images/s to captuie all the impoitant motion in a scene. At 30 images/s, with each image
sampled at 512 512 and with 8 bits/pixel, the system must handle 62,914,560 bits/s. Only veiy expensive
acquisition systems aie capable of handling these laige data iates.
The gieatest advantage of digital images is that they can be piocessed on a computei. Any type of opeiation
that one can do on a computei can be done to a digital image. Recall that a digital image is just a (huge) matiix
of numbeis. Digital image piocessing is the piocess of using a computei to extiact useful infoimation fiom
this matiix. Piocessing that cannot be done optically oi with analog systems (such as eaily video systems) can
be easily done on computeis. The disadvantage is that a laige amount of data needs to be piocessed and on
some small computei systems this can take a long time (houis). We shall examine image piocessing in moie
detail in the next subsection and discuss some of the computei haidwaie issues in a latei chaptei.
FIGURE 17.3 This image demonstiates the effects of quantization eiioi. The uppei left image is a coionaiy aiteiy image
with 8 bits (256 levels oi shades of giay) pei pixel. The uppei iight image has 4 bits/pixel (16 levels). The lowei left image
has 3 bits/pixel (8 levels). The lowei iight image has 2 bits/pixel (4 levels). Note the false contouiing in the images as the
numbei of possible levels in the pixel iepiesentation is ieduced. This false contouiing is the quantization eiioi, and as the
numbei of levels incieases the quantization eiioi decieases because fewei mistakes aie being made in the iepiesentation.
2000 by CRC Press LLC
Puint Operatiuns
Peihaps the simplest image piocessing opeiation is that of modifying the values of individual pixels in an image.
These opeiations aie commonly known as point operations. A point opeiation might be used to highlight
ceitain iegions in an image. Suppose one wished to know wheie all the pixels in a ceitain giay level iegion
weie saa||y located in the image. One would modify all those pixel values to 0 (black) oi 255 (white) such
that the obseivei could see wheie they weie located.
Anothei example of a point opeiation is tonras en|antemen oi tonras sret|ng. The pixel values in a
paiticulai image may occupy only a small iegion of giay level distiibution. Foi instance, the pixels in an image
may only take on values between 0 and 63, when they could nominally take on values between 0 and 255. This
is sometimes caused by the way the image was digitized and/oi by the type of tiansducei used. When this image
is examined on a CRT display the contiast looks washed out. A simple point opeiation that multiplies each
pixel value in the image by foui will inciease the appaient contiast in the image; the new image now has giay
values between 0 and 252. This opeiation is shown in Fig. 17.4. Possibly the most widely used point opeiation
in medical imaging is seuJo-to|orng. In this point opeiation all the pixels in the image with a paiticulai giay
value aie assigned a to|or. Vaiious schemes have been pioposed foi appiopiiate pseudo-coloi tables that assign
the giay values to colois. It should be mentioned that point opeiations aie often cascaded, i.e., an image
undeigoes contiast enhancement and then pseudo-coloiing.
The opeiations desciibed above can be thought of as opeiations (oi a|gor|ms) that modify the iange of
the giay levels of the pixels. An impoitant featuie that desciibes a gieat deal about an image is the |sogram
of the pixel values. A histogiam is a table that lists how many pixels in an image take on a paiticulai giay value.
These data aie often plotted as a function of the giay value. Point opeiations aie also known as |sogram
moJftaon oi |sogram sret|ng. The contiast enhancement opeiation shown in Fig. 17.4 modifes the
histogiam of the iesultant image by stietching the giay values fiom a iange of 0-63 to a iange of 0-252. Some
point opeiations aie such that the iesulting histogiam of the piocessed image has a paiticulai shape. A populai
foim of histogiam modifcation is known as |sogram equa|:aon, wheieby the pixels aie modifed such that
the histogiam of the piocessed image is almost at, i.e., all the pixel values occui equally.
It is impossible to list all possible types of point opeiations; howevei, the impoitant thing to iemembei is
that these opeiations piocess one pixel at a time by modifying the pixel based on|y on its giay level value and
no wheie it is distiibuted spatially (i.e., location in the pixel matiix). These opeiations aie peifoimed to enhance
the image, make it easiei to see ceitain stiuctuies oi iegions in the image, oi to foice a paiticulai shape to the
histogiam of the image. They aie also used as initial opeiations in a moie complicated image piocessing
FIGURE 17.4 Contiast stietching. The image on the iight has giay values between 0 and 63, causing the contiast to look
washed out. The image on the iight has been contiast enhanced by multiplying the giay levels by foui.
2000 by CRC Press LLC
Image Enhancement
Image enhancement is the use of image piocessing algoiithms to iemove ceitain types of distoition in an
image. The image is enhanced by iemoving noise, making the edge stiuctuies in the image stand out, oi any
othei opeiation that makes the image |oo| bettei.
Point opeiations discussed above aie geneially consideied
to be enhancement opeiations. Enhancement also includes opeiations that use gioups of pixels and the spatial
location of the pixels in the image.
The most widely used algoiithms foi enhancement aie based on pixel functions that aie known as window
operations. A window opeiation peifoimed on an image is nothing moie than the piocess of examining the
pixels in a ceitain iegion of the image, called the window iegion, and computing some type of mathematical
function deiived fiom the pixels in the window. In most cases the windows aie squaie oi iectangle, although
othei shapes have been used. Aftei the opeiation is peifoimed, the iesult of the computation is placed in the
centei pixel of the window wheie a 3 3 pixel window has been extiacted fiom the image. The values of the
pixels in the window, labeled a
, a
, . . . , a
, aie used to compute a new pixel value which ieplaces the value of
, and the window is moved to a new centei location until all the pixels in the oiiginal image have been
piocessed. As an example of a window opeiation, suppose we computed the aveiage value of the pixels in the
window. This opeiation is known as smoo|ng and will tend to ieduce noise in the image, but unfoitunately
it will also tend to blui edge stiuctuies in the image.
Anothei window opeiation often used is the computation of a lineai weighted sum of the pixel values. Let
be the new pixel value that will ieplace a
in the oiiginal image. We then foim
wheie the o

`s aie any ieal numbeis. Foi the simple smoothing opeiation desciibed above we set o

1/9 foi
all . By changing the values of the o

weights, one can peifoim diffeient types of enhancement opeiations to

an image. Any window opeiation that can be desciibed by Eq. 17.1 is known as a |near wnJow oeraon oi
tono|uon opeiatoi. If some of the o

coeffcients take on negative values, one can enhance the appeaiance of

edge stiuctuies in the image.
It is possible to compute a nonlineai function of the pixels in the window. One of the moie poweiful nonlineai
window opeiations is that of meJan f|erng. In this opeiation all the pixels in the window aie listed in
descending magnitude and the middle, oi meJan, pixel is obtained. The median pixel then is used to ieplace
. The median fltei is used to iemove noise fiom an image and at the same time pieseive the edge stiuctuie
in the image. Moie iecently theie has been a gieat deal of inteiest in mor|o|ogta| oeraors. These aie also
nonlineai window opeiations that can be used to extiact oi enhance shape infoimation in an image.
In the pieceding discussion, all of the window opeiations weie desciibed on 3 3 windows. The cuiient
ieseaich in window opeiations is diiected at using laige window sizes, i.e., 9 9, 13 13, oi 21 21. The
philosophy in this woik is that small window sizes only use local infoimation and what one ieally needs to use
is infoimation that is moie global in natuie.
Digita! Image Cumpressiun
Image compression iefeis to the task of ieducing the amount of data iequiied to stoie oi tiansmit a digital
image. As discussed eailiei, in its natuial foim, a digital image compiises an aiiay of numbeis. Each such
Image enhancement is often confused with mage resoraon. Image enhancement is the ad hoc application of vaiious
piocessing algoiithms to enhance the appeaiance of the image. Image iestoiation is the application of algoiithms that use
knowledge of the degiadation piocess to enhance oi iestoie the image, i.e., deconvolution algoiithms used to iemove the
effect of the apeituie point spiead function in bluiied images. A discussion of image iestoiation is beyond the scope of this

a a

2000 by CRC Press LLC
numbei is the sampled value of the image at a pixel (pictuie element) location. These numbeis aie iepiesented
with fnite piecision using a fxed numbei of bits. Until iecently, the dominant image size was 512 512 pixels
with 8 bits oi 1 byte pei pixel. The total stoiage size foi such an image is 512
- 0.25 10
bytes oi 0.25 Mbytes.
When digital image piocessing fist emeiged in the 1960s, this was consideied to be a foimidable amount of
data, and so inteiest in developing ways to ieduce this stoiage iequiiement aiose immediately. Since that time,
image compiession has continued to be an active aiea of ieseaich. The iecent emeigence of standaids foi image
coding algoiithms and the commeicial availability of veiy laige scale integiation (VLSI) chips that implement
image coding algoiithms is indicative of the piesent matuiity of the feld, although ieseaich activity continues
With declining memoiy costs and incieasing tiansmission bandwidths, 0.25 Mbytes is no longei consideied
to be the laige amount of data that it once was. This might suggest that the need foi image compiession is not
as gieat as pieviously. Unfoitunately (oi foitunately, depending on one`s point of view), this is not the case
because oui appetite foi image data has also giown enoimously ovei the yeais. The old 512 512 pixels
1 byte pei pixel standaid`` was a consequence of the spatial and giay scale iesolution of sensois and displays
that weie commonly available until iecently. At this time, displays with moie than 10
pixels and
24 bits/pixel to allow full coloi iepiesentation (8 bits each foi ied, gieen, and blue) aie becoming commonplace.
Thus, oui 0.25-Mbyte standaid image size has giown to 3 Mbytes. This is just the tip of the icebeig, howevei.
Foi example, in desktop piinting applications, a 4-coloi (cyan, magenta, yellow, and black) image of an 8.5
11 in.
page sampled at 600 dots pei in. iequiies 134 Mbytes. In iemote sensing applications, a typical
hypeispectial image contains teiiain iiiadiance measuiements in each of 200 10-nm-wide spectial bands at
25-m inteivals on the giound. Each measuiement is iecoided with 12-bit piecision. Such data aie acquiied
fiom aiiciaft oi satellite and aie used in agiicultuie, foiestiy, and othei felds conceined with management of
natuial iesouices. Stoiage of these data fiom just a 10 10 km
aiea iequiies 4800 Mbytes.
Figuie 17.5 shows the essential components of an image compiession system. At the system input, the image
is encoded into its compiessed foim by the image codei. The compiessed image may then be subjected to
fuithei digital piocessing, such as eiioi contiol coding, enciyption, oi multiplexing with othei data souices,
befoie being used to modulate the analog signal that is actually tiansmitted thiough the channel oi stoied in
a stoiage medium. At the system output, the image is piocessed step by step to undo each of the opeiations
that was peifoimed on it at the system input. At the fnal step, the image is decoded into its oiiginal uncom-
piessed foim by the image decodei. Because of the iole of the image encodei and decodei in an image
compiession system, image coding is often used as a synonym foi image compiession. If the ieconstiucted
image is identical to the oiiginal image, the compiession is said to be lossless. Otheiwise, it is lossy.
Image compiession algoiithms depend foi theii success on two sepaiate factois: iedundancy and iiielevancy.
ReJunJanty iefeis to the fact that each pixel in an image does not take on all possible values with equal
piobability, and the value that it does take on is not independent of that of the othei pixels in the image. If
this weie not tiue, the image would appeai as a white noise pattein such as that seen when a television ieceivei
is tuned to an unused channel. Fiom an infoimation-theoietic point of view, such an image contains the
FIGURE 17.5 Oveiview of an image compiession system.
2000 by CRC Press LLC
maximum amount of infoimation. Fiom the point of view of a human oi machine inteipietei, howevei, it
contains no infoimation at all. Irre|eanty iefeis to the fact that not all the infoimation in the image is iequiied
foi its intended application. Fiist, undei typical viewing conditions, it is possible to iemove some of the
infoimation in an image without pioducing a change that is peiceptible to a human obseivei. This is because
of the limited ability of the human viewei to detect small changes in luminance ovei a laige aiea oi laigei
changes in luminance ovei a veiy small aiea, especially in the piesence of detail that may mask these changes.
Second, even though some degiadation in image quality may be obseived as a iesult of image compiession,
the degiadation may not be objectionable foi a paiticulai application, such as teleconfeiencing. Thiid, the
degiadation intioduced by image compiession may not inteifeie with the ability of a human oi machine to
extiact the infoimation fiom the image that is impoitant foi a paiticulai application. Lossless compiession
algoiithms can only exploit iedundancy, wheieas lossy methods may exploit both iedundancy and iiielevancy.
A myiiad of appioaches have been pioposed foi image compiession. To biing some semblance of oidei to
the feld, it is helpful to identify those key elements that piovide a ieasonably accuiate desciiption of most
encoding algoiithms. These aie shown in Fig. 17.6. The fist step is [eaure exraton. Heie the image is
paititioned into N N blocks of pixels. Within each block, a featuie vectoi is computed which is used to
iepiesent all the pixels within that block. If the featuie vectoi piovides a complete desciiption of the block, i.e.,
the block of pixel values can be deteimined exactly fiom the featuie vectoi, then the featuie is suitable foi use
in a lossless compiession algoiithm. Otheiwise, the algoiithm will be lossy. Foi the simplest featuie vectoi, we
let the block size N 1 and take the pixel values to be the featuies. Anothei impoitant example foi N 1 is
to let the featuie be the eiioi in the piediction of the pixel value based on the values of neighboiing pixels
which have alieady been encoded and, hence, whose values would be known as the decodei. This featuie foims
the basis foi reJte entoJng, of which diffeiential pulse-code modulation (DPCM) is a special case. Foi
laigei size blocks, the most impoitant example is to compute a two-dimensional (2-D) Fouiiei-like tiansfoim
of the block of pixels and to use the N
tiansfoim coeffcients as the featuie vectoi. The widely used Joint
Photogiaphic Expeits Gioup (JPEG) standaid image codei is based on the disciete cosine tiansfoim (DCT)
with a block size of N 8. In all of the foiegoing examples, the block of pixel values can be ieconstiucted
exactly fiom the featuie vectoi. In the last example, the inveise DCT is used. Hence, all these featuies may foim
the basis foi a lossless compiession algoiithm. A featuie vectoi that does not piovide a complete desciiption
of the pixel block is a vectoi consisting of the mean and vaiiance of the pixels within the block and an N N
binaiy mask indicating whethei oi not each pixel exceeds the mean. Fiom this vectoi, we can only ieconstiuct
an appioximation to the oiiginal pixel block which has the same mean and vaiiance as the oiiginal. This featuie
is the basis foi the lossy block tiuncation coding algoiithm. Ideally, the featuie vectoi should be chosen to
piovide as noniedundant as possible a iepiesentation of the image and to sepaiate those aspects of the image
that aie ielevant to the viewei fiom those that aie iiielevant.
The second step in image encoding is vector quantization. This is essentially a clusteiing step in which we
paitition the featuie space into cells, each of which will be iepiesented by a single piototype featuie vectoi.
Since all featuie vectois belonging to a given cell aie mapped to the same piototype, the quantization piocess
is iiieveisible and, hence, cannot be used as pait of a lossless compiession algoiithm. Figuie 17.7 shows an
example foi a two-dimensional featuie space. Each dot coiiesponds to one featuie vectoi fiom the image. The
X`s signify the piototypes used to iepiesent all the featuie vectois contained within its quantization cell, the
boundaiy of which is indicated by the dashed lines. Despite the simplicity with which vectoi quantization may
be desciibed, the implementation of a vectoi quantizei is a computationally complex task unless some stiuctuie
is imposed on it. The clusteiing is based on minimizing the distoition between the oiiginal and quantized
featuie vectois, aveiaged ovei the entiie image. The distoition measuie can be chosen to account foi the ielative
sensitivity of the human viewei to diffeient kinds of degiadation. In one dimension, the vectoi quantizei ieduces
to the Lloyd-Max scalai quantizei.
FIGURE 17.6 Key elements of an image encodei.
2000 by CRC Press LLC
The fnal step in image encoding is toJng. Heie we conveit the stieam of piototype featuie vectois
to a binaiy stieam of 0`s and 1`s. Ideally, we would like to peifoim this conveision in a mannei that yields the
minimum aveiage numbei of binaiy digits pei piototype featuie vectoi.
In 1948, Claude Shannon pioved that it is possible to code a disciete memoiyless souice using on the aveiage
as few binaiy digits pei souice symbol as the sourte enroy defned as
denotes the piobability oi ielative fiequency of occuiience of the nth symbol in the souice alphabet,
and log
(x) ln(x)/ln(2) is the base 2 logaiithm of x. The units of H aie bits/souice symbol. The pioof of
Shannon`s theoiem is based on giouping the souice symbols into laige blocks and assigning binaiy code woids
of vaiying length to each block of souice symbols. Moie piobable blocks of souice symbols aie assigned shoitei
code woids, wheieas less piobable blocks aie assigned longei code woids. As the block length appioaches
infnity, the bit iate tends to H. Huffman deteimined the omum vaiiable-length coding scheme foi a disciete
memoiyless souice using blocks of any fnite length.
Table 17.1 piovides an example illustiating the concept of souice coding. The souice alphabet contains eight
symbols with the piobabilities indicated. Foi convenience, these symbols have been labeled in oidei of decieasing
piobability. In the context of image encoding, the souice alphabet would simply consist of the piototype featuie
vectois geneiated by the vectoi quantizei. The entiopy of this souice is 2.31 bits/souice symbol. If we weie to
use a fxed-length code foi this souice, we would need to use thiee binaiy digits foi each souice symbol as
shown in Table 17.1. On the othei hand, the code woids foi the Huffman code contain fiom 1 to 4 code letteis
(binaiy digits). In this case, the aveiage code woid length
is |

2.31 binaiy digits. Heie |

is the numbei of code letteis in the code woid foi the souice symbol a
. This
is the aveiage numbei of binaiy digits pei souice symbol that would be needed to encode the souice, and it is
equal to the entiopy. Thus, foi this paiticulai souice, the Huffman code achieves the lowei bound. It can be
shown that in geneial the iate foi the Huffman code will always be within 1 binaiy digit of the souice entiopy.
By giouping souice symbols into blocks of length L and assigning code woids to each block, this maximum
FIGURE 17.7 Vectoi quantization of a 2-D featuie space.
n n

| |
n n

2000 by CRC Press LLC

distance can be decieased to 1/L binaiy digits. Note the subtle distinction heie between |s, which aie units
of infoimation, a piopeity of the souice alone, and |nary Jgs, which aie units of code woid length, and
hence only a piopeity of the code used to iepiesent the souice. Also note that the Huffman code satisfes the
refx tonJon, i.e., no code woid is the piefx of anothei longei code woid. This means that a stieam of 0`s
and 1`s may be uniquely decoded into the coiiesponding sequence of souice symbols without the need foi
maikeis to delineate boundaiies between code woids.
The Huffman code is deteimined fiom the |nary ree shown in Fig. 17.8. This tiee is constiucted iecuisively
by combining the two least piobable symbols in the alphabet into one composite symbol whose piobability of
occuiience is the sum of the piobabilities of the two symbols that it iepiesents. The code woids foi these two
symbols aie the same as that of the composite symbol with a 0 oi a 1 appended at the end to distinguish
between them. This pioceduie is iepeated until the ieduced alphabet contains only a single code woid. Then
the code woid foi a paiticulai souice symbol is deteimined by tiaveising the tiee fiom its ioot to the leaf node
foi that souice symbol.
The objective of image reconstruction is to compute an unknown image fiom many complex measuiements
of the image. Usually, each measuiement depends on many pixels in the image which may be spatially distant
fiom one anothei.
TABLE 17.1 A Disciete Souice with an Eight-Symbol Alphabet and Two
Schemes foi Encoding It
Souice Symbol Piobability
Fixed-Length Code Huffman Code
1/2 000 0
1/8 001 100
1/8 010 101
1/16 011 1100
1/16 100 1101
1/16 101 1110
1/32 110 11110
1/32 111 11111
H 2.31 |

3 binaiy |

2.31 binaiy
bits/souice digits/souice digits/souice
symbol symbol symbol
FIGURE 17.8 Binaiy tiee used to geneiate the Huffman code foi the souice shown in Table 17.1.
2000 by CRC Press LLC
A typical ieconstiuction pioblem is tomography, in which each measuiement is obtained by integiating the
pixel values along a iay thiough the image. Figuie 17.9 illustiates the measuiement of these iay integials in the
projection piocess. Foi each angle 0 a set of iay integials is computed by vaiying the position at which the
iay passes thiough the image. The points along a iay aie given by all the solutions (x,y) to the equation
x cos 0 - y sin 0
We may theiefoie compute the ideal piojection integials by the following expiession known as the Radon
wheie o( - x cos 0 -y sin 0) is an impulse function that is nonzeio along the piojection iay.
In piactice, these piojection integials may be measuied using a vaiiety of physical techniques. In tiansmission
tomogiaphy, X
photons aie emitted into an object undei test. A detectoi then counts the numbei of photons,
X(0,), which pass thiough the object without being absoibed. Collimatois aie used to ensuie the detected
eneigy passes stiaight thiough the object along the desiied path. Since the attenuation of eneigy as it passes
thiough the object is exponentially ielated to the integial of the object`s density, the piojection integial may
be computed fiom the foimula
In emission tomogiaphy, one wishes to measuie the iate of photon emission at each pixel. In this case, vaiious
methods may be used to collect and count all the photons emitted along a iay passing thiough the object.
Once the piojections (0,) have been measuied, the objective is to compute the unknown cioss section
[ (x,y). The image and piojections may be ielated by fist computing the Fouiiei tiansfoim of the 2-D image
and the 1-D piojection foi each angle
FIGURE 17.9 Piojection data foi angle 0, iesulting in the one-dimensional function (0,).
[ x y x y JxJy ( , ) ( , ) ( cos sin ) o
[ [

( , ) log
( , )



F [ x y e JxJy
x y
, x y
x y
( , ) ( , )
( )
u u
u u

[ [
2000 by CRC Press LLC
These two tiansfoims aie then ielated by the Fouiiei slice theoiem.
F (u cos 0, u sin 0) P (0,u)
In woids, P(0,u) coiiesponds to the value of the 2-D Fouiiei tiansfoim F(u
, u
) along a 1-D line at an angle
of 0 passing thiough the oiigin.
The Fouiiei slice theoiem may be used to develop two methods foi inveiting the Radon tiansfoim and
theieby computing the image [. The fist method, known as flteied back piojection, computes the inveise
Fouiiei tiansfoim in polai cooidinates using the tiansfoimed piojection data.
Notice that the u teim accounts foi the integiation in polai cooidinates.
A second inveision method iesults fiom peifoiming all computations in the space domain iathei than fist
tiansfoiming to the fiequency domain u. This can be done by expiessing the innei integial of flteied back
piojection as a convolution in the space domain.
Heie |() is the inveise Fouiiei tiansfoim of u. This iesults in the inveision foimula known as convolution
back piojection
In piactice, | must be a low-pass appioximation to the tiue inveise Fouiiei tiansfoim of u. This is necessaiy
to suppiess noise in the piojection data. In piactice, the choice of | is the most impoitant element in the design
of the ieconstiuction algoiithm.
Edge Detectiun
The ability to fnd giay level edge stiuctuies in images is an impoitant image piocessing opeiation. We shall
defne an edge to be iegions in the image wheie theie is a laige change in giay level ovei a ielatively small
spatial iegion. The piocess of fnding edge locations in digital images is known as edge detection. Most edge
detection opeiatois, also known as edge opeiatois, use a window opeiatoi to fist enhance the edges in the
image, followed by thiesholding the enhanced image.
Theie has been a gieat deal of ieseaich peifoimed in the aiea of edge detection. Some of the ieseaich issues
include iobust thieshold selection, window size selection, noise iesponse, edge linking, and the detection of
edges in moving objects. While it is beyond the scope of this section to discuss these issues in detail, it is obvious
that such things as thieshold selection will gieatly affect the peifoimance of the edge detection algoiithm. If
the thieshold is set too high, then many edge points will be missed; if set too low, then many false`` edge points
will be obtained because of the inheient noise in the image. The investigation of the optimal`` choice of the
thieshold is an impoitant ieseaich aiea. Selection of the paiticulai window opeiation to enhance the edges of
an image, as an initial step in edge detection, has iecently been based on using models of the peifoimance of
the human visual system in detecting edges.
P e J
( , ) ( , ) u

[ x y P e J J
, x y
( , ) ( , )
( cos sin )

[ [
u u u

u u u
P e J | s J
, s
( , ) ( , ) ( )
[ [

[ x y | x y JJ ( , ) ( , ) ( cos sin ) +
[ [

2000 by CRC Press LLC
Ana!ysis and Cumputer Yisiun
The piocess of extiacting useful measuiements fiom an image oi sequence of images is known as mage ana|yss
oi tomuer son. Befoie analysis can be peifoimed one must fist deteimine peitinent featuies oi attiibutes
of the object in the scene and extiact infoimation about these featuies. The selection of which featuies in the
image to measuie must be chosen a ror, based on empiiical iesults. Most featuies used consist of shape
piopeities, shape change piopeities, shading, textuie, motion, depth, and coloi. Aftei the featuies aie extiacted,
one must then use the featuie measuiements to deteimine scene chaiacteiistics such as object identifcation.
In the past, simple pattein iecognition algoiithms, i.e., neaiest-neighboi classifcation, have been used to
compaie the featuie measuiements of an image to a set of featuie measuiements that coiiespond to a known
object. A decision is then made as to whethei oi not the featuies of the image match those of the known type.
Recently, theie has been woik in the application of arfta| ne||gente techniques to image analysis. These
appioaches aie veiy much diffeient fiom classical statistical pattein iecognition in that the featuie measuiements
aie used in a diffeient mannei as pait of a laigei system that attempts to model the scene and then deteimine
what is in it based on the model.
Dehning Terms
Digital image: An aiiay of numbeis iepiesenting the spatial distiibution of eneigy in a scene which is obtained
by a piocess of sampling and quantization.
Edge: A localized iegion of iapid change in giay level in the image.
Entropy: A measuie of the minimum amount of infoimation iequiied on the aveiage to stoie oi tiansmit
each quantized featuie vectoi.
Image compression or coding: The piocess of ieducing the numbei of binaiy digits oi bits iequiied to
iepiesent the image.
Image enhancement: An image piocessing opeiation that is intended to impiove the visual quality of the
image oi to emphasize ceitain featuies.
Image feature: An attiibute of a block of image pixels.
Image reconstruction: The piocess of obtaining an image fiom nonimage data that chaiacteiizes that image.
Lossless vs. lossy compression: If the ieconstiucted oi decoded image is identical to the oiiginal, the com-
piession scheme is lossless. Otheiwise, it is lossy.
Pixel: A single sample oi pictuie element in the digital image which is located at specifc spatial cooidinates.
Point operation: An image piocessing opeiation in which individual pixels aie mapped to new values iiie-
spective of the values of any neighboiing pixels.
Projection: A set of paiallel line integials acioss the image oiiented at a paiticulai angle.
Quantization: The piocess of conveiting fiom a continuous-amplitude image to an image that takes on only
a fnite numbei of diffeient amplitude values.
Sampling: The piocess of conveiting fiom a continuous-paiametei image to a disciete-paiametei image by
discietizing the spatial cooidinate.
Tomography: The piocess of ieconstiucting an image fiom piojection data.
Vector quantization: The piocess of ieplacing an exact vectoi of featuies by a piototype vectoi that is used
to iepiesent all featuie vectois contained within a clustei.
Window operation: An image piocessing opeiation in which the new value assigned to a given pixel depends
on all the pixels within a window centeied at that pixel location.
Re!ated Tupics
15.1 Coding, Tiansmission, and Stoiage 73.6 Data Compiession
H. C. Andiews and B.R. Hunt, Dga| Image Resoraon, Englewood Cliffs, N.J.: Pientice-Hall, 1977.
D. H. Ballaid and C. M. Biown, Comuer Vson, Englewood Cliffs, N.J.: Pientice-Hall, 1982.
2000 by CRC Press LLC
H. Baiiow and J. Tenenbaum, Computational vision,`` Prot. IEEE, vol. 69, pp. 572-595, May 1981.
A. Geisho and R. M. Giay, Vetor Quan:aon anJ Sgna| Comresson, Noiwell, Mass.: Kluwei Academic
Publisheis, 1991.
R. C. Gonzalez and P. Wintz, Dga| Image Protessng, Reading, Mass.: Addison-Wesley, 1991.
G.T. Heiman, Image Retonsruton [rom Pro,etons, New Yoik: Spiingei-Veilag, 1979.
T. S. Huang, Image Sequente na|yss, New Yoik: Spiingei-Veilag, 1981.
A. K. Jain, FunJamena|s o[ Dga| Image Protessng, Englewood Cliffs, N.J.: Pientice-Hall, 1989.
A. Kak and M. Slaney, Prnt|es o[ Comuer:eJ Tomogra|t Imagng, New Yoik: IEEE Piess, 1988.
A. Macovski, MeJta| Imagng Sysems, Englewood Cliffs, N.J.: Pientice-Hall, 1983.
M. D. McFailane, Digital pictuies ffty yeais ago,`` Prot. IEEE, pp. 768-770, July 1972.
W. K. Piatt, Dga| Image Protessng, New Yoik: Wiley, 1991.
A. Rosenfeld and A. Kak, Dga| Pture Protessng, vols. 1 and 2, San Diego: Academic Piess, 1982.
J. Seiia, Image na|yss anJ Ma|emata| Mor|o|ogy, vols. 1 and 2, San Diego: Academic Piess, 1982 and 1988.
Further Inlurmatiun
A numbei of textbooks aie available that covei the bioad aiea of image piocessing and seveial that focus on
moie specialized topics within this feld. The texts by Gonzalez and Wintz 1991], Jain 1989], Piatt 1991],
and Rosenfeld and Kak (Vol. 1) 1982] aie quite bioad in theii scope. Gonzalez and Wintz`s tieatment is wiitten
at a somewhat lowei level than that of the othei texts. Foi a moie detailed tieatment of computed tomogiaphy
and othei medical imaging modalities, the ieadei may consult the texts by Heiman 1979], Macovski 1983],
and Kak and Slaney 1988]. To exploie the feld of computei vision, the ieadei is advised to consult the text
by Ballaid and Biown 1982]. Cuiient ieseaich and applications of image piocessing aie iepoited in a numbei
of jouinals. Of paiticulai note aie the IEEE Transatons on Image Protessng, the IEEE Transatons on Paern
na|yss anJ Mat|ne Ine||gente, the IEEE Transatons on Ceostente anJ Remoe Sensng, the IEEE Transatons
on MeJta| Imagng, the Journa| o[ |e Ota| Sotey o[ merta, . Ota| Engneerng, the Journa| o[ E|etront
Imagng, and Comuer Vson, Cra|ts, anJ Image Protessng.
17.2 Yideu Signa! Prucessing
Soro| A. Fojo|o
VJeo sgna| rotessng is the aiea of specialization conceined with the piocessing of time sequences of image
data, i.e., video. Because of the signifcant advances in computing powei and incieases in available tiansmission
bandwidth, theie has been a piolifeiation of potential applications in the aiea of video signal piocessing.
Applications such as high-defnition television, digital video, multimedia, video phone, inteiactive video,
medical imaging, and infoimation piocessing aie the diiving foices in the feld today. As diveise as the
applications may seem, it is possible to specify a set of fundamental piinciples and methods that can be used
to develop the applications.
Consideiable undeistanding of a video signal piocessing system can be gained by iepiesenting the system
with the block diagiam given in Fig. 17.10. Light fiom a ieal-woild scene is captuied by a scanning system
and causes an image fiame [ (x,y,
) to be foimed on a focal plane. A video signal is a sequence of image fiames
that aie cieated when a scanning system captuies a new image fiame at peiiodic inteivals in time. In geneial,
each fiame of the video sequence is a function of two spatial vaiiables x and y and one tempoial vaiiable . An
integial pait of the scanning system is the piocess of conveiting the oiiginal analog signal into an appiopiiate
digital iepiesentation. The conveision piocess includes the opeiations of sampling and quantization. Sampling
FIGURE 17.10 Video signal piocessing system block diagiam.
2000 by CRC Press LLC
is the piocess of conveiting a continuous-time/space signal into a disciete-time/space signal. Quantization is
the piocess of conveiting a continuous-valued signal into a disciete-valued signal.
Once the video signal has been sampled and quantized, it can be piocessed digitally. Piocessing can be
peifoimed on special-puipose haidwaie oi geneial-puipose computeis. The type of piocessing peifoimed
depends on the paiticulai application. Foi example, if the objective is to geneiate high-defnition television,
the piocessing would typically include compiession and motion estimation. In fact, in most of the applications
listed above these aie the fundamental opeiations. Compression is the piocess of compactly iepiesenting the
infoimation contained in an image oi video signal. Motion estimation is the piocess of estimating the dis-
placement of the moving objects in a video sequence. The displacement infoimation can then be used to
inteipolate missing fiame data oi to impiove the peifoimance of compiession algoiithms.
Aftei the piocessing is complete, a video signal is ieady foi tiansmission ovei some channel oi stoiage on
some medium. If the signal is tiansmitted, the type of channel will vaiy depending on the application. Foi
example, today analog television signals aie tiansmitted one of thiee ways: via satellite, teiiestiially, oi by cable.
All thiee channels have limited tiansmission bandwidths and can adveisely affect the signals because of the
impeifect fiequency iesponses of the channels. Alteinatively, with a digital channel, the piimaiy limitation will
be the bandwidth.
The fnal stage of the block diagiam shown in Fig. 17.10 is the display. Of ciitical impoitance at this stage is
the human obseivei. Undeistanding how humans iespond to visual stimuli, i.e., the psychophysics of vision, will
not only allow foi bettei evaluation of the piocessed video signals but will also peimit the design of bettei systems.
If a continuous-time video signal satisfes ceitain conditions, it can be exactly iepiesented by and be iecon-
stiucted fiom its sample values. The conditions which must be satisfed aie specifed in the sam|ng |eorem.
The sampling theoiem can be stated as follows:
Sampling Theoiem:
Let [ (x,y,) be a bandlimited signal with F(u

) 0 foi u
> u
, u
> u
, and u

> u
. Then
[ (x,y,) is uniquely deteimined by its samples [ ( ,X
) [ ( ,,|,| ), wheie ,,|,| 0, 1, 2, ... if
> 2u
x M
, u
> 2u
y M
, and u
> 2u
, u
, and u
is the sampling peiiod along the x diiection, u
is the spatial sampling fiequency along the x
diiection, Y
is the sampling peiiod along the y diiection, u
is the spatial sampling fiequency along
the y diiection, T
is the sampling peiiod along the tempoial diiection, and u

is the tempoial
sampling fiequency.
Given these samples, [ (x,y,) can be ieconstiucted by geneiating a peiiodic impulse tiain in which suc-
cessive impulses have amplitudes that aie successive sample values. This impulse tiain is then piocessed
thiough an ideal low-pass fltei with appiopiiate gain and cut-off fiequencies. The iesulting output signal
will be exactly equal to [ (x,y,). (Sourte. Oppenheim et al., 1983, p. 519.)
If the sampling theoiem is not satisfed, aliasing will occui. Aliasing occuis when the signal is undeisampled
and theiefoie no longei iecoveiable by low-pass flteiing. Figuie 17.11(a) shows the fiequency spectium of a
sampled bandlimited signal with no aliasing. Figuie 17.11(b) shows the fiequency iesponse of the same signal
with aliasing. The aliasing occuis at the points wheie theie is oveilap in the diamond-shaped iegions. Foi video
signals aliasing in the tempoial diiection will give iise to ickei on the display. Foi television systems, the
standaid tempoial sampling iate is 30 fiames pei second in the United States and Japan and 25 fiames pei
second in Euiope. Howevei, these iates would be insuffcient without the use of inteilacing.
If the sampling iate (spatial and/oi tempoial) of a system is fxed, a standaid appioach foi minimizing the
effects of aliasing foi signals that do not satisfy the sampling theoiem is to use a piesampling fltei. Piesampling
2000 by CRC Press LLC
flteis aie low-pass flteis whose cut-off fiequencies aie chosen to be less than u
, u
, u
. Although the signal
will still not be able to be ieconstiucted exactly, the degiadations aie less annoying. Anothei pioblem in a ieal
system is the need foi an ideal low-pass fltei to ieconstiuct an analog signal. An ideal fltei is not physically
iealizable, so in piactice an appioximation must be made. Seveial veiy simple fltei stiuctuies aie common in
video systems: sample and hold, bilineai, and iaised cosine.
Quantization is the piocess of conveiting the continuous-valued amplitude of the video signal into a disciete-
valued iepiesentation, i.e., a fnite set of numbeis. The output of the quantizei is chaiacteiized by quantities
that aie limited to a fnite numbei of values. The piocess is a many-to-one mapping, and thus theie is a loss
of infoimation. The quantized signal can be modeled as
( ,,|,| ) [ ( ,,|, | ) - e ( ,,|, | )
wheie [
( ,,|, | ) is the quantized video signal and e( ,,|, | ) is the quantization noise. If too few bits pei sample
aie used, the quantization noise will pioduce visible false contouis in the image data.
The quantizei is a mapping opeiation which geneially takes the foim of a staiicase function (see Fig. 17.12).
A iule foi quantization can be defned as follows: Let {J
, | 1, 2,. . ., N - 1 be the set of decision levels with
the minimum amplitude value and J
the maximum amplitude value of [ ( ,,|, | ). If [ ( ,,|, | ) is contained in
the inteival (J
, J
), then it is mapped to the |th ieconstiuction level r. Methods foi designing quantizeis
can be bioken into two categoiies: unifoim and nonunifoim. The input-output function foi a typical unifoim
quantizei is shown in Fig. 17.12. The mean squaie value of the quantizing noise can be easily calculated if it is
assumed that the amplitude piobability distiibution is constant within each quantization step. The quantization
step size foi a unifoim quantizei is
and all eiiois between q/2 and -q/2 aie equally likely. The mean squaie quantization eiioi is given by:
FIGURE 17.11 (a) Fiequency spectium of a sampled signal with no aliasing. (b) Fiequency spectium of a sampled signal
with aliasing.

+1 1
e , | |
2 2
( , , )

2000 by CRC Press LLC
If one takes into account the exact amplitude piobability distiibution, an optimal quantizei can be designed.
Heie the objective is to choose a set of decision levels and ieconstiuction levels that will yield the minimum
quantization eiioi. If [ has a piobability density function
( [ ), the mean squaie quantization eiioi is
wheie N is the numbei of quantization levels. To minimize, the mean squaie quantization eiioi is diffeientiated
with iespect to J

and r

. This iesults in the Max quantizei:

Thus, the quantization levels need to be midway between the ieconstiuction levels, and the ieconstiuction
levels aie at the centioid of that poition of
( [ ) between J

and [
. Unfoitunately these iequiiements do not
lead to an easy solution. Max used an iteiative numeiical technique to obtain solutions foi vaiious quantization
levels assuming a zeio-mean Gaussian input signal. These iesults and the quantization levels foi othei standaid
amplitude distiibutions can be found in Jain 1989].
A moie common and less computationally intense appioach to nonunifoim quantization is to use a com-
pandoi (compiessoi-expandei). The input signal is passed thiough a nonlineai compiessoi befoie being
quantized unifoimly. The output of the quantizei must then be expanded to the oiiginal dynamic iange (see
Fig. 17.13). The compiession and expansion functions can be deteimined so that the compandoi appioximates
a Max quantizei.
FIGURE 17.12 Chaiacteiistics of a unifoim quantizei.
e , | | [ r [ J[

2 2
( , , ) ( ) ( )

r r

[ [ J[
[ J[


( )
( )
2000 by CRC Press LLC
Yectur Quantizatiun
Quantization does not have to be done on a single pixel at a time. In fact, bettei iesults can be achieved if the
video data aie quantized on a vectoi (block) basis. In vectoi quantization, the image data aie fist piocessed
into a set of vectois. A code book (set of code woids oi templates) that best matches the data to be quantized
is then geneiated. Each input vectoi is then quantized to the closest code woid. Compiession is achieved by
tiansmitting only the indices foi the code woids. At the ieceivei, the images aie ieconstiucted using a table
look-up pioceduie. Two aieas of ongoing ieseaich aie fnding bettei methods foi designing the code books
and developing bettei seaich and update techniques foi matching the input vectois to the code woids.
Yideu Cumpressiun
Digital iepiesentations of video signals typically iequiie a veiy laige numbei of bits. If the video signal is to be
tiansmitted and/oi stoied, compiession is often iequiied. Applications include conventional and high-defnition
television, video phone, video confeiencing, multi-media, iemote-sensed imaging, and magnetic iesonance
imaging. The objective of compiession (souice encoding) is to fnd a iepiesentation that maximizes pictuie
quality while minimizing the data pei pictuie element (pixel). A wealth of compiession algoiithms have been
developed duiing the past 30 yeais foi both image and video compiession. Howevei, the ultimate choice of an
appiopiiate algoiithm is application dependent. The following summaiy will piovide some guidance in that
selection piocess.
Compiession algoiithms can be divided into two majoi categoiies: infoimation-pieseiving, oi lossless, and
lossy techniques. Infoimation-pieseiving techniques intioduce no eiiois in the encoding/decoding piocess;
thus, the oiiginal signal can be ieconstiucted exactly. Unfoitunately, the achievable compiession iate, i.e., the
ieduction in bit iate, is quite small, typically on the oidei of 3:1. On the othei hand, lossy techniques intioduce
eiiois in the coding/decoding piocess; thus, the ieceived signal cannot be ieconstiucted exactly. The advantage
of the lossy techniques is the ability to achieve much highei compiession iatios. The limiting factoi on the
compiession iatio is the iequiied quality of the video signal in a specifc application.
One appioach to compiession is to ieduce the spatial and/oi tempoial sampling iate and the numbei of
quantization levels. Unfoitunately, if the sampling is too low and the quantization too coaise, aliasing, con-
touiing, and ickeiing will occui. These distoitions aie often much gieatei than the distoitions intioduced by
moie sophisticated techniques at the same compiession iate. Compiession systems can geneially be modeled
by the block diagiam shown in Fig. 17.14. The fist stage of the compiession system is the mappei. This is an
opeiation in which the input pixels aie mapped into a iepiesentation that can be moie effectively encoded.
This stage is geneially ieveisible. The second stage is the quantizei and peifoims the same type of opeiation
as desciibed eailiei. This stage is not ieveisible. The fnal stage attempts to iemove any iemaining statistical
iedundancy. This stage is ieveisible and is typically achieved with one of the infoimation-pieseiving codeis.
Inlurmatiun-Preserving Cuders
The data iate iequiied foi an oiiginal digital video signal may not iepiesent its aveiage infoimation iate. If the
oiiginal signal is iepiesented by M possible independent symbols with piobabilities

, 0, 1,..., M - 1, then
the infoimation iate as given by the fist-oidei entiopy of the signal H is
FIGURE 17.13 Nonunifoim quantization using a compandoi.

bits pei sample
2000 by CRC Press LLC
Accoiding to Shannon`s coding theoiem see Jain, 1989], it is possible to peifoim lossless coding of a souice
with entiopy H bits pei symbol using H - r bits pei symbol. r is a small positive quantity. The maximum
obtainable compiession iate C is then given by:
Hullman Cuding
One of the most effcient infoimation-pieseiving (entiopy) coding methods is Huffman coding. Constiuction
of a Huffman code involves aiianging the symbol piobabilities in decieasing oidei and consideiing them as
leaf nodes of a tiee. The tiee is constiucted by meiging the two nodes with the smallest piobability to foim a
new node. The piobability of the new node is the sum of the two meiged nodes. This piocess is continued
until only two nodes iemain. At this point, 1 and 0 aie aibitiaiily assigned to the two iemaining nodes. The
piocess now moves down the tiee, decomposing piobabilities and assigning 1`s and 0`s to each new paii. The
piocess continues until all symbols have been assigned a code woid (stiing of 1`s and 0`s). An example is given
in Fig. 17.15. Many othei types of infoimation-pieseiving compiession schemes exist (see, foi example, Gonza-
lez and Wintz 1987]), including aiithmetic coding, Lempel-Ziv algoiithm, shift coding, and iun-length coding.
Predictive Cuding
Tiaditionally one of the most populai methods foi ieducing the bit iate has been piedictive coding. In this
class, diffeiential pulse-code modulation (DPCM) has been used extensively. A block diagiam foi a basic DPCM
system is shown in Fig. 17.16. In such a system the diffeience between the cuiient pixel and a piedicted veision
of that pixel gets quantized, coded, and tiansmitted to the ieceivei. This diffeience is iefeiied to as the piediction
eiioi and is given by

- [

The piediction is based on pieviously tiansmitted and decoded spatial and/oi tempoial infoimation and can
be lineai oi nonlineai, fxed oi adaptive. The diffeience signal e

is then passed thiough a quantizei. The signal

FIGURE 17.14 Thiee-stage model of an encodei.
FIGURE 17.15 An example of constiucting a Huffman code.
aveiage bit iate of the oiiginal data
aveiage bit iate of the encoded data
2000 by CRC Press LLC
at the output of the quantizei is the quantized piediction eiioi e
, which is entiopy encoded tiansmission. The
fist step at the ieceivei is to decode the quantized piediction eiioi. Aftei decoding, J
is added to the piedicted
value of the cuiient pixel [

to yield the ieconstiucted pixel value. Note that as long as a quantizei is included
in the system, the output signal will not exactly equal the input signal.
The piedictois can include pixels fiom the piesent fiame as well as those fiom pievious fiames (see
Fig. 17.17). If the motion and the spatial detail aie not too high, fiame (oi feld) piediction woiks well. If the
motion is high and/oi the spatial detail is high, intiafeld piediction geneially woiks bettei. A piimaiy ieason
is that theie is less coiielation between fiames and felds when the motion is high.
Foi moie infoimation on piedictive coding, see Musmann et al. 1985] oi Jain 1989].
Mutiun-Cumpensated Predictive Cuding
Signifcant impiovements in image quality, at a fxed compiession iate, can be obtained when adaptive piedic-
tion algoiithms take into account the fiame-to-fiame displacement of moving objects in the sequence. Altei-
natively, one could inciease the compiession iate foi a fxed level of image quality. The amount of inciease in
peifoimance will depend on one`s ability to estimate the motion in the scene. Techniques foi estimating the
motion aie desciibed in a latei subsection.
Motion-compensated piediction algoiithms can be divided into two categoiies. One categoiy estimates the
motion on a block-by-block basis and the othei estimates the motion one pixel at a time. Foi the block-based
methods an estimate of the displacement is obtained foi each block in the image. The block matching is achieved
by fnding the maximum coiielation between a block in the cuiient fiame and a somewhat laigei seaich aiea
in the pievious fiame. A numbei of ieseaicheis have pioposed ways to ieduce the computational complexity,
FIGURE 17.16 Block diagiam of a basic DPCM system.
FIGURE 17.17 Tiansfoim coding system.
2000 by CRC Press LLC
including using a simple matching ciiteiion and using logaiithmic seaiches foi fnding the peak value of the
The second categoiy obtains a displacement estimate at each pixel in a fiame. These techniques aie iefeiied
to as pel iecuisive methods. They tend to piovide moie accuiate estimates of the displacement but at the
expense of highei complexity. Both categoiies of techniques have been applied to video data; howevei, block
matching is used moie often in ieal systems. The piimaiy ieason is that moie effcient implementations have
been feasible. It should be noted, howevei, that eveiy pixel in a block will be assigned the same displacement
estimate. Thus, the laigei the block size the gieatei the potential foi eiiois in the displacement estimate foi a
given pixel. Moie details can be found in Musmann et al. 1985].
Translurm Cuding
In tiansfoim coding, the video signal [ (x,y,) is subjected to an inveitible tiansfoim, then quantized and encoded
(see Fig. 17.17). The puipose of the tiansfoimation is to conveit statistically dependent pictuie elements into
a set of statistically independent coeffcients. In piactice, one of the sepaiable fast tiansfoims in the class of
unitaiy tiansfoims is used, e.g., cosine, Fouiiei, oi Hadamaid. In geneial, the tiansfoim coding algoiithms can
be implemented in 2-D oi 3-D. Howevei, because of the ieal-time constiaints of many video signal piocessing
applications, it is typically moie effcient to combine a 2-D tiansfoim with a piedictive algoiithm in the tempoial
diiection, e.g., motion compensation.
Foi 2-D tiansfoim coding the image data aie fist subdivided into
blocks. Typical block sizes aie 8 8 oi 16 16. The tiansfoim
independently maps each image block into a block of tiansfoim coef-
fcients; thus, the piocessing of each block can be done in paiallel. At
this stage the data have been mapped into a new iepiesentation, but
no compiession has occuiied. In fact, with the Fouiiei tiansfoim theie
is an expansion in the amount of data. This occuis because the tians-
foim geneiates coeffcients that aie complex-valued. To achieve com-
piession the tiansfoim coeffcients must be quantized and then coded
to iemove any iemaining iedundancy.
Two impoitant issues in tiansfoim coding aie the choice of tians-
foimation and the allocation of bits in the quantizei. The most com-
monly used tiansfoim is the disciete cosine tiansfoim (DCT). In fact,
many of the pioposed image and video standaids utilize the DCT. The
ieasons foi choosing a DCT include: its peifoimance is supeiioi to
the othei fast tiansfoims and is veiy close to the optimal Kaihunen-
Loeve tiansfoim, it pioduces ieal-valued tiansfoim coeffcients, and
it has good symmetiy piopeities, thus ieducing the blocking aitifacts
inheient in block-based algoiithms. One way to ieduce these aitifacts
is by using a tiansfoim whose basis functions aie even, i.e., the DCT,
and anothei is to use oveilapping blocks. Foi bit allocation, one can
deteimine the vaiiance of the tiansfoim coeffcients and then assign
the bits so the distoition is minimized. An example of a typical bit
allocation map is shown in Fig. 17.18.
Subband Cuding
Recently, subband coding has pioved to be an effective technique foi image compiession. Heie, the oiiginal
video signal is flteied into a set of bandpass signals (subbands), each sampled at successively lowei iates. This
piocess is known as the subband analysis stage. Each of the bandpass images is then quantized and encoded
foi tiansmission/stoiage. At the ieceivei, the signals must be decoded and then an image ieconstiucted fiom
the subbands. The piocess at the ieceivei is iefeiied to as the subband synthesis stage. A one-level subband
FIGURE 17.18 A typical bit allocation
foi 16 16 block coding of an image using
the DCT.
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0
1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0
1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0
1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0
2 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0
2 2 2 2 1 1 1 1 1 1 1 0 0 0 0 0
2 2 2 2 2 2 1 1 1 1 1 0 0 0 0 0
3 3 2 2 2 2 2 1 1 1 1 1 0 0 0 0
3 3 3 3 2 2 2 1 1 1 1 1 0 0 0 0
5 4 3 3 3 2 2 2 1 1 1 1 1 0 0 0
6 5 4 3 3 2 2 2 1 1 1 1 1 0 0 0
7 6 5 4 3 3 2 2 1 1 1 1 1 0 0 0
8 7 6 5 3 3 4 4 4 1 1 1 1 1 0 0
2000 by CRC Press LLC
analysis iesults in 4 subbands and a 2-level analysis in 16 equal subbands oi 7 unequal subbands. A block
diagiam foi a sepaiable two-dimensional subband analysis system is shown in Fig. 17.19.
High-defnition television (HDTV) has ieceived much attention in the past few yeais. With the iecent push foi
all digital implementations of HDTV, the need foi video signal piocessing techniques has become moie obvious.
In oidei foi the digital HDTV signal to ft in the tiansmission bandwidth, theie is a need foi a compiession
iatio of appioximately 10:1, with little oi no degiadation intioduced. The goal of HDTV is to pioduce high-
quality video signals by enhancing the detail, impioving the aspect iatio and the viewing distance. The detail
is enhanced by incieasing the video bandwidth. The pioposed aspect iatio of 16/9 will allow foi a wide-scieen
foimat which is moie consistent with the foimats used in the motion-pictuie industiy. The eye`s ability to
iesolve fne detail is limited. To achieve full iesolution of the detail, the HDTV image should be viewed at a
distance of appioximately thiee times the pictuie height. To accommodate typical home viewing enviionments,
laigei displays aie needed.
Mutiun Estimatiun Techniques
Fiame-to-fiame changes in luminance aie geneiated when objects move in video sequences. The luminance
changes can be used to estimate the displacement of the moving objects if an appiopiiate model of the motion
is specifed. A vaiiety of motion models have been developed foi dynamic scene analysis in machine vision and
foi video communications applications. In fact, motion estimates weie fist used as a contiol mechanism foi
the effcient coding of a sequence of images in an effoit to ieduce the tempoial iedundancy. Motion estimation
algoiithms can be classifed in two bioad categoiies: giadient oi diffeiential-based methods and token matching
oi coiiespondence methods. The giadient methods can be fuithei divided into pel iecuisive, block matching,
and optical ow methods.
Pe! Recursive Methuds
Netiavali and Robbins 1979] developed the fist pel iecuisive method foi television signal compiession. The
algoiithm begins with an initial estimate of the displacement, then iteiates iecuisively to update the estimate.
The iteiations can be peifoimed at a single pixel oi at successive pixels along a scan line. The tiue displacement
D at each pixel is estimated by
wheie D

is the displacement estimate at the th iteiation and U

is the update teim. U

is an estimate of D -
. They then used the displaced fiame diffeience (DFD):
FIGURE 17.19 A two-dimensional subband analysis system foi geneiating foui equal subbands.
` `

2000 by CRC Press LLC
to obtain a ielationship foi the update teim U

. In the pievious equation, T

is the tempoial sample spacing.
If the displacement estimate is updated fiom sample to sample using a steepest-descent algoiithm to minimize
the weighted sum of the squaied displaced fiame diffeiences ovei a neighboihood, then D

wheie V
> 0 and
A giaphical iepiesentation of pel iecuisive motion estimation
is shown in Fig. 17.20.
A vaiiety of methods to calculate the update teim have been
iepoited. The advantage of one method ovei anothei is mainly
in the impiovement in compiession. It should be noted that pel
iecuisive algoiithms assume that the displacement to be esti-
mated is small. If the displacement is laige, the estimates will be
pooi. Noise can also affect the accuiacy of the estimate.
B!uck Matching
Block matching methods estimate the displacement within an
M N block in an image fiame. The estimate is deteimined by
fnding the best match between the M N block in a fiame at
time and its best match fiom fiame at - T
. An undeilying
assumption in the block matching techniques is that each pixel
within a block has the same displacement. A geneial block
matching algoiithm is given as follows:
1. Segment the image fiame at time into a fxed numbei
of blocks of size M N.
2. Specify the size of the seaich aiea in the fiame at time - 1. This depends on the maximum expected
displacement. If D
is the maximum displacement in eithei the hoiizontal oi veitical diiection, then
the size of the seaich aiea, SA, is
SA (M - 2D
) (N - 2D
Figuie 17.21 illustiates the seaich aiea in the fiame at time - 1 foi an M N block at time .
3. Using an appiopiiately defned matching ciiteiion, e.g., mean-squaied eiioi oi sum of absolute diffei-
ence, fnd the best match foi the M N block.
4. Pioceed to the next block in fiame and iepeat step 3 until displacement estimates have been deteimined
foi all blocks in the image.
Optica! F!uv Methuds
The optical ow is defned as the appaient motion of the biightness patteins fiom one fiame to the next. The
optical ow is an estimate of the velocity feld and hence iequiies two equations to solve foi it. Typically a
DFD x y I x y I x T

( , ,
) ( , , ) (
, )

1 1
` ` `
( ,

, | ,



1 1 2


FIGURE 17.20 A giaphical illustiation of pel
iecuisive motion estimation. The distance
between the x and o pixels in the fiame at - 1
is D
2000 by CRC Press LLC
constiaint is imposed on the motion model to piovide the necessaiy equations. Optical ow can give useful
infoimation about the spatial aiiangement of the objects in a scene, as well as the iate of change of those
objects. Hoin 1986] also defnes a motion feld, which is a two-dimensional velocity feld iesulting fiom the
piojection of the thiee-dimensional velocity feld of an object in the scene onto the image plane. The motion
feld and the optical ow aie not the same.
In geneial, the optical ow has been found diffcult to compute because of the algoiithm sensitivity to noise.
Also, the estimates may not be accuiate at scene discontinuities. Howevei, because of its impoitance in assigning
a velocity vectoi at each pixel, theie continues to be ieseaich in the feld.
The optical ow equation is based on the assumption that the biightness of a pixel at location (x,y) is constant
ovei time; thus,
wheie Jx/J and Jy/J aie the components of the optical ow. Seveial diffeient constiaints have been used with
the optical ow equation to solve foi Jx/J and Jy/J. A common constiaint to impose is that the velocity feld
is smooth.
Tuken Matching Methuds
Token matching methods aie often iefeiied to as disciete methods since the goal is to estimate the motion only
at distinct image featuies (tokens). The iesult is a spaise velocity feld. The algoiithms attempt to match the
set of disciete featuies in the fiame at time - 1 with a set that best iesembles them in the fiame at time .
Most of the algoiithms in this gioup assume that the estimation will be achieved in a two-step piocess. In the
fist step, the featuies aie identifed. The featuies could be points, coineis, centeis of mass, lines, oi edges. This
step typically iequiies segmentation and/oi featuie extiaction. The second step deteimines the vaiious velocity
paiameteis. The velocity paiameteis include a tianslation component, a iotation component, and the iotation
axis. The token matching algoiithms fail if theie aie no distinct featuies to use.
All of the methods desciibed in this subsection assume that the intensity at a given pixel location is ieasonably
constant ovei time. In addition, the giadient methods assume that the size of the displacements is small. Block
matching algoiithms have been used extensively in ieal systems, because the computational complexity is not
too gieat. The one disadvantage is that theie is only one displacement estimate pei block. To date, optical ow
algoiithms have found limited use because of theii sensitivity to noise. Token matching methods woik well foi
applications in which the featuies aie well defned and easily extiacted. They aie piobably not suitable foi most
video communications applications.
FIGURE 17.21 An illustiation of block matching.
x y
+ + 0
2000 by CRC Press LLC
Image Qua!ity and Yisua! Perceptiun
An impoitant factoi in designing video signal piocessing algoiithms is that the fnal ieceivei of the video
infoimation is typically a human obseivei. This has an impact on how the quality of the fnal signal is assessed
and how the piocessing should be peifoimed. If oui objective is video tiansmission ovei a limited bandwidth
channel, we do not want to waste unnecessaiy bits on infoimation that cannot be seen by the human obseivei.
In addition, it is undesiiable to intioduce aitifacts that aie paiticulaily annoying to the human viewei. Unfoi-
tunately, theie aie no peifect quantitative measuies of visual peiception. The human visual system is quite
complicated. In spite of the advances that have been made, no complete model of human peiception exists.
Theiefoie, we often have to iely on subjective testing to evaluate pictuie quality. Although no compiehensive
model of human vision exists, ceitain functions can be chaiacteiized and then used in designing impioved
solutions. Foi moie infoimation, see Netiavali and Haskell 1988].
Subjective Qua!ity Ratings
Theie aie two piimaiy categoiies of subjective testing: taegory-,uJgmen (rang-sta|e) methods and tomarson
methods. Categoiy-judgment methods ask the subjects to view a sequence of pictuies and assign each pictuie
(video sequence) to one of seveial categoiies. Categoiies may be based on oveiall quality oi on visibility of
impaiiment (see Table 17.2).
Compaiison methods iequiie the subjects to compaie a distoited test pictuie with a iefeience pictuie.
Distoition is added to the test pictuie until both pictuies appeai of the same quality to the subject. Viewing
conditions can have a gieat impact on the iesults of such tests. Caie must be taken in the expeiimental design
to avoid biases in the iesults.
Yisua! Perceptiun
In this subsection, a ieview of the majoi aspects of human psychophysics that have an impact in video signal
piocessing is given. The phenomena of inteiest include light adaptation, visual thiesholding and contiast
sensitivity, masking, and tempoial phenomena.
Light Adaptatiun
The human visual system (HVS) has two majoi classes of photoieceptois, the iods and the cones. Because these
two types of ieceptois adapt to light diffeiently, two diffeient adaptation time constants exist. Fuitheimoie,
these ieceptois iespond at diffeient iates going fiom daik to light than fiom light to daik. It should also be
noted that although the HVS has an ability to adapt to an enoimous iange of light intensity levels, on the oidei
of 10
in millilambeits, it does so adaptively. The simultaneous iange is on the oidei of 10
Yisua! Threshu!ding and Cuntrast Sensitivity
Deteimining how sensitive an obseivei is to small changes in luminance is impoitant in the design of video
systems. One`s sensitivity will deteimine how visible noise will be and how accuiately the luminance must be
iepiesented. The contiast sensitivity is deteimined by measuiing the just-noticeable diffeience (JND) as a
TABLE 17.2 Quality and Impaiiment Ratings
5 Excellent 5 Impeiceptible 3 Much bettei
4 Good 4 Peiceptible but not annoying 2 Bettei
3 Faii 3 Slightly annoying 1 Slightly bettei
2 Pooi 2 Annoying 0 Same
1 Bad 1 Veiy annoying -1 Slightly woise
-2 Woise
-3 Much woise
2000 by CRC Press LLC
function of the biightness. The JND is the amount of additional biightness needed to distinguish a patch fiom
the backgiound. It is a visibility thieshold. What is signifcant is that the JND is dependent on the backgiound
and suiiounding luminances, the size of the backgiound and suiiounding aieas, and the size of the patch, with
the piimaiy dependence being on the luminance of the backgiound.
The iesponse to visual stimuli is gieatly affected by what othei visual stimuli aie in the immediate neighboihood
(spatially and tempoially). An example is the ieduced sensitivity of the HVS to noise in aieas of high spatial
activity. Anothei example is the masking of details in a new scene by what was piesent in the pievious scene.
In both cases, the masking phenomenon can be used to impiove the quality of image compiession systems.
Tempura! Ellects
One ielevant tempoial phenomenon is the ickei fusion fiequency. This is a tempoial thieshold which detei-
mines the point at which the HVS fuses the motion in a sequence of fiames. Unfoitunately this fiequency
vaiies as a function of the aveiage luminance. The HVS is moie sensitive to ickei at high luminances than at
low luminances. The spatial-tempoial fiequency iesponse of the HVS is impoitant in deteimining the sensitivity
to small-amplitude stimuli. In both the tempoial and spatial diiections, the HVS iesponds as a bandpass fltei
(see Fig. 17.22). Also signifcant is the fact that the spatial and tempoial piopeities aie not independent of one
anothei, especially at low fiequencies.
Foi moie details on image quality and visual peiception see Schieibei 1991] and Netiavali and Haskell
Dehning Terms
Aliasing: Distoition intioduced in a digital signal when it is undeisampled.
Compression: Piocess of compactly iepiesenting the infoimation contained in a signal.
Motion estimation: Piocess of estimating the displacement of moving objects in a scene.
Quantization: Piocess of conveiting a continuous-valued signal into a disciete-valued signal.
Sampling: Piocess of conveiting a continuous-time/space signal into a disciete-time/space signal.
Scanning system: System used to captuie a new image at peiiodic inteivals in time and to conveit the image
into a digital iepiesentation.
FIGURE 17.22 A peispective view of the spatio-tempoial thieshold suiface.
2000 by CRC Press LLC
Re!ated Tupics
8.5 Sampled Data 15.1 Coding, Tiansmission, and Stoiage
R. C. Gonzalez and P. Wintz, Dga| Image Protessng, Reading, Mass.: Addison-Wesley, 1987.
R. A. Haddad and T. W. Paisons, Dga| Sgna| Protessng. T|eory, |taons, anJ HarJware, New Yoik:
Computei Science Piess, 1991.
B. P. Hoin, Ro|o Vson, Cambiidge, Mass.: The MIT Piess, 1986.
A. K. Jain, FunJamena|s o[ Dga| Image Protessng, Englewood Cliffs, N.J.: Pientice-Hall, 1989.
N. Jayant, Signal compiession: Technology taigets and ieseaich diiections," IEEE Journa| on Se|eteJ reas n
Communtaons, vol. 10, no. 5, pp. 796-818, 1992.
H. G. Musmann, P. Piisch, and H.-J. Gialleit, Advances in pictuie coding," Prot. IEEE, vol. 73, no. 4,
pp. 523-548, 1985.
A. N. Netiavali and B. G. Haskell, Dga| Ptures. Reresenaon anJ Comresson, New Yoik: Plenum Piess,
A. N. Netiavali and J. D. Robbins, Motion-compensated television coding: Pait I," Be|| Sys. Tet|. J., vol. 58,
no. 3, pp. 631-670, 1979.
A. V. Oppenheim, A. S. Willsky, and I. T. Young, Sgna|s anJ Sysems, Englewood Cliffs, N.J.: Pientice-Hall, 1983.
W. F. Schieibei, FunJamena|s o[ E|etront Imagng Sysems, Beilin: Spiingei-Veilag, 1991.
Further Inlurmatiun
Othei iecommended souices of infoimation include IEEE Transatons on Crtus anJ Sysems [or VJeo
Tet|no|ogy, IEEE Transatons on Image Protessng, and the ProteeJngs o[ |e IEEE, Apiil 1985, vol. 73, and
Mu|Jmensona| Sysems anJ Sgna| Protessng Journa|, 1992, vol. 3.
17.3 Sensur Array Prucessing
N. K. oe ond I. H. Sbu|
Multidimensional signal piocessing tools apply to apeituie and sensoi aiiay piocessing. Planai sensoi aiiays
can be consideied to be sampled apeituies. Thiee-dimensional oi volumetiic aiiays can be viewed as multidi-
mensional spatial flteis. Theiefoie, the topics of sensoi aiiay piocessing, apeituie piocessing, and multidimen-
sional signal piocessing can be studied undei a unifed foimat. The basic function of the ieceiving aiiay is
tiansduction of piopagating waves in the medium into electiical signals. Piopagating waves aie fundamental
in iadai, communication, optics, sonai, and geophysics. In electiomagnetic applications, basic tiansduceis aie
antennas and aiiays of antennas. A laige body of liteiatuie that exists on antennas and antenna aiiays can be
exploited in the aieas of apeituie and sensoi aiiay piocessing. Much of the antenna liteiatuie deals with
tiansmitting antennas and theii iadiation patteins. Because of the iecipiocity of tiansmitting and ieceiving
tiansduceis, key iesults that have been developed foi tiansmitteis can be used foi analysis of ieceivei apeituie
and/oi aiiay piocessing. Tiansmitting tiansduceis iadiate eneigy in desiied diiections, wheieas ieceiving apei-
tuies/aiiays act as spatial flteis that emphasize signals fiom a desiied look diiection while disciiminating against
inteifeiences fiom othei diiections. The spatial fltei wavenumber iesponse is called the ieceivei beam pattein.
Tiansmitting apeituies aie chaiacteiized by theii iadiation patteins.
Conventional beamfoiming deals with the design of fxed beam patteins foi given specifcations. Optimum
beamfoiming is the design of beam patteins to meet a specifed optimization ciiteiion. It can be compaied to
optimum flteiing, detection, and estimation. Adaptive beamformers sense theii opeiating enviionment (foi
example, noise covaiiance matiix) and adjust beamfoimei paiameteis so that theii peifoimance is optimized
Monzingo and Millei, 1980]. Adaptive beamfoimeis can be compaied with adaptive flteis.
2000 by CRC Press LLC
Multidimensional signal piocessing techniques have found wide application in seismology-wheie a gioup
of identical seismometeis, called seismic aiiays, aie used foi event location, studies of the eaith`s sedimentation
stiuctuie, and sepaiation of coheient signals fiom noise, which sometimes may also piopagate coheiently acioss
the aiiay but with diffeient hoiizontal velocities-by employing velocity nltering Claeibout, 1976]. Velocity
flteiing is peifoimed by multidimensional flteis and allows also foi the enhancement of signals which may
occupy the same wavenumbei iange as noise oi undesiied signals do. In a bioadei context, beamfoiming can
be used to sepaiate signals ieceived by sensoi aiiays based on fiequency, wavenumbei, and velocity (speed as
well as diiection) of piopagation. Both the tiansfei and unit impulse-iesponse functions of a velocity fltei aie
two-dimensional functions in the case of one-dimensional aiiays. The tiansfei function involves fiequency and
wavenumbei (due to spatial sampling by equally spaced sensois) as independent vaiiables, wheieas the unit
impulse iesponse depends upon time and location within the aiiay. Two-dimensional flteiing is not limited
to velocity flteiing by means of seismic aiiay. Two-dimensional spatial flteis aie fiequently used, foi example,
in the inteipietation of giavity and magnetic maps to diffeientiate between iegional and local featuies. Input
data foi these flteis may be obseivations in the suivey of an aiea conducted ovei a planai giid ovei the eaith`s
suiface. Two-dimensional wavenumbei digital flteiing piinciples aie useful foi this puipose. Velocity flteiing
by means of two-dimensional aiiays may be accomplished by piopeily shaping a thiee-dimensional iesponse
function H(|
,u). Velocity flteiing by thiee-dimensional aiiays may be accomplished thiough a foui-
dimensional function H(|
, |
, u) as explained in the following subsection.
Spatia! Arrays, Beamlurmers, and FIR Fi!ters
A piopagating plane wave, s(x,), is, in geneial, a function of the thiee-dimensional space vaiiables and the
time vaiiable (x
, x
, x
) A x and the time vaiiable . The 4-D Fouiiei tiansfoim of the stationaiy signal s(x,) is
which is iefeiied to as the wavenumbei-fiequency spectium of s(x,), and (|
, |
, |
) A k denotes the wavenum-
bei vaiiables in iadians pei unit distance and u is the fiequency vaiiable in iadians pei second. If t denotes
the velocity of piopagation of the plane wave, the following constiaint must be satisfed
If the 4-D Fouiiei tiansfoim of the unit impulse iesponse |(x,) of a 4-D lineai shift-invaiiant (LSI) fltei is
denoted by H(|,u), then the iesponse y(x,) of the fltei to s(x,) is the 4-D lineai convolution of |(x,) and
s(x,), which is, uniquely, chaiacteiized by its 4-D Fouiiei tiansfoim
Y(k,u) H(k,u)S(k,u) (17.4)
The inveise 4-D Fouiiei tiansfoim, which foims a 4-D Fouiiei tiansfoim paii with Eq. (17.3), is
It is noted that S(k,u) in Eq. (17.3) is pioduct sepaiable, i.e., expiessible in the foim
S(k,u) S
) S
) S
) S
(u) (17.6)
S s e Jx Jx Jx J
, | x

( , ) ( , )
( )


[ [ [ [
1 2 3
| | |
+ +
s S e J| J| J| J
, | x

( , )
( )
( , )
( )


[ [ [ [
1 2 3 1
u u
2000 by CRC Press LLC
wheie each function on the iight-hand side is a univaiiate function of the iespective independent vaiiable, if
and only if s(x,) in Eq. (17.3) is also pioduct sepaiable. In beamfoiming, S


) in Eq. (17.6) would be the fai-

feld beam pattein of a lineai aiiay along the x

-axis. Foi example, the noimalized beam pattein of a unifoimly

weighted (shaded) lineai aiiay of length L is
wheie X (2r/|) is the wavelength of the piopagating plane wave and 0 is the angle of aiiival at aiiay site as
shown in Fig. 17.23. Note that 0 is explicitly admitted as a vaiiable in S(|,0) to allow foi the possibility that
foi a fxed wavenumbei, the beam pattein could be plotted as a function of the angle of aiiival. In that case,
when 0 is zeio, the wave impinges the aiiay bioadside and the noimalized beam pattein evaluates to unity.
The counteipait, in apeituie and sensoi aiiay piocessing, of the use of window functions in spectial analysis
foi ieduction of sidelobes is the use of apeituie shading. In apeituie shading, one simply multiplies a unifoimly
weighted apeituie by the shading function. The iesulting beam pattein is, then, simply the convolution of the
beam pattein of the unifoimly shaded volumetiic aiiay and the beam pattein of the shading function. Fouiiei
tiansfoim ielationship between the stationaiy signal s(x,) and the wavenumbei fiequency spectium S(k,u)
allows one to exploit high-iesolution spectial analysis techniques foi the high-iesolution estimation of the
diiection of aiiival Pillai, 1989]. The supeisciipt , , and H denote, iespectively, complex conjugate, tianspose,
and conjugate tianspose.
Discrete Arrays lur Beamlurming
An aiiay of sensois could be distiibuted at distinct points in space in vaiious ways. Line aiiays, planai aiiays,
and volumetiic aiiays could be eithei unifoimly spaced oi nonunifoimly spaced, including the possibility of
placing sensois iandomly accoiding to some piobability distiibution function. Unifoim spacing along each
cooidinate axis peimits one to exploit the well-developed multidimensional signal piocessing techniques con-
ceined with fltei design, DFT computation via FFT, and high-iesolution spectial analysis of sampled signals
Dudgeon, 1977]. Nonunifoim spacing sometimes might be useful foi ieducing the numbei of sensois, which
otheiwise might be constiained to satisfy a maximum spacing between unifoimly placed sensois to avoid
grating lobes due to aliasing, as explained latei. A disciete aiiay, unifoimly spaced, is convenient foi the synthesis
of a digital fltei oi beamfoimei by the peifoiming of digital signal piocessing opeiations (namely delay, sum,
and multiplication oi weighting) on the signal ieceived by a collection of sensois distiibuted in space. The
sequence of the natuie of opeiations dictates the types of beamfoimei. Common beamfoiming systems aie of
FIGURE 17.23 Unifoimly weighted lineai aiiay.
S |
( , )



2000 by CRC Press LLC
the stiaight summation, delay-and-sum, and weighted delay-and-sum types. The geometiical distiibution of
sensois and the weights w

associated with each sensoi aie ciucial factois in the shaping of the fltei chaiactei-
istics. In the case of a lineai aiiay of N equispaced sensois, which aie spaced D units apait, staiting at the oiigin
0, the function
becomes the array pattern, which may be viewed as the fiequency iesponse function foi a fnite impulse
iesponse (FIR) fltei, chaiacteiized by the unit impulse iesponse sequence {w
. In the case when w
Eq. (17.8) simplifes to
If the N sensois aie symmetiically placed on both sides of the oiigin, including one at the oiigin, and the sensoi
weights aie w
1, then the lineai aiiay pattein becomes
Foi planai aiiays, diiect geneializations of the pieceding lineai aiiay iesults can be obtained. To wit, if the
sensois with unity weights aie located at cooidinates (|D, |D), wheie | 0, 1, 2, . . ., ( N-1)/2], and |
0, 1, 2, . . ., (M-1)/2], foi odd integei values of N and M, then the aiiay pattein function becomes
Routine geneializations to 3-D spatial aiiays aie also possible. The aiiay pattein functions foi othei geometiical
distiibutions may also be ioutinely geneiated. Foi example, if unit weight sensois aie located at the six veitices
and the centei of a iegulai hexagon, each of whose sides is D units long, then the aiiay pattein function can
be shown to be
V |
w e
, | nD
( )

V |
| ND
| D
N | D
( )
( )



V |
| ND
| D
( )



V | |
, | |D | |D
| ND
| D
| MD
| D
| |
( , ) exp{ (
1 2 1 2







2000 by CRC Press LLC
The aiiay pattein function ieveals how selective a paiticulai beamfoiming system is. In the case of a typical
aiiay function shown in Eq. (17.9), the beamwidth, which is the width of the main lobe of the aiiay pattein,
is inveisely piopoitional to the aiiay apeituie. Because of the peiiodicity of the aiiay pattein function, the
main lobe is iepeated at inteivals of 2r/D. These iepetitive lobes aie called giating lobes, whose existence may
be inteipieted in teims of spatial fiequency aliasing iesulting fiom a sampling inteival D due to the N ieceiving
sensois located at disciete points in space. If the spacing D between sensois satisfes
wheie X is the smallest wavelength component in the signal ieceived by the aiiay of sensois, then the giating
lobes have no effect on the ieceived signal. A plane wave of unit amplitude which is incident upon the aiiay
at beaiing angle 0 degiees, as shown in Fig. 17.23, pioduces outputs at the sensois given by the vectoi
s(0)A s
exp( , 0) exp(,|
D sin 0) . . . exp(,|
(N - 1)D sin 0)]

wheie |
2r/X is the wavenumbei. In aiiay piocessing, the aiiay output y
may be viewed as the innei pioduct
of an aiiay weight vectoi w and the steeiing vectoi s
. Thus, the beamfoimei iesponse along a diiection
chaiacteiized by the angle 0 is, tieating w as complex,
The beamfoiming system is said to be iobust if it peifoims satisfactoiily despite ceitain peituibations Ahmed
and Evans, 1982]. It is possible foi each component s
of s
to belong to an inteival s
| 0
, s
| 0
], and
a iobust beamfoimei will iequiie the existence of at least one weight vectoi w which will guaiantee the output
to belong to an output envelope foi each s
in the input envelope. The iobust beamfoiming pioblem can be
tianslated into an optimization pioblem, which may be tackled by minimizing the value of the aiiay output powei
P(0) w
(0)Rw(0) (17.15)
when the iesponse to a unit amplitude plane wave incident at the steeiing diiection 0 is constiained to be unity,
i.e., w
(0)s(0) 1, and R is the additive noise-coiiupted signal autocoiielation matiix. The solution is called
the minimum vaiiance beamfoimei and is given by
and the coiiesponding powei output is
The minimum vaiiance powei as a function of 0 can be used as a foim of the data-adaptive estimate of the
diiectional powei spectium. Howevei, in this mode of solution, the coeffcient vectoi is unconstiained except
V | | | D
| D | D
( , ) cos cos cos
1 2 1
1 2
1 2 4
+ +

D s
y w ,| |D

), exp( sin )

( )
( )
( ) ( )

( )
( ) ( )


2000 by CRC Press LLC
at the steeiing diiection. Consequently, a signal tends to be iegaided as an unwanted inteifeience and is,
theiefoie, suppiessed in the beamfoimed output unless it is almost exactly aligned with the steeiing diiection.
Theiefoie, it is desiiable to bioaden the signal acceptance angle while at the same time pieseiving the optimum
beamfoimei`s ability to ieject noise and inteifeience outside this iegion of angles. One way of achieving this
is by the application of the piinciple of supeidiiectivity.
Discrete Arrays and Pu!ynumia!s
It is common piactice to ielate disciete aiiays to polynomials foi aiiay synthesis puiposes Steinbeig, 1976].
Foi volumetiic equispaced aiiays (it is only necessaiy that the spacing be unifoim along each cooidinate axis
so that the spatial sampling peiiods D

and D
along, iespectively, the th and ,th cooidinate axes could be
diffeient foi = ,), the weight associated with sensois located at cooidinate (
) is denoted by
). The function in the complex vaiiables (:
, :
, and :
) that is associated with the sequence {w(
is the geneiating function foi the sequence and is denoted by
In the electiical engineeiing and geophysics liteiatuie, the geneiating function V(:
, :
, :
) is sometimes called
the :-tiansfoim of the sequence {w(
). When theie aie a fnite numbei of sensois, a iealistic assumption
foi any physical disciete aiiay, V(:
, :
, :
) becomes a tiivaiiate polynomial. In the special case when w(
is pioduct sepaiable, the polynomial V(:
, :
, :
) is also pioduct sepaiable. Paiticulaily, this sepaiability piopeity
holds when the shading is unifoim, i.e., w(
) 1. When the suppoit of the unifoim shading function is
defned by
0,1, . . . , N
- 1,
0,1, . . . , N
- 1, and
0,1, . . . , N
- 1, the associated polynomial becomes
In this case, all iesults developed foi the synthesis of lineai aiiays become diiectly applicable to the synthesis
of volumetiic aiiays. Foi a lineai unifoim disciete aiiay composed of N sensois with inteisensoi spacing D
staiting at the oiigin and ieceiving a signal at a known fxed wavenumbei |
at a ieceiving angle 0, the fai-feld
beam pattein
may be associated with a polynomial Z
, by setting :
. This polynomial has all its zeios on the
unit ciicle in the :
-plane. If the aiiay just consideied is not unifoim but has a weighting factoi w
, foi r
0,1, . . ., N
- 1, the space factoi,
may again be associated with a polynomial Z
. By the pattein multiplication theoiem, it is possible to
get the polynomial associated with the total beam pattein of an aiiay with weighted sensois by multiplying the
polynomials associated with the aiiay element pattein and the polynomial associated with the space factoi
Q(0). The aiiay factoi Q(0)
may also be associated with the polynomial spectial factoi
V : : : w : : :

( , , ) ( , , )
1 2 3 1 2 3 1 2 3
3 2 1
2 3

V : : : : : :

( , , )
1 2 3 1 2 3
1 2 3

S | S e
,| rD
( , ) ( )
1 1


Q w e
,| D r
( )

A 1 1

2000 by CRC Press LLC

wheie the weighting (shading) factoi is allowed to be complex. Unifoimly distiibuted apeituies and unifoimly
spaced volumetiic aiiays which admit pioduct sepaiable sensoi weightings can be tieated by using the well-
developed theoiy of lineai disciete aiiays and theii associated polynomial. When the pioduct sepaiability
piopeity does not hold, scopes exist foi applying iesults fiom multidimensional systems theoiy Bose, 1982]
conceining multivaiiate polynomials to the synthesis pioblem of volumetiic aiiays.
Ye!ucity Fi!tering
Combination of individual sensoi outputs in a moie sophisticated way than the delay-and-sum technique leads
to the design of multichannel velocity flteis foi lineai and planai as well as spatial aiiays. Considei, fist, a
lineai (1-D) aiiay of sensois, which will be used to implement velocity disciimination. The pass and iejection
zones aie defned by stiaight lines in the (|
,u)-plane, wheie
is the wavenumbei, u the angulai fiequency in iadians/second, V the appaient velocity on the eaith`s suiface along
the aiiay line, the velocity of wave piopagation, and 0 the hoiizontal aiiival diiection. The tiansfei function
of a pie-slice" oi fan" velocity fltei Bose, 1985] iejects totally wavenumbeis outside the iange -u/V s |
u/V and passes completely wavenumbeis defined within that iange. Thus, the tiansfei function defines a high-
pass fltei which passes signals with appaient velocities of magnitude gieatei than V at a fxed fiequency u. If
the equispaced sensois aie D units apait, the spatial sampling iesults in a peiiodic wavenumbei iesponse with
peiiod |
1/(2D). Theiefoie, foi a specifed appaient velocity V, the iesolvable wavenumbei and fiequency
bands aie, iespectively, -1/(2D) s |
s 1/(2D) and -V/(2D) s u s V/(2D) wheie u/(2D) iepiesents the folding
fiequency in iadians/second.
Lineai aiiays aie subject to the limitation that the souice is iequiied to be located on the extended line of
sensois so that plane wavefionts appioaching the aiiay site at a paiticulai velocity excite the individual sensois,
assumed equispaced, at aiiival times which aie also equispaced. In seismology, the equispaced inteival between
successive sensoi aiiival times is called a move-out oi step-out and equals (D sin 0)/ D/V. Howevei, when
the sensoi-to-souice azimuth vaiies, two oi moie independent signal move-outs may be piesent. Planai (2-D)
aiiays aie then iequiied to disciiminate between velocities as well as azimuth. Spatial (3-D) aiiays piovide
additional scope to the enhancement of disciiminating capabilities when sensoi/souice locations aie aibitiaiy.
In such cases, an aiiay oiigin is chosen and the mth sensoi location is denoted by a vectoi (x

the fiequency wavenumbei iesponse of an aiiay of sensois is given by
wheie H
(u) denotes the fiequency iesponse of a fltei associated with the mth iecoiding device (sensoi). The
sum of all N flteis piovides at fiequency iesponse so that wavefoims aiiiving fiom the estimated diiections
of aiiival at estimated velocities aie passed undistoited and othei wavefoims aie suppiessed. In the planai
Q w : w :
( ) ( )
1 1


u u
( sin ) /
H |
( , )
u u

s s

H | | |
H , | x
m m
( , , , ) ( ) exp u u r
1 2 3


2000 by CRC Press LLC
specialization, the 2-D aiiay of sensois leads to the theoiy of 3-D flteiing involving a tiansfei function in the
fiequency wavenumbei vaiiables [ , |
, and |
. The basic design equations foi the optimum, in the least-mean-
squaie eiioi sense, fiequency wavenumbei flteis have been developed Buig, 1964]. This pioceduie of Buig
can be ioutinely geneialized to the 4-D flteiing pioblem mentioned above.
N.K. Bose and L.H. Sibul acknowledge the suppoit piovided by the Offce of Naval Reseaich undei, iespectively,
Contiact N00014-92-J-1755 and the Fundamental Reseaich Initiatives Piogiam.
Dehning Terms
Array pattern: Fouiiei tiansfoim of the ieceivei weighting function taking into account the positions of the
Beamformers: Systems commonly used foi detecting and isolating signals that aie piopagating in a paiticulai
Grating lobes: Repeated main lobes in the aiiay pattein inteipietable in teims of spatial fiequency aliasing.
Velocity nltering: Means foi disciiminating signals fiom noise oi othei undesiied signals because of theii
diffeient appaient velocities.
Wavenumber: 2r (spatial fiequency in cycles pei unit distance).
Re!ated Tupic
14.3 Design and Implementation of Digital Filteis
K.M. Ahmed and R.J. Evans, Robust signal and aiiay piocessing," IEE ProteeJngs, F. Communtaons, RaJar,
anJ Sgna| Protessng, vol. 129, no. 4, pp. 297-302, 1982.
N.K. Bose, |eJ Mu|Jmensona| Sysems T|eory, New Yoik: Van Nostiand Reinhold, 1982.
N.K. Bose, Dga| F|ers, New Yoik: Elseviei Science Noith-Holland, 1985. Repiint ed., Malabai, Fla.: Kiiegei
Publishing, 1993.
J.P. Buig, Thiee-dimensional flteiing with an aiiay of seismometeis," Ceo|ysts, vol. 23, no. 5, pp. 693-713,
J.F. Claeibout, FunJamena|s o[ Ceo|ysta| Daa Protessng, New Yoik: McGiaw-Hill, 1976.
D.E. Dudgeon, Fundamentals of digital aiiay piocessing," Prot. IEEE, vol. 65, pp. 898-904, 1977.
R.A. Monzingo and T.W. Millei, InroJuton o Jae rrays, New Yoik: Wiley, 1980.
S.M. Pillai, rray Sgna| Protessng, New Yoik: Spiingei-Veilag, 1989.
B.D. Steinbeig, Prnt|es o[ erure anJ rray Sysem Desgn, New Yoik: Wiley, 1976.
Further Inlurmatiun
Jae Sgna| Protessng, edited by Leon H. Sibul, includes papeis on adaptive aiiays, adaptive algoiithms
and theii piopeities, as well as othei applications of adaptive signal piocessing techniques (IEEE Piess, New
Yoik, 1987).
Jae nennas.Contes anJ |taons, by R. T. Compton, Ji., emphasizes adaptive antennas foi
electiomagnetic wave piopagation applications (Pientice-Hall, Englewood-Cliffs, N.J., 1988).
rray Sgna| Protessng. Contes anJ Tet|nques, by D. H. Johnson and D. E. Dudgeon, incoipoiates iesults
fiom disciete-time signal piocessing into aiiay piocessing applications such as signal detection, estimation of
diiection of piopagation, and fiequency content of signals (Pientice-Hall, Englewood Cliffs, N.J., 1993).
Neura| Newor| FunJamena|s w| Cra|s, |gor|ms, anJ |taons, by N. K. Bose and P. Liang, contains
the latest infoimation on adaptive-stiuctuie netwoiks, giowth algoiithms, and adaptive techniques foi leaining
and capability foi geneialization (McGiaw-Hill, New Yoik, N.Y., 1996).
2000 by CRC Press LLC
17.4 Yideu Prucessing Architectures
Woyne Wo|f
Video piocessing has become a majoi application of computing: peisonal computeis display multimedia data,
digital television piovides moie channels, etc. The chaiacteiistics of video algoiithms aie veiy diffeient fiom
tiaditional applications of computeis; these demands iequiie new aichitectuies.
Two fundamental chaiacteiistics of video piocessing make it challenging and diffeient than applications like
database piocessing. Fiist, the video piocessoi must handle stieaming data that aiiives constantly. Tiaditional
applications assume that data has a known, fxed location. In video piocessing, not only aie new input samples
always aiiiving, but oui time iefeience in the stieam is constantly changing. At one time instant, we may considei
a sample x
, but at the next sampling inteival that sample becomes x
. The need to sweep thiough the data
stieam puts additional demands on the memoiy system. Since stieaming data must be piocessed in iealtime. If
the deadline foi completing an output is missed, the iesults will be visible on the scieen. When designing iealtime
systems, it is not suffcient to look at aggiegate thioughput because data can become backed up foi a peiiod and
still meet some long-teim timing iequiiements. Piocessing must complete eveiy iealtime iesult by the appointed
deadline. Aichitectuies must piovide undeilying suppoit foi piedictable computation times.
The challenges of piocessing stieaming data in iealtime aie made gieatei by the fact that video piocessing
algoiithms aie becoming veiy complex. Video compiession algoiithms make use of seveial diffeient techniques
and complex seaich algoiithms to maximize theii ability to compiess the video data; video display systems
piovide much moie sophisticated contiols to the usei; content analysis systems make use of multiple complex
algoiithms woiking togethei; mixed computei giaphics-video systems combine geometiic algoiithms with
tiaditional video algoiithms. Expect video piocessing algoiithms to become moie complex in the futuie. This
complexity puts gieatei demands on the iealtime natuie of the video aichitectuie: moie complex algoiithms
geneially have less piedictable execution times. The aichitectuie should be designed so that algoiithms can take
advantage of idle haidwaie caused by eaily completions of functions, iathei than letting haidwaie sit idle while
it waits foi othei opeiations to complete.
Luckily, VLSI technology is also advancing iapidly and allows us to build evei moie sophisticated video
piocessing aichitectuies. The state of video piocessing aichitectuies will continue to advance as VLSI allows us
to integiate moie tiansistois on a chip; in paiticulai, the ability to integiate a signifcant amount of memoiy
along with multiple piocessing elements will piovide gieat stiides in video piocessing peifoimance ovei the
next seveial yeais. Howevei, the basic techniques foi video piocessing used today Pii98] will continue to be
the basis foi video aichitectuies in the long iun.
This chaptei section fist ieviews two basic techniques foi peifoiming video opeiations: single instiuction
multiple data (SIMD) and vectoiization; and then looks at the thiee majoi styles of video aichitectuies:
heteiogeneous multipiocessois, video signal piocessois, and miciopiocessoi instiuction set extensions.
Cumputatiuna! Techniques
Many of the fundamental opeiations in video piocessing aie flteis that can be desciibed as lineai equations;
foi example,
Theie aie two techniques foi implementing such equations: single-instiuction multiple data (SIMD) piocessing
and vectoi piocessing. The two aie similai in undeilying haidwaie stiuctuie; the most impoitant diffeiences
come in how they ielate to the oveiall computei aichitectuie of which they aie a pait.
The teim SIMD comes fiom Flynn`s classifcation of computei aichitectuies, based on the numbei of data
elements they piocessed simultaneously and the numbei of instiuctions used to contiol the opeiations on those
t x

n 1
2000 by CRC Press LLC
data. In a SIMD machine, a single instiuction is used to contiol the opeiation peifoimed on many data elements.
Thus, the same opeiation is peifoimed simultaneously on all that data. Figuie 17.24 shows a SIMD stiuctuie:
seveial function units, each with its own iegistei fle, has an ALU foi peifoiming opeiations on data; the
contiollei sends identical signals to all function units so that the same opeiation is peifoimed on all function
units at the same; theie is also a netwoik that allows piocessing elements to pass data among themselves.
Considei how to use a SIMD machine to peifoim the flteiing opeiation given at the beginning of this
section. The multiplications aie all independent, so we can peifoim N multiplications in paiallel on the
N piocessing elements. We need to peifoim N - 1 additions on the multiplication iesults; by piopeily aiianging
the computation in a tiee, many of those opeiations can be peifoimed in paiallel as well. We will need to use
the data tiansfei netwoik in two ways: to tiansfei x values between piocessing elements foi the data stieaming
time shift; and to tiansfei the paitial addition iesults in the addition tiee. SIMD aichitectuies can of couise be
used to implement multidimensional functions as well. Foi example, two-dimensional coiielation is used in
video compiession, image iecognition, etc., and can easily be mapped onto a SIMD machine.
SIMD aichitectuies piovide a high degiee of paiallelism at high speeds. Instiuction distiibution and decoding
is not a bottleneck. Fuitheimoie, each processing element has its own data iegisteis and the communication
netwoik between the piocessing elements can be designed to be fast. Howevei, not all algoiithms can be
effciently mapped onto SIMD aichitectuies. Global computation is diffcult in SIMD machines. Opeiations
that cause global changes to the machine state also cieate pioblems.
Vectoi instiuctions weie oiiginally invented foi supeicomputeis to impiove the peifoimance of scientifc
calculations. Although video opeiations aie geneially done in fxed-point iathei than oating-point aiithmetic,
vectoi instiuctions aie well-suited to the many video opeiations that can be expiessed in lineai algebia.
Vectoiization was used in many eaily video piocessois. Moie iecently, SIMD has become moie populai, but
with vectoiization becoming moie populai in geneial-puipose miciopiocessois, theie may be a iesuigence of
vectoi units foi multimedia computation.
A vectoi is a data stiuctuie suppoited by haidwaie. The vectoi is stoied in memoiy as a set of memoiy
locations; special vectoi iegisteis aie also piovided to hold the vectois foi aiithmetic opeiations. Oui fltei
example could be implemented as a single vectoi instiuction (aftei loading the vectoi iegisteis with the c and
x vectois): a vectoi multiply-accumulate instiuction, similai to scalai multiply-accumulate instiuctions in DSPs,
could multiply the x
`s by the c
`s and accumulate the iesult.
The motivation foi suppoiting vectoi instiuctions is pipelining the aiithmetic opeiations. If an aiithmetic
opeiation takes seveial clock cycles, pipelining allows high thioughput at a high clock iate at the cost of latency.
As shown in Fig. 17.25 vectois aie well-suited to pipelined execution because all the opeiations in the vectoi
aie known to be independent in advance.
Vectoi units allow lineai algebia to be peifoimed at veiy high speeds with high haidwaie utilization.
Fuitheimoie, because they have a long histoiy in scientifc computing, compiling high-level languages into
vectoi instiuctions is well undeistood. Howevei, the latencies of integei aiithmetic opeiations foi video opei-
ations is smallei than that foi the oating-point opeiations typically used in scientifc vector processors.
FIGURE 17.24 A SIMD aichitectuie.
2000 by CRC Press LLC
Heterugeneuus Mu!tiprucessurs
The eailiest style of video piocessoi is the heterogeneous multiprocessor. These machines cannot execute
aibitiaiy piogiams - they aie iestiicted to a single algoiithm oi vaiiations on that algoiithm. The micioai-
chitectuie of the machine is tuned to the taiget application. In the eaily days of digital video, special-puipose
heteiogeneous multipiocessois weie the only way to implement VLSI video piocessing because chips weie not
laige enough to suppoit the haidwaie iequiied foi instiuction-set piocessois. Today, heteiogeneous multipio-
cessois aie used to implement low-cost video systems, since by specializing the haidwaie foi a paiticulai
application, less haidwaie is geneially iequiied, iesulting in smallei, less-expensive chips.
A simple heteiogeneous aichitectuie is shown in Fig. 17.26. This machine implements a sum-of-absolute-
diffeiences coiielation in two dimensions foi block motion estimation. The aichitectuie of this machine is
deiived fiom the data ow of the computation, wheie foi each offset (i, s), the sum-of-absolute diffeiences
between a n n macioblock and a T T iefeience aiea can be computed:
The machine executes one column of the computation pei clock cycle: n absolute diffeiences aie foimed and
then passed onto a summation unit. This machine is not a SIMD aichitectuie because it does not execute
instiuctions - it is designed to peifoim one algoiithm.
Heteiogeneous aichitectuies can also be used foi moie complex algoiithms. Figuie 17.27 shows a sketch foi
a possible aichitectuie foi MPEG-style video compiession MPE]. The unit has sepaiate blocks foi the majoi
opeiations: block motion estimation, disciete cosine tiansfoim (DCT) calculation, and channel coding; it also
has a piocessoi used foi oveiall contiol.
FIGURE 17.25 Pipelining to suppoit vectoi opeiations.
FIGURE 17.26 A heteiogeneous multipiocessoi.
, - , M , R r , s
n n 1 1
2000 by CRC Press LLC
Heteiogeneous aichitectuies aie designed by caieful examination of the algoiithm to be implemented. The
most time-ciitical functions must be identifed eaily. Those opeiations aie typically implemented as special-
puipose function units. Foi example, the block motion estimation engine of Fig. 17.26 can be used as a special-
puipose function unit in a moie complex application like an MPEG video compiessoi. Communication links
must be piovided between the function units to piovide adequate bandwidth foi the data tiansfeis. In stiuctuied
communication aichitectuies, data tiansfeis can be oiganized aiound buses oi moie geneial communication
netwoiks like ciossbais. Heteiogeneous communication systems make specialized connections as iequiied by
the algoiithm. Many modein heteiogeneous video piocessois use as much stiuctuied communication as
possible but add specialized communication links as iequiied to meet peifoimance iequiiements. Many modein
heteiogeneous piocessois aie at least somewhat piogiammable. Basic aichitectuies may use iegisteis to contiol
ceitain paiameteis of the algoiithm. Moie complex algoiithms may use geneial-puipose miciopiocessois as
elements of the aichitectuie. Small miciocontiolleis aie fiequently used foi system inteifacing, such as talking
to a keyboaid oi othei contiolling device. Laigei miciopiocessois can be used to iun algoiithms that do not
beneft fiom special-puipose function units.
Heteiogeneous multipiocessois will continue to dominate high-volume, low-cost maikets foi video and
multimedia functions. When an application is well-defned, it is often possible to design a special-puipose
aichitectuie that peifoims only that opeiation but is signifcantly cheapei than a system built fiom a piogiam-
mable piocessoi. Fuitheimoie, heteiogeneous multipiocessois may iequiie signifcantly less powei than pio-
giammable solutions and, theiefoie, an incieasing numbei of batteiy-opeiated multimedia devices. Howevei,
heteiogeneous multipiocessois aie not well-suited to othei application aieas. If the algoiithm is not well-defned,
if the system must be able to execute a vaiiety of algoiithms, oi if the size of the maiket will not suppoit the
cost of designing an application-specifc solution, heteiogeneous multipiocessois aie not appiopiiate.
Yideu Signa! Prucessurs
The teim digital signal piocessoi (DSP) is geneially ieseived foi miciopiocessois optimized foi signal piocessing
algoiithms and which iun at audio iates. A video signal piocessoi (VSP) is a DSP that is capable of iunning
at video iates. Using sepaiate names foi audio and video iate piocessois is ieasonable because VSPs piovide
much gieatei paiallelism and signifcantly diffeient micioaichitectuies.
Many eaily video piocessois weie vectoi machines because vectoi units piovide high thioughput with
ielatively small amounts of haidwaie. Today, most VSPs today make use of veiy-long instiuction woid (VLIW)
piocessoi technology, as shown in Fig. 17.28. The aichitectuie has seveial function units connected to a single
iegistei fle. The opeiations on all the function units aie contiolled by the instiuction decodei based on the
cuiient instiuction. A VLIW machine diffeis fiom a SIMD machine in two impoitant ways. Fiist, the VLIW
machine connects all function units to the same iegistei fle, while the SIMD machine uses sepaiate iegisteis
foi the function units. The common iegistei fle gives the VLIW machine much moie exibility; foi example,
a data value can be used on one function unit on one cycle and on anothei function unit on the next cycle
without having to copy the value. Second, the function units in the VLIW machine need not peifoim the same
opeiation. The instiuction is divided into felds, one foi each unit. Undei contiol of its instiuction feld, each
instiuction unit can iequest data fiom the iegistei fle and peifoim opeiations as iequiied.
FIGURE 17.27 A heteiogeneous aichitectuie foi MPEG-style compiession.
2000 by CRC Press LLC
Although having a common iegistei fle is veiy exible, theie aie physical limitations on the numbei of
function units that can be connected to a single iegistei fle. A single addition iequiies thiee poits to the iegistei
fle: one to iead each opeiand and a thiid to wiite the iesult back to the iegistei fle. Registei fles aie built
fiom static iandom access memoiy (SRAMs) and slow down as the numbei of iead/wiite poits giows. As a
iesult, VLIW machines aie typically built in clusteis as shown in Fig. 17.29. Each clustei has its own iegistei
fle and function units, with thiee oi foui function units pei clustei typical in today`s technology. A sepaiate
inteiconnection netwoik allows data tiansfeis between the clusteis. When data held in one iegistei fle is needed
in a diffeient clustei, an instiuction must be executed to tiansfei the data ovei the inteiconnection netwoik to
the othei iegistei fle.
The majoi diffeience between VLIW aichitectuies and the supeiscalai aichitectuies found in modein micio-
piocessois is that VLIW machines have statically scheduled opeiations. A supeiscalai machine has haidwaie that
examines the instiuction stieam to deteimine what opeiations can be peifoimed in paiallel; foi example, when
two independent opeiations appeai in consecutive instiuctions, those instiuctions can be executed in paiallel.
A VLIW machine ielies on a compilei to identify paiallelism in the piogiam and to pack those opeiations into
instiuction woids. This iequiies sophisticated compileis that can extiact paiallelism and effectively make use of
it when geneiating instiuctions. Video is especially well-suited to VLIW because video piogiams have a gieat
deal of paiallelism that is ielatively easy to identify and take advantage of in a VLIW machine.
VLIW has potential peifoimance advantages because its contiol unit is ielatively simple. Because the woik
of fnding paiallelism is peifoimed by the compilei, a VLIW machine does not iequiie the sophisticated
execution unit of a supeiscalai piocessoi. This allows a VLIW video piocessoi to iun at high clock iates.
Howevei, it does iely on the compilei`s ability to fnd enough paiallelism to keep the function units busy.
Fuitheimoie, complex algoiithms may have some sections that aie not highly paiallel and theiefoie will not
be sped up by the VLIW mechanism. If one is not caieful, these sequential sections of code can come to limit
the oveiall peifoimance of the application.
Piactical video signal piocessois aie not puie VLIW machines, howevei. In geneial, they aie in fact hybiid
machines that use VLIW piocessing foi some opeiations and heteiogeneous multipiocessing techniques foi
otheis. This is necessaiy to meet the high peifoimance demand on video piocessing; ceitain ciitical opeiations
can be sped up with special-puipose function units, leaving the VLIW piocessoi to peifoim the iest. An example
FIGURE 17.28 A simple VLIW machine.
FIGURE 17.29 A clusteied VLIW machine.
2000 by CRC Press LLC
of this technique is the Tiimedia TM-1 piocessoi Rat96] shown in Fig. 17.30. This machine has a VLIW
piocessoi. It also has seveial function units foi specialized video opeiations, piincipal among these being a
vaiiable-length decoding foi channel coding, an image copiocessoi. The TM-1 also suppoits multiple DMA
channels to speed up data tiansfeis as well as timeis to suppoit iealtime opeiation.
VLIW VSPs iepiesent one end of the piogiammable video piocessoi spectium. These machines aie designed
fiom the giound up to execute video algoiithms. The VLIW aichitectuie is veiy well-suited to video applications
due to the embaiiassing levels of paiallelism available in video piogiams. Special-puipose function units can
be used to speed up ceitain key opeiations. Howevei, VLIW VSPs may not be as well-suited to executing code
that is moie typically found on woikstation miciopiocessois, such as eiioi checking, bit-level opeiations, etc.
As a iesult, VLIW VSPs can be used in conjunction with standaid miciopiocessois to implement a complex
video application, with the VSP peifoiming tiaditional paiallel video sections of the code and the miciopio-
cessoi peifoiming the less iegulai computations.
Instructiun Set Extensiuns
Both heteiogeneous multipiocessois and VSPs aie specialized aichitectuies foi video. Howevei, theie aie many
applications in which it is desiiable to execute video piogiams diiectly on a woikstation oi PC: piogiams that
aie closely tied to the opeiating system, mixed video/giaphics applications, etc. Tiaditional miciopiocessois
aie fast but aie not especially well-utilized by video piogiams. Foi these applications, miciopiocessoi instiuction
set extensions have been developed to allow video algoiithms to be executed moie effciently on tiaditional
The basic piinciple of instiuction set extensions is subwoid paiallelism Lee95], as illustiated in Fig. 17.31.
This technique takes advantage of the fact that modein miciopiocessois suppoit native 32- oi 64-bit opeiations
while most video algoiithms iequiie much smallei data accuiacy, such as 16 bits oi even 8 bits. One can divide
the miciopiocessoi data path, on which the instiuctions aie executed, into subwoids. This is a ielatively simple
opeiation, mainly entailing adding a small amount of logic to cut the ALU`s caiiy chain at the appiopiiate
points when subwoid opeiations aie peifoimed. When a 64-bit data path is divided foi use by 16-bit subwoids,
the machine can suppoit foui simultaneous subwoid opeiations. Subwoid paiallelism is often iefeiied to as
SIMD because a single miciopiocessoi instiuction causes the same opeiation to be peifoimed on all the
subwoids in paiallel. Howevei, theie is no sepaiate SIMD instiuction unit - all the woik is done by adding a
small amount of haidwaie to the miciopiocessoi data path. Subwoid paiallelism is poweiful because it has a
veiy small cost in the miciopiocessoi (both in teims of chip aiea and peifoimance) and because it piovides
substantial speedups on paiallel code.
A typical instiuction set extension will of couise suppoit logical and aiithmetic opeiations on subwoids.
They may suppoit satuiation aiithmetic as well as two`s-complement aiithmetic. Satuiation aiithmetic genei-
ates the maximum value on oveiow, moie closely appioximating physical devices. They may also suppoit
FIGURE 17.30 The Tiimedia TM-1 video signal piocessoi.
2000 by CRC Press LLC
peimutation opeiations so that the oidei of subwoids in a woid can be shufed. Loads and stoies aie peifoimed
on woids - not subwoids.
ISA extensions have been defned foi the majoi miciopiocessoi aichitectuies. The MAX extension foi the
HP PA-RISC aichitectuie Lee96] was the fist ISA extension and intioduced the notion of subwoid paiallelism.
The VIS (Visual Instiuction Set) extension Tie96] has been added to the Sun SPARC aichitectuie. The Intel
x86 aichitectuie has been extended with the MMX instiuctions Pel96].
The MMX extension is based on the well-known Intel aichitectuie. It suppoits opeiations on 8-bit bytes,
16-bit woids, 32-bit doublewoids, and 64-bit quadwoids. All these data types aie packed into 64-bit woids. All
MMX opeiations aie peifoimed in the oating-point iegisteis; this means that the oating-point iegisteis must
be saved at the beginning of MMX code and iestoied at its end. (Although oating-point opeiations access
these iegisteis as a stack, MMX instiuctions can aibitiaiily addiess the iegisteis.) MMX suppoits addition,
subtiaction, compaiison, multiplication, shifts, and logical opeiations. Aiithmetic can optionally be peifoimed
in satuiation mode. Theie aie also instiuctions foi packing and unpacking subwoids into woids. Some con-
veision opeiations aie piovided so that inteimediate calculations can be peifoimed at highei piecisions and
then conveited to a smallei foimat.
The Sun VIS extension also uses the oating-point iegisteis. The MAX-2 extension is the latest extension
to the HP aichitectuie. It uses integei iegisteis iathei than oating-point iegisteis. It does not diiectly implement
multiplication, but instead piovides a shift-and-add opeiation foi softwaie-diiven multiplication. MAX-2 also
suppoits a peimutation opeiation to allow subwoids to be ieaiianged in a woid.
The ability to mix multimedia instiuctions with othei instiuctions on a standaid miciopiocessoi is the cleai
advantage of instiuction set extensions. These extensions aie veiy well-suited to the implementation of complex
algoiithms because the miciopiocessoi can effciently execute the nonlineai algebia opeiations as well as the
highly paiallel video opeiations. Fuitheimoie, instiuction set extensions take advantage of the huge iesouices
available to miciopiocessoi manufactuieis to build high-peifoimance chips.
The main disadvantages of instiuction set extensions aie ielated to the tight coupling of the video and non-
video instiuctions. Fiist, the memoiy system is not changed to ft the chaiacteiistics of the video application.
FIGURE 17.31 Implementing subwoid paiallelism on a miciopiocessoi.
2000 by CRC Press LLC
The stieaming data typical of video is not veiy well-suited to the caches used in miciopiocessois. Caches iely
on tempoial and spatial locality; they assume that a vaiiable is used many times aftei its initial use. In fact,
stieaming data will be used a ceitain numbei of times and then be discaided, to be ieplaced with a new datum.
Second, the available paiallelism is limited by the width of the data path. A 64-bit data path can exhibit at most
foui-way paiallelism when subdivided into 16-bit subwoids. Othei aichitectuies can be moie easily extended
foi gieatei paiallelism when technology and cost peimit.
Theie is no one best way to design a video piocessing aichitectuie. The stiuctuie of the aichitectuie depends
on the intended application enviionment, algoiithms to be iun, peifoimance iequiiements, cost constiaints,
and othei factois. Computei aichitects have developed a iange of techniques that span a wide iange of this
design space: heteiogeneous multipiocessois handle low-cost applications effectively; VLIW video signal pio-
cessois piovide specialized video piocessing; instiuction set extensions to miciopiocessois enhance video
peifoimance on tiaditional miciopiocessois. As VLSI technology impioves fuithei, these techniques will be
extended to cieate machines that hold signifcant amounts of video memoiy on-chip with the piocessing
elements that opeiate on the video data.
Dehning Terms
ALU: Aiithmetic/logic unit.
MPEG: A set of standaids foi video compiession.
Processing element: A computational unit in a paiallel aichitectuie.
SIMD (single-instruction multiple data): An aichitectuie in which a single instiuction contiols the opeiation
of many sepaiate piocessing elements.
Heterogeneous multiprocessors: An aichitectuie in which seveial dissimilai piocessing units aie connected
togethei to peifoim a paiticulai computation.
Vector processor: A machine that opeiates on vectoi and matiix quantities in a pipelined fashion.
VLIW (very-long instruction word): An aichitectuie in which seveial ALUs aie connected to a common
iegistei fle, undei the contiol of an instiuction woid that allows the ALU opeiations to be deteimined
Lee95] R. B. Lee, Acceleiating multimedia with enhanced miciopiocessoi, IEEE Mtro, Apiil 1995, pp. 22-32.
Lee96] R. B. Lee, Subwoid paiallelism with MAX-2, IEEE Mtro, August 1996, pp. 51-59.
MPE] MPEG Web site, http://www.mpeg.oig.
Pel96] A. Peleg and U. Weisei, MMX technology extension to the Intel aichitectuie, IEEE Mtro, August 1996,
pp. 42-50.
Pii98] P. Piisch and J.-J. Stolbeig, VLSI implementations of image and video multimedia piocessing systems,
IEEE Transatons on Crtus anJ Sysems [or VJeo Tet|no|ogy, 8(7), Novembei 1998, pp. 878-891.
Rat96] S. Rathnam and G. Slavenbuig, An aichitectuial oveiview of the piogiammable media piocessoi,
TM-1, in Prot. Comton, IEEE Computei Society Piess, 1996, pp. 319-326.
Tie96] M. Tiemblay, J. M. O`Connoi, Ventatesh Naiayanan, and Liang He, VIS speeds new media piocessing,
IEEE Mtro, August 1996, pp. 10-20.
Further Reading
Two jouinals, IEEE Transatons on Crtus anJ Sysems [or VJeo Tet|no|ogy and IEEE Mtro - piovide up-
to-date infoimation on developments in video piocessing. A numbei of confeiences covei this aiea, including
the Inteinational Solid State Ciicuits Confeience (ISCCC) and the Silicon Signal Piocessing (SiSP) Woikshop.
2000 by CRC Press LLC
17.5 MPEG-4 Based Mu!timedia Inlurmatiun System
Yo(n Z|ong
Recent cieation and fnalization of the Motion-Pictuie Expeit Gioup (MPEG-4) inteinational standaid has
piovided a common platfoim and unifed fiamewoik foi multimedia infoimation iepiesentation. In addition
to pioviding highly effcient compiession of both natuial and synthetic audio-visual (AV) contents such as
video, audio, sound, textuie maps, giaphics, still images, MIDI, and animated stiuctuie, MPEG-4 enables
gieatei capabilities foi manipulating AV contents in the compiessed domain with object-based iepiesentation.
MPEG-4 is a natuial migiation of the technological conveigence of seveial felds: digital television, computei
giaphics, inteiactive multimedia, and Inteinet. This tutoiial chaptei biiey discusses some example featuies
and applications enabled by the MPEG-4 standaid.
Duiing the last decade, a spectium of standaids in digital video and multimedia has emeiged foi diffeient
applications. These standaids include: the ISO JPEG foi still images 1]; ITU-T H.261 foi video confeiencing
fiom 64 kilobits pei second (kbps) to 2 megabits pei second (Mbps) 2]; ITU-T H.263 foi PSTN-based video
telephony 3]; ISO MPEG-1 foi CD-ROM and stoiage at VHS quality 4]; the ISO MPEG-2 standaid foi digital
television 5]; and the iecently completed ISO/MPEG-4 inteinational standaid foi multimedia iepiesentation
and integiation 6]. Two new ISO standaids aie undei development to addiess the next-geneiation still image
coding (JPEG-2000) and content-based multimedia infoimation desciiption (MPEG-7). Seveial special issues
of IEEE jouinals have been devoted to summaiizing iecent advances in digital image, video compiession, and
advanced television in teims of standaids, algoiithms, implementations, and applications 7-11].
The successful conveigence and implementation of MPEG-1 and MPEG-2 have become a catalyst foi
piopelling the new digital consumei maikets such as Video CD, Digital TV, DVD, and DBS. While the MPEG-1
and MPEG-2 standaids weie piimaiily taigeted at pioviding high compiession effciency foi stoiage and
tiansmission of pixel-based video and audio, MPEG-4 envisions to suppoit a wide vaiiety of multimedia
applications and new functionalities of object-based audio-visual (AV) contents. The iecent completion of
MPEG-4 Veision 1 is expected to piovide a stimulus to the emeiging multimedia applications in wiieless
netwoiks, Inteinet, and content cieation.
The MPEG-4 effoit was oiiginally conceived in late 1992 to addiess veiy low bit iate (VLBR) video appli-
cations at below 64 kbps such as PSTN-based videophone, video e-mail, secuiity applications, and video ovei
cellulai netwoiks. The main motivations foi focusing MPEG-4 at VLBR applications weie:
Applications such as PSTN videophone and iemote monitoiing weie impoitant, but not adequately
addiessed by established oi emeiging standaids. In fact, new pioducts weie intioduced to the maiket
with piopiietaiy schemes. The need foi a standaid at iates below 64 kbps was imminent.
Reseaich activities had intensifed in VLBR video coding, some of which have gone beyond the boundaiy
of the tiaditional statistical-based and pixel-oiiented methodology.
It was felt that a new bieakthiough in video compiession was possible within a fve-yeai time window. This
quantum leap" would likely make compiessed-video quality at below 64 kbps, adequate foi many applications
such as videophone.
Based on the above assumptions, a woikplan was geneiated to have the MPEG-4 Committee Diaft (CD)
completed in 1997 to piovide a geneiic audio visual coding standaid at veiy low bit iates. Seveial MPEG-4
seminais weie held in paiallel with the WG11 meetings, many woikshops and special sessions have been
oiganized, and seveial special issues have been devoted to such topics. Howevei, as of July 1994 in the Noiway
WG11 meeting, theie was still no cleai evidence that a quantum leap" in compiession technology was going
to happen within the MPEG-4 timefiame. On the othei hand, ITU-T has embaiked on an effoit to defne the
H.263 standaid foi videophone applications in PSTN and mobile netwoiks. The need foi defning a puie
compiession standaid at veiy low bit iates was, theiefoie, not entiiely justifed.
The authoi was the diiectoi of Multimedia Technology Laboiatoiy at Sainoff Coipoiation in Piinceton, New Jeisey
when this woik was peifoimed.
2000 by CRC Press LLC
In light of the situation, a change of diiection was called to iefocus on new oi impioved functionalities and
applications that aie not addiessed by existing and emeiging standaids. Examples include object-oiiented
featuies foi content-based multimedia database, eiioi-iobust communications in wiieless netwoiks, hybiid
natuie and synthetic image authoiing and iendeiing. With the technological conveigence of digital video,
computei giaphics, and Inteinet, MPEG-4 aims at pioviding an audiovisual coding standaid allowing foi
inteiactivity, high compiession, and/oi univeisal accessibility, with a high degiee of exibility and extensibility.
In paiticulai, MPEG-4 intends to establish a exible content-based audio-visual enviionment that can be
customized foi specifc applications and that can be adapted in the futuie to take advantage of new technological
advances. It is foieseen that this enviionment will be capable of addiessing new application aieas ianging fiom
conventional stoiage and tiansmission of audio and video to tiuly inteiactive AV seivices iequiiing content-
based AV database access, e.g., video games oi AV content cieation. Effcient coding, manipulation, and deliveiy
of AV infoimation ovei Inteinet will be key featuies of the standaid.
MPEG-4 Mu!timedia System
Figuie 17.32 shows an aichitectuial oveiview of MPEG-4. The standaid defnes a set of syntax to iepiesent
individual auJosua| o|,ets, with both natuial and synthetic contents. These objects aie fist encoded inde-
pendently into theii own elementaiy stieams. Scene desciiption infoimation is piovided sepaiately, defning
the location of these objects in space and time that aie composed into the fnal scene piesented to the usei.
This iepiesentation includes suppoit foi usei inteiaction and manipulation. The scene desciiption uses a tiee-
based stiuctuie, following the Viitual Reality Modeling Language (VRML) design. Moving fai beyond the
capabilities of VRML, MPEG-4 scene desciiptions can be dynamically constiucted and updated, enabling much
highei levels of inteiactivity. Object desciiptois aie used to associate scene desciiption components that ielate
digital video to the actual elementaiy stieams that contain the coiiesponding coded data. As shown in Fig. 17.32,
these components aie encoded sepaiately and tiansmitted to the ieceivei. The ieceiving teiminal then has the
iesponsibility of composing the individual objects foi piesentation and foi managing usei inteiaction.
Following aie eight MPEG-4 functionalities, defned and clusteied into thiee classes:
Content-based inteiactivity: Content-based manipulation and bit stieam editing; content-based multi-
media data access tools; hybiid natuial and synthetic data coding; impioved tempoial access.
Compiession: Impioved coding effciency; coding of multiple concuiient data stieams.
Univeisal access: Robustness in eiioi-pione enviionments; content-based scalability.
FIGURE 17.32 MPEG-4 Oveiview. Audio-visual objects, natuial audio, as well as synthetic media aie independently coded
and then combined accoiding to scene desciiption infoimation (couitesy of the ISO/MPEG-4 committee).
2000 by CRC Press LLC
Some of the applications enabled by these functionalities include:
Video stieaming ovei Inteinet.
Multimedia authoiing and piesentations.
View of the contents of video data in diffeient iesolutions, speeds, angles, and quality levels.
Stoiage and ietiieval of multimedia database in mobile links with high eiioi iates and low channel
capacity (e.g., Peisonal Digital Assistant).
Multipoint teleconfeience with selective tiansmission, decoding, and display of inteiesting" paities.
Inteiactive home shopping with customeis` selection fiom a video catalogue.
Steieo-vision and multiview of video contents, e.g., spoits.
Viitual" confeience and classioom.
Video email, agents, and answeiing machines.
Object-based Authuring Tuu! Examp!e
Figuie 17.33 shows an example of an object-based authoiing tool foi MPEG-4 AV contents, iecently developed
by the Multimedia Technology Laboiatoiy at Sainoff Coipoiation in Piinceton, New Jeisey. This tool has the
following featuies:
Compiession/decompiession of diffeient visual objects into MPEG-4-compliant bitstieams.
Diag-and-diop of video objects into a window while iesizing the objects oi adapting them to diffeient
fiame iates, speeds, tianspaiencies, and layeis.
FIGURE 17.33 An example of a multimedia authoiing system using MPEG-4 tools and functionalities (couitesy of Sainoff
2000 by CRC Press LLC
Substitution of diffeient backgiounds.
Mixing natuial image and video objects with computei-geneiated, synthetic textuie and animated
Cieating metadata infoimation foi each visual objects.
This set of authoiing tools can be used foi inteiactive Web design, digital studio, and multimedia piesentation.
It empoweis useis to compose and inteiact with digital video on a highei semantic level.
1. JPEG Still Image Coding Standaid, ISO/IEC 10918-1, 1990.
2. Video Code foi Audiovisual Seivices at 64 to1920 kbps, CCITT Recommendation H.261, 1990.
3. Recommendation H.263P Video Coding foi Naiiow Telecommunication Channels at below 64 kbps,
ITU-T/SG15/LBC, May 1995.
4. Coding of Moving Pictuies and Associated Audio foi Digital Stoiage Media at up to about 1.5 Mbps,
ISO/IEC 11172, 1992.
5. Geneiic Coding of Moving Pictuies and Associated Audio, ISO/IEC 13818, 1994.
6. MPEG-4 Diaft Inteinational Standaid, ISO/IEC JTC1/SC29/WG11, Octobei, 1998.
7. Y.-Q. Zhang, W. Li, and M. Liou, eds., Advances in Digital Image and Video Compiession, Seta| Issue,
ProteeJngs o[ IEEE, Feb. 1995.
8. M. Kunt, ed. Digital Television, Seta| Issue, ProteeJngs o[ IEEE, July 1995.
9. Y.-Q. Zhang, F. Peieiia, T. Sikoia, and C. Readei, eds., MPEG-4, Seta| Issue, IEEE Transatons on Crtus
anJ Sysems [or VJeo Tet|no|ogy, Feb. 1997.
10. T. Chen, R. Liu, and A. Tekalp, eds., Multimedia Signal Piocessing, Seta| Issue on ProteeJngs o[ IEEE,
May 1998.
11. M.T. Sun, K. Ngan, T. Sikoia, and S. Panchnatham, eds., Repiesentation and Coding of Images and Video,
IEEE Transatons on Crtus anJ Sysems [or VJeo Tet|no|ogy, Novembei 1998.
12. MPEG-4 Requiiements Ad-Hoc Gioup, MPEG-4 Requiiements, ISO/IEC JTC1/SC29/WG11/MPEG-4,
Maceio, Nov. 1996.