Imaging Hardware
With advances in image sensors and their supporting interfaces, image-based
measurement platforms will continue to offer higher performance and more
programmable features. While the evolving technologies will simplify the physical
connection and setting up of machine vision systems and open new application
areas, the level of benefits will continue to depend on the collective ability
of application engineers and system designers to quantify the characteristic
parameters of their target applications. Although software can enhance a noisy,
distorted, or defocused image, some of the measurement features embedded in the
target scene may be lost in the process. Thus, a good source image, rather than a
numerically enhanced image, is an essential building block of a successful machine
vision application. Capturing an image is not difficult, but acquiring an image with
the required characteristic features of the target requires insight into the imaging
hardware.
This chapter starts with an outline description of video signals and their
standards in the context of image display. This subject is followed by a description
of the key components of framegrabbers and their performance. With the
increasing use of images from moving targets for higher throughput and demands
on measurement accuracy, latency and resolution have become important in the
overall assessment of a machine vision system. This chapter concludes with some
definitions and concepts associated with these topics and illustrative examples.
This leads to two basic terms: (1) update rate: the actual new picture frames/sec,
and (2) refresh rate: the number of times the same picture frame is presented
(twice the update rate in cinematic films). Computer monitors may be interlaced or
noninterlaced. In noninterlaced displays, the picture frame is not divided into two
fields. Therefore, noninterlaced monitors have only a refresh rate, typically upward
of 70 Hz. The refresh rate and resolution in multisync monitors are programmable.
Since the refresh rate depends on the number of rows to scan, it restricts the
maximum resolution, which is in turn related to the physical size of the monitor.
Image frames have traditionally been displayed by a raster-based CRT monitor,
which consists of an electronic beam moving on a 2D plane (display screen) and
a beam intensity that varies along the perpendicular axis (Fig. 6.1). The display
signal may be considered to be a spatially varying luminous signal. The timing
for the horizontal and vertical deflection scan and the amplitude of the luminous
signal are specified by the Electronic Industries Alliance (EIA) Recommended
Standard-170 (commonly referred to as RS-170) and the CCIR standards. Table 6.1
lists some of the key parameters in RS-170 plus three video standards. The
RS-343A standard, originally created for high-resolution closed-circuit television
cameras, defines higher resolution as 675 to 1023 lines/image frame with timing
waveforms modified from the RS-170 to provide additional signal characteristics.
The RS-170A, a modification of the RS-170 standard, works with color video
signals by adding color information to the existing monochrome brightness signal;
Figure 6.1 (a) Main components of a CRT (interlaced) display. (b) Excitation signals for
the x- and y-deflection coils control the beam location on the display surface deflection
coils. The beam intensity along the z axis contains the video signal (analog voltage);
corresponding timing and voltage values are given in Fig. 6.3.
Imaging Hardware 163
Table 6.1 Operational parameters of three video broadcasting standards. EIA RS-343A
operates from 675 to 1023 lines; the recommended values for 875 lines are included for
comparison.3
Parameter Format
EIA RS-170 CCIR SECAM EIA RS-343A
Frame rate 30 25 25 60
Number of lines/frame 525 625 625 875
Total line time, s 63.49 64 64 38.09
[=1/(frame rate lines per frame)]
Number of active lines/frame 485 575 575 809
Nominal active line time * , s 52.59 52 52 31.09
Number of horizontal pixels 437 569 620
Number of pixels/frame 212,000 527,000 356,500
Line-blanking time, s 10.9 12 12 7
[=total line time-nominal active line time]
Field blanking time, ms
1.27 1.6 1.6
Line frequency, kHz = total line
1
time 15.750 15.625 15.625 26.25
*
Corresponds to the duration of the video signal (luminous intensity) in each horizontal line.
RS-170A provides the color television NTSC standard. The color video standard in
Europe, PAL, was adapted from the CCIR standard. Squentiel couleur mmoire
(SECAM) uses techniques similar to NTSC and PAL to generate a composite color
video signal.
In raster-based video display systems, the picture frame is divided into two
fields: the odd field contains the odd-numbered (horizontal) scan lines, and the
even field contains the even-numbered scan lines (Fig. 6.2). By displaying the two
fields alternately, the effective CFF for the whole picture frame is doubled. The
frame rate refers to the number of complete pictures presented, while the field rate
indicates the rate (or field frequency) at which the electron beam scans the picture
from top to bottom. By dividing the whole frame (= one whole image or picture)
into two fields, the frame rate becomes one-half of the field frequency. By choosing
the main frequency as the field frequency, the frame rates in RS-170 and the CCIR
standards become 30 fps and 25 fps, respectively. These give the respective picture
frame updating times of 33.33 ms for RS-170 and 25 ms for CCIR.
In the RS-170 standard, the whole picture is made of 525 horizontal scan lines,
with two fields interlaced as in Fig. 6.2. The scanning process begins with the
odd field starting at the upper left top corner. The beam moves from the left to
the right across the screen, shifting downward slightly to give a slanted display
of each horizontal scan line. When the beam reaches the edge of the screen, it
moves to the left edge of the odd line location on the screen. The time needed
by the beam to move from the end of one odd line to the beginning of the next
and settle down before beginning to scan again is known as the line flyback time
[Figs. 6.2(a) and (b), top]. When the beam reaches the very last line in the odd
field (end of screen), it moves to the starting point of the first even field, which is
above the very first odd line on top of the screen [Fig. 6.2(c), top]. The scanning
time between the end of one field and the beginning of the next is called the field
164 Chapter 6
Figure 6.2 Superposition of the (a) odd fields and (b) even fields to generate (c) one
picture frame (courtesy of Philips Research, Eindhoven, The Netherlands).
flyback time. To ensure that the line flyback and the field flyback tracks do not
distract the viewer, the beam is made invisible (field blanking) by bringing its
luminous intensity down to the ground level.2 For the RS-170, the total time taken
for the field flyback is equivalent to 20 scan lines/field, giving 242.5 active (visible)
lines/field (25 field blanking lines and 287.5 active lines in CCIR). The number of
lines for field synchronization and the timing parameters in the video signal are
shown in Fig. 6.3.
Because of the line-scanning nature of a video signal, the spacing between two
consecutive horizontal lines defines the screen height necessary to display a full
picture. Thus, the number of horizontal lines becomes the default value of the
number of vertical pixels available for display. The width of the screen in turn
is related to its height through the aspect ratio (ratio of display width to height)
specified in the video standard. The aspect ratio for the RS-170 and CCIR is 4:3.
A composite video signal contains all timing as well as analog signals, as shown in
Fig. 6.3. During display, these signals are extracted to drive the respective parts of
the display unit [Fig. 6.4(a)].
In image-processing operations, the input is normally a square image (aspect
ratio of 1, or width:height = 1:1). This implies that the displayed image is required
to have the same number of pixels along the horizontal (x) and the vertical (y)
axes. Consequently, the duration of the active video along the horizontal axis must
be adjusted; this adjusted time is referred to as the active line time and denoted by
TAC (=3/4TVH ). To obtain an aspect ratio of 1:1, the number of horizontal scan
lines is kept unchanged, but the sampling is delayed. In the RS-170, the delay is
6.575 s from the start of the horizontal scan line of the incoming video to 6.575 s
before the line reaches its end. In the CCIR standards, the sampling is delayed by
6.5 s and terminated early by the same period. The consequence is that black
strips appear on the left and right sides of the display, giving a smaller active video
Imaging Hardware 165
Figure 6.3 (a) Video lines for a raster-based video display. (b) One active line of mono-
chrome video line signal. (c) Timing parameters in images (a) and (b) and in Fig. 6.1(b). For
equal spatial resolution, TAC = 3/4 52 = 39 s; for 512 512-pixel resolution, visible line
sampling time is 76 ns (13.13-MHz sampling frequency).
area. For 256 256 images, one of the two fields is defined; for higher resolutions,
768 576 or more video lines are captured.
For image digitization, each video line signal in each field during TAC is
sampled, quantized, and stored in memory as an individual pixel with its x and
y locations appropriately registered. The value of the ADC sampling time will
166 Chapter 6
Figure 6.4 (a) Generation of timing pulses from a composite video signal. (b) Image
digitization from a composite video signal.
depend on the required resolution. If the required image size is 256 256, the
common practice is to capture only one field, discounting the half-line, which
corresponds to 242 lines/field in RS-170. The remainder is made up of blank
lines [AB plus BC in Fig. 6.3(a)]. For CCIR, excess lines are discarded equally
at the top and bottom of each field. A similar process follows for sampling to
generate a 512 512 image frame. The resolution of conventional TV cameras
limits the maximum possible vertical resolution available from the source image.
For the larger 512 512 image, both fields are captured. An adequate memory
space is necessary to store the entire digitized image frame. For the commonly used
machine vision image size of 512 512 pixels with an 8-bit gray-level resolution,
Imaging Hardware 167
the size of one image frame is 262,144 bytes. A functional block diagram for image
digitization and the associated memory map is shown in Fig. 6.4(b).68
By definition, a pixel is the smallest area on the screen that can be displayed
with time-varying brightness. The pixel size gives a quantitative measure of the
number of individual pieces of brightness information conveyed to the observer or
the display system resolution (independent of the display screen size). However,
due to errors in the beam control circuitry, neighboring pixels may be illuminated,
which reduces the effective resolution. The parameter addressability (the number
of pixels per unit length of the horizontal scan line) is used for indicating the
ability of the beam location controller to select and activate a unique area within
the display screen.3
The vertical pixel size is the width of the horizontal scan line, i.e., display
height/number of active lines. Because of the phasing effect in the human visual
system, the average number of individual horizontal lines that can be perceived is
less than the actual number of active horizontal lines present in a picture frame. The
ratio of the average number of horizontal scan lines perceived to the total number
of horizontal lines present in the frame (Kell factor) is usually 0.7. This value gives
an average vertical resolution of 340 lines for the RS-170 and 402 for the CCIR
systems. The corresponding parameter for HDTV is over 1000 lines.
Color displays contain three independently controlled electron beams that scan
small areas on the CRT face. Each area on the screen corresponds to a pixel location
that contains three different phosphor-coated dots: blue, red, and green (Fig. 6.5).
The three electron beams themselves do not control color, but the desired color
is produced by the combination of their intensities. The separation between the
adjacent dots of similar colors is known as the dot pitch and gives a quantitative
measure of the display resolution. To ensure that beams converge uniformly, a
dynamic convergence correction is employed to keep the three electron beams
together as they move across the 2D screen.3 In Sony TrinitronTM monitors, the
metal mask has vertical slots rather than circular holes. The geometric arrangement
of these vertical slots is such that the output of one gun can reach only one stripe
of color phosphor. Tables 6.2 and 6.3 summarize the display formats and typical
pixel densities of color monitors commonly used in machine vision platforms.
The brightness of the displayed pixels on the screen is a function of the intensity
of the electron beams and the luminosity of the coated phosphors. While the beam
intensity is linearly related to the applied voltage ( video signal), the luminous
output of the phosphors is related exponentially to the incident beam intensity,
generally with gamma as the exponential parameter:
Figure 6.5 In color CRT monitors, the source beams converge through holes in a metal
mask approximately 18 mm behind the glass display screen. These holes are clustered
either as (a) shadow-mask pattern or (b) precision inline (PIL). (c) Gamma correction with
normalized axis scaling.
images, this is essentially a gray-level mapping in which the captured pixel gray-
level values are rescaled by the transformation
1
Rescaled pixel gray level = (captured pixel gray-level value) (6.1b)
and fed into the display.10 The collective result of Eqs. (6.1a) and (6.1b) is a
linearized relationship between the image pixel intensity and its brightness on
the CRT display. The gamma-corrected image generally has a brighter appearance
(Sec. 9.2). In addition to gamma correction for brightness, color displays require
variations in the color additive rules.
Imaging Hardware 169
CRT monitors are specified by the diagonal size of the screen. Since pixel
resolution is dependent on the number of vertical lines and hence the monitor
height, resolution (pixels/inch) is related to the monitor size. With an aspect
ratio of 4:3, the screen height is 0.2 the screen diameter, which allows for
a margin of around 7% for the dark areas around the edges. An approximate
pixel resolution value (also referred to as pixel density) is derived as 1.07 (5
monitor vertical resolution/3 monitor size). Since the number of pixels in an
image frame (display resolution) defines the pixel density, an image on a larger
screen may appear coarser than on a smaller screen with similar resolution.
Figure 6.6 Molecular structure of three types of liquid crystal materials: (a) nematic, (b)
twisted nematic, and (c) smectic. (d) Passive matrix display.11,12 The orientation of light as
it passes through the liquid crystal layers (e) in their natural state and (f) with an applied
electric field.
Imaging Hardware 171
parallel (aligned) molecules may be created. Liquid crystal displays (LCDs) use
two basic properties of the liquid crystal material: (1) light follows the alignment
of the molecules, and (2) the molecules tend to orient with their long axes parallel
to an electric field.
If a thin layer of liquid crystal material is sandwiched between two glass surfaces
(alignment layers) with grooves at right angles, the liquid crystal molecules on
one surface will be aligned at right angles to the other surface while those in
between will be forced to assume a twisted state between 0 and 90 deg. Thus,
incident light on one glass surface will be twisted by 90 deg as it passes through
the sandwiched liquid crystal layer and will exit the second glass surface without
any loss of intensity [Fig. 6.6(e)]. A transparent indium-tin-oxide (ITO) layer
is placed by photolithography on each glass plate to act as electrodes. When a
voltage is applied to the sandwiched layer using these terminals, all liquid crystal
molecules align with the resulting electric field. This axial lining up of the twisted
nematic molecules blocks all incoming light rays [Fig. 6.6(f)]. A pair of orthogonal
polarized films is added to ensure that only rays twisted exactly by 90 deg are
transmitted through the LCD sandwich in both natural and excited states. The
source of the incoming light is a backlit fluorescent tube mounted on the top and
bottom edges of the display panel. Light guides distribute light across the scene.
The extent of optical blocking depends on the response time and transmissivity of
the liquid crystal material; the best results are obtained by super-twisted nematic
(STN) material. The orientation of the alignment layers in STN screens vary from
90 deg to 270 deg, depending on the total amount of rotation of the liquid crystals
sandwiched between the layers.
The response time of the early passive LCD was around 350 msrather slow
for rapidly varying brightness levels and a fast-moving mouse/cursorwhich pro-
duced ghosting or smear effects. Evolutionary developments in LCD technology,
for example, lowering the viscosity of the liquid crystal material, permitted a faster
switching time between states, increasing contrast and reducing response time (to
around 150 ms by using the hybrid passive display technology). In monochrome
LCDs, the brightness level of the individual pixel area is controlled by varying the
voltage through a row-column addressing method; in color displays, three color
pixels are used in each row-column location [Fig. 6.7(a)]. This row-column ad-
dressing mode is limited because an N N display has (N 1) sneak paths around
a selected pixel [Fig. 6.7(b)] and because voltage applied to neighboring pixels
reduces contrast in a small neighborhood of the displayed image (crosstalk).
One way of reducing crosstalk12 is to drive the nonselected
columns
by V/b and
the nonselected rows by V/2b. For optimal design, b = N + 1 , and the ratio of
the rmsvoltage applied between the selected and nonselected columns is given
by = N + 1/ N 1 1 as N . This addressing problem and the slow
response time are partially overcome by dividing the screen into two halves and
scanning them separately. Though the problem of variable intensity across the
screen is not completely eliminated, by doubling the number of lines scanned/s,
the dual-scan super-twisted nematic (DSTN) display provides a sharper image. The
172 Chapter 6
Figure 6.7 (a) Passive-matrix display pixel addressing. (b) Sneak paths. (c) Addressing of
active-matrix display pixels. (d) Elements of one pixel in an active LCD display.12,13
problem of addressing and loss of contrast due to varying light levels at individual
pixel locations from the light guide distribution network is overcome by adding a
thin-film transistor (TFT) at each pixel location. In this active or TFT screen, one
row of pixels is selected by driving the corresponding transistor gates [ Fig. 6.7(c)].
TFT screens add to the manufacturing costs because they require three transistors
for each pixel location in a color display. However, TFT screens offer a uniform
brightness, increased viewing angle, and faster response (down to as low as 25
ms).12,13 Major limitations of these displays include the need for complex hardware
circuitry to uniformly distribute the backlight across the screen, and as much as
Imaging Hardware 173
50% loss of brightness as the light rays pass through the various layers and the
polarizing films [Fig. 6.7(d)].
CRT displays are emissive devices in that the three electron beams converge
on the coated phosphors behind the back of the display screen glass. For a sharp
image, all three beams must converge perfectly on the screen. The pixel intensity of
LCD panels depends on the transmissivity property of the liquid crystal molecules
behind the pixels, so it is not susceptible to imperfect convergence. However, a
limitation of backlit and transmissive displays is that the intensity of the individual
pixels varies with viewing angle in LCD devices, which reduces their overall
viewing angle compared with CRT displays. Some comparative figures are given
in Table 6.4.
6.3 Framegrabber1418
The generic name framegrabber describes the interface and data conversion
hardware between a camera and the host processor computer. In analog cameras,
the image sensor generates an analog signal stream as a function of the incident
light, and the onboard timing circuits convert the sensor signal into a composite
or RGB video signal. An analog framegrabber receives the video signal from
the analog camera and performs all onboard signal-conditioning, digitization, and
elementary processing operations. A digital camera is essentially an analog camera
with all framegrabber hardware packaged within the camera casing that outputs
a digital image stream. Framegrabbers in low-end/high-volume applications
(e.g., security and web cameras) generally contain the minimal hardware and a
memory store provided by first-in first-out (FIFO) buffers (Fig. 6.8). A FIFO
buffer is essentially a collection of registers that can be written onto and read out
simultaneously, provided that new input data does not overwrite the existing data.
An important property of the FIFO buffer is that it does not need to be emptied
before new data is added. A FIFO buffer, with its read-and-write capacity and
174 Chapter 6
Figure 6.8 (a) Analog camera. (b) Functional block diagram of basic analog framegrabber
hardware, including a multiplexer that reads camera outputs with different video formats.
operating with minimal attention from the host processor, can transfer image data
to the host almost as soon as it acquires them from the ADC (subject to its own
internal delay). Analog cameras are common in machine vision applications; as
some of the front-end electronics in analog framegrabbers are embedded within
digital cameras, an overview of analog framegrabber components is given in this
section.
Line-scan cameras are widely used in moving-target inspection systems, but
despite their superior pixel density and physical dimensions, the requirement for
relative motion between the camera and the target adds some complications to
the camera setup. For a constant scan rate (number of lines/second), the vertical
resolution is related to the target motion (the horizontal resolution is dictated by the
sensor resolution). This makes the vertical resolution finer at slow target speeds;
at higher speeds, the individual scan lines that make up the image may become
darker with insufficient exposure time. In most applications, an encoder is used as
part of the speed control system to synchronize target motion with the cameras
acquisition timing (Fig. 6.9).
By containing all analog electronics in a shielded case, digital cameras offer
enhanced noise immunity. The hardware architecture of a digital framegrabber is
comparatively simpler than that of an analog framegrabber. It usually contains an
application-specific IC (ASIC) or a field-programmable gate array (FPGA) for low-
level, real-time operations on the image data prior to transferring them to the host
(Fig. 6.10). Various types of serialized parallel data cables are used with digital
cameras, each with its own data throughput rate.1922 These include:
the RS-644 low-voltage differential signaling (LVDS) cable with 28 single-
ended data signals (converted to four datastreams) and one single-ended clock
(up to 1.8 Gbits/sec),
Imaging Hardware 175
Figure 6.9 Schematic configuration of a line-scan camera setup. The encoder measures
the target position and controls the camera trigger time to ensure that images are captured
as the target travels a fixed distance17 (courtesy of RVSI Acuity CiMatrix, Nashua, NH).
Figure 6.10 (a) Digital camera components. Generally a high-performance host computer
is required to make use of the higher data throughput of digital cameras. (b) Camera link
standard for camera-to-framegrabber connection. (c) Functional block diagram of National
Instruments NI-1428 digital framegrabber with a channel link capable of a sustained data
rate of 100 MB/sec; 28 bits of data and the status are transmitted with four pairs of wire, while
a fifth pair is used to transmit clock signals (compared to the 56 wires used in the RS-644
LVDS).14 CC: command and control channels use the same protocols as serial ports. MDR:
miniature delta ribbon. RTSI: real-time synchronization information. DMA: direct memory
address.
Imaging Hardware 177
Figure continued on next page.
Figure 6.11 (a) Functional blocks in a conceptual framegrabber with full-frame buffer
memory and an onboard frame processor.7,8 (b) Diagrams of the dedicated hardware or
(c) general-purpose digital signal processor used in many commercial framegrabber boards
(adapted from the DT2858 block diagram, courtesy of Data Translation, Marlboro, MA). The
dedicated hardware blocks perform a wide variety of tasks; four sub-blocks are included in
(b) for illustration.15,16
178 Chapter 6
and USB2 (up to 480 Mb/sec). USB2 is a four-wire cable (one pair for differential
data, and one pair for power and ground) for half-duplex transfer. By adding one
pair each for differential receive and transmit data (a total of eight wires in the
connection cable), the USB3 implements the full bidirectional data communication
protocol, resulting in an increased bandwidth, and a ten-fold improvement in data
transfer rate (up to 5 Gb/sec, design specification, November 2008).
In conventional instrumentation terms, framegrabber hardware acts as the
signal-conditioning and data-conversion unit with memory to store one or more
image frames. Thus, its specification plays a critical role in the overall performance
of a machine vision system. The input stage of a commercial framegrabber
card consists of an analog preprocessing block and a timing-and-control block
[Fig. 6.11(a)]. The analog front end picks up the analog video signal while the
sync stripper extracts the timing pulses to drive the digital modules within the
framegrabber hardware.
Imaging Hardware 179
needs to be able to reset and resynchronize with the new video stream. (In this
context, resetting refers to abandoning the current operation, and resynchronization
implies detection of the horizontal sync pulses in the incoming image.) For this
reason, crystal-controlled digital clock synchronization is more appropriate for
resettable cameras than PLL-driven clock generators. Framegrabbers with digital
pixel clocks are able to resynchronize immediately to the first field after being
reset.18 For maximum image integrity (zero jitter), a digitally generated clock is
shared between the camera and the framegrabber so the framegrabber does not need
to extract the clock signal from the video sync signal. To avoid propagation delays,
unnecessary travel of the clock signal is eliminated by generating the pixel clock
in the camera and transmitting it to the framegrabber along with the video signal
cable. Excluding delays associated with line drivers and image sensor circuitry, the
propagation delay is estimated to be around 5.9 ns/m for typical camera cables.
Figure 6.12 (a) Address generation from video line number. (b) Memory mapping in an
8-bit LUT, where (s , r ) denotes 1 byte of data and its address location.7
operations in real time. Machine vision framegrabbers usually have two ALU
and LUT combinations around the frame store buffer (Fig. 6.11). By using them
independently, the user is able to transform image brightness by simply storing the
appropriate gray-level transformation map in the LUT RAM. In image processing,
the address bus of the RAM is used as the image input brightness, and the data
bus is connected to the image output; the size of the RAM is equal to the image
brightness resolution (256 words for 8-bit gray-level resolution). The memory
cycle time must be less than the pixel clock for real-time implementation of a LUT.
A key advantage of having two separate LUTs is that the output LUT can be used
for the sole purpose of modifying the image for display per a predefined intensity
map without affecting the data in the buffer store, which can be used as part of the
onboard processing operations.
is very short, there is a potential for conflict if both the RAM and the SAM ports
demand access to the memory body at the same time. This conflict is prevented
by giving priority to uninterrupted video input and image display operations. The
SAM and RAM ports have different cycle time requirements. For the SAM port, the
read/write operation effectively involves shifting and latching. Because the RAM
port is connected to either the host processor or the dedicated onboard processor
hardware, its cycle time is related to the time required by the data to reach the
destination (data-transfer time). To optimize performance, the usual practice is to
use a zero-wait state dual-ported memory module as the image frame buffer. The
wait state refers to the period (in clock cycles) during which a bus remains idle
due to a mismatch between the access times of different devices on the bus. Wait
states (in clock cycles) are inserted when expansion boards or memory chips are
slower than the bus. A zero-wait-state memory permits the processor to work at
its full clock speed, regardless of the speed of the memory device. The size of the
memory body (image store) is usually a multiple of the image frame dimension;
one image frame with a spatial dimension of 512 512 and an 8-bit gray-level
resolution requires 262 KB.
Figure 6.13 2D sampling and a sampling grid of an image frame. The voltage levels
shown correspond to CCIR. For CCIR, xD = 6.5 s; the corresponding value for NTSC
is xD = 6.575 s. The analog signal voltage levels, 0.3 V and 0.7 V are CCIR standard
values (Sec. 6.1). The corresponding voltage levels for NTSC are 0.286 V and 0.714 V.
These are the highest spatial resolutions of NTSC/CCIR signals. The older
generation of TV-monitor-based displays had more pixels (768567, for example).
They did not provide higher resolution but did capture wider views.
The interpixel distance along the y axis is given by
y = ny s , (6.2b)
where n is a positive integer; n = 1 when the whole image frame (i.e., the odd
and even fields together) is sampled, and n = 2 when the two fields are sampled
separately.
The highest spatial resolution that can be achieved in the two standards is
512 512 pixels. For this resolution, the whole image frame is sampled (n = 1).
Interpixel separation of a square sampling grid along the x and along the y axis is
then derived as
3 52.59 s
= 81.33 ns for NTSC,
4 485
y = x =
(6.2c)
3 52.00 s
= 67.83 ns for CCIR.
4 575
184 Chapter 6
If the odd and the even fields are used for producing one image frame each (n = 2)
and sampled separately, the image acquisition rate increases to 60 or 50 fps at the
expense of increased interpixel separation of x = 2y s The pixel clock rate then
reduces to
f pixel = 6.15 MHz for NTSC, (6.4)
7.37 MHz for CCIR.
Digitization of the video signal starts at the beginning of a field marked by the
vertical sync signal. For a 512-line operation and assuming an even field, the video
counter that registers the number of samples taken per horizontal line is reset to
zero. The first 11 lines of video are ignored. When the twelfth horizontal sync is
detected, digitization is initiated by resetting the horizontal pixel counter (to count
down from 512) and then timing out by 6.575 s in NTSC (6.5 s in CCIR) for a
1:1 aspect ratio. After this timeout period, sampling begins at the pixel clock rate
until the correct number of samples has been taken. Each sampled value is stored
as an image data intensity with the ADC resolution (typically 8 bits, but possibly as
high as 12 bits in some hardware). After the last sample has been taken, horizontal
pixel counting and sampling stop until the next vertical sync (next field, odd in this
case) arrives and the entire process is then repeated, i.e., discounting of the first 11
lines, timing out for a 1:1 aspect ratio, and sampling for the next 512 points in each
new horizontal line. If the resolution is 256 256, the sampling process takes place
after alternate vertical sync signal, thereby capturing only the odd or the even field.
Because of the fast clock rates, all framegrabbers use a flash AD (or video)
converter (conversion time is one clock period). The front-end analog block in
Fig. 6.11(a) conditions the video input signal, samples it at the pixel clock fre-
quency, quantizes it with the ADC resolution, and puts a video datastream into the
dedicated image-processing block. A reverse process takes place at the back end to
convert the processed datastream into an analog video signal for display. Since any
variation in the input or output clock frequencies will create horizontal distortion,
the ADC and the DAC clocks are driven by the same pixel clock. Table 6.5 lists
some of the key parameters in the specifications of a machine vision framegrabber.
Table 6.5 Typical specification list of a PC-based monochrome machine vision framegrab-
ber (courtesy of CyberOptics and Imagenation, Portland, OR).
Specification feature Parameters
Bus and image capture (form factor) PCI bus-master, real-time capture
Composite video inputs Monochrome, RS-170 (NTSC), CCIR (PAL)
Up to four video inputs (switch or trigger)
Video format Interlace, progressive scan, and resettable
Analog front end Programmable line offset and gain
Image resolution NTSC: 640 480 pixels (768 486 max)
CCIR: 786 576 pixels
ADC resolution 8-bit ADC, 256 gray-level resolution
LUTs 256-byte programmable I/O LUTs
Onboard memory * 8-MB FIFO
Onboard processing Typically none for mid-range framegrabbers
Acquisition rate ** Typically 25 MHz
Display Typically none in mid-range framegrabber
Sampling jitter 2.6 ns with 1-line resync from reset;
0 with pixel clock input
Video noise 0.5 least significant bit (LSB)
External trigger Optically isolated or transistortransistor logic (TTL)
Strobe and exposure output One strobe and two exposure pulses (up to 59.99 min)
Digital I/O Four TTL inputs and four TTL outputs
Flexible memory Through scatter-gather technology
Image information Image stamp with acquisition status information
Framegrabber power requirement +5V, PCI, 700 mA
Camera power requirement +12 V, 1 A for up to four cameras
Operating system Windows 98/98SE/2000/ME, NT4, XP
Programming language supported Visual C/C++
*
Commercial framegrabbers offer more than 128 MB of onboard memory.
**
For digital framegrabbers, the acquisition rate is given in Mbits/sec.
mode is equal to one frame transfer time [Fig. 6.14(a)]. Because of the limited
time, the continuous mode is used for offline measurement or analysis as part of
a statistical quality control process. In some very high-performance systems with
custom-built onboard processing hardware, a limited amount of online tasks may
be performed at the expense of missing a certain number of intermediate frames.
In applications that require particular image characteristicsfor example,
high-contrast images taken in low ambient lighting or variable contrast in the
target objectsa capture command/trigger is added to allow for a programmable
exposure time. In this pseudo-continuous (pseudo-synchronous) mode, capture
latency is increased because the camera outputs an image at a rate equal to the
exposure time plus the frame transfer time [Fig. 6.14(b)]. For a moving target,
interlaced cameras may produce an offset between the odd and even field images
due to the one-half frame delay at the start of scanning (uneven vertical edges or
motion tear). Motion tear for a target moving at a constant speed (tear ) may be
estimated18 by Eq. (6.5) (in pixel units):
pixel pixels per scan line
tear = target velocity field time . (6.5)
horizontal field of view
Figure 6.14 Continuous modes of camera operation: (a) without an external trigger and
(b) with an external trigger for exposure19 (courtesy of Matrox, Dorval, QC, Canada).
Imaging Hardware 187
pixel latency
position = V part T acq . (6.6)
Variations or uncertainty in the location of the target parts in the FOV may
require either closed-loop control of the image acquisition timing or a larger FOV.
Equation (6.6) may be used as a design basis in both cases. The former requires
additional hardware, while the latter reduces the pixel resolution in the captured
image. One option is to optimize the FOV area with respect to the statistically
188 Chapter 6
Figure 6.15 Capture cycle sequences for (a) resettable and (b) asynchronous operations19
(courtesy of Matrox, Dorval, QC, Canada).
collected data on positional uncertainty pixel
position
uncertainty
within the target scene
for a given image resolution. The other option is to assign an acquisition latency
time and then compute the limit on positional variation for a given part velocity.
If the target is in motion during exposure, the captured image is likely to be
blurred. When the image of a target part moving at an axial velocity of V part is
captured with an exposure time of T exp , the magnitude of the image blur in units
of pixels is given by18
pixel number of pixels in the axial direction within the FOV
blur = V part T exp .
axial width of the FOV
(6.7)
The practical way of reducing blur is to slow down the target motion or to
reduce the exposure time. Since the target scene or target part speed may not be
Imaging Hardware 189
Figure 6.16 Control mode operation19 (courtesy of Matrox, Dorval, QC, Canada).
the camera scanning cycle is used to generate the strobe signals. The intensity
of the strobe light is usually related to the pulse frequency, which ranges from
1.25 to 4 MHz. Since strobe signals are generally very intense, care is required
to ensure that the average intensity of the pulsed illumination is comparable with
the ambient lighting of the target scene. An alternative to a high-intensity strobe
light is an electronic shutter (typically with a 20-s duration) that is controlled
by either the camera or the cameraframegrabber interface. The signals required
to trigger the exposure and strobe as well as the vertical and horizontal controls
(known as the genlock signals) are normally grouped under digital I/O lines in
the framegrabber specification. Typically, eight digital I/O lines are included in
commercial machine vision framegrabber cards. Genlocking connects multiple
cameras to a single framegrabber, which ensures identical video timing as cameras
are sequentially switched into the video input stage.
Figure 6.17 Image transfer from framegrabber to host through PCI bus. The timing pulses
illustrate bus latency and the loss of a framegrabber image during processing by the host.18
Imaging Hardware 191
between the PCs central processing unit (CPU) and memory. In some cases, the
motherboard RAM may not support the full PCI peak rate. Since the video input
from the image sensor is usually received at a constant rate, the video memory
acts as the buffer to accommodate bus sharing (control signals and memory data)
with other plugged-in devices, including multiple cameras. If the framegrabber in
an overloaded bus has insufficient video memory, the captured image data may be
lost or corrupted. Using dual-ported memory or FIFO buffers and scatter-gather
capability, PCI bus master devices can operate without the onboard shared memory
arrangement indicated earlier. (A bus master allows data throughput from the
external memory without the CPUs direct involvement. The scatter-gather feature
ensures that the image data received at the destination memory is contiguous.)
For high-speed applications, memory access latency may be reduced by
transferring the captured image to the host memory with the framegrabber
hardware operating as a PCI bus-master and managing the transfer itself. This
permits the host to handle the processing tasks using the captured image data. To
remove the need for data access through addressing, many framegrabbers use large
FIFO buffers that are capable of storing multiple image frames. In this operation,
the framegrabber issues an interrupt at the end of each frame transfer so that the
host CPU can proceed with its processing operations on the latest frame. In this
case the transfer latency is the time period between the image data transfer from the
camera and the conclusion of the framegrabbers end-of-frame interrupt servicing.
Data movement during the PCI bus transfer occurs in blocks during the time when
a target image is being captured from the camera, so the scatter-gather feature of
the PCI bus-master becomes relevant. When an application requests a block of
memory to hold image data (for example, in Pentium PCs, memory is available
as a collection of 4-KB pages), the required (logical) memory made available by
the operating system may not necessarily be physically contiguous. With scatter-
gather capability, the software driver for the board loads up a table to translate the
logical address to a physically contiguous address in the memory. In the absence
of scatter-gather capability, either the application software must ensure that the
destination memory is contiguous, or a software driver must be used to convert the
logical addresses issued by the processor to contiguous physical addresses in the
memory space. Accelerated graphics processor (AGP) slots have made it possible
to access the host RAM at a very high bandwidth without any framegrabber FIFO.
The image acquisition latency with an AGP is equal to the latency with the end-of-
frame interrupt servicing from the framegrabber.
Figure 6.18 Configuration of a part-dimension measuring setup (all parameters are given
in millimeters).
Image capture command: time interval between the vision systems receipt of the
capture signal and the actual start of image capture.
Strobe/shutter trigger: time between the start of image acquisition and the start of
a strobe pulse or opening of the shutter.
Exposure time: time required by the vision illumination system (e.g., pulsed light)
to create an exposure. In steady light, the exposure time corresponds to the
cameras exposure time.
Video transfer: time required to transfer a video image from the camera to the
framegrabber.
Transfer to host: time elapsed between the end of the video transfer to the end of
the image data transfer from the framegrabber to the host CPU. The time elapsed
is contingent on other devices competing with the framegrabber to communicate
with the host.
Image data processing: time taken by the host CPU to complete the assigned
processing tasks on a captured image frame. This latency is dependent on the
complexity of the image content and other demands on the processor resources. For
a given algorithm, this time may be computed from the host processors parameters.
Resynchronization: In all image-processing work, the processing time is closely
related to image content. A very efficient and structured algorithm or codes
may lead to a constant execution time, but various uncertainties within the
complete vision system may not permit a guaranteed cycle time for a given set
of numerical operations on the captured image data. A more pragmatic approach is
to resynchronize the processed results by tagging a time stamp on each input image.
A time stamp, which need not correspond to the actual time, is a sequential record
of the receipt of incoming images with respect to a time base, perhaps from the
operating system. This time-tag stamp remains with the image as it is processed and
placed on the output queue. Resynchronization of the processed results is achieved
by placing the outputs in the sequential order of their time stamps.
Imaging Hardware 193
Time base: One time-based interval is added if the incoming image is time-tagged.
Output activation: time interval between the end of image processing (or
resynchronization) and the final event within the vision system. This term includes
all mechanical delays, processing overheads, and the signal propagation lag.
While not all of the above factors may be present in a specific vision system, they
are related, and it is useful to refer to them when deriving the system specifications
for a given application. For the setup in Fig. 6.18, the first parameter to estimate
is the FOV. Assuming a resolution of 4 pixels/mm in the captured image (i.e., a
2-pixel minimum detectible feature size), FOVH = 1000 and FOVV = 800. If both
FOVs are rounded up to 1024 1024 pixels, the FOV = 256 mm 256 mm. If the
image blur is limited to 1 pixel and the motion tear to 1 mm (4 pixels), [combining
Eqs. (6.5) and (6.7)], then
The parameters listed in Table 6.7 indicate an image capture and processing
subtotal range of 179 to 206 ms or an uncertainty of 27 ms, which corresponds to a
linear distance of 5.4 mm or 20 pixels. This value may be improved through further
iterations and changes in the characteristics of the vision system. (Optimization for
latency parameters is an application-specific task.)18,20
6.5 Resolution2024
The captured image is the primary source of all processing operations, so the
quality of the processed image data is closely tied to the characteristics of the input
image data. In this respect, resolution is a key feature in the quantification of the
input image data.
Figure 6.19 Parameters for FOV computation. The solid FOVangle subtended by 1 pixel
is obtained by replacing the pixel width with its diameter (= 2w for square pixels). The
addition of a lens extension moves the image capture plane farther from the objective.
Adding the FOV dimensions and DHf ormat and DVf ormat , the horizontal and vertical
dimensions of the sensor format are derived as
FOVV = x2 |max =
1
xi | =
1 f ormat
M max
D
M V
. (6.9b)
1 1 f ormat
FOVH = y2 |max = yi | = D
M max M H
Also from Eq. (3.6a),
1 1 1 1 1
= + = 1+
f z1 z2 z2 M
1
z2 = 1 + f
, (6.9c)
M
FOV
= 1 + f ormat f
D
where stands for the horizontal or vertical dimensions. For a given square pixel
width w, the horizontal FOV for one pixel is given by [without extension tube
(Fig. 6.19)]
1 w
pixel = 2 tan . (6.9d)
2f
Most standard CCTV lenses do not focus below 500 mm (focal lengths of
standard CCTV lenses are 8, 12.5, 16, 25, and 50 mm), so the available lens and
object distances in some applications may not permit perfect focusing on the focal
plane. In such cases, an extension tube is added to move the lens away from the
196 Chapter 6
and
Lext = z1 f = M f. (6.10b)
Table 6.8 Combinations of applications and lenses. A telecentric lens collimates incoming
light with reduced shadows at the expense of a narrow FOV.
Lens Type
Application CCTV lens with Telecentric lens Zoom lens 35-mm standard
C/CS mount * photographic lens
Defect detection
Refractive defect detection
Gauging of thick objects
(variable depth)
High magnification
inspection
Alignment
Part recognition
Optical character reading
Pattern matching
Flat-field gauging
High resolution gauging
and other applications
Surveillance
Optical feature Medium to Constant viewing Low Very low distortion
high image angle over FOV and distortion
distortion large depth of field
Overall optical Poor to good Fair to excellent; Good to Good to excellent
performance performance excellent
improves with cost
Relative cost Low High Mid-range Low to high
to high
*
Parameters related to C and CS lens mounts are listed in Table 3.3.
Imaging Hardware 197
where L pH L pV is the nominal dimension of the target part, and Lp and Lp are
their likely variations as parts arrive within the FOV. These figures are generally
given in the application specifications. The alignment parameters Falignment are
necessary to ensure that the FOV can encompass all likely variations in part size
with a reasonable degree of reliability (e.g., the part edges should not be too close
to the FOV boundary). Camera alignment is a major task in any vision system
installation, so an element of judgment is used to choose Falignment (a typical figure
is 10% around the nominal FOV). Spatial resolution is derived as the FOV length
along the vertical (or horizontal) direction divided by the pixel resolution of the
image sensor in the corresponding direction in units of mm/pixel (or inch/pixel).
Allowing dimensional tolerance limits of 5% along both sides of an oblong
target part with nominal dimensions of 10 mm 5 mm and Falignment = 10%, from
Eq. (6.11), an FOV of 12.1 mm 6.05 mm preserves the target objects aspect ratio.
However, if this image is to be captured on a standard TV camera with an aspect
ratio of 4:3, the FOV must be made larger in one direction for square image capture.
If this FOV is to be captured by a sensor with a pixel resolution of 512 512, the
spatial resolution in the captured image becomes 236 m/pixel 118 m/pixel.
The camera alignment figures and camera location are thus dependent on the
lens/optics parameters as well as the image format.
Feature size is the linear dimension of the smallest object to be captured with a
reasonable degree of reliability. The analog video image is sampled in the capturing
process, and the Nyquist sampling theorem is used to provide the theoretical lower
limit, which is given as two pixels. For analogy with another capturing activity,
the smallest fish that a fishing net of uniform mesh size can catch is twice the nets
mesh diameter. Thus, for the above pixel resolutions, the theoretically detectable
feature size is 572 m 236 m. However, allowing for the presence of noise in
any video signal, a size of three or four image pixels is considered more realistic
to recover a one-pixel-size feature in the object space with good contrast. Upward
adjustments are made for poor contrast and low SNRs. Although a captured image
has a resolution of one pixel, interpolation may be used to create subpixel accuracy
through numerical operations. While the achievable level of subpixel resolution
is limited by the accuracy of the numerical algorithms and computations and the
characteristics of the processor, 0.1 pixel is usually taken to be a realistic limit in
vision-based measurements. This limit in turn dictates the measurement accuracy
that may be expected from a given hardware setup.
In traditional metrology, measurement accuracy is better than the tolerance
by a factor of 10, with the measurement instruments resolution (the smallest
measurable dimension) being 10 times better than the measurement accuracy.
198 Chapter 6
Thus, the measurement resolution is 100 times better than the components
dimensional tolerance. Although much of the human error is eliminated in machine
vision systems, a ratio of 1:20 for part dimension to measurement resolution is
considered more realistic than 1:100.
Table 6.9 and Eq. (6.12) do not include temporal variations, but these variations,
along with a few nominal parameters chosen by the designer, are adequate to
Table 6.9 Resolution parameters commonly used in machine vision applications (uncer-
tainties such as shock and vibration are excluded). The notation is used in subscripts to
indicate the horizontal or vertical directions.
Specification parameter Notation Source
For a circular part, the resolutions and feature sizes are the same along the
horizontal and vertical directions.
While the amount of data that can be processed by a vision system depends on
several factors, including bus transfer capacity and latency, a figure of 107 pixel/sec
is considered to be typical in a mid-range PC-based machine vision system.
Applications with over 108 pixel/sec may require onboard vision processors along
with high-performance host workstations. For reference, data rates in various
standards are CCIR: 11 MB/sec; RS-170: 10.2 MB/sec; and line-scan cameras:
15 MB/sec.
Figure 6.20 shows the primary sources of noise inherent in the sensing
mechanism. Dark current is due to Si impurities and leads to the buildup of
thermally generated photons (hot pixels) during the integration time. This type of
noise is not separable from photon noise and is generally modeled with a Poisson
distribution (see Table 5.6).3 The level of thermal noise may be reduced with a
shorter integration time and cooling. In thermally cooled cameras, dark current
may be reduced by a factor of 2 for every 6 C reduction in temperature. Air-cooled
cameras are susceptible to ambient humidity; cooling below 4 C requires a vacuum
around the sensing element. Some IR image sensors are built to run at temperatures
around 40 C using Peltier elements, which in turn are cooled by air or liquid
(e.g., ethylene glycol). In cooled slow-scan cameras, the noise floor is taken as the
readout noise.
Some image sensors are designed to operate in multiphase pinning mode, where
a smaller potential well size is used to lower the average dark current (at the
expense of quantum efficiency). A range of commercial devices estimate the level
of dark current from the output of calibration pixels (masked photosites around
the sensor edges) and subtract it from the active pixel output to increase the
overall dynamic range. Readout noise is primarily due to the on-chip electronics
200 Chapter 6
Figure 6.20 Image sensor noise: (a) locations of noise introduction in the signal flow and
(b) illustration of noise effects on the signal levels. Photon (shot) noise is independent of the
generated signal. Reset (or kTC) noise represents the uncertainty in the amount of charge
remaining on the capacitor following a reset. Amplifier noise (or 1/ f noise) is an additive
white noise that can be reduced by correlated double sampling (Sec. 5.4). A summary of
sensor noise definitions is given in Table 5.6.
and is assumed to be an additive noise affected by the readout rate. For readout
clock rates below 100 kHz, readout noise is taken to be constant; for higher
rates, this is modeled as a Gaussian distribution function of the signal intensity.
Quantization noise, the roundoff error due to the finite number of discrete levels
available in the video ADC, is taken as 1 LSB; for the commonly used 8-bit
ADC, 1 LSB = 1/28 1 = 1/255 = 0.4% of full-scale resolution (FSR), or 48 dB.
Imaging Hardware 201
In applications that require a high dynamic range, the ADC width is matched by
the lowest signal level to be detected, the well capacity, and the quantum efficiency.
A smaller well size corresponds to lower quantum efficiency and less blooming
(Sec. 5.6.2).
In addition to sensor noise, captured image scenes may have measurement errors
due to nonuniform illumination and shading. If the level of dark current is known,
one way of reducing some of these errors is to calibrate (normalize) the target
image with respect to its background using Eq. (6.14):
where gdark () and gbackground () are the brightness levels associated with the dark
current and the background around the target image; gcaptured () is the intensity in
the captured image; and G is a scaling factor. Although the end result is a high-
contrast image,24 this flat-field correction may not always be convenient due to the
added computational overhead and the need for two additional image frames for
each target scene.
Although the dynamic range is an important factor in quantifying a sensors
ability to retain intensity levels during image capture, the ability of the machine
vision hardware (lens and image sensor) to reproduce the spatial variation of
intensity is also critical in image-based measurement applications. The spatial
characteristics of any optical system are contained in their MTFs, which are
considered in Chapter 7.
References
1. S. Hecht, S. Shlaer, and M. H. Pirenne, Energy, quanta and vision,
J. General Physiology 25, 819840 (1942).
2. K. B. Benson, Television Engineering Handbook, McGraw-Hill, NY (1992).
3. G. C. Holst, CCD Arrays, Cameras, and Displays, Second ed., SPIE Press,
Bellingham, WA (1998).
4. G. A. Baxes, Digital Image Processing: A Practical Primer, Prentice Hall,
Englewood Cliffs, NJ (1984).
5. G. A. Baxes, Digital Image Processing: Principles and Applications, John
Wiley & Sons, New York (1994).
6. P. K. Sinha and F.-Y. Chen, Real-time hardware for image edge thinning
using a new 11-pixel window, in Communicating with Virtual Worlds,
N. M. Thalmann and D. Thalmann, Eds., Springer Verlag, Berlin/Heidelberg,
pp. 508516 (1993).
7. F.-Y. Chen, A Transputer-based Vision System for On-line Recognition, PhD
thesis, University of Reading, UK (1993).
202 Chapter 6