Seth Hutchinson
Department of Electrical and Computer Engineering
The Beckman Institute for Advanced Science and Technology
University of Illinois at Urbana-Champaign
405 N. Mathews Avenue
Urbana, IL 61801
Email: seth@uiuc.edu
Greg Hager
Department of Computer Science
Yale University
New Haven, CT 06520-8285
Phone: 203 432-6432
Email: hager@cs.yale.edu
Peter Corke
CSIRO Division of Manufacturing Technology
P.O. Box 883,
Kenmore. Australia, 4069.
pic@brb.dmt.csiro.au
May 14, 1996
Abstract
This paper provides a tutorial introduction to visual servo control of robotic manipulators.
Since the topic spans many disciplines our goal is limited to providing a basic conceptual frame-
work. We begin by reviewing the prerequisite topics from robotics and computer vision, including
a brief review of coordinate transformations, velocity representation, and a description of the
geometric aspects of the image formation process. We then present a taxonomy of visual servo
control systems. The two major classes of systems, position-based and image-based systems, are
then discussed. Since any visual servo system must be capable of tracking image features in a
sequence of images, we include an overview of feature-based and correlation-based methods for
tracking. We conclude the tutorial with a number of observations on the current directions of
the research eld of visual servo control.
1 Introduction
Today there are over 800,000 robots in the world, mostly working in factory environments. This
population continues to grow, but robots are excluded from many application areas where the
work enviroment and object placement cannot be accurately controlled. This limitation is due
to the inherent lack of sensory capability in contempory commercial robot systems. It has long
been recognized that sensor integration is fundamental to increasing the versatility and application
domain of robots but to date this has not proven cost eective for the bulk of robotic applications
which are in manufacturing. The `new frontier' of robotics, which is operation in the everyday
world, provides new impetus for this research. Unlike the manufacturing application, it will not be
cost eective to re-engineer `our world' to suit the robot.
Vision is a useful robotic sensor since it mimics the human sense of vision and allows for non-
contact measurement of the environment. Since the seminal work of Shirai and Inoue 1] (who
describe how a visual feedback loop can be used to correct the position of a robot to increase
task accuracy), considerable eort has been devoted to the visual control of robot manipulators.
Robot controllers with fully integrated vision systems are now available from a number of vendors.
Typically visual sensing and manipulation are combined in an open-loop fashion, `looking' then
`moving'. The accuracy of the resulting operation depends directly on the accuracy of the visual
sensor and the robot end-eector.
An alternative to increasing the accuracy of these subsystems is to use a visual-feedback control
loop which will increase the overall accuracy of the system | a principal concern in any application.
Taken to the extreme, machine vision can provide closed-loop position control for a robot end-
eector | this is referred to as visual servoing. This term appears to have been rst introduced
by Hill and Park 2] in 1979 to distinguish their approach from earlier `blocks world' experiments
where the system alternated between picture taking and moving. Prior to the introduction of this
term, the less specic term visual feedback was generally used. For the purposes of this article, the
task in visual servoing is to use visual information to control the pose of the robot's end-eector
relative to a target object or a set of target features.
Since the rst visual servoing systems were reported in the early 1980s, progress in visual control
of robots has been fairly slow but the last few years have seen a marked increase in published
research. This has been fueled by personal computing power crossing the threshold which allows
analysis of scenes at a sucient rate to `servo' a robot manipulator. Prior to this, researchers
required specialized and expensive pipelined pixel processing hardware. Applications that have
been proposed or prototyped span manufacturing (grasping objects on conveyor belts and part
mating), teleoperation, missile tracking cameras and fruit picking as well as robotic ping-pong,
juggling, balancing, car steering and even aircraft landing. A comprehensive review of the literature
in this eld, as well the history and applications reported to date, is given by Corke 3] and includes
a large bibliography.
Visual servoing is the fusion of results from many elemental areas including high-speed image
processing, kinematics, dynamics, control theory, and real-time computing. It has much in common
with research into active vision and structure from motion, but is quite dierent to the often de-
scribed use of vision in hierarchical task-level robot control systems. Many of the control and vision
2
problems are similar to those encountered by active vision researchers who are building `robotic
heads'. However the task in visual servoing is to control a robot to manipulate its environment
using vision as opposed to passively or actively observing it.
Given the current interest in this topic it seems both appropriate and timely to provide a
tutorial introduction to this topic. We hope that this tutorial will assist researchers by providing
a consistant terminology and nomenclature, and assist others in creating visually servoed systems
and gaining an appreciation of possible applications. The growing literature contains solutions and
promising approaches to many theoretical and technical problems involved. We have attempted
here to present the most signicant results in a consistant way in order to present a comprehensive
view of the area. Another diculty we faced was that the topic spans many disciplines. Some issues
that arise such as the control problem, which is fundamentally nonlinear and for which there is not
complete established theory, and visual recognition, tracking, and reconstruction which are elds
unto themselves cannot be adequately addressed in a single article. We have thus concentrated on
certain fundamental aspects of the topic, and a large bibliography is provided to assist the reader
who seeks greater detail than can be provided here. Our preference is always to present those ideas
and techniques which have been found to function well in practice in situations where high control
and/or vision performance is not required, and which appear to have some generic applicability. In
particular we will describe techniques which can be implemented using a minimal amount of vision
hardware, and which make few assumptions about the robotic hardware.
The remainder of this article is structured as follows. Section 2 establishes a consistent nomen-
clature and reviews the relevant fundamentals of coordinate transformations, pose representation,
and image formation. In Section 3, we present a taxonomy of visual servo control systems (adapted
from 4]). The two major classes of systems, position-based visual servo systems and image-based
visual servo systems, are discussed in Sections 4 and 5 respectively. Since any visual servo sys-
tem must be capable of tracking image features in a sequence of images, Section 6 describes some
approaches to visual tracking that have found wide applicability and can be implemented using a
minimum of special-purpose hardware. Finally Section 7 presents a number of observations, and
about the current directions of the research eld of visual servo control.
3
2.1 Coordinate Transformations
In this paper, the task space of the robot, represented by T , is the set of positions and orientations
that the robot tool can attain. Since the task space is merely the conguration space of the robot
tool, the task space is a smooth m-manifold (see, e.g., 5]). If the tool is a single rigid body moving
arbitrarily in a three-dimensional workspace, then T = SE3 = <3 SO3 , and m = 6. In some
applications, the task space may be restricted to a subspace of SE3 . For example, for pick and
place, we may consider pure translations (T = <3 , for which m = 3), while for tracking an object
and keeping it in view we might consider only rotations (T = SO3 , for which m = 3).
Typically, robotic tasks are specied with respect to one or more coordinate frames. For ex-
ample, a camera may supply information about the location of an object with respect to a camera
frame, while the conguration used to grasp the object may be specied with respect to a coordi-
nate frame at the end-eector of the manipulator. We represent the coordinates of point P with
respect to coordinate frame x by the notation x P. Given two frames, x and y , the rotation matrix
that represents the orientation of frame y with respect to frame x is denoted by x Ry . The location
of the origin of frame y with respect to frame x is denoted by the vector x ty . Together, the position
and orientation of a frame are referred to as its pose, which we denote by a pair x xy = (x Ry x ty ).
If x is not specied, the world coordinate frame is assumed.
If we are given y P (the coordinates of point P relative to frame y ), and x xy = (x Ry x ty ), we
can obtain the coordinates of P with respect to frame x by the coordinate transformation
x
P = xRy y P + x ty (1)
= x xy y P: (2)
Often, we must compose multiple poses to obtain the desired coordinates. For example, suppose
that we are given poses x xy and y xz . If we are given z P and wish to compute x P, we may use the
composition of transformations
x
P = xxy yP (3)
= x xy y xz z P (4)
= x xz z P (5)
where
x
xz = (xRy y Rz xRy ytz + x ty ) (6)
Thus, we will represent the the composition of two poses by x xz = x xy y xz . We note that
the operator is used to represent both the coordinate transformation of a single point and the
composition of two coordinate transformations. The particular meaning should always be clear
from the context.
4
In much of the robotics literature, poses are represented by homogeneous transformation ma-
trices, which are of the form
"x #
xT
y = Ry xty : (7)
0 1
To simplify notation throughout the paper, we will represent poses and coordinate transformations
as dened in (1). Some coordinate frames that will be needed frequently are referred to by the
following superscripts/subscripts:
When T = SE3 , we will use the notation xe 2 T to represent the pose of the end-eector
coordinate frame relative to the world frame. In this case, we often prefer to parameterize a
pose using a translation vector and three angles, (e.g., roll, pitch and yaw 6]). Although such
parameterizations are inherently local, it is often convenient to represent a pose by a vector r 2 <6,
rather than by xe 2 T . This notation can easily be adapted to the case where T SE3 . For
example, when T = <3 , we will parameterize the task space by r = x y z ]T . In the sequel, to
maintain generality we will assume that r 2 <m , unless we are considering a specic task.
5
This can be written concisely in matrix form by noting that the cross product can be represented
in terms of the skew-symmetric matrix
2 3
0 ;z y
sk(P) = 64 z 0 ;x 75
;y x 0
allowing us to write
P_ = ;sk(P ) + T: (12)
Together, T and dene what is known in the robotics literature as a velocity screw
2T 3
66 Txy 77
6 77
r_ = 666 !Txz 77 :
64 ! 75
y
!z
Note that r_ also represents the derivative of r when the angle parameterization is chosen to be the
set of rotations about the coordinate axes (recall that r is a parameterization of xe).
Dene the 3 6 matrix A(P) = I3 j ; sk(P)] where I3 represents the 3 3 identity matrix.
Then (12) can be rewritten in matrix form as
P_ = A(P)_r (13)
Suppose now that we are given a point expressed in end-eector coordinates, e P: Combining
(1) and (13), we have
P_ = A(xe eP)_r (14)
Occasionally, it is useful to transform velocity screws among coordinate frames. For example,
suppose that e r_ = e Te ] is the velocity of the end-eector in end-eector coordinates. Then the
equivalent screw in base coordinates is
" # " #
r_ = T = Re e
ReeT ; e te :
6
1 Y
Im
ag
ep
X lan
e
P=(x,y,z)
(u,v)
Z
image
view point object
corresponding to an image plane point. This information may come from multiple cameras, multiple
views with a single camera, or knowledge of the geometric relationship between several feature
points on the target. In this section, we describe three projection models that have been widely
used to model the image formation process: perspective projection, scaled orthographic projection,
and ane projection. Although we briey describe each of these projection models, throughout
the remainder of the tutorial we will assume the use of perspective projection.
For each of the three projection models, we assign the camera coordinate system with the x-
and y -axes forming a basis for the image plane, the z -axis perpendicular to the image plane (along
the optic axis), and with origin located at distance behind the image plane, where is the focal
length of the camera lens. This is illustrated in Figure 1.
Perspective Projection. Assuming that the projective geometry of the camera is modeled by
perspective projection (see, e.g., 7]), a point, c P = x y z ]T , whose coordinates are expressed
with respect to the camera coordinate frame, will project onto the image plane with coordinates
p = u v]T , given by
" # " #
(x y z ) = uv = z xy (15)
If the coordinates of P are expressed relative to coordinate frame x, we must rst perform the
coordinate transformation c P = c xx x P
8
cx
e t
xc
cx
t
xe
xt
xc
feature parameters can be computed using the projective geometry of the camera. We will denote
this mapping by F, where
F : T ! F: (18)
For example, if F <2 is the space of u v image plane coordinates for the projection of some
point P onto the image plane, then, assuming perspective projection, F = u v ]T , where u and v
are given by (15). The exact form of (18) will depend in part on the relative conguration of the
camera and end-eector as discussed in the next section.
9
Joint controllers Power
amplifiers
cx inverse
d + Control
law kine−
− matics
cx
^
Pose f Feature
determination extraction
f +
d Control
law
−
f Feature
extraction
10
the robot controller entirely replacing it with a visual servo controller that directly computes joint
inputs, thus using vision alone to stabilize the mechanism.
For several reasons, nearly all implemented systems adopt the dynamic look-and-move approach.
First, the relatively low sampling rates available from vision make direct control of a robot end-
eector with complex, nonlinear dynamics an extremely challenging control problem. Using internal
feedback with a high sampling rate generally presents the visual controller with idealized axis
dynamics 27]. Second, many robots already have an interface for accepting Cartesian velocity or
incremental position commands. This simplies the construction of the visual servo system, and also
makes the methods more portable. Thirdly, look-and-move separates the kinematic singularities of
the mechanism from the visual controller, allowing the robot to be considered as an ideal Cartesian
motion device. Since many resolved rate 28] controllers have specialized mechanisms for dealing
with kinematic singularities 29], the system design is again greatly simplied. In this article, we
will utilize the look-and-move model exclusively.
The second major classication of systems distinguishes position-based control from image-based
control. In position-based control, features are extracted from the image and used in conjunction
with a geometric model of the target and the known camera model to estimate the pose of the
target with respect to the camera. Feedback is computed by reducing errors in estimated pose
space. In image-based servoing, control values are computed on the basis of image features directly.
The image-based approach may reduce computational delay, eliminate the necessity for image
interpretation and eliminate errors due to sensor modeling and camera calibration. However it
does present a signicant challenge to controller design since the plant is non-linear and highly
coupled.
In addition to these considerations, we distinguish between systems which only observe the
target object and those which observe both the target object and the robot end-eector. The former
are referred to as endpoint open-loop (EOL) systems, and the latter as endpoint closed-loop (ECL)
systems. The primary dierence is that EOL system must rely on an explicit hand-eye calibration
when translating a task specication into a visual servoing algorithm. Hence, the positioning
accuracy of EOL systems depends directly on the accuracy of the hand-eye calibration. Conversely,
systems that observe the end-eector as well as target features can perform with accuracy that is
independent of hand-eye calibration error 30{32]. Note also that ECL systems can easily deal with
tasks that involve the positioning of objects within the end-eector, whereas EOL systems must
use an inferred object location.
From a theoretical perspective, it would appear that ECL systems would always be preferable
to EOL systems. However, since ECL systems must track the end-eector as well as the target
object, the implementation of an ECL controller often requires solution of a more demanding vision
problem.
visual servo" to avoid confusion.
11
Joint controllers Power
amplifiers
cx
d + Control inverse
law kine−
− matics
cx
^
Pose f Feature
determination extraction
Power
amplifiers
f
d + Control
law
−
f Feature
extraction
Notice that the terms involving x^ e have dropped out. Thus (22) is not only simpler, but
positioning accuracy is also independent of the accuracy of the robot kinematics.
The above formulations presumed an EOL system. For an ECL system we suppose that we can
also directly observe eP and estimate its coordinates. In this case, (20) and (22) can be written:
14
As a second example of feature-based positioning, consider that some point on the end-eector,
e
P is to be brought to the line joining two xed points S1 and S2 in the world. Geometrically,
the shortest path for performing this task is to move eP toward the line joining S1 and S2 along
the perpendicular to the line. The error function describing this trajectory in base coordinates is:
Epl(xe S1 S2 eP) = (S2 ; S1) ((xe eP ; S1 ) (S2 ; S1)):
Notice that although E is a mapping from T to <3 placing a point on a line is a constraint of
degree 2: >From the geometry of the problem, we see that dening
u = ;kA(^xe eP)+Epl(^xe Sc1 Sc2 eP)
is a proportional feedback law for this problem.
Suppose that now we apply this constraint to two points on the end-eector:
" #
Eppl(xe S1 S2 P1 P2) = EEplpl((xxee SS11 SS22 ePP12))
e
e e
Eppl denes a four degree of freedom positioning constraint which aligns the points on the end-
eector with those in target coordinates. The error function is again overparameterized. Geomet-
rically, it is easy to see that one way of computing feedback is to compute a translation, T which
moves e P1 to the line through S1 and S2 : Simultaneously, we can choose so as to rotate e P2
about e P1 so that the the line through e P1 and e P2 becomes parallel to that through S1 and S2 :
This leads to the proportional feedback law:
= ;k1 (S2 ; S1 ) Re (eP2 ; e P1 )] (28)
T = ;k2(S2 ; S1) ((^xe eP ; S1) (S2 ; S1 )) ; (^xe eP1) (29)
Note that we are still free to choose translations along the line joining S1 and S2 as well as
rotations about it. Full six degree-of-freedom positioning can be attained by enforcing another
point-to-line constraint using an additional point on the end-eector and an additional point in the
world. See 35] for details.
These formulations can be adjusted for end-eector mounted camera and can be implemented
as ECL or EOL systems. We leave these modications as an exercise for the reader.
15
previous section in both ECL and EOL congurations. Similar remarks hold for systems utilizing
free-standing cameras.
Given object pose, it is possible to directly dene manipulator stationing in object coordinates.
Let xt be a desired stationing point for the end-eector, and suppose the system employs free-
e
(Note that in order for this error function to be in accord with our denition of kinematic error
we must select a parameterization of rotations which is 0 when the end-eector is in the desired
position.)
Using feature information and the camera calibration, we can directly estimate x^t = x^c cx^ t: In
order to compute a velocity screw, we rst note that the rotation matrix Re can be represented
e
as a rotation through an angle e e about an axis dened by a unit vector e ke 6]. Thus, we can
dene
4.3 Estimation
Obviously, a key issue in position-based visual servo is the estimation of the quantities used to
parameterize the feedback. In this regard, position-based visual servoing is closely related to
the problem of recovering scene geometry from one or more camera images. This encompasses
problems including structure from motion, exterior orientation, stereo reconstruction, and absolute
orientation. A comprehensive discussion of these topics can be found in a recent review article 36].
We divide the estimation problems that arise into single-camera and multiple-camera situations
which will be discussed in the following sections.
16
4.3.1 Single Camera
As noted previously, it follows from (15) that a point in a single camera image corresponds to a line
in space. Although it is possible to perform geometric reconstruction using a single moving camera,
the equations governing this process are often ill-conditioned, leading to stability problems 36]
Better results can be achieved if target features have some internal structure, or the features come
from a known object. Below, we briey describe methods for performing both point estimation and
pose estimation with a single camera assuming such information is available.
Single Points Clearly, extra information is needed in order to reconstruct the Cartesian coor-
dinates of a point in space from a single camera projection. In particular, if the feature has a
known scale, this information can be used to compute point position. An example of such a feature
is a circular opening with known diameter d whose image will be an ellipse. By estimating the
parameters of the ellipse in the image, it is possible to compute distance to the hole as well as the
orientation of the plane of the hole relative to the camera imaging plane.
Object Pose Accurate object pose estimation is possible if the vision system observes features of
a known object, and uses those features to estimate object pose. This approach has been recently
demonstrated by Wilson 37] for six DOF control of end-eector pose. A similar approach was
recently reported in 38].
Briey, such an approach proceeds as follows. Let t P1 t P2 : : : t Pn be a set of points expressed
in an object coordinate system with unknown pose c xt relative to an observing camera. The
reconstruction problem is to estimate c xt from the image locations of the corresponding observations
p1 p2 : : : pn: This is referred to as the pose estimation problem in the vision literature. Numerous
methods of solution have been proposed and 39] provides a recent review of several techniques.
Broadly speaking, solutions divide into analytic solutions and least-squares solutions which employ
a variety of simplications and/or iterative methods. Analytic solutions for three and four points
are given by 40{44]. Unique solutions exist for four coplanar, but not collinear, points. Least-
squares solutions can be found in 45{51]. Six or more points always yield unique solutions. The
camera calibration matrix can be computed from features on the target, then decomposed 49] to
yield the target's pose.
The least-squares solution proceeds as follows. Using (15), we can dene an objective function
of the unknown pose between the camera and the object:
X
n
O(cxt ) = O(c Rt ctt ) = kpi ; (cRt t P + c tt )k2:
i=1
This is a nonlinear optimization problem which has no known closed-form solution. Instead,
iterative optimization techniques are employed. These techniques iteratively rene a nominal value
c
xt (e:g: the pose of the object in a previous image), to compute an updated value for the pose
parameters. Because of the sensitivity of the reconstruction process to noise, it is often a good
idea to incorporate some type of smoothing or averaging of the computed pose parameters, at the
17
cost of some delay in response to changes in target pose. A particularly elegant formulation of this
updating procedure results by application of statistical techniques such as the extended Kalman
lter 52]. The reader is referred to 37] for details.
Single Points Let axc1 represent the location of a camera relative to an arbitrary base coordinate
frame a: By inverting this transformation and combining (1) and (15) for a point a P = x y z ]T
we have
" # " #
p1 = uv11 = z aP+ t xy aPP ++ ttxy
a
(33)
z
where x y and z are the rows of c1 Ra and c1 ta = tx ty tz ]T : Multiplying through by the denomi-
nator of the right-hand side, we have
A1 (p1)aP = b1(p1): (34)
where " # " #
A1(p1) = xy ;; uv1zz a P and b1(p1) = ttz uv 1 ; ; tx :
1 z 1 ty
Given a second camera at location c2 xa we can compute A2 (p2) and b2(p2 ) analogously. Stacking
these together results in a matrix equation
" # " #
A1(p1) a
P = bb12((pp12)) :
A2(p2)
which is an overdetermined system that can be solved for a P:
Object Pose Given a known object with three or more points in known locations with respect
to an object coordinate system, it is relatively straightforward to solve the absolute orientation
problem relating camera coordinates to object coordinates. The solution is based on noting that
the centroid of a rigid set of points is invariant to coordinate transformations. Let t P1 t P2 : : : t Pn
and c Pb 1 c Pb 2 : : : c Pb n denote n reference points in object coordinates and their corresponding
18
estimates in camera coordinates. Dene t C and c C be the centroids these point sets, respectively,
and dene t Pi = t Pi ; t C and c Pi = c Pb i ; c C: Then we have
c
xt (tPi ; tC) ; (cPb i ; cC) = (cRt tPi + ctt ; cRttC ; ctt) ; (cPb i ; cC) = cRttPi ; cPi:
Note that the nal expression depends only on c Rt : The corresponding least-squares problem can
either be solved explicitly for c Rt (see 56{58]), or solved incrementally using linearization. Given
an estimate for c Rt the computation of c tt is a linear least squares problem.
4.4 Discussion
The principle advantage of position-based control is that it is possible to describes tasks in terms
of positioning in Cartesian coordinates. It's primary disadvantage is that it is often highly cali-
bration dependent. The impact of calibration dependency often depends on the situation. In an
environment where moderate positioning accuracy is required from rmly mounted cameras, extant
calibration techniques probably provide a suciently accurate solution. However, if the cameras
are moving and high accuracy is required, calibration sensitivity is an important issue.
Computation time for the relative orientation problem is often cited as a disadvantage of
position-based methods. However recent results show that solutions can be computed in only
a few milliseconds even using iteration 39] or Kalman ltering 37].
Endpoint closed-loop systems are demonstrably less sensitive to calibration. However, partic-
ularly in stereo systems, small rotational errors between the cameras can lead to reconstruction
errors which do impact the positioning accuracy of the system. Thus, endpoint closed-loop systems
will work well, for example, with a moving stereo head in which the cameras are xed and rigid.
However, it still may cause problems when both cameras are free to move relative to one another.
Feature-based approaches tend to be more appropriate to tasks where there is no prior model
of the geometry of the task, for example in teleoperation applications 59]. Pose-based approaches
inherently depend on an existing object model.
The pose estimation problems inherent in many position-based servoing problems requires so-
lution to a potentially dicult correspondence problem. However, if the features are being tracked
(see Section 6), then this problem need only be solved once at the beginning of the control process.
Many of these problems can be circumvented by sensing target pose directly using a 3D sensor.
Active 3D sensors based on structured lighting are now compact and fast enough to use for visual
servoing. If the sensor is small and mounted on the robot 60{62] the depth and orientation
information can be used for position-based visual servoing.
5 Image-Based Control
As described in Section 3, in image-based visual servo control the error signal is dened directly
in terms of image feature parameters (in contrast to position-based methods that dene the error
signal in the task space coordinates). Thus, we posit the following denition.
19
Denition 5.1 An image-based visual servoing task is represented by an image error
function e : F ! <l , where l k and k is the dimension of the image feature parameter
space.
As described in Section 2.5, the system may use either a xed camera or an eye-in-hand con-
guration. In either case, motion of the manipulator causes changes to the image observed by the
vision system. Thus, the specication of an image-based visual servo task involves determining
an appropriate error function e, such that when the task is achieved, e = 0: This can be done
by directly using the projection equations (15), or by using a \teaching by showing" approach, in
which the robot is moved to a goal position and the corresponding image is used to compute a
vector of desired feature parameters, fd . If the task is dened with respect to a moving object, the
error, e, will be a function, not only of the pose of the end-eector, but also of the pose of the
moving object.
Although the error, e, is dened on the image parameter space, the manipulator control input is
typically dened either in joint coordinates or in task space coordinates. Therefore, it is necessary
to relate changes in the image feature parameters to changes in the position of the robot. The image
Jacobian, introduced in Section 5.1, captures these relationships. We present an example image
Jacobian in Section 5.2. In Section 5.3, we describe methods that can be used to \invert" the image
Jacobian, to derive the robot velocity that will produce a desired change in the image. Finally, in
Sections 5.4 and 5.5 we describe how controllers can be designed for image-based systems.
f_ = Jv r_ (35)
where Jv 2 <km , and
2 @v (r) @v1 (r) 3
6
" # 6 @r1
1
::: @rm 77
Jv (r) =
@ 6
=6 . . .. 77
@ r 664 . . 77 : (36)
@vk (r) ::: @vk (r) 5
@r1 @rm
Recall that m is the dimension of the task space, T . Thus, the number of columns in the image
20
Jacobian will vary depending on the task.
The image Jacobian was rst introduced by Weiss et al. 19], who referred to it as the feature
sensitivity matrix. It is also referred to as the interaction matrix 10] and the B matrix 14,15].
Other applications of the image Jacobian include 9,12,13,22].
The relationship given by (35) describes how image feature parameters change with respect to
changing manipulator pose. In visual servoing we are interested in determining the manipulator
velocity, r_ , required to achieve some desired value of f_ . This requires solving the system given by
(35). We will discuss this problem in Section 5.3, but rst we present an example image Jacobian.
c _p = c p + T (37)
To simplify notation, let c p = x y z ]T . Substituting the perspective projection equations (15)
into (9) { (10) we can write the derivatives of the coordinates of p in terms of the image feature
parameters u v as
x_ = z!y ; vzf !z + Tx (38)
y_ = uz
f !z ; z!x + Ty (39)
z_ = fz (v!x ; u!u ) + Tz : (40)
!x + !y ; v!z (43)
Similarly
v ; 2 ; v 2 uv
v_ = Ty ; Tz + !x + !y + u!z (44)
z z
21
Finally, we may rewrite these two equations in matrix form to obtain
2T 3
2 3
2 + u2 ;v 66 Ty 77
x
" # 6 0 ;u ;uv
77 66 Tz 77
u_ = 66 z z 75 66 !x 77 (45)
v_ 4 0 ;v ; ; v
2 2
uv u 64 ! 75
z z y
!z
which is an important result relating image-plane velocity of a point to the relative velocity of
the point with respect to the camera. Alternative derivations for this example can be found in a
number of references including 63,64].
It is straightforward to extend this result to the general case of using k=2 image points for the
visual control by simply stacking the Jacobians for each pair of image point coordinates
2 ;u1 ;u1 v1 2 + u21 3
6 z1 0 z1 ;v1 7
2 u_ 3 666 ;v1 ; ; v12
2
u1 v1
7 77 2 Tx 3
66 v_11 77 666 0 z1 z1 u1 77 66 Ty
7
77
66 .. 77 66 .. .. .. .. .. .. 77 666 Tz 77
77 :
66 . 77 = 66 . . . . . . 77 6 !x (46)
4 u_ 5 66 ;uk=2 ;uk=2 vk=2 2 + u2k=2 76 75
k=2
v_k=2 66 zk=2 0 zk=2 ;vk=2 77 4 !y
64 77 !z
u 5
;vk=2 ; ; vk=2 uk=2 vk=2
2 2
0 zk=2 zk=2 k=2
Finally, note that the Jacobian matrices given in (45) and (46) are functions of the distance
from the camera focal center to the point being imaged (i.e., they are functions of zi ). For a
xed camera system, when the target is the end-eector these z values can be computed using
the forward kinematics of the robot and the camera calibration information. For an eye-in-hand
system, determining z can be more dicult. This problem is discussed further in Section 7.1.
r_ = J+v f_ : (49)
When k < m, the system is underconstrained. In the visual servo application, this implies that
we are not observing enough features to uniquely determine the object motion r_ , i.e., there are
certain components of the object motion that can not be observed. In this case, the appropriate
pseudoinverse is given by
23
of a point on a projection ray about that projection ray cannot be observed. Unfortunately, not
all basis vectors for the null space have such an obvious physical interpretation. The null space of
the image Jacobian plays a signicant role in hybrid methods, in which some degrees of freedom
are controlled using visual servo, while the remaining degrees of freedom are controlled using some
other modality 12].
where K is a constant gain matrix of the appropriate dimension. For the case of a non-square
image Jacobian, the techniques described in Section 5.3 would be used to compute for u. Similar
results have been presented in 12,13].
Point to Point Positioning Consider the task of bringing some point P on the manipulator
to a desired stationing point S. If two cameras are viewing the scene, a necessary and sucient
condition for P and S to coincide in the workspace is that the projections of P and S coincide in
each image.
If we let ul v l]T and ur v r ]T be the image coordinates for the projection of P in the left and
right images, respectively, then we may take f = ul v l ur v r ]T . If we let T = <3 F is a mapping
from T to R4 :
Let the projection of S have coordinates uls vsl ] and urs vsr ] in the left and right images. We
then dene the desired feature vector to be fd = uls vsl urs vsr ]T , yielding
24
The image Jacobian for this problem can be constructed by using (45) for each camera (note
that a coordinate transformation must be used for either the left or right camera, to relate the
end-eector velocity screw to a common reference frame).
Point to Line Positioning Consider again the task in which some point P on the manipulator
end-eector is to be brought to the line joining two xed points S1 and S2 in the world.
If two cameras are viewing the workspace, it can be shown that a necessary and sucient
condition for P to be colinear with the line joining S1 and S2 is that the projection of P be
colinear with the projections of the points S1 and S2 in both images (for non-degenerate camera
congurations). The proof proceeds as follows. The origin of the coordinate frame for the left
camera, together with the projections of S1 and S2 onto the left image forms a plane. Likewise,
the origin of the coordinate frame for the right camera, together with the projections of S1 and S2
onto the right image forms a plane. The intersection of these two planes is exactly the line joining
S1 and S2 in the workspace. When P lies on this line, it must lie simultaneously in both of these
planes, and therefore, must be colinear with the the projections of the points S1 and S2 in both
images.
We now turn to conditions that determine when the projection of P is colinear with the the
projections of the points S1 and S2 . It is known that three vectors are coplanar if and only if their
scalar triple product is zero. For the left image, let the projection of S1 have image coordinates
ul1 v1l ], the projection of S2 have image coordinates ul2 v2l ], and the projection of P have image
coordinates ul v l]. If the three vectors from the origin of the left camera to these image points are
coplanar, then the three image points are colinear. Thus, we construct the scalar triple product
02 l 3 2 l 31 2 l 3
B6 u1 7 6 u2 7C 6 u 7
l l l T
epl ( u v ] ) = @4 v1l 5 4 v2l 5A 4 v l 5 : (55)
We may proceed in the same fashion to derive conditions for the right image
02 r 3 2 r 31 2 r 3
u1 75 64 uv22r 75CA 64 uvr 75 :
erpl( ur vr]T ) = B 6
@4 v1r (56)
Finally, we construct the error function
"l l lT #
e(f ) = eerpl( ( uur vvr]]T)) (57)
pl
where f = ul v l ur v r ]T . Again, the image Jacobian for this problem can be constructed by using
(45) for each camera (note that a coordinate transformation must be used for either the left or
right camera, to relate the end-eector velocity screw to a common reference frame).
25
Given a second point on the end-eector, a four degree of freedom positioning operation can
be dened by simply stacking the error terms. It is interesting to note that these solutions to the
point-to-line problem perform with an accuracy that is independent of calibration, whereas the
position-based versions do not 66].
5.6 Discussion
One of the chief advantages to image-based control over position-based control is that the position-
ing accuracy of the system is less sensitive camera calibration. This is particularly true for ECL
image-based systems. For example, it is interesting to note that the ECL image-based solutions to
the point-to-line positioning problem perform with an accuracy that is independent of calibration,
whereas the position-based versions do not 66].
It is important to note, however, that most of the image-based control methods appearing in
the literature still rely on an estimate of point position or target pose to parameterize the Jacobian.
In practice, the unknown parameter for Jacobian calculation is distance from the camera. Some
recent papers present adaptive approaches for estimating 14] this depth value, or develop feedback
methods which do not use depth in the feedback formulation 67].
There are often computational advantages to image-based control, particularly in ECL cong-
urations. For example, a position-based relative pose solution for an ECL single-camera system
must perform two nonlinear least squares optimizations in order to compute the error function.
The comparable image-based system must only compute a simple image error function, an inverse
Jacobian solution, and possibly a single position or pose calculation to parameterize the Jacobian.
One disadvantage of image-based methods over position-based methods is the presence of sin-
gularities in the feature mapping function which reect themselves as unstable points in the inverse
Jacobian control law. These instabilities are often less prevalent in the equivalent position-based
scheme. Returning again to the point-to-line example, the Jacobian calculation becomes singular
when the two stationing points are coplanar with the optical centers of both cameras. In this con-
guration, rotations and translations of the setpoints in the plane are not observable. This singular
conguration does not exist for the position-based solution.
In the above discussion we have referred to fd as the desired feature parameter vector, and
implied that it is a constant. If it is a constant then the robot will move to the desired pose with
respect to the target. If the target is moving the system will endeavour to track the target and
maintain relative pose, but the tracking performance will be a function of the system dynamics as
discussed in Section 7.2.
Many tasks can be described in terms of the motion of image features, for instance aligning visual
cues in the scene. Jang et al. 68] describe a generalized approach to servoing on image features,
with trajectories specied in feature space { leading to trajectories (tasks) that are independent
of target geometry. Skaar et al. 16] describes the example of a 1DOF robot catching a ball. By
observing visual cues such as the ball, the arm's pivot point, and another point on the arm, the
interception task can be specied, even if the relationship between camera and arm is not known
a priori. Feddema 9] uses a feature space trajectory generator to interpolate feature parameter
26
values due to the low update rate of the vision system used.
27
A window can be thought of as a two-dimensional array of pixels related to a larger image by an
invertible mapping from window coordinates to image coordinates. We consider rigid transforma-
tions consisting of a translation vector c = x y ]T and a rotation : A pixel value at x = u v ]T in
window coordinates is related to the larger image by
R(x c t) = I (c + R()x t) (58)
where R is a two dimensional rotation matrix. We adopt the convention that x = 0 is the center
of the window. In the sequel, the set X represents the set of all values of x:
Window-based tracking algorithms typically operate in two stages. In the rst stage, one or
more windows are acquired using a nominal set of window parameters. The pixel values for all x 2 X
are copied into a two-dimensional array that is subsequently treated as a rectangular image. Such
acquisitions can be implemented extremely eciently using line-drawing and region-ll algorithms
commonly developed for graphics applications 73]. In the second stage, the windows are processed
to locate features. Using feature measurements, a new set of window parameters are computed.
These parameters may be modied using external geometric constraints or temporal prediction,
and the cycle repeats.
We consider an edge segment to be characterized by three parameters in the image plane:
the u and v coordinates of the center of the segment, and the orientation of the segment relative
to the image plane coordinate system. These values correspond directly to the parameters of
the acquisition window used for edge detection. Let us rst assume we have correct prior values
c; = (u; v;) and ; for an edge segment. A window, R;(x) = R(x c; ; t) extracted with
these parameters would then have a vertical edge segment within it.
Isolated step edges can be localized by determining the location of the maximum of the rst
derivative of the signal 64,72,74]. However, since derivatives tend to increase the noise in an image,
most edge detection methods combine spatial derivatives with a smoothing operation to suppress
spurious maxima. Both derivatives and smoothing are linear operations that can be computed
using convolution operators. Recall that the two-dimensional convolution of the window R; () by
a function G is given by
X
(R; G)(x) = jX1 j R; (x ; s)G(s):
s2X
+ = ; +
u+ = u; ;
o sin(+ )
v+ = v; +
o cos(+ )
An implementation of this method has shown that that localizing a 20 pixel edge using a
Prewitt-style mask 15 pixels wide searching 10 pixels and 15 degrees takes 1:5 ms on a Sun
Sparc II workstation. At this rate, 22 edge segments can be tracked simultaneously at 30 Hz, the
video frame rate used. Longer edges can be tracked at comparable speeds by subsampling along
the edge.
Clearly, this edge-detection scheme is susceptible to mistracking caused by background or fore-
ground occluding edges. Large acquisition windows increase the range of motions that can be
tracked, but reduce the tracking speed and increase the likelihood that a distracting edge will dis-
rupt tracking. Likewise, large orientation brackets reduce the accuracy of the estimated orientation,
and make it more susceptible to edges that are not closely oriented to the underlying edge.
There are several ways of increasing the robustness of edge tracking. One is to include some
type of temporal component in the algorithm. For example, matching edges based on the sign or
29
absolute value of the edge response increases its ability to reject incorrect edges. For more complex
edge-based detection, collections of such oriented edge detectors can be combined to verify the
location and position of the entire feature. Some general ideas in this direction are discussed in
Section 6.3
30
" #
Ry (x) = (R ;11 ;11 )(x)
" #!
Rt(x) = (R( c t + ) ; Rs ( c t)) 11 11 (x)
6.4 Discussion
Prior to executing or planning visually controlled motions, a specic set of visual features must be
chosen. Discussion of the issues related to feature selection for visual servo control applications can
be found in 18, 19]. The \right" image feature tracking method to use is extremely application
dependent. For example, if the goal is to track a single special pattern or surface marking that is
approximately planar and moving at slow to moderate speeds, then SSD tracking is appropriate.
It does not require special image structure (e:g: straight lines), it can accommodate a large set of
image distortions, and for small motions can be implemented to run at frame rates.
In comparison to the edge detection methods described above, SSD tracking is extremely sen-
sitive to background changes or occlusions. Thus, if a task requires tracking several occluding
contours of an object with a changing background, edge-based methods are clearly faster and more
robust.
In many realistic cases, neither of these approaches by themselves yields the robustness and
performance desired. For example, tracking occluding edges in an extremely cluttered environment
is sure to distract edge tracking as \better" edges invade the search window, while the changing
background would ruin the SSD match for the region. Such situations call for the use of more
global task constraints (e:g: the geometry of several edges), more global tracking (e:g: extended
contours or snakes 80]), or improved or specialized detection methods.
To illustrate these tradeos, suppose a visual servoing task relies on tracking the image of a
circular opening over time. In general, the opening will project to an ellipse in the camera. There
32
are several candidate algorithms for detecting this ellipse and recovering its parameters:
1. If the contrast between the interior of the opening and area around it is high, then binary
thresholding followed by a calculation of the rst and second central moments can be used to
localize the feature 54].
2. If the ambient illumination changes greatly over time, but the brightness of the opening and
the brightness of the surrounding region are roughly constant, a circular template could be
localized using SSD methods augmented with brightness and contrast parameters. In this
case, (59) must also include parameters for scaling and aspect ratio 70].
3. The opening could be selected in an initial image, and subsequently located using SSD meth-
ods. This diers from the previous method in that this calculation does not compute the
center of the opening, only its correlation with the starting image. Although useful for ser-
voing a camera to maintain the opening within the eld of view, this approach is probably
not useful for manipulation tasks that need to attain a position relative to the center of the
opening.
4. If the contrast and background are changing, the opening could be tracked by performing edge
detection and tting an ellipse to the edge locations. In particular, short edge segments could
be located using the techniques described in Section 6.1. Once the segments have been t to
an ellipse, the orientation and location of the segments would be adjusted for the subsequent
tracking cycle using the geometry of the ellipse.
During task execution, other problems arise. The two most common problems are occlusion of
features and and visual singularities. Solutions to the former include intelligent observers that note
the disappearance of features and continue to predict their locations based on dynamics and/or
feedforward information 54], or redundant feature specications that can perform even with some
loss of information. Solution to the latter require some combination of intelligent path planning
and/or intelligent acquisition and focus-of-attention to maintain the controllability of the system.
It is probably safe to say that image processing presents the greatest challenge to general-
purpose hand-eye coordination. As an eort to help overcome this obstacle, the methods described
above and other related methods have been incorporated into a publically available \toolkit." The
interested reader is referred to 70] for details.
7 Related Issues
In this section, we briey discuss a number of related issues that were not addressed in the tutorial.
34
then prediction (based upon some assumption of target motion) can be used to compensate for
the latency, but combined with a low sample rate this results in poor disturbance rejection and
long reaction time to target `maneuvers'. Predictors based on autoregressive models, Kalman
lters, ; and ; ; tracking lters have been demonstrated for visual servoing. In order
for a visual-servo system to provide good tracking performance for moving targets considerable
attention must be paid to modelling the dynamics of the robot and vision system and designing
an appropriate control system. Other issues for consideration include whether or not the vision
system should `close the loop' around robot axes which are position, velocity or torque controlled.
A detailed discussion of these dynamic issues in visual servo systems is given by Corke 27,82].
35
mobile robotics, including nonholonmoic systems 83] and, feature selection 18,78]. Many of these
are describe in the proceedings of a recent workshop on visual servo control 89].
8 Conclusion
This paper has presented, for the rst time, a tutorial introduction to robotic visual servo control.
Since the topic spans many disciplines, we have concentrated on certain fundamental aspects of the
topic. However a large bibliography is provided to assist the reader who seeks greater detail than
can be provided here.
The tutorial covers, using consistant notation, the relevant fundamentals of coordinate trans-
formations, pose representation, and image formation. Since no standards yet exist for terminology
or symbols we have attempted, in Section 2, to establish a consistent nomenclature. Where neces-
sary we relate this to the notation used in the source papers. The two major approaches to visual
servoing, image-based and position-based control, were been discussed in detail in Sections 5 and
4. The topics have been discussed formally, using the notation established earlier, and illustrated
with a number of realistic examples. An important part of any visual servo system is image feature
parameter extraction. Section 6 discussed two broad approaches to this problem with an emphasis
on methods that have been found to function well in practice and that can be implemented without
36
specialized image processing hardware. Section 7 presented a number of related issues that are rel-
evant to image-based or position-based visual servo systems. These included closed-loop dynamics,
relative pros and cons of the dierent approaches, open problems and the future.
References
1] Y. Shirai and H. Inoue, \Guiding a robot by visual feedback in assembling tasks," Pattern
Recognition, vol. 5, pp. 99{108, 1973.
2] J. Hill and W. T. Park, \Real time control of a robot with a mobile camera," in Proc. 9th
ISIR, (Washington, DC), pp. 233{246, Mar. 1979.
3] P. Corke, \Visual control of robot manipulators | a review," in Visual Servoing
(K. Hashimoto, ed.), vol. 7 of Robotics and Automated Systems, pp. 1{31, World Scientic,
1993.
4] A. C. Sanderson and L. E. Weiss, \Image-based visual servo control using relational graph
error signals," Proc. IEEE, pp. 1074{1077, 1980.
5] J. C. Latombe, Robot Motion Planning. Boston: Kluwer Academic Publishers, 1991.
6] J. J. Craig, Introduction to Robotics. Menlo Park: Addison Wesley, second ed., 1986.
7] B. K. P. Horn, Robot Vision. MIT Press, Cambridge, MA, 1986.
8] W. Jang, K. Kim, M. Chung, and Z. Bien, \Concepts of augmented image space and trans-
formed feature space for ecient visual servoing of an \eye-in-hand robot"," Robotica, vol. 9,
pp. 203{212, 1991.
9] J. Feddema and O. Mitchell, \Vision-guided servoing with feature-based trajectory genera-
tion," IEEE Trans. Robot. Autom., vol. 5, pp. 691{700, Oct. 1989.
10] B. Espiau, F. Chaumette, and P. Rives, \A New Approach to Visual Servoing in Robotics,"
IEEE Transactions on Robotics and Automation, vol. 8, pp. 313{326, 1992.
11] M. L. Cyros, \Datacube at the space shuttle's launch pad," Datacube World Review, vol. 2,
pp. 1{3, Sept. 1988. Datacube Inc., 4 Dearborn Road, Peabody, MA.
12] A. Castano and S. A. Hutchinson, \Visual compliance: Task-directed visual servo control,"
IEEE Transactions on Robotics and Automation, vol. 10, pp. 334{342, June 1994.
13] K. Hashimoto, T. Kimoto, T. Ebine, and H. Kimura, \Manipulator control with image-based
visual servo," in Proc. IEEE Int. Conf. Robotics and Automation, pp. 2267{2272, 1991.
14] N. P. Papanikolopoulos and P. K. Khosla, \Adaptive Robot Visual Tracking: Theory and
Experiments," IEEE Transactions on Automatic Control, vol. 38, no. 3, pp. 429{445, 1993.
15] N. P. Papanikolopoulos, P. K. Khosla, and T. Kanade, \Visual Tracking of a Moving Target by
a Camera Mounted on a Robot: A Combination of Vision and Control," IEEE Transactions
on Robotics and Automation, vol. 9, no. 1, pp. 14{35, 1993.
37
16] S. Skaar, W. Brockman, and R. Hanson, \Camera-space manipulation," Int. J. Robot. Res.,
vol. 6, no. 4, pp. 20{32, 1987.
17] S. B. Skaar, W. H. Brockman, and W. S. Jang, \Three-Dimensional Camera Space Manipula-
tion," International Journal of Robotics Research, vol. 9, no. 4, pp. 22{39, 1990.
18] J. T. Feddema, C. S. G. Lee, and O. R. Mitchell, \Weighted selection of image features for
resolved rate visual feedback control," IEEE Trans. Robot. Autom., vol. 7, pp. 31{47, Feb.
1991.
19] A. C. Sanderson, L. E. Weiss, and C. P. Neuman, \Dynamic sensor-based control of robots
with visual feedback," IEEE Trans. Robot. Autom., vol. RA-3, pp. 404{417, Oct. 1987.
20] R. L. Andersson, A Robot Ping-Pong Player. Experiment in Real-Time Intelligent Control.
MIT Press, Cambridge, MA, 1988.
21] M. Lei and B. K. Ghosh, \Visually-Guided Robotic Motion Tracking," in Proc. Thirtieth
Annual Allerton Conference on Communication, Control, and Computing, pp. 712{721, 1992.
22] B. Yoshimi and P. K. Allen, \Active, uncalibrated visual servoing," in Proc. IEEE International
Conference on Robotics and Automation, (San Diego, CA), pp. 156{161, May 1994.
23] B. Nelson and P. K. Khosla, \Integrating Sensor Placement and Visual Tracking Strategies,"
in Proc. IEEE International Conference on Robotics and Automation, pp. 1351{1356, 1994.
24] I. E. Sutherland, \Three-dimensional data input by tablet," Proc. IEEE, vol. 62, pp. 453{461,
Apr. 1974.
25] R. Tsai and R. Lenz, \A new technique for fully autonomous and ecient 3D robotics hand/eye
calibra tion," IEEE Trans. Robot. Autom., vol. 5, pp. 345{358, June 1989.
26] R. Tsai, \A versatile camera calibration technique for high accuracy 3-D machine vision m
etrology using o-the-shelf TV cameras and lenses," IEEE Trans. Robot. Autom., vol. 3,
pp. 323{344, Aug. 1987.
27] P. I. Corke, High-Performance Visual Closed-Loop Robot Control. PhD thesis, University of
Melbourne, Dept.Mechanical and Manufacturing Engineering, July 1994.
28] D. E. Whitney, \The mathematics of coordinated control of prosthetic arms and manipulators,"
Journal of Dynamic Systems, Measurement and Control, vol. 122, pp. 303{309, Dec. 1972.
29] S. Chieaverini, L. Sciavicco, and B. Siciliano, \Control of robotic systems through singulari-
ties," in Proc. Int. Workshop on Nonlinear and Adaptive Control: Issues i n Robotics (C. C.
de Wit, ed.), Springer-Verlag, 1991.
30] S. Wijesoma, D. Wolfe, and R. Richards, \Eye-to-hand coordination for vision-guided robot
control applications," International Journal of Robotics Research, vol. 12, no. 1, pp. 65{78,
1993.
31] N. Hollinghurst and R. Cipolla, \Uncalibrated stereo hand eye coordination," Image and Vision
Computing, vol. 12, no. 3, pp. 187{192, 1994.
38
32] G. D. Hager, W.-C. Chang, and A. S. Morse, \Robot hand-eye coordination based on stereo
vision," IEEE Control Systems Magazine, Feb. 1995.
33] C. Samson, M. Le Borgne, and B. Espiau, Robot Control: The Task Function Approach.
Oxford, England: Clarendon Press, 1992.
34] G. Franklin, J. Powell, and A. Emami-Naeini, Feedback Control of Dynamic Systems. Addison-
Wesley, 2nd ed., 1991.
35] G. D. Hager, \Six DOF visual control of relative position," DCS RR-1038, Yale University,
New Haven, CT, June 1994.
36] T. S. Huang and A. N. Netravali, \Motion and structure from feature correspondences: A
review," IEEE Proceeding, vol. 82, no. 2, pp. 252{268, 1994.
37] W. Wilson, \Visual servo control of robots using kalman lter estimates of robot pose relative
to work-pieces," in Visual Servoing (K. Hashimoto, ed.), pp. 71{104, World Scientic, 1994.
38] C. Fagerer, D. Dickmanns, and E. Dickmanns, \Visual grasping with long delay time of a free
oating object in orbit," Autonomous Robots, vol. 1, no. 1, 1994.
39] C. Lu, E. J. Mjolsness, and G. D. Hager, \Online computation of exterior orientation with
application to hand-eye calibration," DCS RR-1046, Yale University, New Haven, CT, Aug.
1994. To appear in Mathematical and Computer Modeling.
40] M. A. Fischler and R. C. Bolles, \Random sample consensus: a paradigm for model tting with
applicatio ns to image analysis and automated cartography," Communications of the ACM,
vol. 24, pp. 381{395, June 1981.
41] R. M. Haralick, C. Lee, K. Ottenberg, and M. Nolle, \Analysis and solutions of the three
point perspective pose estimation problem," in Proc. IEEE Conf. Computer Vision Pat. Rec.,
pp. 592{598, 1991.
42] D. DeMenthon and L. S. Davis, \Exact and approximate solutions of the perspective-three-
point problem," IEEE Trans. Pat. Anal. Machine Intell., no. 11, pp. 1100{1105, 1992.
43] R. Horaud, B. Canio, and O. Leboullenx, \An analytic solution for the perspective 4-point
problem," Computer Vis. Graphics. Image Process, no. 1, pp. 33{44, 1989.
44] M. Dhome, M. Richetin, J. Laprest'e, and G. Rives, \Determination of the attitude of 3-
D objects from a single perspective view," IEEE Trans. Pat. Anal. Machine Intell., no. 12,
pp. 1265{1278, 1989.
45] G. H. Roseneld, \The problem of exterior orientation in photogrammetry," Photogrammetric
Engineering, pp. 536{553, 1959.
46] D. G. Lowe, \Fitting parametrized three-dimensional models to images," IEEE Trans. Pat.
Anal. Machine Intell., no. 5, pp. 441{450, 1991.
47] R. Goldberg, \Constrained pose renement of parametric objects," Intl. J. Computer Vision,
no. 2, pp. 181{211, 1994.
39
48] R. Kumar, \Robust methods for estimating pose and a sensitivity analysis," CVGIP: Image
Understanding, no. 3, pp. 313{342, 1994.
49] S. Ganapathy, \Decomposition of transformation matrices for robot vision," Pattern Recogni-
tion Letters, pp. 401{412, 1989.
50] M. Fischler and R. C. Bolles, \Random sample consensus: A paradigm for model tting and
automatic cartography," Commun. ACM, no. 6, pp. 381{395, 1981.
51] Y. Liu, T. S. Huang, and O. D. Faugeras, \Determination of camera location from 2-D to 3-D
line and point correspondences," IEEE Trans. Pat. Anal. Machine Intell., no. 1, pp. 28{37,
1990.
52] A. Gelb, ed., Applied Optimal Estimation. Cambridge, MA: MIT Press, 1974.
53] P. K. Allen, A. Timcenko, B. Yoshimi, and P. Michelman, \Automated Tracking and Grasping
of a Moving Object with a Robotic Hand-Eye System," IEEE Transactions on Robotics and
Automation, vol. 9, no. 2, pp. 152{165, 1993.
54] A. Rizzi and D. Koditschek, \An active visual estimator for dexterous manipulation," in
Proceedings, IEEE International Conference on Robotics and Automaton, 1994.
55] J. Pretlove and G. Parker, \The development of a real-time stereo-vision system to aid
robot guidance in carrying out a typical manufacturing task," in Proc. 22nd ISRR, (Detroit),
pp. 21.1{21.23, 1991.
56] B. K. P. Horn, H. M. Hilden, and S. Negahdaripour, \Closed-form solution of absolute orien-
tation using orthonomal matrices," J. Opt. Soc. Amer., vol. A-5, pp. 1127{1135, 198.
57] K. S. Arun, T. S. Huang, and S. D. Blostein, \Least-squares tting of two 3-D point sets,"
IEEE Trans. Pat. Anal. Machine Intell., vol. 9, pp. 698{700, 1987.
58] B. K. P. Horn, \Closed-form solution of absolute orientation using unit quaternion," J. Opt.
Soc. Amer., vol. A-4, pp. 629{642, 1987.
59] G. D. Hager, G. Grunwald, and G. Hirzinger, \Feature-based visual servoing and its application
to telerobotics," DCS RR-1010, Yale University, New Haven, CT, Jan. 1994. To appear at the
1994 IROS Conference.
60] G. Agin, \Calibration and use of a light stripe range sensor mounted on the hand of a robot,"
in Proc. IEEE Int. Conf. Robotics and Automation, pp. 680{685, 1985.
61] S. Venkatesan and C. Archibald, \Realtime tracking in ve degrees of freedom using two wrist-
mounted laser range nders," in Proc. IEEE Int. Conf. Robotics and Automation, pp. 2004{
2010, 1990.
62] J. Dietrich, G. Hirzinger, B. Gombert, and J. Schott, \On a unied concept for a new generation
of light-weight robots," in Experimental Robotics 1 (V. Hayward and O. Khatib, eds.), vol. 139
of Lecture Notes in Control and Information Sciences, pp. 287{295, Springer-Verlag, 1989.
40
63] J. Aloimonos and D. P. Tsakiris, \On the mathematics of visual tracking," Image and Vision
Computing, vol. 9, pp. 235{251, Aug. 1991.
64] R. M. Haralick and L. G. Shapiro, Computer and Robot Vision. Addison Wesley, 1993.
65] F. W. Warner, Foundations of Di erentiable Manifolds and Lie Groups. New York: Springer-
Verlag, 1983.
66] G. D. Hager, \Calibration-free visual control using projective invariance," DCS RR-1046, Yale
University, New Haven, CT, Dec. 1994. To appear Proc. ICCV '95.
67] D. Kim, A. Rizzi, G. Hager, and D. Koditschek, \A \robust" convergent visual servoing
system." Submitted to Intelligent Robots and Systems 1995, 1994.
68] W. Jang and Z. Bien, \Feature-based visual servoing of an eye-in-hand robot with improved
tracking performance," in Proc. IEEE Int. Conf. Robotics and Automation, pp. 2254{2260,
1991.
69] R. L. Anderson, \Dynamic sensing in a ping-pong playing robot," IEEE Transaction on
Robotics and Automation, vol. 5, no. 6, pp. 723{739, 1989.
70] G. D. Hager, \The \X-Vision" system: A general purpose substrate for real-time vision-based
robotics." Submitted to the 1995 Workshop on Vision for Robotics, Feb. 1995.
71] E. Dickmanns and V. Graefe, \Dynamic monocular machine vision," Machine Vision and
Applications, vol. 1, pp. 223{240, 1988.
72] O. Faugeras, Three-Dimensional Computer Vision. Cambridge, MA: MIT Press, 1993.
73] J. Foley, A. van Dam, S. Feiner, and J. Hughes, Computer Graphics. Addison Wesley, 1993.
74] D. Ballard and C. Brown, Computer Vision. Englewood Clis, NJ: Prentice-Hall, 1982.
75] J. Canny, \A computational approach to edge detection," IEEE Trans. Pattern Anal. Mach.
Intell., pp. 679{98, Nov. 1986.
76] B. D. Lucas and T. Kanade, \An iterative image registration technique with an application to
stereo vision," in Proc. International Joint Conference on Articial Intelligence, pp. 674{679,
1981.
77] P. Anandan, \A computational framework and an algorithm for the measurement of structure
from motion," International Journal of Computer Vision, vol. 2, pp. 283{310, 1989.
78] J. Shi and C. Tomasi, \Good features to track," in Proc. IEEE Computer Society Conference
on Computer Vision and Pattern Recognition, pp. 593{600, IEEE Computer Society Press,
1994.
79] J. Huang and G. D. Hager, \Tracking tools for vision-based navigation," DCS RR-1046, Yale
University, New Haven, CT, Dec. 1994. Submitted to IROS '95.
80] M. Kass, A. Witkin, and D. Terzopoulos, \Snakes: active contour models," International
journal of Computer Vision, vol. 1, no. 1, pp. 321{331, 1987.
41
81] B. Bishop, S. A. Hutchinson, and M. W. Spong, \Camera modelling for visual servo control
applications," Mathematical and Computer Modelling { Special issue on Modelling Issues in
Visual Sensing.
82] P. Corke and M. Good, \Dynamic eects in visual closed-loop systems," Submitted to IEEE
Transactions on Robotics and Automation, 1995.
83] S. B. Skaar, Y. Yalda-Mooshabad, and W. H. Brockman, \Nonholonomic camera-space ma-
nipulation," IEEE Transactions on Robotics and Automation, vol. 8, pp. 464{479, Aug. 1992.
84] G. D. Hager, S. Puri, and K. Toyama, \A framework for real-time vision-based tracking using
o-the-shelf hardware," DCS RR-988, Yale University, New Haven, CT, Sept. 1993.
85] A. C. Sanderson and L. E. Weiss, \Adaptive visual servo control of robots," in Robot Vision
(A. Pugh, ed.), pp. 107{116, IFS, 1983.
86] N. Mahadevamurty, T.-C. Tsao, and S. Hutchinson, \Multi-rate analysis and design of visual
feedback digital servo control systems," ASME Journal of Dynamic Systems, Measurement
and Control, pp. 45{55, Mar. 1994.
87] R. Sharma and S. A. Hutchinson, \On the observability of robot motion under active camera
control," in Proc. IEEE International Conference on Robotics and Automation, pp. 162{167,
May 1994.
88] A. Fox and S. Hutchinson, \Exploiting visual constraints in the synthesis of uncertainty-
tolerant motion plans," IEEE Transactions on Robotics and Automation, vol. 11, pp. 56{71,
1995.
89] G. Hager and S. Hutchinson, eds., Proc. IEEE Workshop on Visual Servoing: Achievements,
Applications and Open Problems. Inst. of Electrical and Electronics Eng., Inc., 1994.
42