10191029
Annals of Biomedical Engineering, Vol. 34, No. 6, June 2006 (
DOI: 10.1007/s10439-006-9122-8
S. CORAZZA,1 L. MUNDERMANN
,1 A. M. CHAUDHARI,1 T. DEMATTIO,2 C. COBELLI,2
and T. P. ANDRIACCHI1, 3, 4
1
Department of Mechanical Engineering, Stanford University, 496 Lomita Mall, Durand B. 201, Stanford, CA; 2 Department of
Information Engineering, University of Padova, Padova, Italy; 3 Bone and Joint Center, Palo Alto VA, Palo Alto, CA;
and 4 Department of Orthopedic Surgery, Stanford University Medical Center, Stanford, CA
(Received 1 July 2005; accepted 29 March 2006; published online: 5 May 2006)
time for patient preparation and the inter-observer variability. At present, using reflective markers on the skin is the
most common technique.5,12,13 Despite their precision and
popularity, marker based methods have several limitations:
(i) markers attached to the subject can influence the subjects movement, (ii) a controlled environment is required to
acquire high-quality data, (iii) the time required for marker
placement can be excessive, and (iv) the markers on the
skin can move relative to the underlying bone, leading to
what is commonly called skin artifact.4,14,32 Several recent
review articles have summarized the common shortfalls of
skin based marker techniques.6,8,21
Markerless motion capture offers an attractive solution
to the problems associated with marker based methods.
However, the use of markerless methods to capture human
movement for biomechanical or clinical applications has
been limited by the complexity of acquiring accurate threedimensional kinematics using a markerless approach. The
general problem of estimating the free motion of the human body or more generally of an object without markers,
from multiple camera views, is underconstrained without
the spatial and temporal correspondence that tracked markers guarantee.
Model based approaches provide methods to address
some of the complexities associated with a markerless approach. An a priori model of the subject, for example, can
be used to strongly reduce the total number of degrees of
freedom of the problem. Another option is to increase the
number of cameras so that more measured data is available
to solve for a given number of degrees of freedom. Thus
the robustness of a markerless approach can be increased
by increasing the number of cameras and by limiting the
search space of possible body configurations to anatomically appropriate ones. This last strategy can be pursued by
using a human model to identify the motion of the subject.
Several model based methods have been proposed in
the past, modeling the human body or parts of it with
rigid1,9,15,18,29,30 or non rigid segments.19 However, these
INTRODUCTION
Motion capture techniques are used over a very broad
field of applications, ranging from digital animation for
entertainment to biomechanical analysis for sport and clinical applications. Sport and clinical applications require
excellent accuracy and robustness. Two other major requirements for clinical applications are to minimize the
Address correspondence to S. Corazza, Department of Mechanical
Engineering, Stanford University, 496 Lomita Mall, Durand B. 201,
Stanford, CA 94305-4038. Electronic mail: stefano.corazza@stanford.edu
1019
C 2006 Biomedical Engineering Society
0090-6964/06/0600-1019/0
1020
CORAZZA et al.
1021
FIGURE 1. Visual hull reconstruction concept. The silhouettes of the subject from different camera planes are back projected
in space. Their intersection generates the visual hull, a locally convex over-approximation of the volume occupied by the
subjects body.
n
ei i
(1)
i=1
8.(1 , 2 , 3 ) represents the three rotational degrees of freedom of the thigh with respect to the pelvis, (4 , 5 , 6 )
represent the three rotational degrees of freedom of the
shank with respect to the thigh, and(7 , 8 ) represent the
two rotational degrees of freedom of the foot with respect
to the shank. gab (0) represents the transformation matrix
from a to b coordinate frame in the initial configuration,
i.e. with all state variables equal to zero. The general twist
matrixes i are defined for each joint relative to the parent
segment, following the formulation described in.27
The final transformation from one body configuration to
another one is given by a matrix T defined as follows:
g1 0 ... 0
0 g2 ... 0
4N 4N
FIGURE 2. (a) Poser model in the reference pose, (b) 33 DOF full body model. Points in highly deformable regions have been
removed, as most clearly seen at the hips.
1022
CORAZZA et al.
FIGURE 3. Results of the matching algorithm (colored points) applied to the virtual environment sequence superimposed over the
virtual character (gray surface).
1023
FIGURE 4. Results of the matching algorithm applied to the virtual environment sequence. Points visual hull (top, in blue) and
matched model (bottom) for a gait cycle, in the sagittal plane.
(3)
1024
CORAZZA et al.
FIGURE 5. Comparison between ground truth provided by the virtual environment model and the results from the matching
algorithm (gray shaded area indicates one standard deviation) for (a) knee flexion, (b) knee adduction, (c) hip flexion, and (d)
knee adduction.
xi+1 =
yi+1 = xi + ki+1
xi
if p A(xi , yi+1 , Ti )
otherwise
(4)
Since the parameter T plays an important role in the acceptance function, several authors have proposed different formulations for its decreasing function (cooling schedule) and
the corresponding sampling distribution for ki+1 , in order
to improve the performances of the algorithm that normally
has a high computational cost.16,17,24,31,34 The formulation
used in this work is described in34 which samples ki+1 from
a Cauchy distribution. Sampling in this way allows the algorithm to visit each region with positive Lebesgue measure
infinitely often when a cooling schedule proportional to
T0 /i is adopted, where T0 is a large enough constant and i
is the number of iterations. To assure better capabilities for
climbing up local minima (as demonstrated in simulated trials10 ), in this work the parameter T is not decreased linearly
with respect to the number of iterations but depends also on
the value of the cost function. An extensive and complete
description of the general simulated annealing method can
be found in.22
(5)
min a b
COST(A, B) =
aA
bB
The cost function used here differs from the original formulation of the Hausdorff distance since it sums every single
contribution, instead of taking just the maximum between
the minimal distances between pairs of points. This modification increases robustness to possible outliers. As is the
case for the original Hausdorff distance, this cost function
is not commutative. Intuitively, one could state that a low
value of COST(A, B) guarantees that all the points of set A
1025
FIGURE 6. Comparison between ground truth provided by the virtual environment model and the results from the matching
algorithm (gray shaded area indicates one standard deviation) for (a) ankle dorsiflexion, (b) ankle inversion, (c) shoulder flexion,
and (d) shoulder abduction.
are not very far from their closest point of set B. However,
it does not guarantee that all points of set B are not very far
from their closest point of set A. In the first frame of the
sequence, the visual hull points are set A, while the model
points are set B, since the two sets may not be close to
each other (visual hull-to-model formulation). For subsequent frames the next visual hull frame is always very close
to the previous matched model state, so the cost function
is changed to the model-to-visual hull formulation, which
guarantees better accuracy because it is less sensitive to
phantom volumes in the visual hull. Phantom volumes are
defined as a large local deviation from the real subjects
outer body surface resulting from the use of too few cameras. In our case phantom volumes generate points of set
B (visual hull) far from their closest point of set A (model)
that are neglected since the cost functionbased on COST
(A, B)only accounts for the distance between points of set
A from their closest point of set B (model-to-visual hull).
Motion Data: Virtual Environment
In order to provide data with a real ground truth, a
virtual character was animated with known kinematics
1026
CORAZZA et al.
Hip flexion/ext
Hip adduction/abd
Knee flexion/ext
Knee adduction/abd
Ankle plantar/dorsifl
Ankle inversion/ev
Shoulder flexion/ext
Shoulder adduction/abd
Mean
error ( )
Standard
deviation ( )
RMS
error ( )
2.0
1.1
1.5
2.0
3.5
4.7
1.2
3.8
3.0
1.7
3.9
2.3
8.2
2.8
4.2
1.2
3.6
2.0
4.2
3.1
9.0
5.9
4.4
4.0
RESULTS
DISCUSSION
1027
FIGURE 7. Matching of the first frame. The visual hull point cloud is shown in blue, while the different segments of the model
are shown in other colors. The algorithm does not require a good initialization of the model in order to achieve the first matching
(right).
FIGURE 8. Result of the matching algorithm (color points) applied to an experimental data sequence (gray surface). On the bottom
the corresponding video images of the running sequence from one camera are shown.
1028
CORAZZA et al.
subject.25 This increase would improve the overall tracking accuracy and reduce the problem of internal-external
rotation of the hip and knee because the existing axial
asymmetries in the thigh and shank would be better represented. This is one of the major challenges that will need
to be addressed in the future since assessing true motions in
transverse plane can be very relevant from a biomechanical
point of view. The described algorithm, being based on the
entire shape instead of a small number of single points,
shows good performance even with camera resolutions of
an order of magnitude lower than current stereophotogrammetric systems for marker-based motion analysis. This approach also offers a great potential for the reduction of skin
artifact error. Instead of relying on just a few markers it is
based on the tracking of few hundred points per segment,
naturally averaging the skin artifact phenomenon across the
segment during the matching process.
Markerless motion capture also guarantees a great reduction of the amount of time for subject preparation compared to marker based methods since no time for marker
placement is required. Moreover inter-operator variability
is eliminated since no trained operator is needed to accurately place markers. On the other hand, the processing
time is longer since the creation of the model is required.
Future work must automate the setting up of the model and
address how foreground/background separation and camera calibration affect the accuracy of the results, which
have not been directly addressed in the virtual environment validation. However, the experimental data presented
already show good qualitative results, demonstrating the effectiveness and potential of the method for use in a clinical
environment such as a gait laboratory. Moreover, tracking
a running sequence, which is in general more challenging
than walking due to increased velocities of the subject,
demonstrated the robustness of the method. Overall, in the
authors opinion, the system can provide sufficient accuracy for biomechanical research and has great potential for
future improvements, both in the creation of the model
and in the matching algorithm. It is our intention to provide in the future an experimental comparison/validation
against currently available techniques such as marker based
methods.
Two further extremely valuable advantages of the presented algorithm are: (i) apart from a rough rigid body
positioning, it does not need to be initialized at all, being
able to go from a reference pose to the first frame pose
consistently, as shown in Fig. 7; (ii) it directly provides
joints centers and segment volume information during motion that can be used for a more accurate calculation of the
subjects kinetics.
ACKNOWLEDGMENTS
Funding
provided
VA#ADR0001129
by
NSF#03225715
and
REFERENCES
1
1029