Anda di halaman 1dari 44

Large-scale integrating project

Deliverable D1.1
Kinematic model of the human hand
Project acronym: DEXMART
Project full title: DEXterous and autonomous dual-arm/hand robotic manipulation with
sMART sensory-motor skills: A bridge from natural to articial cognition
Grant agreement no: FP7 216239
Project web site: www.dexmart.eu
Due date: January 31, 2009 Submission date: February 2, 2009
Start date of project: February 1, 2008 Duration: 48 months
Lead beneciary: OMG Revision: 1
Nature: R Dissemination level: PU
R = Report
P = Prototype
D = Demonstrator
O = Other
PU = Public
PP = Restricted to other programme participants (including the Commission Services)
RE = Restricted to a group specied by the consortium (including the Commission Services)
CO = Condential, only for members of the consortium (including the Commission Services)
ICT FP7 216239 DEXMART Deliverable D1.1
TABLE OF CONTENTS
1 Introduction 2
2 Hand modelling: state of the art 3
3 Motion capture 5
4 Kinematic model calibration 6
4.1 Least square calibration 8
4.2 Error analysis 8
4.3 Marker motion model 18
5 Methods for experimental validation 22
5.1 Statistical validation 24
6 Results 24
6.1 Marker motion model 24
6.2 Kinematic model selection 26
6.3 Evaluation of nger joints interdependencies 29
6.4 Marker set selection: human hand wearing a data-glove 35
7 Conclusions 39
A Bayes factors for marker regression: algebraic form 39
1
ICT FP7 216239 DEXMART Deliverable D1.1
Figure 1: Human hand skeleton and articulations.
1 Introduction
The articulations of the human hand are more complex than the comparable articulations of other animals.
In fact, the skeleton only consists of 27 bones, 14 for the ngers, ve metacarpals forming the palm and eight
carpal bones in the wrist (Fig. 1
1
). Thanks to this complexity and to an highly sensitive tactile feedback
humans can manipulate objects in the environment and execute complex tasks that are still out of reach for
state-of-the-art robotic systems. As many of the tools that we use in our everyday life were designed for us
humans, robots that can manipulate the same tools would gain enhanced human interaction capabilities.
Also, a natural ability to interact with the environment could ease the creation of richer training data-sets
and consequently enhance the robot articial intelligence.
To improve robotic capabilities a necessity arises for a model that can reproduce the vast majority of
the hand manipulation tasks, that can be measured with available motion capture technology and that can
serve as a path indicator for the evolution of robotic hands. Unfortunately, a model that emulates in toto
the human hand leads to two major problems. First, the subtle movements of the wrist (carpal) bones are
dicult to measure using non-invasive techniques. Second, a detailed robotic replica of the human hand
is complex to implement. The good news is that an approximated articulation model that is capable of
reproducing most of the common manipulation tasks can be implemented and, application-wise, it is often
sucient.
This report describes the research on kinematic models of the human hand conducted within the DEX-
MART project. In Sec. 2 we review existing state of the art approaches adopted in a variety of research elds
and applications. In particular, we analyse the dierent choices in terms of number of bones, number of
joints and marker congurations. After a brief overview of the Vicon motion capture system (Section 3) we
describe our contribution toward a more accurate subject calibration procedure (Section 4). To this extent,
rst we present the standard procedure, then we analyse the dominant errors using Magnetic Resonance
Imaging (MRI) and nally we propose a novel solution that aims at reducing soft tissue artefacts by explicitly
modelling marker movements. Before presenting the experimental results we briey discuss best practises for
kinematic model evaluation (Sec. 5). The experimental validation in Section 6 assesses the performance of
the proposed calibration procedure, compares dierent articulation models, and evaluates the level of inter-
1
The original image is kind courtesy of Mariana Ruiz Villarreal,
http://en.wikipedia.org/wiki/File:Scheme_human_hand_bones-en.svg.
2
ICT FP7 216239 DEXMART Deliverable D1.1
dependencies between joint parameters. Also, a preliminary work on best marker set selection for combined
optical and data glove based motion capture is presented. Finally in Sec. 7 we draw our conclusions.
2 Hand modelling: state of the art
Approximated kinematic models of the hand are a common research tool. The model accuracy mainly
depends on the application requirements and consequently on the research eld. The computer-vision
community has used kinematic models for a long time to research on hand tracking and gesture recognition [1,
2, 3, 4] (for complete overviews the reader should refer to [1, 3]). Due to the ambiguity of the measurements
(i.e., the images) simple models are usually preferred. In their seminal work Rehg and Kanade [4] extract
gradient-based image features and use a quadratic error minimisation procedure to track the hand motion.
The kinematics are summarised by a 21+6 Degrees of Freedom (DoF) model which uses ve DoF for the
Thumb, four for each of the other ngers; the remaining six DoF dene the global position and rotation
of the wrist in the 3D space. In the decade that follows this publication other researchers adopted the
same kinematic model [5, 6]. The agreement between independent publications suggests that a 21+6 DoF
model is sucient for gesture recognition and that tracking more complex models may not be feasible with
state-of-the-art vision-based technologies. In fact, at the best of our knowledge none of the markerless
trackers attempts to measure subtle articulation movements like palm arching.
The biomechanical research community has widely studied the human hand using optical or magnetic
motion capture. As modern motion capture systems can provide a set of discrete positions with sub-
millimetre accuracy, measurement ambiguity is a smaller problem than in markerless computer vision appli-
cations.Thanks to more accurate measurements biomechanical researchers had the possibility to explore a
wide variety of articulation models looking for a faithful reproduction of the human hand kinematics. Baseline
models for research on biomechanics also assume a rigid palm where the carpal bones are approximated by a
single segment [7, 8] and the relative position of the four metacarpal bones (excluding the thumb) is xed a
priori [9, 10]. However, this type of model may not be appropriate to model object manipulation and gasping
activities where the subtle deformations of the palm become relevant. A more accurate model is part of the
Santos virtual human, a complete human body model used to simulate biological activity and ergonomics
for military and industrial purposes [11]. The Santos hand has 25+6 DoF, with four DoF modelling exion
and adduction of the ring nger and pinky Carpomethacarpal (CMC) bones. Several research works analyse
in-depth the motion of the thumb [12, 13, 14, 15]. Results show that the standard thumb model with
three segments and ve DoF is quite a crude approximation [15]. Although the motion of the CMC joint is
dominated by the two DoF associated to exion-extension (FE) and adduction-abduction (AA), signicant
variations in the pronation-supination direction exist. Further complexity is introduced by non-orthogonal
and non-intersecting joint axis. To improve the model Hollister et al. [16] propose to replace the universal
CMC joint with a two DoF saddle joint. Recently Chang et al. [15] realised that with a saddle joint the PS
residual is still signicant and they introduce a model with a third DoF.
The hand models proposed for computer graphic animations are also more complex than those used in
vision-based applications. To model palm arching Yasumuro et al. [17] add to the 21+6 DoF model one
DoF to each of the CMC nger joints. Albrecht et al. [18] use thee DoF for the nger Metacarpalphalangeal
(MCP) joints and the thumb CMC joint and control the articulation via pseudo muscles. Sueda et Al. [19]
use tendons and muscles to introduce forces and pose constraints and to predict the 3D shape of the hand.
The study of the skin surface has also attracted the attention of the computer graphic and biomechan-
ical communities. In computer graphic the aim is to produce realistic deformations of characters surface
mashes [20]. Usually, the position of the mesh vertexes is transformed according to the characters pose.
Undoubtedly Single Weight Enveloping (SWE) (Softimage 1992 ) or "skinning" (Alias|Wavefront, 1998) is
the most widely used and understood technique to model skin deformations. SWE computes the position
3
ICT FP7 216239 DEXMART Deliverable D1.1
of each vertex as a linear combination of the joint poses. Typically, only the poses of the closer segments
are used and the animator manually adjusts the weights to improve the quality of the rendering. Although
this technique is simple it is also prone to the classic "pinching" at the joints caused by the sub-space of
the deformation function becoming degenerate and collapsing when joints are extremely exed. To reduce
pinching artefacts and increase the level of realism Wang et al. [21] still use linear transformations but
add multiple weights per segment. Singh and Kokkevis [20] instead extend the work from Sederberg and
Parry [22] and use control points aligned with the character skin and 3D Bezier control volumes to deform
the skin according to the poses of the character skeleton. This method yields much more realistic results
than SWE due to the Bezier sub-space deformation. Lewis et al. [23] propose a dierent approach that
does not rely on initial manual tuning of the weights. Their method allows CG animators to bind their
skin to a character skeleton in a number of key poses; then the algorithm interpolates the intermediate
deformations using non-linear kernels. Unfortunately a large number of poses is required to generate the
mapping for complex limbs like the hand. Anguelov et al. [24] propose a data driven approach lo learn
secondary motion like muscle bulging. Their method applies a linear regressions from the vertex positions
to the skeleton poses. The problem is formulated as a large optimisation procedure that learns the model
parameters and enforces mesh smoothness. Although the mesh deforms realistically, this approach is not
suited to biomechanical analysis as it requires several range scans of the subject. Park and Hodgins [25]
instead model the dynamic movements of the skin from a large set of markers.
Unlike computer graphic animators, biomechanicists model skin motion to reduce capture artefacts and
consequently improve the accuracy of the joint angle measurements. Standard approaches to model tting
and calibration model the motion as Gaussian noise added to rigid marker positions [26]. Then rigid markers
are t to the deformed 3D reconstructions in a least squares sense. Andreacchi et al. [27] propose an
interesting solution to the soft tissue artefacts problem. Their design relies on distributing as many marker
as possible on each segment. Then a mass is assigned to each marker and the centre of mass and the inertia
tensor of the cluster are calculated on a frame-by-frame basis. By changing the masses the model adapts
the motion of the markers to the skin. Despite its elegance in design and implementation, this method has
been shown to be remarkably unstable by Cereatti et al. [28]. Cappello et al. [29] extend the popular CAST
technique (Calibrated Anatomical System Technique) from Cappozzo et al. [30] by calibrating two distinct
poses at the same time. The two sets of marker positions are interpolated using a linear function of the
joint angle. Although simple and ultimately extensible to many poses this method is limited by the linear
transformation that may not t the skin motion.
The interaction between bones, tendons and skin constraints the possible poses that the human hand
can take. To reduce the tracking search space, and consequently the computational cost, Lin et al. [31] and
the Santos model [11] impose constraints on the joint angle ranges. Also, although the parametrisation of
the hand can have between 20 and 30 DoF, the physical action of the tendons imposes strong dependencies
between dierent joints. These dependencies can be used to reduce the number of free parameters [32]. A
simple rule of thumb for the Proximal Inter-Phalanx (PIP) and Distal Inter-Phalanx joint angles
DIP
and

PIP
is that
DIP
=
2
3

PIP
. As long as the hand moves freely this rule holds. However, if the subjects
grasps an object like a pen or a knife,
DIP
can strongly deviate from the predicted angle
2
3

PIP
. Lin et
al. [31] instead perform a Principal Component Analysis (PCA) of the nger motions. The analysis leads
to a principled model of the inter-parameter dependencies. The authors show that, for tracking purposes,
it is possible to represent the vast majority of the hand poses with just seven degrees of freedom. Also, the
kinematic poses of the hand are weakly correlated to the body poses. Jin et al. [33] exploit this dependency
to generate realistic hand animations.
Biomechanical research has investigated more in-depth joint dependencies [34, 35, 36]. In general the
mechanism according to which, when normal humans attempt to move just one nger the other ngers
have to move as well, is well known. Also, the movements of the thumb, index nger, and little nger
typically are more independent than movements of the middle or ring ngers. Simultaneous motion of non
4
ICT FP7 216239 DEXMART Deliverable D1.1
instructed digits may result in part from passive mechanical connections between the digits, in part from
the organization of multitendoned nger muscles, and in part from distributed neural control of the hand.
Recent studies have demonstrated that mechanical coupling between the ngers rather than neuromuscular
control limits appears to be a major factor limiting the complete independence of nger movements [37].
Finger independence is generally similar during passive and active movements, but showed a trend toward
less independence in the middle, ring, and little ngers during active, large-arc movements. Mechanical
coupling limited the independence of the index, middle, and ring ngers to the greatest degree, followed by
the little nger, and placed only negligible limitations on the independence of the thumb. Studies involving
simple grasping or skilled tasks have shown that a small number of combined joint motions (i.e., synergies)
can account for most of the variance in observed hand postures that is representative of most naturalistic
postures during object manipulation. These synergies are used broadly during variety of tasks execution,
simple hand motions such as reach and grasp of objects that vary in width, curvature and angle, and skilled
motions such as precision pinch. This studies suggest that this small set of synergies represent basic building
blocks underlying natural human hand motions [38]. The degree of interdependence of the ngers depends
on the extension of exion movement of the ngers and also it depends on the frequency of rhythmic
movements. Angular motion tended to be greatest at the middle joint of each digit, with increased angular
motion at the proximal and distal joints during 3 Hz movements [35]. Nakamura et al. [34] discovered that
the correlation between distal and proximal joints may depend on the grasped object. Also, while Hager-
Ross and Schieber [35] simply quantify the dependency level between dierent ngers, Lee and Zhang [36]
propose a control model that uses nger interactions to simulate the natural motion.
Another critical element for capturing the motion of the human hand is the marker conguration.
Motion capture systems are a well established technology to measure the motion of the main human limbs.
However, only recent advances in terms of sensor resolution have allowed researchers to use small markers in
medium-size (3m or more) capture volumes where natural movements of the hand are easier to reproduce.
Nevertheless, selecting the correct number of markers and their position is critical to minimise occlusions.
In [39] 13 colour coded markers are located on key hand locations, 5 on the nger tips, 4 on PIP joints
of (not on the thumb), 3 markers on MCP thumb, pinkie and index joints and one on the wrist. In this
case the large size of the markers heavily constrained their positioning and therefore subtle palm movements
remained unobserved. Zhang et al. [8] for their experiments use 21 markers: one per nger tip, one per joint
right above the joint, and one on the wrist. Then the knowledge of the relative position between markers
and joints is used to estimate the centres of rotation. A similar setup is also presented in [28] but without
markers on the nger tips. For more accurate experiments up to six markers are used to capture the complex
motion of the thumb CMC joint alone [15]. Also, a larger number of markers is positioned on the hand
by Cerveri at al. [9]. Although this setup requires a longer preparation, 42 markers can provide a certain
level of redundancy in case of occlusions. Recently Baker et al. [40] use 24 markers with 4mm diameter to
capture the nger and wrist movements during computer keyboard usage. Finally, Cerveri at al. [9] showed
that 24 markers are sucient to capture simple tasks in a constrained scenario with xed wrist position.
3 Motion capture
This section gives a brief overview of the VICON motion capture procedures. For the life sciences market,
Vicon provides a motion capture software called Nexus. This application allows a user to control the
hardware, to post process the marker data and to compute joint angles. A typical optical motion capture
section includes the following steps:
1. Hardware setup.
2. Subject setup.
5
ICT FP7 216239 DEXMART Deliverable D1.1
3. Motion data capture.
4. Subject calibration.
5. Kinematic tting.
A brief summary of each operation follows:
Hardware setup: the rst stage is to dene the capture volume and position the VICON cameras so that
the markers will be inside the eld of view of at least two cameras at any given time. Then Nexus
is used to calibrate the cameras and dene the origin of the capture volume to produce accurate 3D
data. Camera calibration enables Nexus to work out the positions, orientations, and lens properties of
all the cameras. The cameras are calibrated by waving a special wand in front of them throughout the
area where you intend to capture 3D data. The same wand is then used to set the global coordinate
system in the capture volume, so that subjects are displayed the right way up in the Nexus workspace.
Fig. 2 (a) shows a visualisation of a calibrated camera setup for hand motion capture.
Subject setup: to capture motion information the user attaches a set of markers onto the subjects surface.
Then Nexus requires a description of the skeletal structure of the subject, and of the marker set. This
information is stored in a Vicon Skeleton Template (VST) le. In particular, for each subject the
user species the approximate bone lengths, which markers are attached to which segment and the
position of the markers in the segment coordinate system.
Motion data capture: the VICON cameras capture motion data by shining light from a strobe around the
camera lens onto retro reective markers attached to the subject. These markers show up as very
bright blobs in the camera, which then does circle tting on the blobs and extracts the blob centres.
Every camera sends the positions of the extracted circles to the main processing unit in Nexus, which
uses the calibration data from the cameras to calculate the positions of the centroids in 3D space.
These are the so called marker reconstructions (reckons). An example of reconstructed markers is in
Fig. 2 (c).
Subject calibration: it is not necessary that the information in a VST is 100% accurate. For example,
the VST le for a hand may say that the index nger is 80mm long, but the length of the specic
subjects index is 90mm. As the name says the VST is a template that should generalise across
similar subjects. To this extent a calibration procedure scales the VST parameters describing segment
lengths, orientations and marker position to the physical size of the subject. To calibrate the subject
usually a special trial is recorded. In this trial the subject is asked to perform a set of actions that
expose the articulations Range Of Motion (ROM) .
Kinematic tting: for all the captured trials Nexus uses the calibrated subject to label the reckons and
compute the joint angles by tting the kinematic model to the data. The procedure works correctly
as long as the markers are attached in the same conguration. Fig 2 (c) shows a set of reconstructed
and labelled markers. To compute the joint angles Nexus calculates the best t between the specied
marker positions on the subject and the reconstructed points from the captured data. These joint
angles are often considered the nal output from a motion capture pipeline and are used to drive
3D models of animated characters in lms and games, or analysed in life sciences and engineering
applications. Fig 2 (e)-(f) shows examples of the hand subject kinematically tted in dierent poses.
4 Kinematic model calibration
As anticipated in the previous section the VST template contains information about the subject. In particular
it denes the skeleton topology (i.e., the number of segments/bones n
b
and their hierarchy), the mapping
6
ICT FP7 216239 DEXMART Deliverable D1.1
(a) (b)
(c) (d)
(e) (f)
Figure 2: Vicon system (a) and motion capture intermediate results (b)-(f). (b): visualization of a calibrated
camera setup; (c): 3D reconstructions of markers on a hand; (d): Vicon Skeleton (subject) for a human
hand; (e)-(f): examples of the hand subject kinematically tted in dierent poses.
7
ICT FP7 216239 DEXMART Deliverable D1.1
between the n
m
markers and their parent segments and the marker positions M = {m
i
}
n
m
i=1
in the the
parent segment coordinate systems
2
. For each marker i we dene S
i
(, ), a 4 4 matrix that transforms
the local coordinates m
i
to the world coordinates. This transform depends on the joint angle state and
on the vector of subject parameters (i.e., bone lengths and orientations). Given the kinematic chain we
can decompose S
i
as the product of paired transformations where each pair is composed of a xed segment
transformation P() and a time-varying joint transformation T(). For example, if a marker i is attached
to a segment c, and c is the third segment of a chain a, b and c then
S
i
(, ) = P
a
()T
a
()P
b
()T
b
()P
c
()T
c
(). (1)
Subject calibration adapts the model to the physical dimensions of the subject and this is formulated as an
optimisation problem over the segment parameters = {l
j
}
n
b
j=1
as well as over the marker positions M.
4.1 Least square calibration
VICON subject calibration is formulated as a non-linear least square problem where each marker generates
one observation (i.e., a reckon) r
i,k
R per frame k according to a Gaussian model, that is
r
i,k
= S
i
(
k
, ) [m
i
+n
i,k
] , (2)
where n
i,k
is a Gaussian distributed random vector with zero mean and covariance
i
.
Given a set K of preselected key frames from the ROM trial, the objective function f (.) to be minimised
is the sum of squared dierences between marker positions and the reckons, that is
f (, M, ) =

kK
n
m

i=1
f
i,k
(
k
, m
i
, )
2
(3)
=

kK
n
m

i=1
_
_
_
_

i
_
S
i
(
k
, )
1
r
i,k
m
i
_
_
_
_
2
, (4)
where = {
k
}
kK
denotes the joint angles for all the key frames and f
i,k
(.) outputs the per-frame and
per-marker residual. In a typical calibration scenario the number of free parameters may be high and add up
to several thousands. The solution of such a large problem is found via a conjugate gradient-based iterative
procedure. The quality of the nal results depends above all on whether the model in Eq. (2) can predict
the reconstructed data. To point out the limitations of the calibration procedure in the next section we
analyse the residual errors.
4.2 Error analysis
The model in Eq. 2 treats the marker positional error as a residual covariance. This assumption simplies the
mathematical formulation of the subject calibration problem; however, as showed in Figure 3, the distribution
of the marker residuals may signicantly dier from a Gaussian function.
Model prediction errors have two major cause: (i) sensor noise that aects the reconstruction of the
3D marker positions; (ii) marker movements due to skin and in general soft tissue deformations caused by
pose changes and bulging muscles. Although we can reasonably assume Gaussianity of the sensor noise, the
residual marker movements due to skin or cloth sliding requires further study. As we will see in the following
sections a better understanding of skin motion will lead us to formulate an improved calibration procedure
where marker movements with respect to the parent segments are explicitly accounted for.
2
Note that the the positional coordinates (local and global) are dened by homogeneous 4D vectors, e.g., m= [m
x
m
y
m
z
1]
8
ICT FP7 216239 DEXMART Deliverable D1.1
1
0.5
0
0.5
1
1
0
1
2
1
0
1
2
x
y
z
1
0.5
0
0.5
1
1
0.5
0
0.5
1
1
0.5
0
0.5
1
x
y
z
1
0.5
0
0.5
1
2
1
0
1
2
1
0.5
0
0.5
1
x
y
z
1
0.5
0
0.5
1
1
0.5
0
0.5
1
2
1
0
1
2
x
y
z
Figure 3: Example of 3D residuals on hand motion capture. The red dots show the dierence vector between
predicted marker positions and reconstructed measurements. The standard VICON calibration assumes the
residuals to be Gaussian distributed. The plots shows that a Gaussian does not well approximate the data
distribution.
9
ICT FP7 216239 DEXMART Deliverable D1.1
Figure 4: Right hand - labeled marker.
At Second University of Naples this point was developed by using hand capture data from a Magnetic
Resonance Imaging (MRI) device. The hand was captured with dierent marker setups. Also, we performed
static and sequential MRI acquisitions on dierent hand poses that simulate the tasks proposed in the
DEXMART testing scenario [41]. Then we extracted from MRI data the displacements of markers placed on
the hand dorsal. In particular in this study, we analyse the marker movements caused by a set of predened
exions of the ngers. The displacement of the marker relative to the underlying bone is observed and
quantied.
First, we used the MRI equipment to capture a static hand in two dierent poses and we reconstructed the
three-dimensional models of the hand bones. Then, reective markers were attached to the subjects hand
(see Fig. 4) and a sequential protocol was used to track their position in two dierent postures. To validate
the static data a dynamic MRI scan of the sensorised hand was also performed. No signicant dierences
were measured between the static and dynamic displacements. Some authors [42, 43] have reported on
kinematic studies based on MRI acquisition techniques, the importance of acquiring joint motion actively,
due to the existence of statistically signicant variations between acquiring actively or passively. Unlike
other articulations like knee and hip [44, 45, 46], this does not apply to soft tissue artefacts evaluation of
the back-hand. In active acquisition, no abnormal tracking patterns due the inuence of hand muscles and
tendons were observed during the exion and extension of ngers.
Although MRI data usually shows relevant dierences across subjects in soft tissue elasticity, and these
dierences are dependent on the subject weight, height and age, for the purpose of our experiment we
considered the variation of distance between marker and bone reference to be small and therefore subject in-
dependent. Our experiments were executed on a healthy male subject. The size of the hand is approximately
20.5cm long. The subject consented to use of his anatomical data for scientic purposes.
MRI acquisition
The MRI scanning was performed at University of Naples Federico II with a 1.5 T station manufactured by
Philips Medical systems. We captured the subject while in supine position and with the right arm on top of
the body. Two high-resolution MRI scans of the right hand containing thin axial slices were obtained. The
two series have a small Field Of View (FOV) as they measure one hand only. Two series of T1-weighted
spin echo images and two series of T1-weighted gradient echo images were acquired with one frame every
10
ICT FP7 216239 DEXMART Deliverable D1.1
Figure 5: MRI processing pipeline.
11.4ms, 4.4ms echo, and 250mm FOV. The surface markers in the MR image looks like small cylinders as
highlighted by the arrows in Fig. 6, 7, and 8.
For each hand pose one hundred images representing a slice of the hand were generated. The spacing
between each slice is 1.5mm. Each image is 256 256 pixels in size, 8 bit per pixel, and with each pixel
covering a physical rectangular area 0.98mm wide. In the rst posture the slicing plane is parallel to the
longitude direction of the ngers. While in the second pose the plane is perpendicular to the longitudinal
direction of the ngers.
MRI processing
In the rst stage of the processing a semiautomatic analysis was conduct. We used the software Vitrea
ver. 2.0 of the Vital Images inc. for 2D and 3D visualization and editing of the MR images data. The
software can convert the scans (DICOM format)into many dierent image format, segment the region of
interest and generate iso-surfaces. Also, we performed a manual segmentation to distinguish bones and
surface markers from soft tissues. First, the pixels were removed by the tuning of the threshold value at 60.
Then, a contour tracing method was used to identify the object edges. In the second stage of processing
was carried out a more accurate measurement of sliding for the metacarpal markers without any manual
editing of recorded images. For this purpose, we have used a co-registration method of MR Images for the
two dierent poses of the hand (pose 1:open hand, pose 2: closed hand). The used method was developed
at Biostructure and Bioimaging Institute (IBB) of Italian Research National Council (CNR). We used the
SPM software to implement the automatic coregistration processing of MRI data, it is a suite of MatLab
functions and subroutines,typically used for functional PET and MRI brain image analysis that implements
"statistical parametric mapping". By rst, the MR images sequences (pose 1 and pose 2) are smoothed
and ltered to eliminate some artifact in the coregistration process due of the fat and of the skin which
are in the recorded images. The algorithms work by minimizing the sum of squares dierence between
11
ICT FP7 216239 DEXMART Deliverable D1.1
Table 1: Pair-wise marker distance with close and open hand. The distance measures are in millimetres.
MARKER ID DISTANCES
FIRST SECOND OPEN CLOSED DIFF.
RMM4 RH4 19.1 25.4 -6.3
RMM4 RH6 27.8 31.8 -4.0
RMM2 RH1 34.0 36.2 -2.2
RMM2 RH3 20.5 24.3 -3.8
RMF1 RH3 13.5 16.9 -3.4
Table 2: Distances between a marker and the relative bone head. The radius bone is used as a reference.
The distance measures are in millimetres.
ID DIRECTION BONE DISTANCES
OPEN CLOSED DIFF.
RH6 PROX. CARPAL 15.3 13.8 +1.5
RH6 DISTAL CARPAL 18.6 14.2 +4.4
RMM4 PROX. 4th METAC. 14.6 18.6 -4.0
RMM4 DISTAL 4th METAC. 43.5 40.5 -3.0
RMM2 PROX. 3rd METAC. 24.7 30.4 -5.7
RMM2 DISTAL 3rd METAC. 44.6 37.8 +6.8
RMF2 PROX. 2nd MID PHA. 6.3 5.8 +0.5
RMF2 DISTAL 2nd MID PHA. 21.2 23.8 -2.6
RH3 DISTAL 3rd METAC. 20.1 13.8 +6.3
the images which are to be coregistered. The rst step of the process is to determine the optimum 12-
parameter ane transformation. Initially, the coregistration is performed by matching the whole of the two
hand pose. Following this, the registration proceeded by only matching the metacarpal bones together, by
appropriate weighting of the voxels. A Bayesian framework is used, such that the registration searches for
the solution that maximizes the a posteriori probability of it being correct. i.e., it maximizes the product of
the likelihood function (derived from the residual squared dierence) and the prior function (which is based
on the probability of obtaining a particular set of zooms and shears). The ane registration is followed by
estimating nonlinear deformations, whereby the deformations are dened by a linear combination of three
dimensional discrete cosine transform (DCT) basis functions. The parameters represent coecients of the
deformations in three orthogonal directions. The matching involved simultaneously minimizing the bending
energies of the deformation elds and the residual squared dierence between the images.
Results
In the rst stage of the analysis, after registration of the 3D hands we performed two dierent measurements.
First we measured the distance of a marker from a reference bone in both poses (Fig 6 (c)-(f)). Then,
we measured the pair-wise distances between metacarpal markers and their variations due to pose changes
(Fig 7 (e)-(f)). Table 1 and 2 summarize the results.
The results in Tab 1 show that when the hand is exed, due to skin stretch and muscles deformations,
the distance between markers increases. The markers slide over the bones while the hand moves, and this
12
ICT FP7 216239 DEXMART Deliverable D1.1
(a) (b)
(c) (d)
(e) (f)
Figure 6: MRI measurements for the marker RH6 on open (left column) and closed (right column) hand.
(a)-(b): marker spatial position (yellow arrow); (c)-(d): distance from radius proximal head; (e)-(f): distance
from metacarpal proximal head.
13
ICT FP7 216239 DEXMART Deliverable D1.1
(a) (b)
(c) (d)
(e) (f)
Figure 7: MRI measurements for the marker RMM4 on open (left column) and closed (right column) hand.
(a)-(b): marker spatial position (yellow arrow); (c)-(d): distance from metacarpal proximal head; (e)-(f):
distances from markers RH4 and RH6.
14
ICT FP7 216239 DEXMART Deliverable D1.1
(a) (b) (c)
Figure 8: MRI measurements for the marker RMF2. The blue lines show the distances between the marker
and the 3rd middle phalanx bone heads.
causes large residual errors during calibration and tting with the optical system. In absolute value RMM4
moves of 6.3mm and 4.0mm with respect to RH4 and RH6. This indicates that the skin deformations follow
a not linear law. In fact the distance changes between RH4 and RMM4 are bout 33% of the initial distance,
while between RMM4 and RH6 the change is about 14%. These measures give us a good starting point
to correct passive optical data. The results of the RH1, RMM2 and RH3 chain are even more clear. The
slide between RMM2 and RH1 is of 6%, while RH3 slides by 18%. We can see that RMF1 slides of 25% by
RH3, and we have still to consider that RH3 is moving too. The results in Tab. 2 show that RH6 sliding
is apparently incongruent. Closing the hand doesnt stretch the skin as we could believe, but contracts it.
This happens because the subject does not only closes the hand, but also moves it. This causes a larger skin
slide of about 24% toward the metacarpal and 9% toward the radius. As a results RH6 moves toward the
metacarpal. RMM4 instead shifts toward the radius because of the relative hand-radius movement; while
RMM2 moves toward the middle nger. Vice-versa, RMF2 is moving toward the metacarpal zone (Fig. 8).
Finally the skin sliding causes RH3 to move toward the 3rd phalanx.
In the second stage of our analysis, the output of the coregistration process has provided a more reliable
result of the distances of metacarpal markers in the two hand poses (Fig 9). The rst step of coregistration
process required a ltering operation of the 2 MRI series. The lack of the antenna during acquisition phase
produce noise which is reduced dividing the images with a parabolic tting. Another ltering process was
necessary to reduce anisotropic noise in such a way as to preserve the parts of the images with higher
gradients (edge preserving). Then, a level setting process was applied to eliminate non-interesting parts of
the images(Fig 10). Table 3 summarize the results.
The results in Tab 3 show that metacarpal markers signicantly move over the bones while the hand
moves from pose 1 to pose 2. Obviously, the most important contribution is given by the displacement along
the axial direction of the hand (Y RANGE). The largest displacement appens on the third metacarpal bone
in the vicinity of the proximal phalanx (middle nger). RH3 slips more than 11 mm along the direction of
the 3th metacarpal bone while RH2 and RH4 have covered a distance little bit less. The distances covered
by markers RMM2 and RMM4 respectively are 68% and 45% of maximum displacement. Also, the marker
RH5 approximately slips 68% of the maximum displacement. We can see that the sliding is maximum in
the middle of the backhand and it decreases towards the wrist and the little nger more than what happens
in the thumb direction. The signicant variation along the others 2 directions also mast be considered to
reduce the residual errors during subject calibration and tting with an optical capture system.
We also have analyzed the MRI scans for a gloved hand in the 2 hand poses. The considerable noise
due to the presence of the glove, made it impossible to obtain good images sequences after the ltering
and segmentation phases in the coregistration processes. As can be seen (Fig 11 (c)-(d)) many markers
15
ICT FP7 216239 DEXMART Deliverable D1.1
(a) (b)
(c) (d)
Figure 9: Coregistration of MR Images for the 2 hand poses. (a)-(b): the 2 poses labelled markers; (c)-(d):
the MR Images of the 2 poses, with metacarpal labelled markers.
16
ICT FP7 216239 DEXMART Deliverable D1.1
(a) (b)
Figure 10: Coregistration of MR Images for the 2 hand poses. (a)-(b): Filtered Images (left column)and
MR Images after the coregistration(right column).
17
ICT FP7 216239 DEXMART Deliverable D1.1
Table 3: Distances between metacarpal markers in the 2 hand poses after coregistration process. The
distance measures are in millimetres.
MARKER ID DISTANCES
2-DISTANCE X RANGE Y RANGE Z RANGE
RH2 10.94 0.94 10.85 1.07
RH3 11.31 2.01 10.91 2.23
RH4 10.08 1.29 9.15 4.05
RH5 7.76 0.01 6.07 4.83
RMM2 7.70 0.24 7.61 1.13
RMM4 5.16 0.54 4.96 1.32
RH1 5.52 0.55 5.24 1.68
RH6 4.98 1.01 3.82 3.03
are lost after the ltering process and the segmentation of matacarpal bones images is not properly correct.
We belive that a dierent MRI acquisition protocol must be investigate to obtain a reliable coregistration
process for gloved hand.
Residual correlation analysis
To better understand the soft tissue artefacts we analysed the calibration result from a right hand ROM
trial. From a preliminary visual inspection of the results using NEXUS it became apparent that a dependency
between marker motion and joint angles in indeed exists. The plots in Fig. 12 (a)-(c) show three typical
marker motion behaviours. Each graph displays one component of the unnormalised marker to reckon
residual
d
i,k
= S
i
(
k
, )
1
r
i,k
m
i
, with d
i,k
D, (5)
plotted against the maximally correlated joint angle. As a rst approximation a good portion of the residuals
present a simple linear relationship with the joint angle (Fig. 12 (a)). The approximation holds when the joint
angle range is small. However extreme nger exion may result in a non-linear relationship (Fig. 12 (b)).
Finally, as showed by residual-parameter correlation matrix in Fig. 12 (d) marker motion can be highly
correlated with more than one joint parameter (i.e., matrix rows with more than one red cell). Consequently
a model with a single input cannot provide good predictions (Fig. 12 (c)).
4.3 Marker motion model
The residual error analysis showed that it may be possible to improve the predictive power of the kinematic
model by explicitly accounting for marker movements. To this extent we propose to model the marker
position as a parametric function m
i
(, w
i
) of the joint state with parameters w
i
W. In particular we
chose a linear form, m
i
(, w
i
) = F()w
i
, where
F() =
_

x
()
T
0 0 0
0
y
()
T
0 0
0 0
z
()
T
0
0 0 0 1
_

_
18
ICT FP7 216239 DEXMART Deliverable D1.1
(a) (b)
(c) (d)
Figure 11: Gloved hand for coregistration process; open (left column) and closed (right column) hand.
(a)-(b): Labelled markers; (c)-(d): Output after ltering and segmentation process.
19
ICT FP7 216239 DEXMART Deliverable D1.1
0.2 0.15 0.1 0.05 0 0.05 0.1 0.15
3
2
1
0
1
2
3
4
5
Joint angle ()
d
1 0.5 0 0.5 1 1.5
1.5
1
0.5
0
0.5
1
1.5
2
Joint angle ()
d
(a) (b)
0.4 0.2 0 0.2 0.4 0.6
1
0.5
0
0.5
1
1.5
Joint angle ()
d

d


1 2 3 4 5 6 7 8 9
10
20
30
40
50
60
70
80
90
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
(c) (d)
Figure 12: Visual analysis of the relationship between joint parameters and unnormalised marker to reckon
residuals. (a)-(c): sample residual components (in mm) plotted against the maximally correlated joint
parameter (in radians). (d): correlation coecient absolute values.
20
ICT FP7 216239 DEXMART Deliverable D1.1
contains the regressor vectors () for each of the three positional coordinates. In our implementation the
regressors are simple polynomial components. The modied variable position marker model becomes
r
i,k
= S
i
(
k
, ) [F(
k
)w
i
+n
i,k
] . (6)
Note that, if the polynomials are zero-order (i.e., F() = I) Eq. (6) reduces to Eq. (2) with m
i
= w
i
.
To limit the number of additional parameters it is not desirable to x the polynomial order for all the
residuals or to have polynomial components for all the joint parameters in . As shown in Fig. 12 each
residual is usually correlated to a limited number of joint angles and in some cases non-linear components
may not be necessary. At the same time favouring simpler models with a low number of extra parameters
improves the computational eciency of the calibration step and prevents model overtting. A model
selection procedure is required to nd the correct balance between the number of extra parameters and
model complexity. As in other model selection problems our goal is to add only those parameters which
contribute to a signicant reduction of the overall residual.
Given a measure of model quality, the optimal model selection scheme would require to optimise the
parameters for all the combinations of regressors. Unfortunately, as the parameter optimisation is an
expensive procedure on its own, a suboptimal, although faster, methodology is necessary. To this extent we
propose a procedure that requires the optimisation of two models only.
The outline of the procedure is in Algorithm 1. First we calibrate the standard model dened in Eq. (2).
Then we perform an analysis of the residual error to select the additional polynomial parameters. Finally,
we calibrate the new model where the markers move according to the polynomial functions.
To perform model selection we treat the residual components independently. Therefore, it is convenient
to dene d
k
= [d
1,k
. . . d
n
m
,k
] as the concatenation of the unnormalised residuals at frame k and an
index u = 1, . . . , U for the single residual components d
(u)
k
. Also, we dene the 1D polynomial function
g
(u)
(, w
(u)
) =
(u)
()
T
w
(u)
modelling the marker motion for the component u. For each residual the
model selection procedure is composed of two steps: (i) rst we analyse the correlation between
k
and
d
(u)
k
; those parameters with correlation larger than a threshold T are selected as active inputs for the skin
model function g
(u)
(Algorithm 2); (ii) then we initialise set
(u)
= 1 with the zero order regressor only
and for each input we add higher order regressors (i.e., ,
2
,
3
, etc.) in a greedy fashion (Algorithm 3);
the greedy procedure comes to a halt when none of the more complex models under test improves the
performance with respect to the current best model.
The performance of two models g

and g

on the data d is compared by computing the Bayes factor


p(d|g

)
p(d|g

)
=
_
p(d|w

, g

)p(w

|g

)dw

_
p(d|w

, g

)p(w

|g

)dw

. (7)
As in Eq. (6) we use a Gaussian noise model with independent marker residuals. Thus we can write the
likelihood as
p(d|w, g) =

k
N(d
k
|g(
k
, w), ),
where N is a Gaussian with mean g(
k
, w) and variance evaluated in d
k
. The denition of the prior
p(w|g) requires some preliminary considerations. The model selection step does not recalibrate the subject
for each marker motion model, but compares the models on a xed residual obtained assuming static
markers. Therefore we can expect a bias between this approximated residual and the actual one. The bias
magnitude is unknown a priori and should not aect the model selection result. Consequently we use a
uniform prior for the zero order parameter w
0
w while for all the other parameters we use a standard
Gaussian regulariser, that is
p(w|g) N( w|0,
w
), (8)
21
ICT FP7 216239 DEXMART Deliverable D1.1
Algorithm 1 Skin model calibration
1: INPUT: {r
i,k
, M
std
}
2: OUTPUT: {M
skin
, , W}
3: Calibrate the standard model M
std
(Eq. (2)) on the frame subset K.
4: Fix and M (the bone and target parameters)
5: for all frames in the ROM do
6: Compute the joint angles
k
and the residuals d
k
7: end for
8: {Approximated model selection}
9: M
skin
= M
std
10: for all the components d
(u)
k
D do
11: Select the active input indexes A
u
via correlation analysis (Algorithm 2)
12: Perform greedy model selection (Algorithm 3)
13: Add polynomial regressors
(u)
to M
skin
14: end for
15: Calibrate the new model M
skin
via Eq. (6).
Algorithm 2 Active inputs pre-selection
1: INPUT: {, d
(u)
}
2: OUTPUT: set of active inputs A
u
3: Compute the correlation c
(u)
j
between d
(u)
k
and
j,k
(a row in Fig. 12 (d)).
4: for all j do
5: if c
(u)
j
> T then
6: Add j to the set of active inputs A
u
.
7: end if
8: end for
where w is the vector containing all the parameters but w
0
(i.e., w = [w
0
w]), and
w
is a diagonal prior
covariance. Also, note that the regression steps 4 and 9 in Algorithm 3 use the same regulariser. Given
this choice of likelihood and prior, an algebraic solution of the integrals in Eq. (7) exists. The derivation is
presented in Appendix A.
To conclude this section we comment on the residual components independence assumption. In general
the marker noise components are not independent; for example the sensor noise depends on the camera
position and on the pose of the subject in the capture volume. However the dominant component of the
residual may still due to unmodelled soft tissue artefacts like those caused by abrupt limb accelerations.
Although the formulation in Appendix A could be easily extended to full covariance matrices, in practise,
before the subject calibration step, the user is often unable to provide a good estimate for the residual error
covariance and usually isotropic models are the default.
5 Methods for experimental validation
Ideally an objective comparison between dierent kinematic models would require the actual joint angle
values measured with a procedure that guarantees higher accuracy than motion capture measurements.
Unfortunately, acquiring accurate ground truth data almost always involves invasive procedures. Many have
used intra-cortical (bone) pin mounted markers or capitalised on external bone xation already in place.
For example Fuller et al. [47] use bone pins to asses the eect of skin movements on the joint estimation
22
ICT FP7 216239 DEXMART Deliverable D1.1
Algorithm 3 Greedy selection of the polynomial components.
1: INPUT: {, d
(u)
, A
u
}
2: OUTPUT: set of regressors
(u)
3: Initialize zero order model:
(u)
= {1}
4: Regress model g
(u)
(, w
(u)
) to the data d
(u)
.
5: repeat
6:
(best)
() =
(u)
()
7: for all active inputs v A
u
do
8: Add a component:
(test)
=
(u)

o
v
+1
v
with order o
v
+ 1
9: Regress model g
(test)
(, w
(test)
) to the data d
(u)
.
10: if
p(d
(u)
|g
(test)
)
p(d
(u)
|g
(best)
)
> 1 then
11:
(best)
() =
(test)
()
12: end if
13: end for
14: if
p(g
(best)
|d
(u)
)
p(g
(u)
|d
(u)
)
> 1 then
15:
(u)
() =
(best)
()
16: stop = false
17: else
18: stop = true
19: end if
20: until stop
procedure. Percutaneous xation of markers has also been used, but this is only marginally less invasive and
still requires ethical approval [48]. Less invasive X-ray studies have been performed, but these invariably
require the attachment of radiolucent markers to the bone [49]. Despite this less invasive approach, ionising
radiation still carries with it the need for ethical approval. Instead of comparing the joint angles Veber and
Bajd [50] compare the subject calibration results of the phalanx segments with the data from statistically-
based anthropometry (i.e., hand lengths and palm widths). However, in our opinion the accuracy of the
bone lengths may not be sucient to dierentiate similar models.
As direct measurements are impractical researchers have evaluate other desirable model qualities [51, 10]
or have used synthetic or semi synthetic data [28]. A well established procedure is to measure the repeatability
of the calibration results[51, 10]. In these tests the researcher capture 20 to 30 trials of the same subject
performing the same movement. Then a calibration step is run on each trial end the results in terms of
bone lengths and joint angles are analysed. Good models and a good calibration procedures should produce
consistent results with small cross-trial variance (see ANOVA [52]). Cereatti et al. [28] generate 3D data
with a synthetic model of the knee. The authors animate the model with real gait data and then add to
the predicted marker position real and synthetic soft tissue artefacts. Finally they compare the joint angles
and the calibration parameters with those of the synthetic subject.
Although repeatability analysis and ANOVA are well established techniques for model validation they do
not tell us how well the model explains the data. For example, an over-simple model that cannot explain
some movements could still be highly consistent on a particular movement. The problem is similar to the skin
parameter selection of Sec. 4.3 as the goal is again to nd a compromise between complexity, descriptiveness
and what we can eectively measure. Broadly speaking, complex models (with many segments and more
DoF per joint) have the potential to better explain the data; however they are less stable and slower to
optimise than simpler models. Also, given the data noise level we can model selection should tell us if the
model is overtting the data. On this regard the next section we report on our study on statistical model
23
ICT FP7 216239 DEXMART Deliverable D1.1
selection methods to evaluate hand kinematic models.
5.1 Statistical validation
Given the reconstructed data R from a trial, the selection of a kinematic model M is a similar problem to
the marker model selection step described in Sec. 4.3. While the marker procedure optimises the model for
a single marker, we are now required to evaluate complete hand models. The parametrisation of a hand
model is the union of all time-constant and time-variable parameters (i.e., = [ W ]). Unfortunately
the formulation of the marginal p(R|M) is not as straightforward as in Eq. (7). Thus we have to resort to
approximated methods.
Popular approaches to model selection use a model quality score composed of two terms: one term
favours models with a good t to the training data; the second term assigns a larger penalty to more
complex parametrisations. The Akaike information criterion (AIC) [53], which is based on the Kullback-
Leibler information number, uses as a distance score (i.e., the lower the better) the unbiased estimator of
the expected log-likelihood, that is
AIC = 2 log p(R|, M) + 2n

, (9)
where n

is the number of parameters in . A variant of AIC is the Consistent AIC (CAIC) that accounts
for the number of samples n
R
. The CAIC formulation is
CAIC = 2 log p(R|, M) + 2n

(log n
R
+ 1). (10)
The Bayesian Information Criterion [54] instead assigns a score to each model according to an approx-
imation of the marginal p(R|M) under the assumptions that the data distribution is in the exponential
family. This results in
p(R|M) p(R|, M) n
n
R
/2

. (11)
The BIC approximation is quite crude, especially for the parameter prior p(|M). A more elegant
solution is to bootstrap the data and estimate, under a Gaussian assumption, the prior covariance V as
well [55]. Algorithm 4 outlines the procedure used to compute an approximated p(R|M). First we calibrate
the model as explained in Sec. 4.1. Then we bootstrap the unnormalised residuals and we create a set of
semi-synthetic trials. Each trial is again calibrated and the parameter covariance V is computed. Finally we
use the covariance estimate to compute the approximated marginal (Algorithm 4, step 13).
Although the estimates produced by the bootstrap procedure are potentially more accurate than BIC
ones, its application is limited to small subjects. In fact, bootstrapping typically requires at least 1000
samples that in our case correspond to 1000 computationally expensive subject calibrations.
6 Results
6.1 Marker motion model
We evaluate the kinematic model with moving makers described in Sec. 4.3 on capture data acquired with
a rig of nine 4 megapixel Vicon MX cameras. As a proof of concept we limited the capture to two ngers:
the right thumb and index of one healthy subject. 31 markers with 3mm diameter were glued to the latex
glove wore by the subject as showed in Fig. 13. Also, to ensure accuracy we of the global position we glued
one larger (7mm) marker over the wrist and limited the capture volume to about 1m. Finally, to reduce
the occurrence of marker occlusions under wrist rotations we pointed two of the nine cameras upwards. As
in the Santos model [11] we dened the index and thumb articulations with two DoF for CMC and TMC
joints and one DoF for thumb IP, PIP and DIP joints.
24
ICT FP7 216239 DEXMART Deliverable D1.1
Algorithm 4 Bootstrap marginal approximation.
1: INPUT: {R, M}
2: OUTPUT: p(R|M)
3: Calibrate (Maximum Likelihood solution):
(ML)
= arg max

p(R|, M)
4: Compute residuals D
(ML)
{Eq. (5)}
5: for s = 1, . . . , n
s
{n
s
is the number of bootstrap samples} do
6: Draw with replacement D
(s)
from D
(ML)
7: for all k K and i = 1, . . . , n
m
do
8: Create synthetic reconstructions: r
(s)
i,k
= S
i
(
(ML)
k
,
(ML)
)m
(ML)
i
+d
(s)
i,k
9: end for
10: Calibrate:
(s)
= arg max

p(R
(s)
|, M)
11: end for
12: Compute the covariance V of over the n
s
samples
13: Approximate marginal as: p(R|M) (2)
n

/2
p(R|
(ML)
, M)
_
|V |
Figure 13: High density markerset. The thumb and the index are sensorised with 32 markers. The markers
are glued to a latex glove.
We compared the results of the standard Static Marker (SM) model (Eq. (2)) and the enhanced model
with Moving Markers (MM) Eq. (6) on three capture trials. Trial 1, 2 and 3 are three ROM trials. In Trial
4 the subject repeatedly picks a piece of plastic cutlery (a knife) from a small container that he holds with
the other hand. For our experiments we used 100 frames from Trial 1 to calibrate the two subjects. Then,
for the remaining frames in Trial 1 and for the other two trials, we computed the joint angles and the Root
Mean Square Error (RMSE) of the unnormalised marker residuals. Also, we set the maximal degree of the
polynomials in the marker motion model to three. Fig. 14 and Table 4 summarise the results. For all four
trials MM has a signicant lower RMSE than SM. The reduction was expected on Trial 1 as this is the
trial used to calibrate the subject and the proposed model has a larger number of free parameters than the
standard one. However the improvement on the other three trial shows that MM generalises well on unseen
data. This result is consistent for motions similar to the training ones (i.e., Trial 2 and Trial 3) as well
as for fairly dissimilar movements as in Trial 4. Also, Fig. 14 shows that the performance improvement is
consistent over time. The marker motion model outperforms the standard model both on extreme poses,
when the ngers are fully exed (see RMSE peaks in Fig. 14), and near the mean pose. Also, during the
experiments we noted that sometimes the marker motion shows an hysteretic behaviour that is caused by
the glove sliding over the skin and not returning to the original position. The problem aects both models,
25
ICT FP7 216239 DEXMART Deliverable D1.1
Table 4: Performance comparison between the standard model with static markers and the proposed moving
marker model. The RMSE are in millimeters.
RMSE
Static Markers Moving Markers Perc. dierence
Trial 1 0.91 0.66 27.8%
Trial 2 1.02 0.79 23.0%
Trial 3 0.97 0.80 18.2%
Trial 4 0.92 0.74 19.7%
but we can speculate that a better motion prediction can be achieved with skin attached markers.
To evaluate repeatability and stability of the calibration procedure we produced a dataset of 200 semi-
synthetic subjects. Each subject is a randomly rescaled version of the calibrated subject obtained with the
static marker model. First, we multiply all the bone lengths by a global rescaling factor G, where G is
a Gaussian random variable with mean 1 and standard deviation 0.1 then we independently multiply each
segment k times a local rescale factor L
k
again Gaussian distributed with mean 1 and standard deviation
0.5. Fig. 15 shows the bone-length statistics (mean and standard deviation) for the two models. Although
both models predict similar bone lengths, the standard model is slightly more stable than the proposed
approach. This result is not unexpected. In fact, more complex models are usually more dicult to optimise
than simpler ones. However, Fig. 15, left, shows that the worst-case scenario for MM happens on the thumb
CMC joint where the standard deviation is 0.07 millimetres only. That value is about 0.2% of the CMC
segment length.
6.2 Kinematic model selection
This section reports on experiments aimed at selecting the best kinematic model for the DEXMART project.
In these experiments we applied the statistical model selection techniques described in Sec. 5.1 and the
standard calibration technique with static markers. The AIC, CAIC and BIC model selection methods
require to x the standard deviation of the Gaussian likelihood term. The value of should depend on
the level of marker motion we expect and on the desired accuracy of the model predictions. The results in
Tab. 2 show that the maximal marker movement with respect to the bone head is about 7mm. Also, we
expect the latex glove to slightly reduce these movements. Therefore a reasonable choice is = 1mm so
that 6 returns a value similar to the expected maximal movement. For the bootstrap method instead the
standard deviation is also estimated from the samples. The experiments are divided into two parts: rst we
evaluate possible candidate models of the thumb and nger, then we extend the analysis to complete hand
data.
To evaluate dierent thumb and nger models we used the sensorised subject depicted in Fig. 13 (see
Sec. 6.1 for a detailed description of the camera and subject set-up). The choice of a two nger set-
up, instead of two separate single ngers, is motivated by the necessity of imposing two non collinear
constraints on the wrist position. Given the seven segment kinematic chain of Fig. 13
3
we made the
simplifying assumption of independence between thumb and index movements. Then we tested all the
dependent combinations of four joint types using the scores dened by the statistical model selection
methods described in Sec. 5.1. The four joint types are: hinge joint (one DoF), Hardy-Spicer joint (two
DoF), ball joint (three DoF) and rigid/dummy joint (zero Dof). Note that although we did not test dierent
kinematic hierarchies, the joint results should give some hint on the quality of the selected seven segment
3
Note that the two segments of the nger tips do not inuence the calibration and serve for visualisation only.
26
ICT FP7 216239 DEXMART Deliverable D1.1
0 200 400 600 800 1000 1200 1400 1600 1800 2000
0
0.5
1
1.5
R
M
S
E
Frame number
Trial 1 (ROM)


SM
MM
0 200 400 600 800 1000 1200 1400 1600 1800 2000
0
0.5
1
1.5
R
M
S
E
Frame number
Trial 2


SM
MM
0 200 400 600
0
0.5
1
1.5
R
M
S
E
Frame number
Trial 3


SM
MM
0 200 400 600
0
0.5
1
1.5
R
M
S
E
Frame number
Trial 4


SM
MM
Figure 14: RMSE comparison between the standard calibration model with static markers (SM) and the
proposed model with polynomial moving markers (MM). For all test trials MM better predicts the marker
positions.
27
ICT FP7 216239 DEXMART Deliverable D1.1
MCPI PIPI DIPI CMCT MCPT IPT
0
10
20
30
40
50
60
70
80
90
100
B
o
n
e
l
e
n
g
t
h

(
m
m
)


SM
MM
MCPI PIPI DIPI CMCT MCPT IPT
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0.2
B
o
n
e
l
e
n
g
t
h

(
m
m
)


SM
MM
Figure 15: Reproducibility comparison between the standard calibration model with static markers (SM) and
the proposed model with polynomial moving markers (MM). Bone length statistics (Left: mean ; right:
standard deviation ) over 100 synthetically rescaled subjects. The lower the standard deviation the better.
Table 5: Best kinematic models of the thumb and index according to the four model selection strategies.
Joint types
MCP I PIP I DIP I CMC T MCP T IP T
AIC Ball Ball Hinge Ball Ball Hinge
CAIC HS Hinge Hinge Ball HS Hinge
BIC HS Hinge Hinge Ball HS Hinge
Bootstrap Ball Hinge Hinge HS Ball Hinge
chain. In fact, high scores on models with rigid joints would indicate that a model with a lower number of
segments is more appropriate.
We calibrated each model using 100 frames from Trial 1 and then, to evaluate the scores, we computed
the joint angles and the residuals on the remaining frames. Table 5 shows the joint types for the best
models according to each of the four model selection methods. First we note that AIC seems to grossly
overestimate the model complexity. AIC assigns 3DoF to the index PIP joint that undoublty should have a
single free parameter. This result lead us to exclude AIC from our pool of model selection methods. CAIC
and instead BIC agree on the best model. It is interesting to note that both methods select a three DoF
model for the CMC joint of the thumb. This indicates that a standard two DoF may not be sucient to
predict complex thumb actions. Finally, the bootstrap method assigns 3 DoF to the MCP joint of the index
and, with respect to AIC and BIC, inverts the DoF assignment of the thumb. As the number of models
is exponential with the number of segments, a complete test for the full hand is not feasible. Therefore,
we combined the results on thumb and index with the state-of-the-art research and we implemented a set
of 24 plausible models. The models were generated by combining four thumb models, two nger models
and three palm arching models. The thumb models (TM)s are: TM5, the Santos model [11] with ve DoF
(Hardy-Spicers for CMC and MCP joints and Hinge for the IP joint); TM6a, the thumb model selected by
CAIC and BICl with six DoF (ball CMC joint); TM6b, the model with six DoF selected via Bootstrapping
using a ball TMC joint; and TM7, a model with ball CMC and TMC joints. From the literature only two
28
ICT FP7 216239 DEXMART Deliverable D1.1
Table 6: Five best kinematic models of the hand according to the CAIC and BIC scores when training on a
ROM trial (CAIC score: the lower the better. BIC score: the higher the better.).
Model type CAIC
Thumb Index Palm score10
3
TM5 FM2 PM4 29.6
TM6a FM2 PM4 30.6
TM6b FM2 PM4 30.7
TM5 FM2 PM6 31.2
TM7 FM2 PM4 31.7
Model type BIC
Thumb Index Palm score10
3
TM5 FM2 PM4 -15.0
TM6a FM2 PM4 -15.5
TM6b FM2 PM4 -15.5
TM5 FM2 PM6 -15.7
TM7 FM2 PM4 -16.0
plausible Finger Models (FM) exist. The most common has two DoF on the TMC joints; the other has three
DoF. We name these two models as FM2 and FM3. Finally, the Palm Model (PM) can be rigid (PMR),
with two CMC Hardy-Spicer joints for ring and pinky (PM4), or with a third CMC Hardy-Spicer joint for
the index (PM6). To capture the motion we sensorised the hand with 22 markers positioned as showed in
Fig. 2 (c),(e), one marker per phalanx and the rest on the hand dorsal. For the evaluation we run model
selection methods on three capture trials. While the rst trial is a classic ROM, in the other two trials the
subject was asked to perform two actions that are particularly relevant to the benchmarking scenario of the
DEXMART project [41]. In one trial the subject unscrews a jar lid; in the other trial the subject repeatedly
picks a piece of cutlery from a small box. Finally, due to the Bootstrap procedure being too computationally
demanding on full hand subjects, we present results for CAIC and BIC methods only.
Table 6, 7 and 8 show the best 5 models according to BIC and CAIC for the three trials respectively. The
rst observation is that BIC and CAIC outputs are fully coherent. Therefore, without loss of generality, we
can comment the results of one or the other. The ROM trial results (Table 6) show which models produce
good t to generic hand movements. In this case the Santos model (TM5-FM2-PM4) achieves the highest
score. The other high ranked models are more complex than Santos and have extra DoFs on the thumb
(TM6a, TM6b and TM7) or on the palm (PM6). None of the models use three DoF for the nger MCP
joints. The Santos model is also the highest scorer on the jar lid unscrewing movement (Tab. 7 . However,
the other high rank models present a larger number of DoF than on the ROM trial case. This indicates
that, to achieve high accuracy on this specic movement a more complex palm model like PM6 can be a
viable option. Finally, the results in Tab. 8 show that the cutlery picking task also triggers extra DoF on
the palm as well as on the thumb. In this case the highest score is produced by the kinematics using the
most complex palm model.
6.3 Evaluation of nger joints interdependencies
The experimental study in this section reports a quantitative analysis of motion coordination from thumb
to little nger, and examines the kinematic synergies during reaching and hand grasping activity. We have
asked four male subjects, with height to weight ratio from 40 to 90 percentile, to perform two types of
cylinder-grasping with their right hand that involved concurrent voluntary exion of ngers. Two other
types of voluntary exion and extension of each individual ngers, the rst with a support for the palm,
the second without, are recorded for each volunteer. The acquisition was performed with a Vicon motion
capture system (Oxford Metrics Ltd., UK), with ve-cameras. We measured the trajectories of 23 3.0-mm
reective markers on the backhand of right hand (Fig. 17 at a sampling frequency of 60 Hz, and then
output the time-varying marker coordinates in a three-dimensional laboratory coordinate system (xy z)
established through prior calibration.
29
ICT FP7 216239 DEXMART Deliverable D1.1
Figure 16: Hand joint names.
Figure 17: Marker set.
30
ICT FP7 216239 DEXMART Deliverable D1.1
Table 7: Five best kinematic models of hand according to the CAIC and BIC scores. The training is done
on motion capture data of a hand opening a jar lid (CAIC score: the lower the better. BIC score: the higher
the better.).
Model type CAIC
Thumb Index Palm score10
3
TM5 FM2 PM4 30.3
TM5 FM2 PM6 30.7
TM6b FM2 PM4 31.2
TM6b FM2 PM6 31.3
TM7 FM2 PM6 32.4
Model type BIC
Thumb Index Palm score10
3
TM5 FM2 PM4 -15.3
TM5 FM2 PM6 -15.5
TM6b FM2 PM4 -15.8
TM6b FM2 PM6 -15.8
TM7 FM2 PM6 -16.4
Table 8: Five best kinematic models of hand according to the CAIC and BIC scores. The training is done
on motion capture data of a hand picking up pieces of of cutlery from a small box (CAIC score: the lower
the better. BIC score: the higher the better.).
Model type CAIC
Thumb Index Palm score10
3
TM5 FM2 PM6 35.6
TM6a FM2 PM6 36.1
TM6b FM2 PM4 36.7
TM6a FM2 PM4 37.2
TM6b FM2 PM6 37.3
Model type BIC
Thumb Index Palm score10
3
TM5 FM2 PM6 -17.9
TM6a FM2 PM6 -18.2
TM6b FM2 PM4 -18.5
TM7 FM2 PM4 -18.7
TM6b FM2 PM6 -18.8
After a labelling procedure of markers and a tracking process we have performed a dynamic subject
calibration and a tting of the subject motion. The calculated joint angles values during the trials are
exported in Matlab in CSV format. The movements are acquired with the performers seated in an initial
pose with the torso approximately upright, the right upper arm vertical and forearm horizontal. The ngers
are in natural full extension and the palm is supported by a desk. In the execution of tasks small forearm
pronation/supination and torso assistance was involved
4
.
In the rst two tasks (Fig 18), subjects reached forward over a distance of approximately 250 mm to
grasp two dierent vertical cylinders with diameters 50 mm and 65 mm (once for each trial). The observation
is focused on concurrent voluntary exion of all digits in whole grasp task. Before the subject returns to the
initial posture the cylinder is placed at 150 mm from its initial position and a concurrent voluntary extension
of all ngers is observed.
In the third and fourth task, subjects maintained the same initial posture as in the rst two tasks. Each
subject performed two consecutive repetitions of individual exion voluntary exion of individual ngers,
one digit at a time. For the latter task, the palm of the performer is posed on a special support without any
other constraints. In the latter two tasks, the subjects were instructed not to consciously control involuntary
joint exion of the non-intended ngers; they completed 10 trials (ve dierent movements, two repetitions)
for each task.
A local coordinate system x
0
y
0
z
0
was established to facilitate kinematic descriptions and denitions.
4
. The subjects moved each nger into exion and extension while attempting to keep the others, non instructed ngers
still.
31
ICT FP7 216239 DEXMART Deliverable D1.1
Figure 18: Scenario for tasks 1 and 2.
The origin of this local coordinate system was the marker adhered to the dorsal landmark of the wrist. The
y
0
-axis lay in the plane, pointing radially while being perpendicular to the x
0
-axis. The z
0
-axis was therefore
normal to the plane, pointing dorsally. Coordinates of the markers measured in the global (laboratory)
coordinate system (xy z) were transformed and expressed in the local coordinate system (x
0
y
0
z
0
).
From the local coordinates, the time-varying joint angles (Fig. 16) measuring all the involved exion-
extension DoF were derived through a computational procedure in the Nexus software that determined
the nger segmental centres of rotation. The exion portions of angular proles for the MP, proximal
interphalangeal(PIP), and DIP joints of digits 2-5, and CM, MP and IP joints of thumb were analysed in the
current study with a total of 25 DoF. A semi-automatic procedure was established to identify the initiation
and termination times of the exion and extension motions.
The rst type of analysis consists in the computation of the correlation coecient matrix for all the
DOFs. This is aimed at quantifying the degree of correlation of each DOF with all the others. The
Figures 19,20 summarise the results in the exion-extension movement, while the Figures 21,22 summarise
the results for the grasping movement of only one of the objects considered, where the correspondence
between number and name of each dof is summarised in Table 6.3. Analogous results have been obtained
for the grasping of the other object and thus are not reported for brevity.
In the analysis of the exion-extension movement (see the correlation coecient matrix in Fig. 19) we
can observe that:
each metacarpal joint has an high correlation coecient (0.65 - 0.85) respect to the adjacent
metacarpal joints both in the exion DOF and in the abduction DOF (see the entries (1,5), (5,11),
(11,17))
each proximal interphalanx joint has a correlation coecient around 0.6 respect to the proximal
interphalanx joint of adjacent ngers (see the entries (3,7), (7,13), (13,19))
the abduction of the ring nger metacarpal joint (labeled as CMC_3_MCP_3) is strongly correlated
with exion of the little nger proximal interphalanx joint (labeled as MCP_4_PIP_4) (see the entry
(12,19))
32
ICT FP7 216239 DEXMART Deliverable D1.1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 26
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
DOF number
D
O
F

n
u
m
b
e
r
correlation coefficients


0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Figure 19: Estimated correlation among hand dofs in the voluntary exion-extension trials.
5 10 15 20 25
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25

0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
0.55
Figure 20: Estimated variance of the correlation indices in the voluntary exion-extension trials.
33
ICT FP7 216239 DEXMART Deliverable D1.1
DOF number DOF name type
1 WRIST_MCP_1 exion
2 WRIST_MCP_1 abduction
3 MCP_1_DIP_1 exion
4 PIP_1_DIP_1 exion
5 WRIST_MCP_2 exion
6 WRIST_MCP_2 abduction
7 MCP_2_PIP_2 exion
8 PIP_2_DIP_2 exion
9 WRIST_CMC_3 exion
10 WRIST_CMC_3 abduction
11 CMC_3_MCP_3 exion
12 CMC_3_MCP_3 abduction
13 MCP_3_DIP_3 exion
14 PIP_3_DIP_3 exion
15 WRIST_CMC_4 exion
16 WRIST_CMC_4 abduction
17 CMC_4_MCP_4 exion
18 CMC_4_MCP_4 abduction
19 MCP_4_PIP_4 exion
20 PIP_4_DIP_4 exion
21 WRIST_CMC_T exion
22 WRIST_CMC_T abduction
23 CMC_T_MCP_T exion
24 CMC_T_MCP_T abduction
25 MCP_T_IP_T exion
Table 9: DOF numbers and names
the empirical model commonly used in the literature which correlates, for all ngers except the thumb,
the DIP and PIP angular positions is not valid in the exion movement (see the entries (3,4), (7,8),
(13,14), (19,20)).
the two DoFs of he dorsum joint labeled as Wrist_CMC_3 are strongly correlated (see the entry
(9,10)).
the two DOFs of the hand dorsum joint labeled as Wrist_CMC_4 are strongly correlated each other
(see the entry (15,16)).
the exion DOF of the joint labeled as OCMC_T is strongly correlated with abduction DOF of the
thumb metacarpal joint (labeled as CMC_T_MCP_T) and with both the abduction and exion
DOF of the metacarpal index joint, labeled as Wrist_MCP_1 (see the entries (21,1), (21,2), (21,2),
(21,24), (2,24)).
The exion DOF of the Wrist_MCP_1 joint (index metacarpal joint) has a signicant correlation
(0.65) with the MCP_T_IP_T joint (proximal interphalanx thumb joint).
the two DOFs of the CMC_T_MCP_T joint are both correlated with the MCP_T_IP_T; the
values in entries (23,26) and (24,26) show that a correlation of about 0.6 exists between the two
thumb joints in exion movement.
34
ICT FP7 216239 DEXMART Deliverable D1.1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
DOF number
D
O
F

n
u
m
b
e
r
joint correlation coefficients for a grasp task


0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Figure 21: Estimated correlation among hand dofs in the grasp trials.
From the analysis of the variance matrix in Fig. 20 we can conclude that the above ndings are quite
reliable since the variance values are almost uniformly low.
Figure 21 shows that, during a grasp of a cylinder with a diameter of 60mm, the angular joint positions
are highly correlated. The thumb joints are the less correlated with the others and the associated correlation
coecient signicantly varies in dierent capture sessions, this is due to the occlusion phenomenon, which
makes very dicult capturing and tracking marker positions. The low quality of measurements in this task
is conrmed by the analysis of the variance matrix in Fig. 22, where the last four rows, corresponding to
thumb DOFs, show a high variance among the dierent trials. The sensorised glove and the sensor fusion
algorithms to be developed within the project will be used to reduce this problem and to get more insight
into the joint interdependencies in manipulation tasks involving also the thumb. The obtained information
about hand joint correlation will be used by the Kalman-like sensor fusion algorithm cited above to improve
its tracking performance.
A second type of analysis has been carried out. A Principal Components Analysis (PCA) was employed
to investigate the synergistic behaviour among nger joints. The use of PCA is motivated with the aim
to establish the minimum number of signals necessary to approximately describe a motion during the two
considered tasks.
The PCA analysis of the exion-extension task shows that the 90% of the variance is contained in the
rst ve principal components and the 98% of the variance is contained in the rst 12 principal components.
The values shown in Fig. 23 are obtained by dividing each singular value of the covariance matrix by the
sum of all the singular values; by combining 12 signals it is possibile to represent with good approximation
all the hand joint movements.
The values shown in Fig. 24 demonstrate that 90% of variance is contained in the rst three principal
components and the 98% of the variance is contained in the rst six principal components. It means that
to reconstruct the movements in a grasping task less signals are needed than in a exion task.
6.4 Marker set selection: human hand wearing a data-glove
The benet of using a sensorised glove is the reduction of problems due to use of a passive marker-based
optical motion capture system to acquire human hand motion. To obtain hand position measurements with
high accuracy using optical motion capture systems, many markers and many cameras are required due to
35
ICT FP7 216239 DEXMART Deliverable D1.1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
DOF number
D
O
F

n
u
m
b
e
r


0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Figure 22: Estimated variance of the correlation indices in the gasp trials.
0 5 10 15 20 25
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
principal component number
s
i
n
g
u
l
a
r

v
a
l
u
e
s

o
f

t
h
e

c
o
v
a
r
i
a
n
c
e

m
a
t
r
i
x
extensionflexion pca analysis
Figure 23: Normalised singular values of PCA in the exion-extension task.
36
ICT FP7 216239 DEXMART Deliverable D1.1
0 5 10 15 20 25
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
principal component number
s
i
n
g
u
l
a
r

v
a
l
u
e
grasping PCA analysis
Figure 24: Normalised singular values of PCA in the grasp task.
the high degree of freedom of the hand. In particular, there are two very dicult problems to solve: the
rst is the reduction of the number of marker to place on a small area of the backhand and the second is
the reduction of the marker occlusion phenomenon due to dexterous hand performance in a capture area
(also the relevance of ghost marker problem increases with the number of the markers in a small eld of
view of each camera).
Most of the researchers in this area use a number of markers reduced with respect to the minimum
number required to reconstruct the motion of all the bones constituting the human hand. The reduction
has been made possible by using a mathematical model of the hand or by use of additional sensors. e.g. data
gloves. The second strategy will be also used in DEXMART, in fact the planned activities already include the
integration of a data glove (under development in WP5) into the OMG optical motion capture system. The
main motivation is the objective to reduce at a minimal extent the failure in motion tracking of hand bones
due to the marker occlusion problem, which is very frequent during manipulation tasks. Some preliminary
measurements have already been conducted, where the motion of a single nger has been captured by using
the typical marker set used in the literature (one marker for each nger bone) and even in such a simple
case, occlusions have been demonstrated very frequent even with the use of ve cameras well distributed
around the hand workspace. To improve the quality of acquired kinematic data and to reduce the minimal
number of markers, a sensor fusion algorithm for hand motion tracking will be realised. In particular our
sensorised glove is equipped with only three markers and three low cost angular sensors per nger. In detail,
three markers are used for dening a reference system xed to the hand wearing the glove, three markers
placed on the index nger are used to estimate the joint angles between the phalanxes. The three angular
sensors are then mounted on the same nger (see Fig. 25).
To perform the experiments we used a Vicon 460 motion capture system equipped with 5 high resolution
M2 cameras. Figure 26 shows the marker trajectories for four consecutive exion and extension movements
of the index. This experiment was executed in the two cases with the same constraints condition ( the palm
of the hand is still held in a xed position and the index motion is performed without any other constrains
on other ngers.
The results show high variability of the marker trajectories across consecutive movements. This is due
to the sliding of the glove with respect to the phalanxes bones. To evaluate and reduce this eect it will be
37
ICT FP7 216239 DEXMART Deliverable D1.1
Figure 25: Sensorized data glove.
Figure 26: Trajectories generated by three markers mounted on the data glove.
38
ICT FP7 216239 DEXMART Deliverable D1.1
necessary to perform an analysis in MRI environment of the capture error for a gloved had.
7 Conclusions
In this report we analysed in depth the kinematic model of the human hand. We reviewed the existing
state-of-the-art in dierent research communities, and, although a common ground on articulated models
exists we pointed out that dierent models suit dierent applications. Similarly, our results suggest that the
kinematic model should depend not only on the application but also on the type of capture system. With
small camera counts and a relative low number of large markers, it is not feasible to try to recover subtle
movements like palm arching. In this case a simple kinematic model with 20 DoF should be used. However,
results have shown that 20 DoF are not sucient to accurately model complex grasp actions. When high
resolution cameras and small markers are used, more complex modelling is a feasible option. In particular
we have seen that the Santos model with 25 DoF is a good compromise between simplicity and predictive
power. This model can be enhanced by with a more complex thumb articulation that uses three DoF on
either CMC or MCP thumb joints or via the addition of extra DoF on the palm to better approximate the
subtle movements of the carpal bones.
To improve the capture accuracy, we presented a novel kinematic calibration procedure that accounts
for soft tissue artefacts by allowing the markers to move according to polynomial functions of the joints
angles. The extra parameters added to model marker motions are selected by an elegant automated model
selection procedure. The results on thumb and index capture show that the proposed model generalize well
on unseen data and produces signicant improvements in terms or marker residual reduction.
The nger interdependencies analysis showed that on simple grasp tasks the rst two PCA components
can account for up to 98% of the signal energy. This result paves the way for a low dimensional and compact
representation of the grasp movements that can simplify the design of the robotic hand control algorithms.
Finally, results have shown that optical motion capture can accurately track the hand movements when
these movements are heavily constrained and with small capture volumes. However, natural object manip-
ulations in an unconstrained environment may produce long term occlusions especially on the ngertips. In
these situations an independent source of information is necessary. Research has started in T1.3 and will
focus also on the integration of data-glove data with optical motion capture. On this regard we presented
preliminary results on marker set selection for hybrid optical/data-glove based motion capture.
A Bayes factors for marker regression: algebraic form
To compute the Bayes factor (Eq. (7)) between two polynomial models of the form
d = g(, w) = ()
T
w +e
with e a zero mean Gaussian noise and the parameter prior dened in Eq. (8) we have to formulate the
marginal probability
p(d|g) =
_
p(d|w, g)p(w|g)dw. (12)
To this extent let us formulate the linar regression on the data vector d as
d = w +e,
39
ICT FP7 216239 DEXMART Deliverable D1.1
where e is the random vector generated from the realisations of e, is the design matrix
=
_

_
(
0
)
T
(
1
)
T
.
.
.
(
n
k
)
T
_

_
=
_
A

.
is an n
k
vector with all elements equal to one (i.e., the regressor for the the zero order component) and
A is an n
k
(n
w
1) matrix containing the regressor values for the components with order higher than
zero. Finally, we can rewrite the likelihood as
p(d|w, g) = N(d|w,
2
I) = N(d|A w +w
0
,
2
I).
By substituting the prior and the likelihood into Eq. (12) and applying Gaussian integration rules we obtain
p(d|g) =
_
w
0
_
w
N
_
d|A w +w
0
,
2
I
_
N ( w|0,
w
) d wdw
0
(13)
=
_
w
0
N
_
d|w
0
, B =
2
I +A
w
A
T
_
dw
0
(14)
= c exp
1
2
_
d
T
B
1
d
1
4
_
T
B
1
d +d
T
B
1
_
2
T
d
T
B
1
_
, (15)
where the normalisation factor c = (2)
1n
k
2
(|B|
T
B
1
)

1
2
.
References
[1] A. Erol, G. Bebis, M. Nicolescu, R.D. Boyle, and X. Twombly. Vision-based hand pose estimation: A
review. Computer Vision and Image Understanding, 108(1-2):5273, 2007.
[2] J. Davis and M. Shah. Recognizing hand gestures. In Proc. of European conference on Computer
Vision, pages 331340, Secaucus, NJ, USA, 1994. Springer-Verlag New York, Inc.
[3] K.G. Derpanis. A review of vision-based hand gestures. Technical report, York University, February
2004.
[4] J.M. Rehg and T. Kanade. Visual tracking of high dof articulated structures: an application to human
hand tracking. In Proc. of European conference on Computer Vision, pages 3546, Secaucus, NJ, USA,
1994. Springer-Verlag New York, Inc.
[5] B. Stenger, P. R. S. Mendonca, and R. Cipolla. Model-based 3d tracking of an articulated hand. In
Proc. of IEEE Conference on Computer Vision and Pattern Recognition, volume 2, pages 310315,
2001.
[6] Y. Wu, J. Lin, and T.S. Huang. Analyzing and capturing articulated hand motion in image sequences.
IEEE Trans. Pattern Anal. Mach. Intell., 27(12):19101922, 2005.
[7] K.N. An, E.Y. Chao, W.P. Cooney III, and R.L. Linscheid. Normative model of human hand for
biomechanical analysis. J. Biomechanics, 12(10):775788, 1979.
[8] X. Zhang, S. W. Lee, and P. Braido. Determining nger segmental centers of rotation in exion-
extension based on surface marker measurement. J Biomech, 36(8):10971102, August 2003.
40
ICT FP7 216239 DEXMART Deliverable D1.1
[9] P. Cerveri, N. Lopomo, A. Pedotti, and G. Ferrigno. Derivation of centers and axes of rotation for
wrist and ngers in a hand kinematic model: Methods and reliability results. Annals of Biomedical
Engineering, 33(3):402412, March 2005.
[10] P. Cerveri, E. De Momi, N. Lopomo, Baud G. Bovy, R. Barros, and G. Ferrigno. Finger kinematic
modeling and real-time hand motion estimation. Annals of Biomedical Engineering, 35(11):19892002,
November 2007.
[11] E.P. Pena-Pitarch, J. Yang, and K. Abdel-Malek. SANTOS
TM
hand: A 25 degree-of-freedom model.
In Proc. of SAE Digital Human Modeling for Design and Engineering, Iowa City, USA, June 2005.
[12] J. H. Coert, H. G. van Dijke, S. E. Hovius, C. J. Snijders, and M. F. Meek. Quantifying thumb rotation
during circumduction utilizing a video technique. J Orthop Res, 21(6):11511155, November 2003.
[13] F. J. Valero-Cuevas, M. E. Johanson, and J. D. Towles. Towards a realistic biomechanical model of
the thumb: the choice of kinematic description may be more critical than the solution method or the
variability/uncertainty of musculoskeletal parameters. J Biomech, 36(7):10191030, July 2003.
[14] L. Kuo, W.P. Cooney, M. Oyama, K.R. Kaufam, F.C. Su, and K.N. An. Feasibility of using surface
markers for assessing motion of the thumb trapeziometacarpal joint. Clinical Biomechanics, 18(6):558
563, July 2003.
[15] L.Y. Chang and N. Pollard. Method for determining kinematic parameters of the in vivo thumb
carpometacarpal joint. IEEE Trans. Biomed. Eng., 55(1):18971906, July 2008.
[16] A. Hollister, D. J. Giurintano, W. L. Buford, L. M. Myers, and A. Novick. The axes of rotation of
the thumb interphalangeal and metacarpophalangeal joints. Clinical Orthopaedics & Related Research,
320:188193, November 1995.
[17] Y. Yasumuro. Three-dimensional modeling of the human hand with motion constraints. Image and
Vision Computing, 17(2):149156, February 1999.
[18] I. Albrecht, J. Haber, and H.P. Seidel. Construction and animation of anatomically based human hand
models. In Proceedings of SIGGRAPH the conference on Computer graphics and interactive techniques,
pages 98109, San Diego, California, 2003.
[19] S. Sueda, A. Kaufman, and D.K. Pai. Musculotendon simulation for hand animation. ACM Trans.
Graph. (SIGGRAPH), 27(3):18, 2008.
[20] K. Singh and E. Kokkevis. Skinning characters using surface oriented free-form deformations. In Proc.
of the Conference on Graphics Interface, pages 3542, Montreal, Canada, 2000.
[21] X.C. Wang and C. Phillips. Multi-weight enveloping: least-squares approximation techniques for skin
animation. In SCA 02: Proceedings of the 2002 ACM SIGGRAPH/Eurographics symposium on Com-
puter animation, pages 129138, New York, NY, USA, 2002. ACM.
[22] T.W. Sederberg and S.R. Parry. Free-form deformation of solid geometric models. In Proceedings
of SIGGRAPH the conference on Computer graphics and interactive techniques, pages 151160, New
York, NY, USA, 1986. ACM Press.
[23] J. P. Lewis, M. Cordner, and N. Fong. Pose space deformation: a unied approach to shape inter-
polation and skeleton-driven deformation. In Proceedings of SIGGRAPH the conference on Computer
graphics and interactive techniques, pages 165172, New York, NY, USA, 2000. ACM Press/Addison-
Wesley Publishing Co.
41
ICT FP7 216239 DEXMART Deliverable D1.1
[24] D. Anguelov, P. Srinivasan, D. Koller, S. Thrun, J. Rodgers, and J. Davis. Scape: shape completion
and animation of people. ACM Trans. Graph., 24(3):408416, 2005.
[25] S.I. Park and J.K. Hodgins. Capturing and animating skin deformation in human motion. ACM Trans.
Graph., 25(3):881889, 2006.
[26] I. Sderkvist and P.A. Wedin. (determining the movements of the skeleton using well-congured
markers. Journal of Biomechanics, 26(12):14731477, 1993.
[27] T. Andriacchi, E. Alexander, M. Toney, C. Dyrby, and J. Sum. A point cluster method for in vivo motion
analysis: Applied to a study of knee kinematics. Journal of Biomechanical Engineering, 120(12):743
749, 1998.
[28] A. Cereatti, U. Della Croce, and A. Cappozzo. Reconstruction of skeletal movement using skin markers:
comparative assessment of bone pose estimators. Journal of NeuroEngineering an Rehabilitation, 3(7),
2006.
[29] A. Cappello, A. Cappozzo, P.F. La Palombara, L. Lucchetti, and A. Leardini. Multiple anatomical
landmark calibration for optimal bone pose estimation. Human Movement Science, 16(2-3):259274,
1997.
[30] A. Cappozzo, F. Catani, U. Della Croce, and A. Leardini. Position and orientation of bones during
movement: anatomical frame denition and determination. Clinical Biomechanics, 10:171178, 1995.
[31] J. Lin, Y. Wu, and T.S. Huang. Modeling the constraints of human hand motion. In Proc. of Workshop
on Human Motion, page 121, Washington, DC, USA, 2000. IEEE Computer Society.
[32] C.S. Chua, H.Y. Guan, and Y.K. Ho. Model-based nger posture estimation. In Proc. of Asian
Conference on Computer Vision, pages 4348, January 2000.
[33] G. Jin and J.K. Hahn. Adding hand motion to the motion capture based character animation. In
International Symposium on Advances in Visual Computing, pages 1724, 2005.
[34] M. Nakamura, C. Miyawaki, N. Matsushita, R. Yagi, and Y. Handa. Finger kinematic modeling and
real-time hand motion estimation. J Electromyogrraphy and Kinesiology, 8(5):295303, 1998.
[35] C. Hager-Ross and M. H. Schieber. Quantifying the independence of human nger movements: com-
parisons of digits, hands, and movement frequencies. Journal of Neuroscience, 20(22):85428550,
2000.
[36] S.W. Lee and X. Zhang. Biodynamic modeling, system identication, and variability of multi-nger
movements. J Biomech., 40(14):32153222, 2007.
[37] C.E. Lang and Schieber M.H. Human nger independence: limitations due to passive mechanical
coupling versus active neuromuscular control. Journal of Neurophysiology, 92:28022810, 2004.
[38] P.H. Thakur, A.J. Bastian, and S.S. Hsiao. Multidigit movement synergies of the human hand in an
unconstrained haptic exploration task. Journal of Neuroscience, 28(6):12711281, 2008.
[39] E. Holden. Visual Recognition of Hand Motion. PhD thesis, University of Western Australia, 1997.
[40] N.A. Baker, R. Cham, and E.H. Cidboy. Kinematics of the ngers and hands during computer keyboard
use. Clinical Biomechanics, 22(1):3443, January 2007.
42
ICT FP7 216239 DEXMART Deliverable D1.1
[41] DLR et al. Specication of benchmarks. Technical report, European research project DEXMART
(FP7-216239), 2009.
[42] D. Witonski. Dynamic magnetic resonance imaging. Clinics in Sports Medicine, 21(3):403415.
[43] H.H. Quick, M.E. Ladd, and M. Hoevel. Real-time mri of joint movement with true sp. Magnetic
Resonance Imaging, 15:710715, 2002.
[44] J. Brossmann, Muhle C., and Schroder C. Patellar tracking patterns during active and passive knee
extension: evaluation with motion-triggered cine mr imaging. Radiology, 187:205212, 1993.
[45] B. Gilles, R. Perrin, N. Magnenat-Thalmann, and J. Vallee. Bone motion analysis from dynamic mri:
Acquisition and tracking. Academic Radiology, 12(10):12851292.
[46] C. Muhle. Kinematic ct and mr imaging of the patellofemoral joint. Eur Radiol, 9(3):508518, 1999.
[47] J. Fuller, L. Liu, M.C. Murphy, and R.W. Mann. A comparison of lower-extremity skeletal kinematics
measured using skin- and pin-mounted markers. In 3-D Analysis of Human Movement, volume 16,
pages 219242, 1997.
[48] I.K. Sahni, Hipp J.A., Kirking B.C., Alexander J.W., and Esses S.I. Use of percutaneous transpedicular
external xation pins to measure intervertebral motion. Spine, 24(18):18901893, September 1999.
[49] D. Nunn, M.A. Freeman, P.F. Hill, and S.J. Evans. The measurement of migration of the acetabular
component of hip prostheses. Journal of Bone and Joint Surgery - British Volume, 71-B:629631,
1989.
[50] M. Veber and T. Bajd. Assessment of human hand kinematics. In Proc. of International Conference
on Robotics and Automation, pages 29662971. IEEE, 2006.
[51] I.W. Charlton, P. Smyth, and L. Roren. Repeatability of an optimized lower body model. Gait and
Posture, 20:213221, 2004.
[52] H.R. Lindman. Analysis of variance in complex experimental designs. SIAM Review, 18(1):134137,
January 1976.
[53] H. Akaike. A new look at the statistical model identication. IEEE Trans. Autom. Control,
19(6):716723, 1974.
[54] G. Schwarz. Estimating the dimension of a model. Annals of Statistics, 6(2):461464, 1978.
[55] K. Bubna and C.V. Stewart. Model selection and surface merging in reconstruction algorithms. In
Proc. of International Conference on Computer Vision, pages 895902, 1998.
43

Anda mungkin juga menyukai