Anda di halaman 1dari 4

Hand Pose Recognition Using Curvature Scale Space

Chin-Chen Chang*, I-Yen Chen and Yea-Shuan Huang


Advanced Technology Center
Computer & Communications Research Laboratories
Industrial Technology Research Institute, Chutung, Hsinchu, Taiwan 310, R.O.C.
*E-mail: chinchen@itri.org.tw

Abstract presented a sign language system and used the Fourier


descriptor [15] to characterize hand shapes. Lee and
In this paper, we present a novel feature extraction ap- Chung [7] proposed an algorithm that extracts features to
proach based on Curvature Scale Space (CSS) for trans- recognize sign language based on orientation histograms
lation, scale, and rotation invariant recognition of hand of hand postures. Wu and Huang [14] presented a view-
poses. First, the CSS images are used to represent the dependent hand posture recognition approach to achieve
shapes of boundary contours of hand poses. Then, we natural interaction in virtual environments and used the
extract the multiple sets of CSS features to overcome the Fourier descriptor to represent hand poses.
problem of deep concavities in contours of hand poses. The 2D appearance-based approach is the most natural
Finally, nearest neighbour techniques are used to perform way for HCI. Characterizing hand poses is one of the
CSS matching between the multiple sets of input CSS challenging topics in gesture research. In this paper, we
features and the stored CSS features for hand pose identi- propose a 2D appearance-based approach for translation,
fication. Results show the proposed approach can extract scale, and rotation invariant recognition of hand poses
the multiple sets of CSS features from the input images based on Curvature Scale Space (CSS) [8,9]. First, we use
and perform well for recognition of hand poses. the CSS images [8,9] to represent the contours of hand
poses. Then, multiple sets of CSS features are extracted to
overcome the problem of deep concavities in contours of
1. Introduction hand poses. And, finally, we apply nearest neighbour
techniques [2] to perform CSS matching between the
In recent years, the use of gestures in human computer multiple sets of input CSS features and the stored CSS
interaction (HCI) has been extensively researched for features for identification of hand poses.
humans to interact with computers [1,3,4,7,10,11,12,14]. The rest of this paper is organized as follows. In Sec-
There are two main approaches for analyzing gestures, tion 2, we briefly introduce the CSS image. In Section 3,
namely, 3D model-based approaches and 2D appearance- we extract the multiple sets of CSS features and nearest
based approaches [4,10]. The 3D model-based approach is neighbour techniques are used to perform CSS matching
based on the 3D spatial description of the hand and 2D the between the multiple sets of input CSS features and the
appearance-based approach is based on appearance of stored CSS features for hand pose identification. Imple-
hands in the 2D visual images. Generally, gestures can be mentation and results are given in Section 4. Finally,
classified into two types, namely, hand poses and dy- conclusion is presented in Section 5.
namic gestures. Hand poses are characterized by the in-
formation about the hand shapes and dynamic gestures
represent some actions by hand movements. 2. The Curvature Scale Space Image
Schlenzig et al. [11] proposed a 2D appearance-based
hand gesture interpretation using recursive estimation Mokhtarian and Mackworth [8,9] first proposed the ob-
based on the Zernike moments [5,6,13] of hand poses. Su ject contour-based shape descriptor based on the CSS
et al. [12] presented a static hand gesture recognition image of the contour. The CSS descriptor provides trans-
system based on ten fingers’ flex angles of hand poses lation, scale and rotation invariant features of curves.
using a composite neural network. Banarse and Duller The curvature κ of a planar curve is defined as the
[1] developed a three stage self-organizing neural network derivative of the tangent angle φ with respect to the arc
architecture to perform static gesture recognition and used length s, as shown in Figure 1. The curvature κ is writ-
2D planes cells as features of hand poses. Huang et al. [3] ten as follows [8,9]:

1051-4651/02 $17.00 (c) 2002 IEEE


dφ X u (u, σ ) = x(u ) * g u (u, σ ),
κ= . (1)
ds X uu (u, σ ) = x(u ) * g uu (u, σ ),
Yu (u, σ ) = y (u ) * g u (u, σ ),
and
Yuu (u, σ ) = y (u ) * g uu (u, σ ),
where * means convolution,

gu (u,σ ) = g (u,σ )
∂u
and
Figure 1: The curvature of a planar curve. ∂2
guu (u,σ ) = g (u,σ ).
∂u 2
Let Γ a planar curve defined by The function defined implicitly by
Γ = {( x(u ), y (u )) | u ∈ [0,1]}, (2) κ (u, σ ) = 0 (7)
where u is the normalized arc length parameter. Then the is the CSS image of Γ [8,9].
curvature function κ (u ) of Γ can be expressed as
follows:
x& (u ) &y&(u ) − &x&(u ) y& (u ) , 3. Feature Extraction and Hand Pose Recog-
κ (u ) = 3
(3)
nition
(( x& (u )) 2 + ( y& (u )) 2 ) 2
where The overall procedure of hand pose recognition is
dx d x 2 shown in Figure 2. First, input images of hand poses.
x& (u ) = , &x&(u ) = 2 , Second, the images of hand poses are segmented into the
du du binary contour images via some image processing tech-
dy d2y niques. Third, compute the CSS representations of the
y& (u ) = , and &y&(u ) = 2 . contours of hand pose images. Then, extract multiple sets
du du of CSS features. Finally, the nearest neighbor techniques
An evolved version of the curve is defined by are used to perform features matching and recognition
Γσ = {( X (u, σ ), Y (u , σ )) | u ∈ [0,1]} , (4) between the input feature vectors and the stored feature
where X (u , σ ) and Y (u , σ ) are defined as vectors for hand pose identification.
X (u,σ ) = x(u) * g (u,σ )
and
Y (u,σ ) = y(u) * g(u,σ ) ,
respectively, and g (u , σ ) denotes a one dimensional
Gaussian kernel of width σ defined by
− u2
1
g (u , σ ) = ).exp( (5)
σ 2π 2σ 2
Functions X (u , σ ) and Y (u , σ ) are given explicitly by
∞ 1 − (u − v ) 2
X (u , σ ) = ∫ −∞
x (v ) ⋅
σ 2π
⋅ exp(
2σ 2
) dv

and
∞ 1 − (u − v)2
Y (u,σ ) = ∫ y(v) ⋅ ⋅ exp( )dv,
−∞
σ 2π 2σ 2 Figure 2: Procedure of hand pose recognition.
respectively.
The curvature of Γσ can be computed as follows. The first three steps are shown in Figures 3(a)-3(h).
Figure 3(a) shows an input hand pose and Figure 3(b) is
X (u, σ )Yuu (u,σ ) − X uu (u, σ )Yu (u, σ ) , (6) the contour of the hand pose. Figures 3(c)-3(g) show the
κ (u, σ ) = u 3
( X u (u, σ ) + Yu (u, σ ) )
2 2 2 resulting contours of the contour of the hand pose itera-
tively low-pass filtered by performing a convolution with
where
the (0.25,0.5,0.25) kernel [16] for 201, 534, 640, 724 and

1051-4651/02 $17.00 (c) 2002 IEEE


731 iterations, respectively. And the red points in the As shown in Figure 3, the locations of the maximal
Figures mean the locations of peaks in the CSS image peaks in the CSS image approximately correspond to the
corresponding to the locations of the smoothed contours. deep concavities in original hand pose contour. Since
Figure 3(h) shows the generated CSS image and the num- there are at most five fingers and four deep concavities for
bers indicate the number of iterations of smoothing the each hand posture. In order to overcome the above unsta-
contour for the corresponding peaks. ble problem, we extract multiple sets of CSS features
from the CSS image. Let {(ui , σ i )}ti =−1Normalized
,L, N
be a set of
coordinates-peaks of the maxima in the CSS image nor-
malized by the t-th largest peak, where N is the number of
the maxima in the CSS image determined by the user-
defined threshold. Let
(a) (b) (c) F I = {{( u iI , σ iI )} ti =−1Normalized
,L , N | t = 1, 2 ,..., T } be the multi-
ple sets of CSS features of the input image I, where
1 ≤ T ≤ MaxNum, and MaxNum is a parameter determined
by the user-specified threshold. In our implementation,
MaxNum is at most 4.
(d) (e) (f) The nearest neighbour techniques are used to perform
features matching between the input feature vectors and
the stored feature vectors for hand posture identification.
Let FCS = {{( u Sj ,C , σ Sj ,C )}1j−=Normalized
1,L, M } be the feature
k k k

vector of the CSS image of a stored image S of class Cj


normalized by the largest peak. The distance function is
(g) (h)
defined by
Figure 3(a) shows the input hand pose. Figure 3(b) is the
contour of the hand pose. Figures 3(c)-3(g) show the dist( F I , FCS ) = min{MATCHt1 }
k
(8)
resulting contours of the hand pose contour iteratively and
low-pass filtered by performing a convolution with the
(0.25,0.5,0.25) kernel for 201, 534, 640, 724 and 731
MATCHt1 = ∑ (u iI − u Sj ,Ck ) 2 + (σ iI − σ Sj ,Ck ) 2
matched
peaks
(9)
iterations, respectively. Figure 3(h) shows the resulting
CSS image. + ∑σ
unmatched
i
I
+ ∑σ
unmatched
S
j ,Ck

peaks peaks

Since human hand is highly deformable, the location of where i ∈ {1, 2, …, N} and j ∈ {1, 2, …, M}. The
the largest peak in CSS image will be unstable for the decision function is defined as
same hand poses. An example is shown in Figure 4(a)- D ( F I ) = C j if dist ( F I , FCSj )
4(d). Figures 4(a) and 4(c) are the same hand poses. Fig- (10)
ure 4(b) and 4(d) are the CSS images of Figures 4(a) and {
= min dist ( F I , FCSk ) ,
1≤ k ≤ K
}
4(c), respectively and show that the locations of the larg-
where K is the number of classes.
est peaks are unstable in the CSS images.

4. Implementation and Results


Our experimental platform is a PC with an Intel® Pen-
tium® III 733MHz CPU running the Microsoft® Windows
(a) (b) 2000 Professional OS.
The gesture images are captured under a controlled
background cropped at the wrist. The experimental im-
ages used are 320x240 pixel color images of 6 different
hand poses, namely, zero, one, two, three, four, and five,
as shown in Figure 5. The database consists of 6 different
(c) (d) hand poses formed by 10 users. Each hand pose is cap-
Figures 4(a) and 4(c) are the same hand poses. Figure 4(b) tured 10 times by different scale, translation and rotation
and 4(d) are the CSS images of Figures 4(a) and 4(c), by each user. Therefore, the total number of hand poses in
respectively and shows that the locations of the largest our database is 600. We split the database into two sets: 1)
peaks are unstable in the CSS images.

1051-4651/02 $17.00 (c) 2002 IEEE


a stored set consisting of 300 data and 2) a testing set [1] D.S. Banarse and A.W.G. Duller, “Deformation Invariant
containing 300 data. Pattern Classification for Recognizing Hand Gestures,” in
IEEE International Conference on Neural Networks, Vol. 3,
pp. 1812-1817, 1996.
[2] E. Gose, R. Johnsonbaugh and S. Jost, Pattern Recognition
and Image Analysis, Prentice Hall, 1996.
[3] C.L. Huang and W.Y. Huang, “Sign Language Recognition
Using Model-based Tracking and a 3D Hopfield Neural
Network,” Machine Vision and Applications, Vol. 10, pp.
292-307, 1998.
[4] T.S. Huang and V.I. Pavlovic, “Hand Gesture Modeling,
Analysis and Synthesis,” in International Workshop on
Automatic Face and Gesture Recognition, pp. 73-79, 1995.
Figure 5: Six hand poses. [5] A. Khotanzad and Y.H. Hong, “Rotational Invariant Image
Recognition Using Features Selected via a Systematic
We compare our approach using multiple sets of input Method,” Pattern Recognition, Vol. 23, No. 10, pp. 1089-
CSS features with the method using the single set of CSS 1101, 1990.
features normalized by the largest peak and the method [6] H.K. Kim, J.D. Kim, D.G. Sim and D.I. Oh, “A Modified
using Zernike and pseudo-Zernike moments (ZMs & Zernike Moment Shape Descriptor Invariant to Translation,
PZMs). The Zernike and pseudo Zernike moments used Rotation and Scale for Similarity-Based Image Retrieval,”
are totally 56 features. The results are listed in Table1. in 2000 IEEE International Conference on Multimedia and
Expo, Vol. 1, pp. 307-310, 2000.
The results show that our approach performs better than
[7] H.J. Lee and J.H. Chung, “Hand Gesture Recognition
the other two methods since the single set of CSS features Using Orientation Histogram,” in Proceedings of the IEEE
are sensitive for the contours with deep concavities and Region 10 Conference (TENCON 99), Vol., 2, pp. 1355-
ZMs & PZMs are sensitive to noises [13]. 1358, 1999.
[8] F. Mokhtarian and A.K. Mackworth, “Scale-Based
Features of Tested Correct Recognition Description and Recognition of Planar Curves and Two-
Methods Gestures Match Rate Dimensional Shapes,” IEEE Transactions on Pattern
Multiple Sets of 300 295 98.3% Analysis and Machine Intellengence, Vol. PAMI-8, No. 1,
pp. 34-43, 1986.
CSS Features
[9] F. Mokhtarian and A.K. Mackworth, “A Theory of Multis-
Single Set of 300 284 94.7% cale, Curvature-Based Shape Representation for Planar
CSS Features Curves,” IEEE Transactions on Pattern Analysis and Ma-
ZMs & PZMs 300 285 95.0% chine Intellengence, Vol. 14, No. 8, pp. 789-805, 1992.
Table 1: The comparisons of our approach using multiple [10] V.I. Pavlovic, R. Sharma and T.S. Huang, “Visual Interpre-
sets of CSS features with the method the method using the tation of Hand Gestures for Human-Computer Interaction:
single set of CSS features normalized by the largest peak A Review,” IEEE Transactions on Pattern Analysis and
and the method using ZMs & PZMs. Machine Intelligence, Vol. 19, No. 7, pp. 496-513, 1988.
[11] J. Schlenzig, E. Hunter and R. Jain, “Vision Based Hand
Gesture Interpretation Using Recursive Estimation,” in
1994 Conference Record of the Twenty-Eighth Asilomar
5. Conclusion Conference on Signals, Systems and Computers,” Vol. 2,
pp. 1267-1271, 1994.
We have proposed the appearance–based approach of [12] M.C. Su, W.F. Jean and H.T. Chang, “A Static Hand Ges-
ture Recognition System Using a Composite Neural Net-
the hand pose recognition using CSS images. The pro-
work,” in Proceedings of the Fifth IEEE International Con-
posed approach performs better than the other two meth- ference on Fuzzy Systems, Vol. 2, pp. 786-792, 1996.
ods and can achieve about 98.3% recognition rate for [13] C.H. Teh and R.T. Chin, “On Image Analysis by the Meth-
recognizing 6 different hand poses. ods of Moments,” IEEE Transactions on Pattern Analysis
and Machine Intelligence, Vol. 10, No. 4, pp. 677-695,
Acknowledgements 1997.
This work is a partial result of project number [14] Y. Wu and T.S. Huang, “View-independent Recognition of
Hand Postures,” in Proceedings of IEEE Conference on
903XS1B11 conducted by ITRI under the sponsorship of
Computer Vision and Pattern Recognition (CVPR’2000),
the Ministry of Economic Affairs, R.O.C. Vol. II, pp.88-94, 2000.
[15] C.T. Zhan and R.Z. Roskies “Fourier descriptor for plane
closed curves,” IEEE Transactions on Computer, Vol. 21,
References No. 3, pp. 269-281, 1972.
[16] Overview of the MPEG-7 Standard (version 4.0). ISO/IEC
JTC1/SC29/WG11 N3753, Oct. 2000.

1051-4651/02 $17.00 (c) 2002 IEEE

Anda mungkin juga menyukai