Anda di halaman 1dari 6

Feature Extraction from 2D Gesture Trajectory in Dynamic Hand Gesture Recognition

M.K. Bhuyan, D. Ghosh and P.K. Bora


Department of Electronics and Communication Engineering Indian Institute of Technology Guwahati, India 781039 Email: (manas kb, ghosh, prabin )@iitg.ernet.in
Abstract Vision-based hand gesture recognition is a popular research topic for human-machine interaction (HMI). We have earlier developed a model-based method for tracking hand motion in complex scene by using Hausdorff tracker. In this paper, we now propose to extract certain features from the gesture trajectory so as to identify the form of the trajectory. Thus, these features can be efciently used for trajectory guided recognition/classication of hand gestures. Our experimental results show 95% of accuracy in identifying the forms of the gesture trajectories. This indicates that the trajectory features proposed in this paper are appropriate for dening a particular gesture trajectory. Keywordshuman machine interaction, video object plane, motion trajectory

I. I NTRODUCTION One very interesting eld of research in Pattern Recognition that has gained much attention in recent times is Gesture Recognition. Gesture refers to a particular pose and/or movement of the body parts, such as hand, head, face etc., so as to convey some message. Accordingly, one important direction of research in gesture recognition is concerned with hand gestures formed by different hand shapes, positions, orientations and movements. While static hand gestures are modelled in terms of hand conguration, as dened by the ex angles of the ngers and palm orientation, dynamic hand gestures include hand trajectories and orientation in addition to these. So, appropriate interpretation of dynamic gestures on the basis of hand movement in addition to shape and position is necessary for recognition. Another form of dynamic hand gesture that is in common use is in which the 2D gesture trajectory alone builds up a particular message. Examples of such gestures are shown in Fig. 1. Recognition of these gesture, hence, will require trajectory estimation followed by extraction of features dening the estimated gesture trajectory. As mentioned above, the rst task in dynamic hand gesture recognition is to track hand motion from the gesture video sequence and subsequently estimate the trajectory through which the hand moves during gesticulation. While trajectory estimation is quite simple and straightforward in glove-based hand gesture recognition system [1], [2] that provides spatial information directly, trajectory estimation in vision-based system may require to apply complex algorithms to track hand

and ngers using silhouettes and edges. One such method is given in [3] that uses skin color to segment out hand regions and subsequently determines 2D motion trajectories. However, many of these techniques are plagued by some difculties such as large variation in skin tone, unknown lighting conditions and dynamic scenes. In an attempt to overcome these difculties a model-based method for tracking hand motion in complex scenes is proposed in [4]. The modied version of this algorithm is described in Section II. Next important step in gesture recognition is the selection of suitable features. Selecting good features is crucial to gesture recognition, since hand gestures are very rich in shape variation, motion and textures. In view of the present problem, in this paper, we propose to use some basic trajectory features like key trajectory points, trajectory length, trajectory shape, location feature, orientation feature, velocity feature and acceleration feature, so as to accomplish trajectory guided recognition successfully. Section III describes how these features are extracted from the estimated gesture trajectory.

Fig. 1. Hand gestures for representing a circle, two, square and a wavy hand.

II. T RACKING ALGORITHM A. Motion vector estimation The tracking algorithm given in [4] is based on Hausdorff object tracker developed in [5]. The core of this tracking algorithm is an object tracker that matches a two-dimensional binary model of the object against subsequent frames using the Hausdorff distance measure. The best match found indicates the translation the object has undergone, and the model is updated with every frame to accommodate for rotation and changes in shape. In our proposed technique, we segment the frames in the gesture sequence so as to form video object planes (VOPs) where the hand is considered as video object (VO) [6]. Next,

1424400236/06/$20.00 c 2006 IEEE

CIS 2006

Authorized licensed use limited to: Women's College of Engineering. Downloaded on August 05,2010 at 10:37:12 UTC from IEEE Xplore. Restrictions apply.

the VOPs are binarized to form binary alpha plane the handpixels are assigned 1 and the background pixels are assigned 0. Thus, a binary model for the moving hand is derived and is used for tracking. The Hausdorff object tracker nds the position where the input hand model best matches the next edge image and returns the motion vector M Vi that represents the best translation. B. Trajectory estimation Determination of centroid of hand image After determining binary alpha plane corresponding to each VOP, moments are used to nd the center of the hand. The 0th and the 1st moments are dened as: M00 =
x y

Centroid computed by Motion Vector

Motion Vector

Final Centroid

VOP i1

VOP i

Centroid computed by Moment Eqn.

Fig. 2.

Estimation of centroids in VOPs

Fig. 3. Estimated centroids, calculated by motion vector and moment equations in the VOPs.

I(x, y) xI(x, y)
x y

(1) (2) (3)

M10 = M01 =
x y

Therefore, in order to obtain a smooth trajectory, we apply piecewise approximation of spatial positions of the centroids in successive VOPs following the MPEG-7 motion trajectory approximation and representation technique [7]. This gives the nal trajectory model. Trajectory approximation

yI(x, y)

Subsequently, the centroid is calculated as xc = M10 M00 and yc = M01 M00 (4)

First order approximation : x(t) = xi + vi (t ti ), where vi = xi+1 xi ti+1 ti (5)

Second order approximation : 1 x(t) = xi + vi (t ti ) + ai (t ti )2 2 where vi =


xi+1 xi ti+1 ti

In the above equations, I(x, y) is the pixel value at the position (x, y) of the image. Since the background pixels are assigned 0, the centroid of the hand in a frame is also the centroid of the total frame. Therefore, in moment calculation, we may either take the summation over all pixels in the frame or over only the hand pixels. Estimation of centroids of motion trajectory As shown in the Fig. 2 and 3, centroid is calculated both by moment equations (1) to (4) as well as by motion vector, where centroid Ci is obtained by translating Ci1 by the respective motion vector. The nal centroid is taken as the average of these two. This is to nullify the effect of slight shape changes in successive VOPs. C. Trajectory formation and smoothing of nal trajectory Final trajectory is formed by joining all the calculated centroids in sequential manner. However, this trajectory may be noisy due to the following reasons. points too close.

(6)

1 ai (ti+1 ti ) 2

and similarly for the other dimension y. Here vi and ai represent hand velocity and acceleration respectively, considered constant on [ti , ti+1 ], (xi , yi ) and (xi+1 , yi+1 ) are hand positions at times ti and ti+1 . So, dynamic hand gesture can be interpreted as a set of points in a spatio-temporal space as DG = {(x1 , y1 ), (x2 , y2 ), ......, (xt , yt )} D. Key frame based trajectory estimation The computational burden for estimating hand trajectory can be signicantly reduced by selecting the key video frames in the gesture video sequence. The key VOPs are selected on the basis of Hausdorff distance measure, thereby transforming an entire video clip into a small set of representative frames that are sufcient to represent a particular gesture sequence [8], [9]. After getting the key frames, hand trajectories are obtained by following the same procedure as that of the previous one, but considering only the key VOPs in the sequence. (7)

isolated points far away from correct trajectory due to change in shape of the hand model. unclosed end points. hand trembling. unintentional movements.

Authorized licensed use limited to: Women's College of Engineering. Downloaded on August 05,2010 at 10:37:12 UTC from IEEE Xplore. Restrictions apply.

III. F EATURE EXTRACTION FROM ESTIMATED TRAJECTORY For trajectory matching we consider both static and dynamic features. The static features considered in this paper correspond to shape of the hand trajectory, the total length of the extracted trajectory, location of key trajectory points and the orientation of hand in the gesture trajectory. The dynamic features proposed for trajectory classication are velocity and acceleration features. We describe the static features as low level features while dynamic features are high level features. The basic idea is that, even if the static features correctly match during classication, i.e., even if the shape or length matching criteria full it may not represent actual hand trajectory until motion patterns are not compared. Therefore, for actual recognition, both low level and high level features have to be considered for trajectory matching. A. Static features 1) Key trajectory point selection: The basic principle of key point selection is merging of adjacent approximation interval of the estimated trajectory, until the interpolation error exceeds a predened threshold. Key points are dened by their coordinates in 2D space and time, i.e., (x, y, t). These key points best represent the prominent locations of the hand in the gesture trajectory. The total number of key points can be chosen by the user so that the global precision and compactness required by the application is met, and variable time intervals between key points can be chosen to match the local trajectorys smoothness.
Key Trajectory Points

for calculation of locations of the selected key points on the gesture trajectory.
Key Point

(x M , y M )

(x , y ) 0 0

L0
Fig. 5.

(x , y )
Center of Gravity

^ ^

LN

(x N , y N )

Extraction of location feature.

The center of gravity of a gesture trajectory is calculated as x= 1 N


N N

xi and y =
i=0

1 N

yi
i=0

(9)

and the location of different key points from the center of gravity is calculated as Li = (xi x)2 + (yi y )2 (10)

where (xi , yi ) is the ith key point. The average positional distance of all the key points from the center of gravity, given as Lavg = 1 N +1
N

Li
i=0

(11)

Lavg is our proposed location feature that is related to the overall gesture size. 4) Orientation feature extraction: Orientation feature gives the direction along which the hand traverses in space while making a gesture. In order to extract this feature information, rst the hand displacement vector at every point on the trajectory is calculated as di = [xi xi1 , yi yi1 ], i = 1, 2, ...., N i = tan1 yi yi1 xi xi1 , i = 1, 2, ...., N (12)

Fig. 4.

Ideal key points in the circle and square representing gestures

2) Trajectory length calculation: For determining the total length traversed by hand during a gesture (i.e., the length of the gesture trajectory) the sum of all the Euclidean distances between all points is calculated. D= {(xi xi+1 )2 + (yi yi+1 )2 } 2
1

(8)

(13)

where the summation in runs from i = 0 to N 1. 3) Location feature extraction: The location feature is the measure of the distance between the center of gravity and the selected key points in a gesture trajectory, as depicted in Fig. 5. The location of a gesture trajectory cannot be solely determined from the origin/start point, since for a particular gesture there is always spatial variation in the gesture start point. This is solved by using the gesture center of gravity

where di and i give the magnitude and direction, respectively, of hand displacement at the ith trajectory point. Feature values that we propose to derive from these measured displacement vectors are as follow. 1) Directions of hand movement at the starting and ending points, i.e., s = 1 and e = N .

Authorized licensed use limited to: Women's College of Engineering. Downloaded on August 05,2010 at 10:37:12 UTC from IEEE Xplore. Restrictions apply.

Velocity

Acceleration

Velocity

Gesture

Coarticulation phase

Gesture

n+1

(a)

time

time
(b)
Key Frames

Fig. 6. Ideal velocity plot of (a) Circle and (b) Square representing gestures

Fig. 7.

Ideal acceleration plot for gestures connected sequentially

2) Number of points N at which the change in direction of hand movement exceeds some predened threshold T , i.e., |i+1 i | T . This feature corresponds to the number of signicant corners in the gesture trajectory. From large number of visualization experiments, it is observed that a human can generally perceive change in direction of hand movement only when the amount of angular displacement is approximately 45o or more. Accordingly, we select the threshold T equal to 45o . B. Dynamic features Motion features are computed from the spatial positions of hand in the gesture trajectory and the time interval between two prominent hand positions as follow. 1) Velocity feature: In some critical situations, velocity feature plays a decisive role during gesture recognition phase. It is based on an important observation that each gesture is made at different speeds, as depicted in Fig. 6. For a circle representing gesture, hand velocity is more or less constant throughout the gesture, whereas in square gesture, velocity of the hand decreases at the corner points of the square and/or if hand pauses at the corner points velocity drops to zero. 2) Acceleration feature: As mentioned earlier, continuous gestures are composed of a series of gestures that as a whole bears some meaning. As a rst step towards recognition, a continuous gesture sequence needs to be segmented into its component gestures. However, the process is complicated due to the phenomenon of co-articulation in which one gesture inuences the next in the temporal sequence [10]. This happens due to hand movement during transition from one gesture to the next. The problem is very signicant in case of uent sign language. Recognition of co-articulated gestures is one of the hardest parts in gesture recognition. In view of this, we propose acceleration feature which may distinguish co-articulation phase from the meaningful dynamic gesture sequence, as during co-articulation hand moves very quickly, i.e., with high acceleration, just after the completion of one gesture to the next gesture start position, as shown in Fig. 7. The acceleration feature in combination with other features and representation, viz., FSM representation [8], can efciently isolate the co-articulation phase from the rest of the gesture sequence.

Computation of motion features Velocity features are computed from two consecutive key point distance and the corresponding time interval as shown in equation (14). The rst derivative of the velocity of the hand motion in successive video frames determines the hand acceleration while performing gestures. The velocity of the hand for a trajectory T (xi , yi , ti ), i = 0, 1, 2, ....., N , is calculated as vi = xi+1 xi yi+1 yi , ti+1 ti ti+1 ti , i = 0, 1, 2, .....N 1 (14)

Following velocity based features are extracted for a given trajectory.


Average velocity over the whole trajectory length vavg . Maximum trajectory velocity vmax . Minimum trajectory velocity vmin . Number of maxima (Nv,max ) in the velocity prole. This corresponds to the number of smooth trajectory segments. Number of minima (Nv,min ) in the velocity prole. This corresponds to the number of corners in the trajectory.

Instead of doing computation for extracting acceleration feature from each and every frame of gesture video sequence, we can compute it effectively by using either the key points of the extracted trajectory or from the key frames of the sequence. This greatly improves the computational burden. Normalization of features For a particular gesture feature, viz., location, velocity or acceleration, normalization is done as follows. N ormmax = max(N ormi )
i=1 M

(15)

then vnormi =

N ormi N ormmax

(16)

where N ormi represents the feature vector of dimension i to be normalized and N ormmax is the maximum value of the feature vector which is determined from all the M key

Authorized licensed use limited to: Women's College of Engineering. Downloaded on August 05,2010 at 10:37:12 UTC from IEEE Xplore. Restrictions apply.

points in the gesture trajectory. Finally, from equation (16), the normalized value of the feature vector vnormi is calculated, which lies between 0.0 to 1.0, and converted into feature codes by partitioning the range from 0.1 to 1.0. C. Forming prototype feature vectors and knowledge-base for gesture matching The set of feature values extracted from the template trajectory for a particular gesture, as described above, forms the prototype feature vector that gives the mathematical description for that class of gesture. The dimensionality of this feature vector is 10, where the ten feature values are

Trajectory length l, Location feature Lavg , Starting hand orientation s , Ending hand orientation e , Number of signicant changes in hand orientation N , Average velocity vavg , Maximum velocity vmax , Minimum velocity vmin , Number of maxima in the velocity prole Nv,max , and Number of minima in the velocity prole Nv,min .

trajectory and the corresponding template/prototype trajectory. In the next step, all the derived trajectories are normalized and aligned in time using Dynamic Time Warping (DTW) technique as described in [11]. Then from these normalized trajectories we have extracted the trajectory features following the method described in this paper. Subsequently, we use these features for trajectory recognition. In our experiment, we achieved 95% of accuracy. The high recognition rate in identifying the different forms of the trajectories may be attributed to the proposed trajectory feature extraction process. The trajectory features used here give distinct measures from one trajectory to another making them easily distinguishable. As an example, consider the normalized location feature in case of circle and square as given in Table1, where the number of key points for both the trajectories are predened. For circular trajectory, locations from the center of gravity to the predened key points are almost same for all the trajectory key points, whereas location feature is not constant along the whole periphery for the square representing trajectory. As a second example, Table 2 illustrates the variation in acceleration of hand during transition from one gesture to another. The connected gesture sequences for our experiment are transition of 1 to 2 and from 5 to 1 representing hand gesture trajectories. It is seen during coarticulation, hand acceleration is signicant as compared to the acceleration during gesticulation.
11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 00000000 11111111 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 00000000 11111111 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 00000000 11111111 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 00000000 11111111 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 00000000 11111111 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 00000000 11111111

Finally, all prototype feature vectors, one per gesture class, together form the knowledge-base which is used for gesture matching during classication. IV. E XPERIMENTAL R ESULTS We have tested altogether ten different hand trajectories in view of special applications like robot control and gesture based window menu activation in the Human-Computer interactive (HCI) platform. They are shown in Fig. 8. The extracted trajectories for four of the gesture sequences are shown in Fig. 9. First row shows extracted trajectories and the second row shows smoothed trajectory. Trajectories extracted from the key VOPs of the gesture video sequence are shown Fig. 10. Our proposed trajectory estimator gives about 99% of accuracy in nding the actual trajectory. The accuracy criteria is xed in terms of shape similarity between the extracted gesture

Fig. 9.

Estimated and smoothen trajectories.


11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 00000000 00000000 00000000 11111111 11111111 11111111

111111111 111111111 000000000 000000000 111111111 111111111 000000000 000000000 111111111 111111111 000000000 000000000 111111111 111111111 000000000 111111111 111111111 000000000 000000000 000000000 111111111 111111111 000000000 000000000 111111111 111111111 000000000 000000000 111111111 111111111 000000000 000000000 111111111 111111111 000000000 000000000 111111111 111111111 000000000 000000000 111111111 111111111 000000000 000000000 111111111 111111111 000000000 000000000 000000000 000000000 111111111 000000000 000000000 111111111 111111111 111111111

Fig. 10.

Key VOP based trajectory estimation.

V. C ONCLUSIONS AND D ISCUSSION The advantage of VOP based method for segmentation of hand image is that no extra computation for rotation and scaling of the object are required, where the shape change is represented explicitly by a sequence of two dimensional models, one corresponding to each image frame. Moreover, trajectory estimated from the corresponding VOPs bears spatial information in dynamic gestures, which is required in the gesture classication stage. During trajectory guided recognition,

Fig. 8.

Hand gestures showing different motion trajectories.

Authorized licensed use limited to: Women's College of Engineering. Downloaded on August 05,2010 at 10:37:12 UTC from IEEE Xplore. Restrictions apply.

Table 1: Normalized Location Feature


Key points Gestures

[9] M.K. Bhuyan, D. Ghosh, and P.K. Bora, Key video object plane selection by MPEG-7 visual shape descriptor for summarization and recognition of hand gestures, Proc. Fourth Indian Conference on Computer Vision, Graphics and Image Processing (ICVGIP), pp. 638643, 2004. [10] A. Shamaie, W. Hai, and A. Sutherland, Hand gesture recognition for HCI, ERCIM News (on line edition), http://www.ercim.org/publication/Ercim News, no. 46, 2001. [11] M.K. Bhuyan, D. Ghosh, P.K. Bora, Trajectory guided recognition of hand gestures for human computer interface, Proc. 2nd Indian International Conf. Articial Intelligence (IICAI), pp. 312-327, 2005.

Circle

0.320

0.315

0.300 0.330 0.322

0.321 0.321 0.301

Square

0.521

0.322

0.541 0.300 0.550

0.310 0.540 0.330

Equal no. of key points are taken for both gestures.

Table 2 : Acceleration feature for connected gesture sequence.


Gesture phases Gesture sequence 1st gesture phase Coarticulation 2nd gesture phase phase

12 51
*

0.342 0.257

0.783 0.845

0.338 0.276

Average value is taken for a particular gesture phase.

the extracted features serve the purpose of efcient classication of gestures. Also, the proposed acceleration feature works nicely only when the spatial end position of preceding gesture is different from the start position of next gesture in the connected gesture sequence.

R EFERENCES

[1] D.L. Quam, Gesture recognition with a data glove, Proc. IEEE Conf. National Aerospace and Electronics, vol. 2, pp. 755760, 1990. [2] D.J. Sturman, and D. Zeltzer, A survey of glove-based input, IEEE Computer Graphics and Applications, vol. 14, pp. 3039, 1994. [3] M.H. Yang, N. Ahuja, and M. Tabb, Extraction of 2D motion trajectories and its application to hand gesture recognition,IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, no. 8, pp. 10611074, 2002. [4] M.K. Bhuyan, D. Ghosh, P.K. Bora, Estimation of 2D motion trajectories from video object planes and its application in hand gesture recognition, Lecture Notes in Computer Science PreMI05, Springer-Verlag., LNCS 3776, pp. 509514, 2005. [5] D. P. Huttenlocher, J.J. Noh, and W.J. Rucklidge, Tracking non-rigid objects in complex scene, Proc. Fourth International Conference of Computer Vision, pp. 93101, 1993. [6] M.K. Bhuyan, D. Ghosh, P.K. Bora, Automatic video object plane generation for recognition of hand gestures, Proc. International Conf. Systemics, Cybernetics and Informatics (ICSCI), pp. 147152, 2005. [7] B.S.Manjunath, P. Salembier, and T. Sikora, ed., Intoduction to MPEG-7, Multimedia Content Description Interface, John Wiley and Sons, Ltd., 2002, pp. 273276. [8] M.K. Bhuyan, D. Ghosh, and P.K. Bora, Finite state representation of hand gestures using key video object plane, Proc. IEEE Region 10 Asia-Pacic Conf. (TENCON), , pp. 2124, 2004.

Authorized licensed use limited to: Women's College of Engineering. Downloaded on August 05,2010 at 10:37:12 UTC from IEEE Xplore. Restrictions apply.