Pi19404
December 23, 2012
Contents
Contents
Gesture Recognition using HMM
0.1 0.2 0.3 Gesture . . . . . . . . . . . . . . . Hidden Markov Model . . . . . . Gestures and HMM . . . . . . . . 0.3.1 Gestures Representation . 0.3.2 Normalizing Gestures . . 0.3.3 Discretization of Gestures References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
3 3 4 4 4 5 9
0.4
2|9
A gesture is represented as a spatio-temporal sequence of feature vectors that describe the direction of hand movement. The hand gesture continuous time representation of feature vectors ,this is converted to one of codewords from predened set using vector quantizer. we construct a codebook for each gesture and build a HMM model utilizing the spatio temporal information encoded by the codewords representing the gesture. A unique initial and nal state are dened in the model. The number of states in the model are dened by the complexity of the gesture. Higher number of states represent the gesture better but at the cost of performance,
0.2 Hidden Markov Model
A hidden Markov model is a collection of nite states connected by transitions. Each state is characterized by two sets of probabilities: a transition probability, and either a discrete output probability distribution A Hidden Markov models is dened using : Q - sequence of hidden states O - sequence of observed states S - input symbol set V - observation symbol set Qo - initial state of system A - Transition matrix B - Emission matrix For each class we create a hidden Markov model.
3|9
The task of gesture recognition is given a gesture we need to decide if it belongs to known class of gesture.Thus a mathematical representation of gestures is required and a means to compare two gestures is required. Consider the case of 2D gesture. A point of 2D gesture is represented by co-ordinates (x,y) The gesture can be represented as spatio-temporal sequence of 2D points. The rst step is to record the gesture . Thus we record some examples of following two type of gesture.
0.3.2 Normalizing Gestures
Gestures representation is required to be independent of scale and position in 2D space it is performed. Thus we need to normalize the gesture . All the gesture need to be scaled to the same size and aligned with each other and represented using same number of points First scaling is performed followed by translation and then re-sampling To scale the points calculate the minimum and maximum values in each dimension and apply linear scaling of points. To center about the origin the centroid of the points are calculated and all the points are translated wrt to the centroid.
4|9
Below are plots after normalization steps ,the gestures are normalized
(a) gesture 1
(b) gesture 2
Figure 1: Normalized gesture plots
The next step is to resample so that all gestures are represented by same number of points. Simplest re-sampling strategy is of uniform re-sampling. below are example gestures after re-sampling
(a) gesture 1
(b) gesture 2
Figure 2: Normalized gesture plots
We require a mathematical representation of gesture.Consider the ideal gesture than is required to be identied. Each discrete pixel value can be represented as a symbol and thus gesture can be represented as sequence of symbols however the number of symbols in this case will be 1600 for each pixel representation. And since gesture will not occupy all the pixels the representation will be typically sparse and have high redundancy Calculations for such a large representation will be huge and not suitable for real time application. Since gesture is represented by 30 points we required only 30 symbols to represent a particular gestures Thus it is desirable to represent the gesture reduced set of symbols. Even 30 symbols may be too large to achieve real time performance ,if we reduces the gesture set
5|9
(a) gesture 1
(b) gesture 2
Figure 3: Result after Clustering
Each centroid can be though of a observed state . Each data point is represebted centroid/symbols.We have discretized the continous data into discrete once. Thus gesture is dened as sequence of the observed states of the gesture we have estimated from the training data Each input data point is then associated with a observed state after the discretization process and gesture is presented as sequence of observed states. During the training process the model parameters are determined which maximize the observed sequence. The parameters to be decided are number of input states,number of output states. Based on the number of output state the gestures are discretized as sequence of ob-
6|9
P1 = P2 = 1.0000 0.0000 0.0000 0.0000 we compute the likelihood of validation/training data. we determine the average thresholds for each gesture as twice the average likelihood For present data we obtain the likelihood as : TA = 79.4310 and TB = 60.5704. Now given test gesture likelihood are computed against the HMM For the rst model likelihood of 4 test data are : 42.296347, 142.296347, 30.364364, 70.747603 This third gesture is incorrectly identied as being generated by the model.
7|9
(a) gesture 1
(b) gesture 1
(c) gesture 1
(d) gesture 1
Figure 4: Test Data
Different number of samples,input states and observed symbols were tried outcome and reasonable parameters of model. The nal conguration used was : The number of samples of 120 were chosen for representing the gesture. Number if input and output symbols were chosen to be 20. The thresholds in this case were 405.235169and 389.913413 the values of the 4 gesture for model 1 were 212.277444, In f , 373.451343, 325.134281 the values of the 4 gestures for model 2 were 560.252356, 210.214212, 717.249978, 439.386705 In this simple model also though respective gestures are identied correctly some incorrect gesture keep on occuring inspite of using a threshold based approach to eliminate the gestures with lower likelihood. The gesure that were incorrectly identied they form part of the gesture model 1. A template based approach gives better performance for such gestures. Need to check if on how to improve the system to that likelyhood . This aspect need to be improved is HMM are to be used in real time system where a lot of invalid gesture would occur and the model must be capable of eliminating such gestures.
8|9
1. http://cs229.stanford.edu/section/cs229-hmm.pdf
2. http://www.creativedistraction.com/demos/gesture-recognition-kinect-with-hidden-marko
9|9