Anda di halaman 1dari 6

Hand Gesture based Interface for Aiding Visually Impaired

Meenakshi Panwar Centre for Development of Advanced Computing Noida, Uttar Pradesh , India
AbstractHand gesture recognition is a growing and very vast field of research. Numerous work have been done and a lot of work still remains to be done for providing a intuitive, innovative and natural way of non verbal communication, which is more familiar to human beings. Gesture Recognition is widely used in sign language, alternative computer interfaces, Immersive game technology etc. The aim of this paper is to present a system for hand gesture recognition to provide a interface for aiding visually impaired users on the basis of detection of some useful shape based features like orientation, area, centroid, extrema , location, presence of fingers and thumb in image. The approach discussed in this paper solely depends on the shape of the hand gesture. It does not comprise color or texture of the image, which are variant to different light shades and other influences. This approach uses some pre-processing steps for removal of background noise and employs K-means clustering for segmentation of hand object so that only segmented hand objects or cluster is to be processed in terms of shape based features. This unique approach can recognize around 36 different gestures on the bases of 7 bit binary sequence or string generated as a output of this algorithm. The proposed implemented approach has been tested on 360 images, and it gives approximate recognition rate of 94%.One of the great benefits of this algorithm is that it takes only fractional part of a second to recognize the hand gesture which makes it computationally efficient as compare to the other existing approach. The proposed algorithm is simple and independent of user characteristics. And also it does not require any kind of training of data like in HMM or neural network KeywordsImage processing, hand gesture recognition, human computer interaction, K-means clustering, Shape based features detection.

interpreted by gesture recognition system and generate corresponding event, has the potential to provide a unique interface to the computer system. This type of interaction is the heart of immersive virtual environments. With the development of virtual environment, current user interaction approaches with the use of mouse, keyboard and pen are not sufficient. So gesture recognition has now become a very popular research direction in the field of human-computer interaction, sign language and computer vision. If for a second we consider interaction among human beings and remove ourselves from the world of computers, we can quickly realize that we are utilizing a broad range of gestures in our daily life. The gestures vary greatly among cultures and context, but still are intimately used in communication. In fact, it is shown that people even gesticulate as much when they talk on telephone, and unable to see each other as in face to face interaction and communication. This significant use of gestures as a mode of interaction in our daily life motivates the use of gestural interface in this modern era. It is the demand of available advanced technology to recognize, classify and interpret various different hand gestures and use them in a wide range of application through computer vision. B. Related Work Gesture recognition becomes an influencing term in some past decades. There have been many gesture recognition techniques developed for tracking and recognizing various hand gestures. Each one of them has its advantages and drawbacks as well. First is wired technology in which users need to tie up themselves in order to connect or interface with the computer system. In wired technology, the user can not freely move here and there in the room as they are limited by the length of wires to cover the distance which connect with the computer system via wire. One of the instances of wired technology is instrumented gloves also called as electronics gloves or data gloves. An instrumented glove contains some sensors which provide the information related to hand location, orientation etc. These data gloves provide results with high accuracy but they are very expensive to utilize in broad range of application. Data gloves are then replaced by optical markers. These markers project Infra-Red light and reflect this light on screen to provide the information about the exact location of hand or tips/knuckles of fingers wherever the markers are wear on hand. These systems also give good result but require very complex configuration. Then some advanced techniques introduced like Image based techniques which



A. Motivation Gesture and Gesture recognitions system provide us with the large scope of innovations. Gestures are basically the physical action form by the person in order to convey some meaningful information. Gesture recognition system is thus created to provide these gestures a unique tag of interpretation after recognition and classification to form a intuitive and more convenient way of interaction. There is a great emphasis on using hand gesture as a latest input modality in various applications of computers through the use of computer vision. The hand gesture which represents ideas and actions using different hand shapes, orientation or finger patterns being

978-1-4673-0255-5/12/$31.00 c 2012 IEEE


require processing of image features like color, texture etc. If we work with color texture features of the image for hand gesture recognition the result may vary and would be different as skin tones changes from person to person and from one continent to other. And also under different illumination condition, color texture gets modified and leading to changes in observed results. So for adopting another alternative for the same purpose, we reach to employing different shape based features for hand gesture recognition. This is a universal truth that under normal condition every person poses almost the same hand shape with one thumb and four fingers. The approach discussed in paper [1] for hand gesture recognition based on shape features heavily depends on some constraints like location of hand in image for orientation detection, proper gap between each finger, etc. if any user fails to follow these constraints then results may degrade . In paper [2], the approach is based on calculation of three combined features of hand shape which are compactness, area and radial distance. Compactness is the ratio of squared perimeter to area of the shape. If compactness of two hand shapes are equal then they would be classified as same, in this way this approach limits the number of gesture pattern that can be classified using these three shape based descriptors and only 10 different patterns have been recognized[1].The algorithm discussed and implemented in this paper is broadly divided into four steps. First is image pre-processing and segmentation of hand object in the image using k-means clustering. The second step includes orientation detection, which performed in order to categorize images into vertical and horizontal class. In the third step it calculates some essential features required for hand pattern detection and for generating the 7 bit sequence for 36 different hand shapes. Finally, these generated bits are used for assigning different actions to various hand gestures. This proposed approach is designed and implemented for working on single hand gesture with uniform background. II. THE IMPLEMENTED ALGORITHM

the sum cannot be decreased further. The result of K-means clustering is a set of clusters that are compacted in their own and well separated from other clusters. Since our images have uniform plain background and only one hand object, we require only two clusters to be formed, one which represents the hand object and other for denoting background of the image. Cluster 1 which represents hand, has all pixel values set as 1 and cluster which represents background has 0 intensity pixels. For reducing the background noise, we remove all small insignificant smudges or connected components from the image and fill the holes that cannot be reached by filling in the background from the edge of the image. After hand segmentation we need to find out the boundary contours to locate the hand in the image. It is performed by scanning the image from top to bottom and left to right, the first white pixel is encountered is set as left most point of hand. Then start scanning from right to left in top to bottom manner and the first white pixel thus found is set as right side of the hand. Now we got the vertical bounds of hand in image .Within these vertical bounds perform a horizontal scan from left to right and top to bottom. The first white pixel coming upon is fixed as topmost point of the hand. The hand extends from the bottommost part of the image, so no scanning is needed to locate the end of hand. Input image

Preprocessing and Segmentation

Orientation detection Centroid, thumb, finger region detection

Features extraction

The flowchart of the algorithm is shown in Fig. 1 and its main steps are discussed in the following sections. A. Image Segmentation Image Pre-processing is necessary for getting good results. In this algorithm, we take the RGB image as input image. Image segmentation is typically performed to locate the hand object and boundaries in image. It assigns label to every pixel in image such that the pixels which share certain visual characteristics will have the same label. For the purpose of hand object segmentation from rest of the image we employ K-means algorithm. The K-means clustering algorithm is an iterative technique which is used to partition the image in to K clusters. It detects a partition in which objects fall within the same cluster are as close to each other as possible and as far as possible from objects in other clusters. K-mean computes centroid of each cluster in order to minimize the sum of distances from each object to its cluster. K-means iteratively minimizes the sum of distances from each object to its cluster centroid. The algorithm moves objects among clusters until

Classification and bits generation

Hand gesture recognition

Figure 1: The flowchart of the implemented algorithm


Orientation Detection After the segmentation of hand in the image, we proceed to the second step for orientation detection. We process two types of images one is horizontal another is vertical. So in this step,

Figure 2: Input image, Cluster image, Enhanced image and localized hand object

2012 International Conference on Recent Advances in Computing and Software Systems


mainly we identify whether the hand is vertical or horizontal. We compute length and width of bounding box with an assumption that if the hand is vertical then length of the bounding box is greater than the width of bounding box and if width of bounding box is greater than the length of bounding box then the image contains horizontal hand. We compute the ratio of length to width of the bounding box if it is greater than 0.9 then it is vertical otherwise horizontal. In this way we categorise the two categories of hand patterns, horizontal and vertical. Our previous approach described in paper [1] for orientation detection was limited by the constraint that the hand object should not touch the corner of the image it should lie somewhere near the middle. Our previous approach depends on tracing of boundary matrices or edges of hand in binary image. For horizontal hand, whenever we get x-boundary is equal to 1 along with the increasing value of y-boundary for some time span we classify it as horizontal hand and if we get y-boundary is equal to maximum of size of image with increasing value of x-boundary, it is set as vertical hand. If any vertical image touch the left corner of the image then it will show x=1 and y=max which in turns gives an error or unexpected result. So to overcome from this limitation we introduced the current approach for orientation detection of hand in image. For better result, hand wrist part should cover some portion in bounding box.


Where Mij is image moment, I(x, y) is the intensity at coordinate (x, y). { x y } = {M10/M00,M01/M00} By using equation (2), we compute coordinates of centroid. image. (2)


y are the coordinate of centroid and M00 is the area for binary

Figure 4: Centroid of image

Figure 3: Horizontal image and Vertical image


Features extraction

1) Centroid: In this step, we calculate the centroid for differentiating various hand pattern based on index finger and little finger. By assuming Index finger and little finger position as distinct from location of the centroid in the hand image we can differentiate many hand gestures. The partition of the hand is very important. Hand gesture patterns which contain the index finger will fall in the left hand side of the centroid. And other hand patterns which contain little finger will find in the right side of the centroid location in image. Hand patterns which contain both Index and little finger need to satisfy both the conditions of centroid location for generating unique 7 bit sequence for recogniton. Centroid always computed at the geometric center of the image and it is also called as center of mass if the image is uniformly distributed. Centroid is calculated using the image moment, which is the weighted average of pixels intensities of the image. The centroid is calculated by first calculating the image moment using this formula [1] Mij= xi yj I(x, y) (1)

2) Thumb detection: Thumb detection step is performed to detect the presence or absence of thumb in hand gesture recognition.We know that the thumb can either reside at left side of the hand or at right side of the hand in general . So we consider thumb as a significant shape feature to classify vaious hand gestures.To solve this problem, we process the previously calculated bounding box and divided this box in left side and right side. By taking 30 pixels as a width from each side of the bounding box we crop this bounding box in two region. So one is the left box represented by green boundary and the other is right box represented by blue boundaries in the image shown below. After getting these two boxes, we count the total number of white pixels presents in binary image which also represents the hand object in image. We count the number of white pixels in each boxes. If less than 6.9% of total white pixels exist in any of the left box or right box, we consider that the thumb is present in that box. If both boxes have less than 6.9% of total number of white pixels exist in image then thumb is not present in any of the box because thumb is only one and it can not be found at both sides for the same hand shape pattern.And if both boxes having more than 6.9 percent of total white pixels in the image, then thumb is not present in any of the box. The percentage we take as 6.9 is chosen experimentaly after testing more than 400 images. This method is applicable to both categories of hand horizontal as well as for vertical. We should take care of proper hand orientation for getting correct results.The results will be highly influenced by variation in orientation. Fig. 5 shows the partition of bounding box in two boxes represented by green and blue box. In Fig. 5(b), it detects thumb at the left hand side and in the green box in which the percentage of white pixels is counted less than 6.9% of total white pixels.


2012 International Conference on Recent Advances in Computing and Software Systems

(a) (b) Figure 5: Thumb detection

3) Finger region detection: For getting the total number of finger raised in hand pattern, we need to process only some upper portion of the hand,which includes all raised fingers.To proceed this task we consider only the 25 % of the upper portion of bounding box for one set of gesture and 18 % for another set of the hand gesture.We compute extreme points which fall within this area of bounding box to know the finger count. Extrema build 8-by-2 matrix that specifies the extrema points in the selected region. Each row of the Extrema matrix contains x- and y-coordinates of one of the extreme point. The format of this vector is [top-left top-right right-top rightbottom bottom-right bottom-left left-bottom left-top].Extrema found in this area represents the knuckle of fingers. It can be possible that some insignificant extrema found in this region which do not represent raised finger but shows the part of folded finger exist in the image. So to remove these unwanted extrema, we put a thershold on the area that should be ignored if it resides in the target region that need to be processed for finger count. This threshold may vary from 10px to 30px depending on the type of gesture we are taking. After getting the number of fingers or extrema present in the selected region,we need to put certain condition on these extreme points to know the actual pattern of hand gesture. Two or more patterns having the same number of extrema or finger count is differentiated by using location of centroid from cordinates of extrema. In this way the resulted 7-bit sequence becomes different for various hand gesture with same number of fingers. Fig. 6 shows the target region which is processed in order to calculate the total number of fingers present in hand gesture. As shown in Fig. 6 below, there is three extrema found in this hand pattern which shows three fingers raised we use only three extreme points out of eight extreme points, mentioned above, for generating bit sequence.

D. Classification and bits generation Extreme point and centroid position with respect to each other needed to be processed in order to generate unique 7 bit binary sequence for 36 different patterns. First bit of 7 bit sequence represent the type of orientation, if the hand is vertical first bit will be 1 otherwise 0. Second bit is allotted for thumb presence, if thumb is present in the hand gesture then it is set as 1 otherwise 0. Next three bits are assigned for total number of raised fingers in hand such that if 1 finger is raised then it will be represented by its 3 bit binary representation and coded as 001, if two fingers are raised then it will be coded as 010 which is the binary representation of 2. As we know a hand gesture can have maximum four fingers raised in image so these three bits can have the maximum bit pattern as 100. Last two bits of 7 bit sequence are very important because they differentiate among the hand patterns which have equal number of fingers. These two bits set as 1 or 0 by tracing the location of extreme points of each extrema and centroid. For example we have three different patterns for hand gesture with two fingers raised, thumb and with same orientation. In this case, only the last two bits will differentiate these hand gesture. Consider Fig. 7(a), in this hand pattern we got two extrema (finger), one for index finger and another for middle finger. Now we compare the location of first extreme point [Top-left] of first extrema with the location of centroid and other extrema. If x coordinate of Top-Left extreme point of first extrema is lesser than the x coordinate of centroid and y coordinate of Top-right extreme point of second extrema is lesser then the y coordinate of Top-right extreme point of first Extrema, then last bit will be set as 00. In Fig. 7(b) we trace the location of both extrema with respect to centroid, if first extrema falls at left side and second extrema falls at right side of centroid then last two bits will set as 01. In Fig. 7(c) we detect the location of both extrema with respect to centroid as if x coordinate of Top-right extreme points of both extrema are greater than the x coordinate of centroid then last two bits will be10. As shown in Fig. 7, these three patterns have got different bit sequence just because of last two bits in 7 bit sequence. Similarly, we generate bit code for several hand gesture patterns with same number of fingers but different shapes. These seven bits code thus generated is used for classifying and assigning various actions for supporting human computer interaction.

Figure 6: Extreme points and target region

(a) 1001000

(b) 1001001

(c) 1001010

Figure 7: Hand gesture with bits code

2012 International Conference on Recent Advances in Computing and Software Systems




We have applied the above discussed algorithm and tested 360 images with 36 different patterns. By using effective shape based features and orientation, we can recognize and classify 36 different hand gesture patterns. On the basis of generated bit sequences we can assign different tasks to support human computer interaction or sign language. Table I provides us with the experimental data result which shows the input gestures along with the feature extractions like centroid, finger region, finger count and corresponding bits generated. These generated bit sequence would always be unique based on their orientation, presence of thumb and pattern of hand gesture. However, after the orientation of hand gesture gets detected, the algorithm rotate the image to 90 degree if it is found to be horizontal image. All further processing steps for generating bits and recognition of hand gesture will be applied to same orientation of hand gesture that is vertical. It is done to minimize the time of processing. Table II shows the result of 360 images. Out of 360 images tested through the algorithm, it has correctly recognized 339 images and falsely identified the remaining 21 cases. At an average it gives the success rate of 94% approximately with average computation time of 0.60 second. The algorithm is based on simple shape based feature calculation which provides us with the comfort of implementation. The algorithm discussed above is implemented on MATLAB.

Here, in table I, gestures and their corresponding coded bits are plotted using centroid as a reference point for pattern recognition. If we divide the hand in to two regions using centroid, then it will partitioned the hand region in to the finger region and non-finger region, so for separating significant finger region from non significant finger region we took only 25% to 18% of upper portion of segmented hand region depending on the type of gesture pattern. As shown in Table I yellow box represents the significant finger region which consists of various extreme points of fingers for detection of hand gesture pattern. Fig. 8 shows all the sample hand gestures.


Features Extraction

Coded bits
1 2 3 4 5 6



























Figure 8: Sample hand gestures


2012 International Conference on Recent Advances in Computing and Software Systems

TABLE II Hand Gesture Recognition Result

Gesture 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 All

Input Image 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 360 IV.

Successful Cases 10 10 10 09 10 10 10 10 10 09 10 10 10 10 09 09 10 09 10 10 09 09 10 10 10 10 10 09 10 09 10 10 10 09 10 09 339

Recognition Rate 100 100 100 90 100 100 100 100 100 90 100 100 100 100 90 90 100 90 100 100 90 90 100 100 100 100 100 90 100 90 100 100 100 90 100 90 94.16%

Elapsed Time 0.72s 0.56s 0.74s 0.90s 0.57s 0.41s 0.45s 0.43s 0.65s 0.76s 0.56s 0.84s 0.41s 0.46s 0.50s 0.87s 0.72s 0.45s 0.74s 0.67s 0.53s 0.64s 0.82s 0.54s 0.70s 0.43s 0.63s 0.66s 0.79s 0.51s 0.46s 0.52s 0.64s 0.52s 0.54s 0.61s 0.609s

documents like MS Office, notepad etc. Moreover, almost all deaf and dumb people communicate with each other by forming several hand shapes. Similarly, a visually impaired person would be able to work on computer through computer vision. The strength of this approach lies in the ease of implementation, as it does not require any significant amount of training or post processing and it provides us with the higher recognition rate with minimum computation time. The weakness of this method is, we define certain parameters and threshold values experimentally since it does not follow any systematic approach for gesture recognition, and maximum parameters taken in this approach are based on the assumption made after testing a number of images. If we compare our approach with our previous approach described in paper [1], the success rate has improved from 92.3% to 94%, the computation time decreased up to fraction of seconds. And we have removed some of the constraints needed to be followed in our previous approach which makes it simpler. In future the focus would be on improving the system by including some different backgrounds while enlarging the data set .
TABLE III Comparison table for gestures and their corresponding Key Press events




[1] Meenakshi Panwar and Pawan Singh Mehra , Hand Gesture Recognition for Human Computer Interaction, in Proceedings of IEEE International Conference on Image Information Processing(ICIIP 2011), Waknaghat, India, November 2011. [2] Amornched Jinda-apiraksa, Warong Pongstiensak, and Toshiaki Kondo, A Simple Shape-Based Approach to Hand Gesture Recognition, in Proceedings of IEEE International Conference on Electrical Engineering/Electronics Computer Telecommunications and Information Technology (ECTI-CON), Pathumthani, Thailand , pages 851-855, May 2010 [3] A. Jinda-Apiraksa, W. Pongstiensak, and T. Kondo, Shape-Based Finger Pattern Recognition using Compactness and Radial Distance, The 3rd International Conference on Embedded Systems and Intelligent Technology(ICESIT 2010), Chiang Mai, Thailand, February 2010. [4] Rajeshree Rokade , Dharmpal Doye, Manesh Kokare, Hand Gesture Recognition by Thinning Method, in Proceedings of IEEE International Conference on Digital Image Processing (ICDIP), Nanded India, pages 284 287, March 2009.


This section introduces the application of this hand gesture recognition system. As shown in Table III, the sample hand gesture gets converted in to corresponding key press events we have assigned. This is done by using Abstract Window Toolkit. V.

We proposed a simple yet powerful shape based approach for hand gesture recognition. Visually impaired people can make use of hand gestures for writing text on electronic

2012 International Conference on Recent Advances in Computing and Software Systems