Anda di halaman 1dari 8



Assoc Prof. Department of ECE, DBIT, Mysore Road, Bangalore-560074, India

ABSTRACT: Video surveillance systems are mainly used to detect and respond to crime,
suspicious activities and for ensuring safety. Automated human behavior recognition is a very
important step in Video surveillance system. It involves detection of humans, tracking, identifying
and recognizing the activities of humans in real time. It is one of the most interesting research
topics in computer vision This Paper provides a review on current state of art image processing
methods for automated Human Behavior recognition in video surveillance systems. Various
databases used for behavior detection is also provided.

KEYWORDS: Video surveillance, Suspicious Activities, Behavior Recognition


Video surveillance is used for the remote observation of locations using video cameras. The video
camera acquires the scene data and is transmitted to another location to be monitored by a human,
analyzed by a computer or stored for later observation or analysis.

In manual surveillance systems a person will be continuously monitoring the activities in an area
which is under surveillance. This method is not effective as the person cant concentrate on the
scene for long hours continuously. Therefore there is a need for automated surveillance system.

The Automated surveillance systems acquire the data from the scene and automatically analyses
the scene for any abnormal activities or suspicious events. The aim of an automated video
surveillance system is to replace cameras in place of human eyes, and also to accomplish the entire
surveillance task as automatically as possible.

The different steps of an Automated video surveillance system involve Moving object detection
Object classification Human identification & recognition, Tracking, behavior analysis and event

Human behavior detection in video surveillance systems is gaining importance as it is helpful in

many applications like medical, safety & security, monitoring elderly people etc. Current video
surveillance systems can detect and track a moving object in real time but automatic analysis of
the scene is yet to come.

The block diagram of human behavior recognition in video surveillance systems is as shown in Fig
Moving Classification Human Identification and Tracking Behavior
Object recognition Analysis


Fig 1: Block Diagram of Human Behavior Recognition System

Moving object detection and classification of moving objects are the core technologies that gives
the input to higher level analysis which are termed as low level processing, Human Tracking as the
mid level processing and Action recognition and as the high level process.

This paper provides a comprehensive survey on the recent developments in the field of human
behavior recognition. Section II gives a review on object detection. Section III gives the methods
available in literature for object Classification. Section IV gives the review on object tracking
methods. Section V discusses the different types of behaviors, challenges in behavior
understanding and the available methods. Section VI gives the review on the Person identification
methods. Section VII gives the comparison of database available for performance evaluation
.Section. Conclusion and future work is discussed in section VIII


Moving Object detection deals with identifying the moving objects in the scene and segmenting
them from the rest of the image. A Brief survey of different object detection methods is as shown
in Table 1.


After finding moving regions or objects in an image, the next step in the behavior-recognition
process is object classification. Object classification could distinguish interesting motion from
those caused by moving clouds, specular reflections, swaying trees, or other dynamic occurrences
common in transit videos. It is important to note here that there are multiple possible
representations of objects before and after classification. Common geometric or topological
properties used include height/width ratio, fill ratio, perimeter, area, compactness, convex hull, and
histogram projection In general, for object classification in surveillance video; there are shape-
based, motion-based, and feature-based classification methods.

Shape-Based Classification: The geometry of the extracted regions (boxes, silhouettes, blobs)
containing motion are often used to classify objects in video surveillance. Some common
classifications in transit system surveillance are humans, crowds, vehicles, and clutter (R.T
Collins, A.J.Lipton, T.Kanade, 2000).

Motion-Based Classification: This classification method is based on the idea that object motion
characteristics and patterns are unique enough to distinguish between objects. Humans have been
shown to have distinct types of motion. Motion can be used to recognize types of human
movements such as walking, running, or skipping, as well as for human identification.

Table 1: Object detection methods
Object Detection
Using a Moving
Camera under Robost Indoor and outdoor
Sudden SHAKERI background video
Illumination Moein 1, subtraction sequences from
1 Change 2013 ZHANG Hong 2 method different datasets
An Adaptive
Threshold Method
of Temporal
Difference for
Detection of
Moving Object and Temporal
Recognition using Saurabh difference Complex images-
Hidden Markov Dumbre,Mrudul method, HMM, changing
2 Model 2014 Arkadi DWT environments
features are Coding is done in
extracted using VHDL and
A Wavelet Based wavelets, synthesis is done
Feature Extraction machine in Xilinx
and Detection of Akshatha.N, learning ISE series
3 Abandoned Objects 2014 Dr.Vivek M algorithm
Hamed Kiani
Correlation Filters Galoogahi, Terence Correlation Accuracy and
with Limited Sim, and Simon filters, Fourier Computational
4 Boundaries 2014 Lucey domain efficiency
Object Detection
using Wavelet
Transform Method
Three-way Aju John, Wavelet
Decomposed transform,image
5 Images 2014 Dr.V.R.Vijaykumar fusion Turbulent medium
Moving Object
Detection and Jinhai Xiang,1,2 Moving objects
Shadow Removing Heng Fan,2 without cast
under Honghong Liao Local intensity shadow , deals
Changing ,1 Jun Xu,3 ration model, with various
Illumination Weiping Sun,1 and Gaussian illumination
6 Condition 2014 Shengsheng Yu1 mixture model conditions

Starting with the Human ID Gait Challenge(S Sarkar,P.J Phillips 2005), image processing
researchers actively proposed gait-based methods (Y-B Li,T-X Jiang, Z.H Qiao,and H J Qian
2007) for human identification at a distance.

Other Classification Methods: Skin color (S.H Kim and H G Kim, 2006) has proved to be an
important feature that can be used for the classification of humans in video, as it is relatively
robust to changes in illumination, viewpoint, scale, shading, and occlusion. Skin color has also
successfully been combined with other descriptors (S Harasse, L Bonnaud 2006)for classification


Object tracking is a method of following an object through successive frames to determine how it
is moving relative to other objects. It is the method of estimating the trajectory y of an object in the
image plane as it moves around the scene. The task of target tracking is a key component of video
surveillance and monitoring systems. It provides input to high level processing such as analysis
and classification of human activities. The Different Types of Object tracking are:

Point Tracking
Point tracking is robust, reliable and accurate method generally used to track vehicles. This
approach requires good level of fitness of detected object. This method require deterministic or
probabilistic methods (In Su Kim, Hong Seok Choi, Kwang Moo Yi, Jin Young Choi, and Seong
G. Kong. 2010) Object is tracked is based on point which is represented in detected object in
consecutive frames and association of the points is based on the previous object state which can
include object position and motion. This approach requires an external mechanism to detect the
objects in every frame.

Kernel Tracking
In this approach kernel require shape and appearance of the object (Yilmaz, A., Javed, O., and
Shah, M. 2006). In this approach any feature of object is used to track object as kernel like
rectangular template or an elliptic shape with an associated histogram. After computing the motion
of the kernel between consecutive frames object can be tracked. In (Elgammal, A. Duraiswami,
R.,Hairwood, D., Anddavis, L.2002) Mean-shift tracking is based on the kernel tracking method
used. In this method E-kernel is used. It represents histogram feature based by spatial masking
with an isotropic kernel.

Silhouette Tracking
In this approach Silhouette is extracted from detected object. By shape matching or contour
evolution silhouettes are tracked either by calculating object region in consecutive frame tracking
is done. Silhouette tracking methods make use of the information stored inside the object region
(C. Stauffer and E. Grimson, 2000)This information of the region can be appearance density and
shape models. Given the object models, Tracking of the object is based on the features, requires
selecting the right features, which plays a critical role in tracking. In general, the features uses for
tracking must be unique so that the objects can be easily distinguished in the feature space.
Following various features are used for object tracking

The apparent color of an object is influenced primarily by two physical factors, first is the spectral
power distribution of the illuminant and second is the surface reflectance properties of the object
(Christopher R. Wren, Ali J. Azarbayejani, Trevor Darrell, and Alex P.Pentland,1997) In image
processing, the RGB (red, green, blue) color space is usually used to represent color.

Object boundaries usually generate strong changes in image intensities. Edge detection is used to
identify these changes. An important property of edges is that they are less sensitive to
illumination changes compared to color features.

The Center of mass (centroid) is vector of 1-by-n dimensions in length that specifies the center
point of a region. For each point it is worth mentioning that the first element of the centroid is the
horizontal coordinate (or x-coordinate) of the center of mass, and the second element is the vertical
coordinate (or y-coordinate) (Elgammal, A.,Duraiswami, R.,Harwood, D., Anddavis, L.2002).

Texture is used for classification as well as tracking purpose. This feature is used to identify region
or object in which we are interested. It is a measurement of the intensity variation of a surface
which quantifies properties such as smoothness and regularity . Compared to color, texture
requires a processing step to generate the descriptors. Among all features color and texture features
are widely used to track the object. Color bands are sensitive to illumination variation


Timothy John A. Chua, Andrew Jonathan W. Co(2007), discusses the Real-Time Event
Detection System for Intelligent Video Surveillance. The monitoring of CCTV cameras is heavily
dependent on the efficiency of security personnel which leaves a lot to be desired when 50-100live
feeds are to be simultaneously monitored for extended periods of time The more expensive
solution is the addition of alarms to every location under surveillance. A more prudent approach,
which this paper proposes, is to extend the functionality of the already available CCTV camera by
allowing a computer to analyze the live feed using digital signal processing techniques. A proof-
of-concept Real-time Event Detection Automated Surveillance System is presented here. The
system takes a live feed from an analog camera via an encoder connected to the USB port of a
computer. Various image processing techniques are then applied to the fames in order to separate
the foreground from the background. Useful information is subsequently extracted from the
foreground and is used in identifying objects of interest, establishing object correspondence across
consecutive frames and in analyzing the behavior of the object itself. Detected alarm events are
noted by means of a log, which is visible through the use interface of the system.

A Review on Vision Techniques applied to Human Behavior Analysis for Ambient Assisted
Living is proposed in (Alexandros Andre Chaaraoui, Pau Climent-P_erez, FranciscoFl_orez-
Revuelta 2012).Human Behavior Analysis (HBA) is more and more being of interest for Computer
Vision and Artificial Intelligence researchers. Its main application areas, like Video Surveillance
and Ambient{Assisted Living (AAL), have been in great demand in recent years. This paper
provides a review on HBA for AAL and ageing in place purposes focusing specially on vision
techniques. First, a clearly defined taxonomy is presented in order to classify the reviewed works,
which are consequently presented following a bottom-up abstraction and complexity order. At the
motion level, pose and gaze estimation as well as basic human movement recognition are covered.
Next, the mainly used action and activity recognition approaches are presented with examples of
recent research works. Increasing the degree of semantics and the time interval involved in the
HBA, finally the behavior level is reached. Furthermore, useful tools and datasets are analyzed in
order to provide help for initiating projects.

M.Vrigkas, C. Nikou and I. A. Kakadiaris, (2014), discusses the Silhouette-based Human
Action Recognition using Sequences of Key Poses. In this paper, a human action recognition
method is presented in which pose representation is based on the contour points of the human
silhouette and actions are learned by making use of sequences of multi-view key poses. Here the
contribution is two-fold. The first approach achieves state-of-the-art success rates without
compromising the speed of the recognition process and therefore showing suitability for online
recognition and real-time scenarios. Secondly, dissimilarities among different actors performing
the same action are handled by taking into account variations in shape (shifting the test data to the
known domain of key poses) and speed (considering inconsistent time scales in the classification).

Gesture Recognition of Human Behavior using Multimodal Approach is presented in [6].A gesture
is a form of non verbal or non vocal communication in which visible bodily actions communicate
particular messages. Gestures include movement of the face, hands or other parts of the body.
Gestures allow persons to speak a variety of feelings and thoughts, from condescension and
hostility to approval and affection, often together with body language in addition to words when
they speak. An emotion plays a crucial role in person to person interaction. In recent years, there
has been a growing interest in improving all aspects of interaction between humans and computers.
Human expression is used to interact with the computers. This paper explores a ways of human-
computer interaction that enable the computer to be more aware of the users emotional


In most of the video surveillance systems, person identification is achieved by motion analysis and
matching, such as gait, gesture, posture analysis and comparison.

Biometric is the unique features of a person. Biometric recognition of the individual based on
individual based on feature vectors derived from their physiological and/or behavioral
characteristic. Biometric features are of two types physiological and behavioral. Physiological
characteristics are face, fingerprints, iris, palm print, DNA etc and behavioral characteristics are
voice and gait. The physiological characteristics do not provide good results in low resolution and
need user cooperation therefore recognition using gait is more attractive. Recognition using gait
means to identify a person by the way he walk or move. Depending on the feature extraction, gait
recognition methods are classified as appearance based approaches and model based approaches.
The appearance based approaches suffer from changes in appearance owing to the change of
viewing or walking directions. Model based approaches extract the motion of the human body by
means of fitting their models to the input images. Model based methods are view and scale
invariant. The basic gait recognition system is as shown in the Fig 4

Camera Preprocessing Feature Recognition Data base of

extraction face/gait

Fig 2: Gait Based recognition system


The datasets normally used for Behavior Recognition are KTH, Weizmann, HOHAI, PETS etc
In KTH Data set the number of event types is 6, Average number of samples per class is 100,
Maximum Resolution is 160 X 6, Height of human in pixels is 80100 and there is no camera

In Weizmann Data set the number of event types is 10, Average number of samples per class is 9,
Maximum Resolution is 180 X 44, Height of human in pixels is 6070 and there is no camera

In HOHAI Data set the number of event types is 8, Average number of samples per class is 85,
Maximum Resolution is 540 X 240, Height of human in pixels is 1001200 and with varying
camera position

In PETS Data set the number of event types is 3, Maximum Resolution is 768 X 576, Height of
human in pixels is 20 and with varying camera position


This paper deals with general processing framework of human activity recognition systems, and
discusses the recent development explored for various stages of the system. The state-of-the-art of
existing methods in each key issue is described and the focus is on three major tasks: detection,
tracking, and activity recognition or behavior understanding. We have also discussed the publicly
available datasets built for the uniform testing of methodology proposed by authors. We have
provided a brief description of datasets and characteristics comparison among them


SHAKERI Moein , ZHANG Hong (July 26-27,2013), Object Detection Using a Moving camera
under Sudden Illumination Change, Procedings of 32 Chinese control conference
Saurabh Dumbre,Mrudul Arkadi (2014) An Adaptive Threshold Method of Temporal Difference
for Detection of Moving Object andRecognition using Hidden Markov Model International
Journal of Latest Trends in Engineering and Technology (IJLTET)
Akshatha.N, Dr.Vivek M, (April 2014) A Wavelet Based Feature Extraction and Detection of
Abandoned Objects, International Journal of Ethics in Engineering & Management
Education, ISSN: 2348-4748
Hamed Kiani Galoogahi, Terence Sim, and Simon Lucey,( 2014) Correlation Filters with
Limited boundaries, arXiv:1403.7876v1 [cs.CV]
Aju John, Dr. V.R.Vijaykumar, Object Detection using Wavelet Transform Method for Three-
way Decomposed Images
Jinhai Xiang, Heng Fan, Honghong Liao, Jun Xu, Weiping Sun, and Shengsheng Yu, ( 2014)
Moving Object Detection and Shadow Removing under Changing Illumination Condition,
Hindawi Publishing Corporation Mathematical Problems in Engineering
R.T Collins, A.J.Lipton,T.Kanade,(2000)A System for Video Surveillance and
Monitoring,Carnegie Mellon Univ,Pittsburgh,PA,Tech.Rep.CMU-RI-TR-00-12

S Sarkar,P.J Phillips (Feb 2005) The humanID gait challenge problem: Data Sets, Performances
and analysis,IEEE Trans Pattern Anal.Mach. Intell.. Vol 27,no 2 pp 162-177
Y-B Li,T-X Jiang, Z.H Qiao,and H J Qian (2007) General methods and development actuality of
gait recognition,in Proc.IEEE Int.Conf-wavelet and pattern Recog.
S.H Kim and H G Kim, (2006) Face Detection using multimodal proc Int Conf
Pattern Recog. Vol 1 pp 929-932
S Harasse, L Bonnaud (2006) Human model for people detection in dynamic scenes. In proc Int
Conf Pattern Recognition Vol 1 PP 335-354
In Su Kim, Hong Seok Choi, Kwang Moo Yi, Jin Young Choi, and Seong G. Kong. (2010)
Intelligent Visual Surveillance - A Survey. International Journal of Control, Automation, and
Systems 8(5):926-939
Yilmaz, A., Javed, O., and Shah, M. (2006). Object tracking: A survey. ACM Comput. Surv. 38,
4, Article 13
Elgammal, A. Duraiswami, R.,Hairwood, D., Anddavis, L.(2002). Background and foreground
modeling using nonparametric kernel density estimation for visual surveillance Proceedings of
IEEE 90, 7, 11511163
C. Stauffer and E. Grimson, (2000)Learning patterns of activity using real time tracking, IEEE
Trans. On Pattern Analysis and Machine Intelligence, vol. 22, no. 8, pp. 747-757
Christopher R. Wren, Ali J. Azarbayejani, Trevor Darrell, and Alex P.Pentland,(1997) Pfinder:
Real-Time Tracking of the Human Body in IEEETransactions on Pattern Analysis and
Machine Intelligence, pp. 780-785
Elgammal, A.,Duraiswami, R.,Harwood, D., Anddavis, L.(2002), Background and foreground
modeling using nonparametric kernel density estimation for visual surveillance. Proceedings of
IEEE 90, 7, 11511163.
WuU Z, and Leahy R. (1993). An optimal graph theoretic approach to data clustering: Theory
and its applications to image segmentation. IEEE Trans. Patt. Analy. Mach. Intell.
Timothy John A. Chua, Andrew Jonathan W. Co(2007) Real-Time Event Detection System for
Intelligent Video Surveillance DLSU Engineering e-Journal Vol. 1 No. 2, pp.31-39
Alexandros Andr_e Chaaraoui, Pau Climent-P_erez, FranciscoFl_orez-Revuelta(2012)A Review
on Vision Techniques applied to Human Behaviour Analysis for Ambient{Assisted
LivingExpert Systems with Applications March 7, 2012
Alexandros Andre Chaaraouia,, Pau Climent-Pereza, Francisco (2013)Silhouette-based Human
Action Recognition using Sequences of Key Poses Pattern Recognition Letters
M.Vrigkas, C. Nikou and I. A. Kakadiaris, Classifying Behavioral Attributes Using Conditional
Random Fields, 8th Hellenic Conference on Artificial Intelligence (SETN'14), Lecture Notes in
Computer Science, Vol. 8445, pp. 95-104, Ioannina, Greece, May 15-17 2014.