Anda di halaman 1dari 13
«2 United States Patent Seow et al. 2 0093 17908B (10) Patent No. 4s) Date of Patent: US 9,317,908 B2 Apr. 19, 2016 (54) AUTOMATIC GAIN CONTROL FILTER IN A VIDEO ANALYSIS SYSTEM (71) Applicant: BEHAVIORAL RECOGNITION SYSTEMS, Ine, Houston, TX (US) (72) Inventors: Ming-Jung Seow, Houston, TX (US) ‘Tao Yang, Katy, TX (US); Wesley Kenneth Cobb, The Woodlands, TX ws) (73) Assignee: Behavioral Recognition System, Ine., Houston, TX (US) (4) Notice: Subject to any diselaimer, the tem of this patent is extended or adjusted under 35 USC. 154(b) by 85 days. (21) Appl. Nos 139930222 (22) Filed: Jun, 28,2013 9) Prior Publication Data US 2014000371341 Jan, 2,2014 Related US. Application Data (60) Provisional application No, 61/666,426, fled on Jun. 20, 2012. Gh (2006.01) (2006.01), (2006.01) (2) G67 5/009 2013.01); Go6T s/908 (2013.01); GasT 72046 (2013.01); GO6T 2207/10016 (2013.01): GO6T 2207/0024 (2013.01); G67 2207/20081 (2013.01; GO6T 207/20144 (2013.01) Field of Classification Search None See application ile for complete search history. 6s) 66) References Cited US. PATENT DOCUMENTS 49071 A T1987 Yuasaatal Elinor A $1902 Jaco, Shasr7s A 51998 Twochkawa ota SUsisrs A $998 Chenet al SOO7SS ALOIS Courtney 6282974 BL "62001 Martens ot a "268088 BL 72001 Cri eta (Continved) FOREIGN PATENT DOCUMENTS wo 2009049314 4242009 (OTHER PUBLICATIONS 4. Compl tl. “Deletion and Taeking inthe IBM PoopleVision Sytem" IEEE ICME, ln. 2004: pp. 1, —aig----— ao. set ¥Oss200%d 4 ee ‘LXGINOD iNanoanoe | oN v qv USLSLINSGI At s coz so: PNOLYNSA LoN\\ IN3NOdWOO \ To | “bom | 1 f { — ~ © tr | NaNOgNOD sounos aNONa oS 94/08 ngNI 03014 walnaWoo N ~ Le NWN TS ee US 9,317,908 B2 Sheet 3 of 4 Apr. 19, 2016 U.S, Patent oe €'Sld a \ woe 9 ee eee ee soe 7] root 4 ‘eat. For example, amplification of pixel intensity andior ‘color drifts may’ affect the video analysis system's ability t0 ‘correctly distinguish between pixels of an image associated ‘with foreground abjcts and beckgrovnd pixels of the image. In some convention video analysis syStems, the autos ‘eature is simply turned off. Other video analysis systems attempt to corte for autouain by, for example, maintaining ‘color constancy or modeling a specific cameras response ‘during autogain and compensating forthe response. These solutions tend to only work for specific eameras (or eamera "ypes) and scenes, SUMMARY OP THE INVENTION ‘One embodiment of the invention provides & method for ‘analyzing a seene captured By a video camera or other recoedad video. The method includes extracting foreground patches froma video frame usinga background model image, the foreground patches each including respective loreground pixels, The method also includes, for each foregronnd piel (1) determining a texte of first area inciding the fore- round pixel and pixels surrounding the foreground! pixel and ‘texture ofa second atea including pixels ofthe background model image corresponding to the pixels of the foreground area; and (2) determining a correlation score based on the texture of the ist area a the texture of the second see. 0 o 2 addition, the method inchides, for foreground pixels whose ‘orrelation scores exceed @ threshold, removing the fore ground pixels from the foreground patches in which the fore- ‘around pines le (Other embodiments include a computer readable media ‘hat ineIndes insteuctions that enable a processing unit +0 ‘implement onc ormore embovdiments ofthe disclosed method as well as a system configured to implement one or more embodiments of the disclosed method. BRIEF DESCRIPTION OP THE DRAWINGS. nner in which the above recited features, jects ofthe present invention are atained ‘and can be understood in detail, a more particular desription ofthe invention, briefly summarized above, may be had by reference to the embodiments ilustated in the appended drawings. Tt isto be noted, however, thatthe appended drawings illystate only typical embodiments ofthis invention and are therefore not t6 be considered limiting ofits scope, for the invention may admit to other equally efletive embodiments FIG. 1 illustrates components ofa video analysis system, cording to one embodiment ofthe invent FIG, 2 further illusiates components ofthe video analysis system shown i FIG. 1, seconling to one embodiment ofthe invention, FIG. 3 illustrates an example video frame and background ‘movtel image and corresponding gradient images, according to one embodiment ofthe invent FIG. illustrates a method for filtering out false-positive {Foreground pixels resulting from camera sutogsin, ccording ‘one embodiment of the invention. DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS, Embodiments ofthe present invention provide a method and system for analyzing an learning behavior based on an ‘acquired stream of video frames. A machine-lagning video Analytics system may be configored to use a computer vision engine to observe a scene, generate information streams of ‘observed activity, and to pass the streams to a machine leam- ‘ng eagine. In en, the machine learning engine may engage ‘nan undirected and unsuper ised learning approach to leara pattems regarding the object behaviors in that scene, There- After, when unexpected (1. abnormal or unisual behaviors observed, alerts may be generated In addition, the computer vision engine may inelude an aulogain fier module configure to filter out (or otherwise adap to) camera autogainellects that may affect the learning. process and other processes. "The autogain filter move may be a part of or distinct from, the BEG component, di cessed above. In one embodiment, the autogain fiter module may receive tracked foreground patches and bounding boxes orthose patches. For each foreground pine in the bounding box area and surrounding pixels ofthe video frame, and for each contesponding pixel inthe background model image, the aulogaia filter module may detemine 4 texture. AS Used herein, “texture” refers wo local variability oF intensity valves ofpixels. In one embodiment, gradient may be used to co pte texture. For example, the autogain filter may apply the Sobel operator, which is commonly’ used in image processing land edge detection algorithms, to determine gradient vals ‘The Sobel operator provides a dieret differentiation oper. tor, used to compute an approximation ofthe change of an we intensity function, US 9,317,908 B2 3 In one embodiment, the autogain filter module may deter- ‘mine, for eaeh foreground pixe in the bounding box are, & ‘correlation sore hased on the texture ofthe foreground pixel and set of surrounding pixels in the video frame and texture ‘ofeorresponding pixels inthe background model image, The fulogain filter module may remove pixels from the fore- round patch which havea comelaton seore which excoods 2 threshold In addition, the aviogsin iter module may reduce the size of bounding boxes t0 fit the modified foreground pateh(es). Ine following, reference is made to embodiments ofthe ‘vention, However, it should be understood that the inven- tion isnot limited to any specifically described embodimeat Instead, any combination ofthe following features and ele- ments, whether related to different embodiments or ot, is ‘contemplated to implement and practice the invention, Fur- thermore, in various embodiments the invention provides numerous advantages over the prior art. However, although ‘embodiments ofthe invention may achieve advantages over ‘ther possible solutions andr over the prior at, whether oF nota particular advantage is achieved by given embodiment is no limiting of the invention. Thus, the Following aspects ‘Features, embodiments and advantages are merely illustrative and are not considered elements or limitations of the > ‘appealed claims except where explicitly recited in acl) Likewise, reference “te invention” shall not be construed ‘38a weneralizaion of any inventive subject mater disclosed herein and shall not be considered to be an element o tation of te appended claims except where explicitly recited ina claims). ‘One embodimeat of the invention is implemented as 3 program procict forse witha computer system. The pro- ram(s) of the program product defines functions of the ‘embodiments (including the methods described herein) and ‘can be contained on a variety of computerteadable storage media, Examples of computerreadable storage media include @) non-writable storage media (eg, read-only memory devices within a computer such as CD-ROM of DVD-ROM disks readable by an optical media drive) on ‘which information s permanently stored (i) writablestorage media (eg, floppy disks within a diskette drive or hard-disk drive) on which alterable information is stored. Such com- pulerreadabe storage media, when eatrying computer-read- ‘ble instructions thal diree the funetions ofthe present iawen- tion, are embodiments of the present invention, Other ‘examples media include communications media through ‘whict information s conveyed to acompute,stchasthrough, ‘compute or telephone network, including Wireless commu nications networks. Tn general, the routines exccutedto implement the embod ‘ments ofthe invention may be part of an operating system or ‘a specific application, component, program, module, object, ‘or sequence of instructions. The computer program of the present invention is comprised typically of a multitude of Jnstructons that will be translated by thenaive computer into ‘4 mochine-readable fommat and hence executable instr tions. Also, programs are comprised of variables and data structures that either reside locally to the program or are ound in memory of on storage deviees. In addition, various programs described herein may be identified based upon the pplication for which they are implemented in a specific ‘embosiiment ofthe invention, Hossever, it should be spprec sted that any particular program nomenelature that follows is used merely for convenience, and thn the invention should not be limited to se solely in any specific aplication iden- fied andr implied by such nomencleture 0 o 4 FIG, 1 illustrates components of a video analysis and bbehuvior-rcognition system 100, according to one embodi- ‘meat ote present invention, As shawn, the behavioe-eoog- nition system 100 includes a video input source 105, a net- work 110, a computer system 115, and input aod output devices 118 (eg., a monitor, a keyboard, a mouse, a printer, and the like). The network 110° may transmit video data ‘recorded by the video input 105 to the computer system 115. Illustratively, the computer system 115 includes a CPU 120, storage 125 (e.,a disk drive, optical disk drive. loppy disk rive and the ike), and a memory 130 whieh includes both a ‘computer vision engine 138 and mochine-leaming engine 140. As described in greater detail below, the computer vision engine 13S andthe machine-learing engine 140 may provide software applications configured fo analyze a sequence of video frames provide by the video input 105, [Network 110 receives video data (eg, video steam(s), video images, orth like) from the video input source 108. The video input source 108 may be a video camera, a VCI DVR, DVD, computer, web-cam device, or the like. For example, the video input source 108 may be @ stationary Video camera simed ata certain area (e.g. subway station, ‘ parking lot, «building entryexit, et), which records the ‘events taking place therein, Generally, the area visible to the ‘eamora is relerred to as the “scene.” The video inp source 105 may be configured to record the seene as a sequence of individual video frames ata specified frame-rate (e224 ames per second), wher eaeh frame includes a fixed aum- ber af pixels (e., 320x240). Fach pixel ofeach frame may specify a color vaine (an RGB Value) or grayseale value (ea, a radiance value between 0-285). Further, the video stream may be formatted using known formats ineluding MPEG2, MIPEG, MPEG, 11263, 1.264, and the like. ‘As noted above, the computer vision engine 136 may be configured to analyze this raw information to identify active ‘objects inthe video stream, identify a varity of appearance fn kinematic fetares used by @ machine leaming engine 140 to derive object classifications, derive variety of meta~ data regarding the actions and infericions of such objects, and supply this information tothe machine-learning engine 140. And in funn, the machine-learning engine 140 may be configured to evaluate, observe, learn and remember details sezrling eves (an yes of eet) ht tsp win the soene over time. In one embodiment, the machineleaming engine 140 receives the vdeo frames andthe data generated by the com puter vision engine 138. The machine-leaming engine 140 may be configured to analyze the received data, cluster ‘objects having similar visual and/or Kinematie features, build semantic representations of events depicted in the video ‘ames. Over time, the machine learning engine 140 learns expocted pattems of behavior for objects that map toa given cluster Thus, overtime, the machine leaming engine learas from these observed pattems to identify normal and/or abnor smal events, That is, rather than having patters, objects, object types, oF activites defined in advance, the machine Teaming engine 140 builds its own model of what different ‘object types have been observed (eg, based on clusters of Kinematic and or appearance Features) a8 Well as model of expected behavior fora given object type. In particular, the ‘machine learing engine may model the kinematic properties ‘of one or more types of objects TIn-general, the computer vision engine 138 and the ‘machine-leaming engine 140 both process video datainreal- time. However, time scales for pressing information by the ‘computer vision engine 138 and the machine-learing engine 140 may differ. For example, in one embodiment, the com- US 9,317,908 B2 5 puter vision engine 136 processes the received video data Jame-by-lrame, while te machine-leaming engine 140 pro- ‘cesses data every Neframes, In other words, while the com- puter vision engine 138 may analyze each frame in real-time to derive a set of appearance and kinematic data related 10 ‘objects observed inthe frame, the machine-learning engine 140 not constrained by the realtime frame rate ofthe video input Note, however, FIG. 1 illustrates merely one possible arrangement of the behavior-recogsition system 100, For ‘example, although the video input source 108 is shown con- nected 1 the computer system 118 via the network 110, the network 110 is not always present or needed (eg, the video ‘input source 105 may be directly connected tothe computer system 115) Further various components and modules ofthe behavior-secognition system 100 may be implemented ‘ther systems. For example, in one embediment, the come puter vision engine 138 may be implemented as @ part of & Video input device (e.g. as a fimiware component wired -feature vectors out fom the ‘computer vision engine 138, US 9,317,908 B2 1 Generally, the workspace 240 provides a computational ‘engine forthe machine-learing engine 140,Forexample, the workspace 240 may be configured to copy information from the peteeptaal memory 230, retrieve relevant memories from the episodic memory 238 and the long-term memory 225, ‘clot which codelets 245 t execute. Each codelet 248 may be a software program configured 10 evaluate diferent sequences of events and a determine how one sequence may follow (or otherwise relate to) another (eg, a finite state machine). More generally, each codelet may provide a sol ware module configured to detect interesting pattern from the streams of data fed to the machine-learning engine. In tun, the codelet 245 may create, retrieve, reinforce, oF modify memories inthe epsodie memory 238 and the long- termmemory 225. By repeatedly scheduling codeets 245 for ‘execution, copying memories ad percep trom the works ‘space 240, the machine-Leaming engine 140 performs a cog- nitive eyele used to observe, and lear, about patterns of behavior that occur within the scone. In one embodiment, the perceptual memory 230, the epi sogic memory 238, andthe long-term memory 228 aroused to ‘dentify pattems of behavior evaltate events that trinspire ia the scene, aed encode and store observations. Generally, the perceptual memory 230 receives the output ofthe eomputer Vision engine 135 (ea, the context event steam). The epi sodie memory 235 stores data representing observed events With dots related toa particular episode, eg. information “deseribing time and space detail related 0 an event. Thais, the episodic memory 235 may eneode specific details of @ particular event, ie, “what and where” something occurred ‘within a scene, suchas a particular vehicle (car A) moved to ‘location believed to be parking space (parking space 5) 53 AM, In contrast, the long-term memory 228 may store data generalizing events observed in the scene. To continue with the example of a vehicle parking he long-term memory 228 ‘may encode information capturing observations and gener- alizationsleamed by an analysis ofthe behavior of objets in the scene sach as “vehicles in certain areas of the scene tend tobe in motion,” “vehicles tend to stop in certain ates ofthe scone,” ete, Thus the long-term memory 225 stores observa- tions about what happens within @ scene with much of the particular episodic details stripped away. In this way, when a hess event ovcurs, memories from the episodic memory 238 ‘and the long-term memory 228 may be used 10 relate and understand a current event, ie, the New event may be com- pared with past experience, leading to both reinforcement ‘decay, and adjustments othe information stored in the long: term memory 228, overtime. Ina particular embodiment, the Jong-term memory 225 may be implemented as an ART nel- work anda sparsedistributed memiony data sracture ‘The miero-feature classitier 255 may schedule a codolet 245 to evaluate the miero-feature vectors output hy the com- puter vision engine 135. Asnoted, the computer vision engine 138 may track objects frame-to-frame and generate micro- feature vectors for each foreground object ata rate of, 2, $ Hz. Inone embodiment, the micro-feature classifier 288 may be configured to create clusters from this sream of micro- ‘ature vectors, For example, eich micro-Feature vector nity be supplied to an input layer ofthe ART network (or com bination of @ self organizing map (SOM) and ART network used to cluster nodes in the SOM). In response, the ART network maps the miro-featne veetartoa euster inthe ART network and updates that clster (or ereates a new chistes the input miero-feature vector is sufficiently dissimilar to the ‘existing clusters), Each elistr is presumed to represent 8 $ponding pines inthe background model mgs That the texture math score may indicate a degse of dati (ea, @ peventaue match) between he texture (in terms ae ‘ient) ofthe ee including the foreground pixel and ts r= rounalng pixels and conesponding pixels ofthe background modal image “Asp 460, heaton filler node resizes the hounding box based onthe comcation results of step 480, i nacesary. As dincussed, the comclation Function may be configured return seo indiatng a degree of meh between the ex tare around piel in the backround model prodient image andthe ideo fame gratientimage. the eorelaton scores igh (or low, depending om te implementation), may ind fete that a Yoreground patch piel i actually part of the Background (ie tht tbe pixel ilse-posieforeound pine), Aso result, the atogain ier module may be config- tied fo, for example, remove pitels from the foreround patch where the corelation sor for those pixels exceed (or sles dn) threshold. The bounding box forthe foreground Patch may then be adjusted accordingly to have wich and Height equal to the masimnum width ad maximums height respectively ofthe modified foreground path, Th one embodiment, the thesbold correlation score and numberof suroanding pixels may’ be adjusted based onthe sizeof taebounding box, with ahigheridensty match (eg, 0 higher threshold sere) quired and more surrounding pixels used forlargr bounding boxes, and vice versa. For example, the threshold correlation score may be made 10 require 3 loser (orless close) identi andthe sizeof squares) ined ing surrounding pixels may be made ager, baseton the size ‘ofthe bounding box aecoring to dserete step functions In ‘Benes, langr bounding boxes may indicate global aston ‘flees. By contrast, smaller hounding boxes ae less kel 0 result fom autognin eflsets. Te threshold colton score ‘and oumber of surounding pixels may be adjusted acount 0 o 12 {or these tendencies because, for example, using # higher identity match and lange number of surrounding pixels may ‘cause the mischaracterization of actual foreground pixels ia small Bounding boxes as background pixels Atstep 470, the autogain filter module detemmines whether there are more bounding box areas and returns to step 430 there are more bounding box areas. If there are no adlitional bounding box areas, the method 400 ends “Although diseussed above with respect to autogsia,tech- siques similar to those discussed may also be applied gener- ally to help distinguish foreground from background. For ‘eximple, techniques similar to those discussed may be used tofilter out effects of auto white balancing; signal noise; and instances where the camera does not overfler, common problem with shadow, noise, ete- algorithms. Although di cussed above with respect o reducing the size ofa foreground ‘object bounding box, techniques similar to those discussed may also be used to, for example, prevent background model corruption de to incorrect pixels being identified as back- ‘ground pixels ‘While the foregoing is directed to embodiments of the present invention, other and further embodiments of the Jnvention may he devised without departing from the basic soope thereof. ‘What is claimed is 1. A.computer-implemented method for filtering out false- positive foreground pixels, the method compeisin: ‘exteacting Foreground patches fom a video fame using 2 background model image, the foreground patches each inching respective foreground pixels; dtemnining a hounding box for exch foreground patch; Tor each bounding box (1) for each foreground pix! in the foreground pate in the bounding box: (A) determining a texture ofa frst area inching the Torearound pixel and pixels surrounding the fore- ground pixel anda texture of second are inchad- ing pixels ofthe background model image eorre- sponding to the Fist area pixels, whercin the rextures of the frst area and the second area each representa local variability of intensity vales of respective pixels, and (B) determining a coreation sore based on the tex: tre ofthe fist area and the texture of the second ares, ad (2) tor foreground pixels having correlation scores Which exceed a threshold value, removing the Fore- ground pixels fom the foreground pateh inthe Bound Jngbox, wherein the threshold valve is proportionalto sizeof the bounding box; and foreach bounding box having foreground pixels removed reducing a size of the bounding box based on the removal ofthe foreground pixels 2. The method of elaim I, wherein determining texture of | the fist area includes determining grodient values for the foreground pixel and the pels surrounding the foreground pixel, and wherein determining texture of the second area ‘includes determining gradient values for the pixels of the background model insige corresponding (othe first area pix ok. '3. The method of claim 2, wherein the gradient values are termined using an edge detection technique which is resi ‘ent to noise and invariant to illumination andor eolohue changes. 4. The method of elaim 2, wherein the gradient values are determined by applying one ofa Sobel Operator and a Tlaar wavelet transfor US 9,317,908 B2 13 5, Themethod of claim 1, wherein the number of surround ing the foreground pine used in the first area is proportional to the size ofthe bounding box. 6. The method of claim 1, further comprising, prior © determining the textures, cowverting the foreground and background pixels o grayscale, "7A non-iensitory computerreadable storage mestvm storing instructions, which when executed by’ computer system, perform operations for filtering out false-positive orepround pixels, the operations comprising ‘exiricting foreground patches from a video frame using 3 ‘background model image the foreyround patches each including respective foreground pixels; {determining a bounding box for each foreground pate Tr each bounding box (foreach foreground pixel in the foreground patch in the Bonning box: (A) determining a texture of a first area including the Joreground pixel and pixels surrounding the fore= round pixel and texture ofa second area inelud- ing pixels of the background model image corre- sponding to the first arca pixels, wherein the textures ofthe first ares and the socond ares each representa local variability of intensity values of respective pixels, and (B) determining a correlation score hased on the tex- ture ofthe first area and the texture of the second area, and (2) for foreground pixels having correlation seores ‘which exeved a threshold value, removing the fore- ground pixels from the foreground patch inthebound- ing box, wherein the threshold value isproportionalto a size of he bounding box: and foreach bounding box having foreground pixels removed, reducing size of the bounding box based on the removal ofthe foreground pixels 8, The computerreadable storage medium of claim 7, wherein detemining texture of the fist area includes deter mining gradient values forthe foreground pixel and thepixels surrounding the foreground pixel, and wherein determining texture of the second area includes determining gradient val- ties forthe pixels f the beckon model image correspond- Jing tothe frst area pixels. '9.. The compulerreadable storage medium of claim 8 wherein the gradient values are determined using an edge detection technique which is resilient to noise and invari to ‘Mhumination and/or color hue changes. 14 10, The computerreadable storage medina of claim 8, ‘wherein the gracent values are determined by applying one ‘off Sobel Operator and a Haar wavelet transfor. 11, The computer readable storage medium of claim 7. ‘wherein thenmber of surrounding the foregronnd pixel sce inthe frst area is proportional to the size ofthe bounding box. 12, The computer readable storage medium ofelaim 7, the ‘operations furher comprising, prior to determining the tex- tures, converting the foreground and background pixels 10 arayseae. 13. A system, comprising a processor; and ‘4 memory, wherein the memory ineludes an application Program configured to perform operations for filtering ‘out false-positive foreground pixels, the operations ‘comprising: ‘exacting foreground patches froma video frame using «8 background model image, the foreground patches ‘each including respective foreground pixels, determining a bounding box fr each foreground patch, foreach bounding box: (1 for each foreground pixel inthe foregroonel patch i the bounding be (A) determining a texture ofa fist area including the foreground pixel and pixels surounding the ‘foreground pixel and a texture of a second area including pixels of the background model image corresponding to the fist area pixel, wherein the textures ofthe first area and the ssoand area cach represent a local variability of intensity val- tes of respective pixels, and (@) determining a correlation seore based on the texture of the first area and the texture of the second ara, and (2) for foreground pixels having correlation scores ‘which exceed threshold vals, removing the fore around pixels from the foroground satel in the bounding box, wherein th threshold value is pro- portional 1 a size ofthe bounding box, and for cach bounding box having foreground pixels removed, reducing a size ofthe bounding box based ‘on the removal of the foreground pixels, 14, The system of claim 13, wherein the number of surounding the foreground pixel used in the first area is proportional to the sizeof the bounding box.

Anda mungkin juga menyukai