Anda di halaman 1dari 72

Randomized Decision Forests

Very fast
for classification for clustering

detailed references at end of slides

Generalization through random training

Inherently multi-class
automatic feature sharing
[Torralba et al. 07]

Simple training / testing algorithms


Randomized Decision Forests = Randomized Forests = Random ForestsTM

Randomized Forests in Vision

[Amit & Geman, 97] digit recognition

[Lepetit et al., 06] keypoint recognition

water boat

tree

chair
road

[Moosmann et al., 06] visual word clustering

[Shotton et al., 08] object segmentation

(Among others...)

Live Demo

[Shotton et al. 08]

Real-time object segmentation using randomized decision forests


trained on MSRC 21-category database: airplane bicycle bird boat body book building car cat chair cow dog face flower grass road sheep sign sky tree water

Segment image and label segments:


Winner CVPR 2008 Best Demo Award!

Outline

Tutorial on Randomized Decision Forests Applications to Vision


keypoint recognition object segmentation [Lepetit et al. 06] [Shotton et al. 08]

please ask questions as we go!

The Basics: Is The Grass Wet?


world state is it raining?
no yes

is the sprinkler on?


no yes

P(wet) = 0.95

P(wet) = 0.1

P(wet) = 0.9

The Basics: Binary Decision Trees


feature vector split functions thresholds Classifications
2

v fn(v) tn Pn(c)

v
1

leaf nodes split nodes

<
4 8 9 5 6 7 11 12 13

<
10

17

14

15

16

category c

Decision Tree Pseudo-Code

double[] ClassifyDT(node, v) if node.IsSplitNode then if node.f(v) >= node.t then return ClassifyDT(node.right, v) else return ClassifyDT(node.left, v) end else return node.P end end

Toy Learning Example Try several lines, chosen at random Keep line that best separates data
information gain
y

Recurse
feature vectors are x, y coordinates: v = [x, y]T split functions are lines with parameters a, b: fn(v) = ax + by threshold determines intercepts: tn four classes: purple, blue, red, green

Toy Learning Example Try several lines, chosen at random Keep line that best separates data
information gain
y

Recurse
feature vectors are x, y coordinates: v = [x, y]T split functions are lines with parameters a, b: fn(v) = ax + by threshold determines intercepts: tn four classes: purple, blue, red, green

Toy Learning Example Try several lines, chosen at random Keep line that best separates data
information gain
y

Recurse
feature vectors are x, y coordinates: v = [x, y]T split functions are lines with parameters a, b: fn(v) = ax + by threshold determines intercepts: tn four classes: purple, blue, red, green

Toy Learning Example Try several lines, chosen at random Keep line that best separates data
information gain
y

Recurse
feature vectors are x, y coordinates: v = [x, y]T split functions are lines with parameters a, b: fn(v) = ax + by threshold determines intercepts: tn four classes: purple, blue, red, green

Randomized Learning
Recursive algorithm
set In of training examples that reach node n is split: left split right split threshold function of example is feature vector

Features f and thresholds t chosen at random At leaf node n, Pn(c) is histogram of examples In

More Randomized Learning


left split right split

Features f(v) chosen from feature pool f 2 F Thresholds t chosen in range Choose f and t to maximize gain in information

Implementation Details
How many features and thresholds to try?
just one = extremely randomized few -> fast training, may under-fit many -> slower training, may over-fit [Geurts et al. 06]

When to stop?
maximum depth minimum entropy gain delta class distribution pruning?

Unsupervised training
information gain -> most balanced split

Randomized Learning Pseudo Code


TreeNode LearnDT(I) repeat featureTests times let f = RndFeature() repeat threshTests times let t = RndThreshold(I, f) let (I_l, I_r) = Split(I, f, t) let gain = InfoGain(I_l, I_r) if gain is best then remember f, t, I_l, I_r end end if best gain is sufficient return SplitNode(f, t, LearnDT(I_l), LearnDT(I_r)) else return LeafNode(HistogramExamples(I)) end end

A Forest of Trees
Forest is ensemble of several decision trees
v v
leaf nodes split nodes

tree t1

tree tT

category c category c

classification is

[Amit & Geman 97] [Breiman 01] [Lepetit et al. 06]

Decision Forests Pseudo-Code


double[] ClassifyDF(forest, v) // allocate memory let P = double[forest.CountClasses] // loop over trees in forest for t = 1 to forest.CountTrees let P = ClassifyDT(forest.Tree[t], v) P = P + P // sum distributions end

// normalise P = P / forest.CountTrees
end

Learning a Forest
Divide training examples into T subsets It I
improves generalization reduces memory requirements & training time

Train each decision tree t on subset It


same decision tree learning as before

Multi-core friendly
Subsets can be chosen at random or hand-picked Subsets can have overlap (and usually do) Could also divide the feature pool into subsets

Learning a Forest Pseudo Code


Forest LearnDF(countTrees, I) // allocate memory let forest = Forest(countTrees) // loop over for t = 1 to let I_t = forest[t] end trees in forest countTrees RandomSplit(I) = LearnDT(I_t)

// return forest object return forest


end

Toy Forest Classification Demo

Randomized Forests for Clustering

[Moosmann et al. 06]


[Sivic et al. 03] [Csurka et al. 04]

Visual words good for e.g. matching, recognition but k-means clustering very slow
Randomized forests for clustering descriptors
e.g. SIFT, texton filter-banks, etc.

Leaf nodes in forest are clusters


concatenate histograms from trees in forest

tree t1

tree tT

1
2 3 4

5 1 2 8 9 3 4 5 6 7

Randomized Forests for Clustering


tree t1

[Moosmann et al. 06] tree tT

3 4 5 1 2 8 9
3

2
6 7 8 9

tree t1
frequency

tree tT

bag of words

well see later how to use whole tree hierarchy!


node index

Relation to Cascades
Cascades
very unbalanced tree

[Viola & Jones 04]

good for unbalanced binary problems e.g. sliding window object detection

Randomized forests
less deep, fairly balanced ensemble of trees gives robustness

good for multi-class problems

Random Ferns
Nave Bayes classifier over random sets of features
Bayes rule nave Bayes
individual features

random ferns
set of features

Can be good alternative to randomized forests

[zuysal et al. 07] [Bosch et al. 07]

Short Pause

Outline

Tutorial on Randomized Decision Forests Applications to Vision Problems


keypoint recognition object segmentation [Lepetit et al. 06] [Shotton et al. 08]

Fast Keypoint Recognition


Wide-baseline matching as classification problem

[Lepetit et al. 06]

Extract prominent key-points in training images


Forest to classifies:
patches -> keypoints

Features
pixel comparisons

Augmented training set


gives robustness to patch scaling, translation, rotation

Fast Keypoint Recognition


Example videos
from http://cvlab.epfl.ch/research/augm/detect.php

[Lepetit et al. 06]

Real-Time Object Segmentation

[Shotton et al. 2008]

Aim a better visual vocabulary


image categorization does this image contain cows, trees, etc.? object segmentation draw and label the outlines of the cow, grass, etc.

Design goals
fast and accurate use learned semantic information

Object Recognition Pipeline

extract features
SIFT, filter bank

clustering
k-means

assignment
nearest neighbour

hand-crafted

unsupervised

classification algorithm supervised


SVM, decision forest, boosting

Object Recognition Pipeline


Semantic Texton Forest (STF)
decision forest for both clustering & classification tree nodes have learned object category associations
local classification

STF
clustering semantic textons

classification algorithm supervised


SVM, decision forest, boosting

Object Recognition Pipeline


test image

Semantic Texton Forest (STF)


decision forest for both clustering & classification tree nodes have learned object category associations

STF
clustering semantic textons local classification

Support Vector Machine (SVM)


pyramid match kernel in learned tree hierarchies

Segmentation Forest (SF) building dog road


image categorization
building dog road

second decision forest features use layout & context semantic context

object segmentation

Object Recognition Pipeline


test image

Semantic Texton Forest (STF)


decision forest for both clustering & classification tree nodes have learned object category associations

STF
semantic textons (clustering) local classification

Support Vector Machine (SVM)


pyramid match kernel in learned tree hierarchies

Segmentation Forest (SF) building dog road


image categorization
building dog road

second decision forest features use layout & context

object segmentation

Textons & Visual Words


Textons [Julesz 81]
computed densely clustered filter-bank responses used for object recognition e.g. references [Malik 01] [Varma 05] [Winn 05] [Shotton 07] [Mikolacjzyk 04] [Lowe 04] [Sivic 03] [Csurka 04]

Visual words
usually computed sparsely clustered descriptors used for object recognition

extract features
filter bank local descriptors
e.g. [Lowe 04]

clustering
k-means

assignment
nearest neighbour

Semantic Texton Forests (STF)

A STF is
a decision forest applied at each image pixel simple pixel-based features

How is this new?


no descriptors or filter-banks
decision forest fast clustering & assignment local classification learned semantic information

Image Patch Features


p i
i

Pixel i gives patch p


(21x21 pixels in experiments)

f(p) >
?

learned threshold

tree split function

Example Semantic Texton Forest

Input Image

A[g] - B[b] > 28

Ground Truth

|A[b] - B[g]| > 37

|A[r] - B[b]| > 21

A[b] > 98

A[r] + B[r] > 363

A[b] + B[b] > 284

A[g] - B[b] > 13

Example Patches

Leaf Node Visualization


Average of all training patches at each leaf node

tree 1 tree 2 tree 3 tree 4 tree 5

STF Training Examples


Supervised training

Regular grid

(GT colors categories)

Random transformations
learn invariances [Lepetit et al. 06]

Different Levels of Supervision


STF can be trained with:
no supervision (just the images) clustering only no local classification

weak supervision (image labels) trained as if all image labels at pixels

tree bench grass

full supervision (pixel labels)

Balancing the Training Set


Datasets often unbalanced
poor average class accuracy
Flower Sign Bike Car Face Water Building Bird Chair Book Road Cat Dog Body

Boat

Weight training examples by inverse class frequency

Airplane

Sky Tree Sheep Cow

Grass

Proportion of pixels by class (MSRC dataset)

Semantic Textons & Local Classification

test image

semantic textons
(color leaf node index)

local classification
(color most likely category)

comparable
ground truth (for reference)

MSRC Nave Segmentation Baseline


Use only local classification P(c|l) from STF

STF

supervised weakly supervised

global accuracy 49.7% 14.8%

average accuracy 34.5% 24.1%

Bags of Semantic Textons (BoSTs)


semantic textons image
(colors leaf node indices)

local classification
(colors categories)

region r

tree t1
frequency
leaf node split node

tree tT
probability

all trees

node index

depth: 2

object category

semantic texton histogram

region prior

Choice of Regions for BoSTs


Image categorization
region r = whole image

Object segmentation
many image regions r

r1

r2 i

r3

....

Other Clustering Methods


Efficient codebooks
Hyper-grid clustering Hierarchical k-means

[Jurie et al. 05]


[Tuytelaars et al. 07] [Nister & Stewnius 06]

Discriminant embedding

[Hua et al. 07]

Randomized clustering forests [Moosmann et al. 06]


tree hierarchy not used ignores classification of forest uses expensive local descriptors

Object Recognition Pipeline


test image

Semantic Texton Forest (STF)


decision forest for both clustering & classification tree nodes have learned object category associations

STF
semantic textons (clustering) local classification

Support Vector Machine (SVM)


pyramid match kernel in learned tree hierarchies

Segmentation Forest (SF) building dog road


image categorization
building dog road

second decision forest features use layout & context

object segmentation

Image Categorization
SVM with learned Pyramid Match Kernel (PMK)
descriptor space image location space [Grauman et al. 05] [Lazebnik et al. 06]

New PMK acts on semantic texton histogram matches P and Q in learned hierarchical histogram space

deeper node matches are more important


increased similarity at depth d

norm.

depth weight

Categorization Experiments on MSRC


Learned PMK vs. radial basis function (RBF)
RBF learned PMK Mean AP 49.9 76.3

NB mean average precision tougher than EER or AuC

Mean Average Precision

Number of Trees T

Object Recognition Pipeline


test image

Semantic Texton Forest (STF)


decision forest for both clustering & classification tree nodes have learned object category associations

STF
semantic textons (clustering) local classification

Support Vector Machine (SVM)


pyramid match kernel in learned tree hierarchies

Segmentation Forest (SF) building dog road


image categorization
building dog road

second decision forest features use layout & context

object segmentation

Segmentation Forest
Object segmentation
building bicycle

road

Adapt TextonBoost [Shotton et al. 06]


boosted classifier textons randomized decision forest semantic textons + region priors

no conditional random field

Features in Segmentation Forest


frequency

semantic texton bin

node index

bin count

>
?

learned threshold

probability

offset rectangle r

or

region prior bin

object category

tree split function

How the Features Work


Rectangles pair with semantic textons can capture
appearance, layout, textural context [Shotton et al. 07]
i1 i2

r1

t1

i3

input image

feature1 = (r1, t1)


r2
t2

feature1 responses
i4

semantic texton map

feature2 = (r2, t2)

feature2 response

Features in Segmentation Forest


Learning the randomized forest
regular grid (10x10 pixels) discriminative pairs of region r and BoST bin

Region prior allows semantic context sheep tend to stand on grass Efficient calculation
compute bins only as required use integral images [Viola & Jones 04] sub-sample integral images

Image-Level Prior (ILP)


Combine
image categorization (SVM with learned PMK) object segmentation (decision forest)

image categorization posterior

prior for segmentation


weighting

SF See also [Verbeek et al. 07]

ILP

MSRC Segmentation Results

test image SF SF + ILP

building bicycle

grass flower

tree sign

cow bird

sheep book

sky chair

airplane road

water cat

face dog

car body

boat

More MSRC Results

building bicycle

grass flower

tree sign

cow bird

sheep book

sky chair

airplane road

water cat

face dog

car body

boat

MSRC Quantitative Comparison


Pixel-wise segmentation accuracy (%)
Average 58 64 63 67 Global 71 68 72
Airplane Building Bicycle Flower Sheep Water Grass Chair

Book

Road

Body

[Shotton 06] [Verbeek 07] SF SF + ILP

62 98 86 58 50 83 60 53 74 63 75 63 35 19 92 15 86 54 19 62 7 52 87 68 73 84 94 88 73 70 68 74 89 33 19 78 34 89 46 49 54 31 41 84 75 89 93 79 86 47 87 65 72 61 36 26 91 50 70 72 31 61 14 49 88 79 97 97 78 82 54 87 74 72 74 36 24 93 51 78 75 35 66 18

red = winner

Computation time
[Shotton 06] [Verbeek 07] SF

Training Time 2 days 1 hr 2 hrs

Test Time 30 sec / image 2 sec / image < 0.125 sec / image

Boat

Face

Cow

Tree

Sign

Bird

Dog

Sky

Car

Cat

MSRC Quantitative Comparison


Pixel-wise segmentation accuracy (%)
Average Global
Airplane Building Bicycle Flower Sheep Water Grass Chair

Book

Road

Body

[Shotton 06] [Verbeek 07] SF SF + ILP [Tu 08]

62 98 86 58 50 83 60 53 74 63 75 63 35 19 92 15 86 54 19 62 7 52 87 68 73 84 94 88 73 70 68 74 89 33 19 78 34 89 46 49 41 84 75 89 93 79 86 47 87 65 72 61 36 26 91 50 70 72 31 49 88 79 97 97 78 82 54 87 74 72 74 36 24 93 51 78 75 35 69 96 87 78 80 95 83 67 84 70 79 47 61 30 80 45 78 68 52

71 58 54 31 - 64 61 14 68 63 66 18 72 67 67 27 78 69 red = winner

Computation time
[Shotton 06] [Verbeek 07] SF [Tu 08]

Training Time 2 days 1 hr 2 hrs a few days

Test Time 30 sec / image 2 sec / image < 0.125 sec / image 30-70 sec / image

Boat

Face

Cow

Tree

Sign

Bird

Dog

Sky

Car

Cat

MSRC Influence of Design Decisions


average accuracy 64.1% 65.5% 66.1% 66.9% 64.4% 64.2% 64.6%

only leaf node bins all tree node bins only region prior bins full model no transformations unsupervised STF weakly supervised STF
MSRC dataset Category average accuracy

VOC 2007 Segmentation Results


person table chair dog

[Brookes] SF SF + Image-Level Prior [TKK] SF + Detection-Level Prior

Average Accuracy 9 20 24 30 42

Detection-Level Prior
[TKK] detection bounding boxes as segmentation prior

Driving Video Database

test image

ground truth(!)

STF + SF result

[Brostow, Shotton, Fauqueur, Cipolla, ECCV 2008] new structure-from-motion cues can improve object segmentation

Semantic Texton Forests Summary


Semantic texton forests
effective alternative to textons for recognition

Image categorization improves segmentation


image-level prior can use identical image features

Memory
high memory requirements for training

Efficiency
very fast on CPU

References (red = most relevant)

Amit & Geman


Shape Quantization and Recognition with Randomized Trees. Neural Computation 1997.

Moosmann et al.
Fast Discriminative Visual Codebooks using Randomized Clustering Forests. NIPS 2006. Scalable Recognition with a Vocabulary Tree. CVPR 2006. Fast Keypoint Recognition in Ten Lines of Code. CVPR 2007. To appear ECCV 2008. Semantic Texton Forests for Image Categorization and Segmentation. CVPR 2008. TextonBoost for Image Understanding: Multi-Class Object Recognition and Segmentation by Jointly Modeling Texture, Layout, and Context. IJCV 2007. Video Google: A Text Retrieval Approach to Object Matching in Videos. ICCV 2003. Sharing visual features for multiclass and multiview object detection. PAMI 2007. Auto-context and Its application to High-level Vision Tasks. CVPR 2008. Vector Quantizing Feature Space with a Regular Lattice. ICCV 2007. A statistical approach to texture classification from single images. IJCV 2005. Region Classification with Markov Field Aspect Models. CVPR 2007. Robust Real-time Object Detection. IJCV 2004. Object Categorization by Learned Universal Visual Dictionary. ICCV 2005.

Bosch et al.
Image Classification using Random Forests and Ferns. ICCV 2007. Random Forests. Machine Learning Journal 2001. To appear ECCV 2008. Visual Categorization with Bags of Keypoints. ECCV Workshop on Statistical Learning in Computer Vision, 2004. Extremely Randomized Trees. Machine Learning 2006. The Pyramid Match Kernel: Discriminative Classification with Sets of Image Features. ICCV 2005. Discriminant Embedding for Local Image Descriptors. ICCV 2007. Creating Efficient Codebooks for Visual Recognition. ICCV 2005. Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories. CVPR 2006. Keypoint Recognition using Randomized Trees. PAMI 2006. Distinctive image features from scale-invariant keypoints. IJCV 2004. Contour and Texture Analysis for Image Segmentation. IJCV 2001. Scale and Affine invariant interest point detectors. IJCV 2004.

Nister & Stewenius zuysal et al. Sharp


Breiman Brostow et al. Csurka et al. Geurts et al. Grauman & Darrel Hua et al. Jurie & Triggs Lazebnik et al. Lepetit et al. Lowe

Shotton et al. Shotton et al.

Sivic & Zisserman


Torralba et al. Tu

Tuytelaars & Schmid Varma & Zisserman Verbeek & Triggs Viola & Jones Winn et al.

Malik et al. Mikolajczyk & Schmid

Take Home Message

Randomized decision forests are


very fast
(GPU friendly [Sharp, ECCV 08])

simple to implement flexible tools for computer vision

Ideas for more research


biasing the randomness

optimal fusion of trees from different modalities e.g. appearance, SfM, optical flow

http://jamie.shotton.org/work/presentations/ICVSS2008.zip

Thank You
jamie@shotton.org

Internships at MSRC available for next year. Talk to me or see:


http://research.microsoft.com/aboutmsr/jobs/internships/about_uk.aspx

Example Tree in Segmentation Forest

Maximum Depth 14

More Results

test image

no ILP

with ILP

person

water road sand

sky sidewalk plant

mountain building

tree rock sign

object classes

grass snow

car

Effect of Color Space

MSRC Categorization Results


Categories 1-5
1.0 0.8 1.0

Categories 6-10
0.8

Precision

0.6 0.4

Precision
0.0 0.2
grass

0.6 0.4 0.2 0.0

0.2
0.0 0.4 0.6 0.8
cow

1.0
sheep

0.0
sky

0.2

0.4

0.6
water

0.8
face

1.0
car

Recall
building tree

Recall
aeroplane

MSRC Categorization Results


Categories 11-15
1.0 0.8

Categories 16-21
1.0
0.8

Precision

0.6 0.4 0.2 0.0

Precision

0.6 0.4 0.2 0.0

0.0
bicycle

0.2
flower

0.4

0.6
sign

0.8
bird

1.0
book

0.0
chair

0.2
road

0.4
cat

0.6
dog

0.8
body

1.0
boat

Recall

Recall