Anda di halaman 1dari 72

# Randomized Decision Forests

Very fast
for classification for clustering

## Generalization through random training

Inherently multi-class
automatic feature sharing
[Torralba et al. 07]

## Simple training / testing algorithms

Randomized Decision Forests = Randomized Forests = Random ForestsTM

water boat

tree

chair

## [Shotton et al., 08] object segmentation

(Among others...)

Live Demo

## Real-time object segmentation using randomized decision forests

trained on MSRC 21-category database: airplane bicycle bird boat body book building car cat chair cow dog face flower grass road sheep sign sky tree water

## Segment image and label segments:

Winner CVPR 2008 Best Demo Award!

Outline

## Tutorial on Randomized Decision Forests Applications to Vision

keypoint recognition object segmentation [Lepetit et al. 06] [Shotton et al. 08]

## The Basics: Is The Grass Wet?

world state is it raining?
no yes

no yes

P(wet) = 0.95

P(wet) = 0.1

P(wet) = 0.9

## The Basics: Binary Decision Trees

feature vector split functions thresholds Classifications
2

v fn(v) tn Pn(c)

v
1

## leaf nodes split nodes

<
4 8 9 5 6 7 11 12 13

<
10

17

14

15

16

category c

## Decision Tree Pseudo-Code

double[] ClassifyDT(node, v) if node.IsSplitNode then if node.f(v) >= node.t then return ClassifyDT(node.right, v) else return ClassifyDT(node.left, v) end else return node.P end end

Toy Learning Example Try several lines, chosen at random Keep line that best separates data
information gain
y

Recurse
feature vectors are x, y coordinates: v = [x, y]T split functions are lines with parameters a, b: fn(v) = ax + by threshold determines intercepts: tn four classes: purple, blue, red, green

Toy Learning Example Try several lines, chosen at random Keep line that best separates data
information gain
y

Recurse
feature vectors are x, y coordinates: v = [x, y]T split functions are lines with parameters a, b: fn(v) = ax + by threshold determines intercepts: tn four classes: purple, blue, red, green

Toy Learning Example Try several lines, chosen at random Keep line that best separates data
information gain
y

Recurse
feature vectors are x, y coordinates: v = [x, y]T split functions are lines with parameters a, b: fn(v) = ax + by threshold determines intercepts: tn four classes: purple, blue, red, green

Toy Learning Example Try several lines, chosen at random Keep line that best separates data
information gain
y

Recurse
feature vectors are x, y coordinates: v = [x, y]T split functions are lines with parameters a, b: fn(v) = ax + by threshold determines intercepts: tn four classes: purple, blue, red, green

Randomized Learning
Recursive algorithm
set In of training examples that reach node n is split: left split right split threshold function of example is feature vector

Features f and thresholds t chosen at random At leaf node n, Pn(c) is histogram of examples In

## More Randomized Learning

left split right split

Features f(v) chosen from feature pool f 2 F Thresholds t chosen in range Choose f and t to maximize gain in information

Implementation Details
How many features and thresholds to try?
just one = extremely randomized few -> fast training, may under-fit many -> slower training, may over-fit [Geurts et al. 06]

When to stop?
maximum depth minimum entropy gain delta class distribution pruning?

Unsupervised training
information gain -> most balanced split

## Randomized Learning Pseudo Code

TreeNode LearnDT(I) repeat featureTests times let f = RndFeature() repeat threshTests times let t = RndThreshold(I, f) let (I_l, I_r) = Split(I, f, t) let gain = InfoGain(I_l, I_r) if gain is best then remember f, t, I_l, I_r end end if best gain is sufficient return SplitNode(f, t, LearnDT(I_l), LearnDT(I_r)) else return LeafNode(HistogramExamples(I)) end end

A Forest of Trees
Forest is ensemble of several decision trees
v v
leaf nodes split nodes

tree t1

tree tT

category c category c

classification is

## Decision Forests Pseudo-Code

double[] ClassifyDF(forest, v) // allocate memory let P = double[forest.CountClasses] // loop over trees in forest for t = 1 to forest.CountTrees let P = ClassifyDT(forest.Tree[t], v) P = P + P // sum distributions end

// normalise P = P / forest.CountTrees
end

Learning a Forest
Divide training examples into T subsets It I
improves generalization reduces memory requirements & training time

## Train each decision tree t on subset It

same decision tree learning as before

Multi-core friendly
Subsets can be chosen at random or hand-picked Subsets can have overlap (and usually do) Could also divide the feature pool into subsets

## Learning a Forest Pseudo Code

Forest LearnDF(countTrees, I) // allocate memory let forest = Forest(countTrees) // loop over for t = 1 to let I_t = forest[t] end trees in forest countTrees RandomSplit(I) = LearnDT(I_t)

end

## [Moosmann et al. 06]

[Sivic et al. 03] [Csurka et al. 04]

Visual words good for e.g. matching, recognition but k-means clustering very slow
Randomized forests for clustering descriptors
e.g. SIFT, texton filter-banks, etc.

## Leaf nodes in forest are clusters

concatenate histograms from trees in forest

tree t1

tree tT

1
2 3 4

5 1 2 8 9 3 4 5 6 7

tree t1

3 4 5 1 2 8 9
3

2
6 7 8 9

tree t1
frequency

tree tT

bag of words

## well see later how to use whole tree hierarchy!

node index

very unbalanced tree

## [Viola & Jones 04]

good for unbalanced binary problems e.g. sliding window object detection

Randomized forests
less deep, fairly balanced ensemble of trees gives robustness

## good for multi-class problems

Random Ferns
Nave Bayes classifier over random sets of features
Bayes rule nave Bayes
individual features

random ferns
set of features

Short Pause

Outline

## Tutorial on Randomized Decision Forests Applications to Vision Problems

keypoint recognition object segmentation [Lepetit et al. 06] [Shotton et al. 08]

## Fast Keypoint Recognition

Wide-baseline matching as classification problem

## Extract prominent key-points in training images

Forest to classifies:
patches -> keypoints

Features
pixel comparisons

## Augmented training set

gives robustness to patch scaling, translation, rotation

## Fast Keypoint Recognition

Example videos
from http://cvlab.epfl.ch/research/augm/detect.php

## Aim a better visual vocabulary

image categorization does this image contain cows, trees, etc.? object segmentation draw and label the outlines of the cow, grass, etc.

Design goals
fast and accurate use learned semantic information

## Object Recognition Pipeline

extract features
SIFT, filter bank

clustering
k-means

assignment
nearest neighbour

hand-crafted

unsupervised

## classification algorithm supervised

SVM, decision forest, boosting

## Object Recognition Pipeline

Semantic Texton Forest (STF)
decision forest for both clustering & classification tree nodes have learned object category associations
local classification

STF
clustering semantic textons

## classification algorithm supervised

SVM, decision forest, boosting

test image

## Semantic Texton Forest (STF)

decision forest for both clustering & classification tree nodes have learned object category associations

STF
clustering semantic textons local classification

## Support Vector Machine (SVM)

pyramid match kernel in learned tree hierarchies

## Segmentation Forest (SF) building dog road

image categorization

second decision forest features use layout & context semantic context

object segmentation

test image

## Semantic Texton Forest (STF)

decision forest for both clustering & classification tree nodes have learned object category associations

STF
semantic textons (clustering) local classification

## Support Vector Machine (SVM)

pyramid match kernel in learned tree hierarchies

## Segmentation Forest (SF) building dog road

image categorization

## second decision forest features use layout & context

object segmentation

## Textons & Visual Words

Textons [Julesz 81]
computed densely clustered filter-bank responses used for object recognition e.g. references [Malik 01] [Varma 05] [Winn 05] [Shotton 07] [Mikolacjzyk 04] [Lowe 04] [Sivic 03] [Csurka 04]

Visual words
usually computed sparsely clustered descriptors used for object recognition

extract features
filter bank local descriptors
e.g. [Lowe 04]

clustering
k-means

assignment
nearest neighbour

## Semantic Texton Forests (STF)

A STF is
a decision forest applied at each image pixel simple pixel-based features

## How is this new?

no descriptors or filter-banks
decision forest fast clustering & assignment local classification learned semantic information

p i
i

## Pixel i gives patch p

(21x21 pixels in experiments)

f(p) >
?

learned threshold

Input Image

Ground Truth

A[b] > 98

Example Patches

## Leaf Node Visualization

Average of all training patches at each leaf node

## STF Training Examples

Supervised training

Regular grid

## (GT colors categories)

Random transformations
learn invariances [Lepetit et al. 06]

## Different Levels of Supervision

STF can be trained with:
no supervision (just the images) clustering only no local classification

## Balancing the Training Set

Datasets often unbalanced
poor average class accuracy
Flower Sign Bike Car Face Water Building Bird Chair Book Road Cat Dog Body

Boat

Airplane

Grass

## Semantic Textons & Local Classification

test image

semantic textons
(color leaf node index)

local classification
(color most likely category)

comparable
ground truth (for reference)

## MSRC Nave Segmentation Baseline

Use only local classification P(c|l) from STF

STF

## Bags of Semantic Textons (BoSTs)

semantic textons image
(colors leaf node indices)

local classification
(colors categories)

region r

tree t1
frequency
leaf node split node

tree tT
probability

all trees

node index

depth: 2

object category

region prior

## Choice of Regions for BoSTs

Image categorization
region r = whole image

Object segmentation
many image regions r

r1

r2 i

r3

....

## Other Clustering Methods

Efficient codebooks
Hyper-grid clustering Hierarchical k-means

## [Jurie et al. 05]

[Tuytelaars et al. 07] [Nister & Stewnius 06]

Discriminant embedding

## Randomized clustering forests [Moosmann et al. 06]

tree hierarchy not used ignores classification of forest uses expensive local descriptors

test image

## Semantic Texton Forest (STF)

decision forest for both clustering & classification tree nodes have learned object category associations

STF
semantic textons (clustering) local classification

## Support Vector Machine (SVM)

pyramid match kernel in learned tree hierarchies

## Segmentation Forest (SF) building dog road

image categorization

## second decision forest features use layout & context

object segmentation

Image Categorization
SVM with learned Pyramid Match Kernel (PMK)
descriptor space image location space [Grauman et al. 05] [Lazebnik et al. 06]

New PMK acts on semantic texton histogram matches P and Q in learned hierarchical histogram space

## deeper node matches are more important

increased similarity at depth d

norm.

depth weight

## Categorization Experiments on MSRC

Learned PMK vs. radial basis function (RBF)
RBF learned PMK Mean AP 49.9 76.3

## Mean Average Precision

Number of Trees T

test image

## Semantic Texton Forest (STF)

decision forest for both clustering & classification tree nodes have learned object category associations

STF
semantic textons (clustering) local classification

## Support Vector Machine (SVM)

pyramid match kernel in learned tree hierarchies

## Segmentation Forest (SF) building dog road

image categorization

## second decision forest features use layout & context

object segmentation

Segmentation Forest
Object segmentation
building bicycle

## Adapt TextonBoost [Shotton et al. 06]

boosted classifier textons randomized decision forest semantic textons + region priors

frequency

## semantic texton bin

node index

bin count

>
?

learned threshold

probability

offset rectangle r

or

object category

## How the Features Work

Rectangles pair with semantic textons can capture
appearance, layout, textural context [Shotton et al. 07]
i1 i2

r1

t1

i3

input image

## feature1 = (r1, t1)

r2
t2

feature1 responses
i4

## feature2 = (r2, t2)

feature2 response

## Features in Segmentation Forest

Learning the randomized forest
regular grid (10x10 pixels) discriminative pairs of region r and BoST bin

Region prior allows semantic context sheep tend to stand on grass Efficient calculation
compute bins only as required use integral images [Viola & Jones 04] sub-sample integral images

## Image-Level Prior (ILP)

Combine
image categorization (SVM with learned PMK) object segmentation (decision forest)

weighting

ILP

building bicycle

grass flower

tree sign

cow bird

sheep book

sky chair

water cat

face dog

car body

boat

building bicycle

grass flower

tree sign

cow bird

sheep book

sky chair

water cat

face dog

car body

boat

## MSRC Quantitative Comparison

Pixel-wise segmentation accuracy (%)
Average 58 64 63 67 Global 71 68 72
Airplane Building Bicycle Flower Sheep Water Grass Chair

Book

Body

## [Shotton 06] [Verbeek 07] SF SF + ILP

62 98 86 58 50 83 60 53 74 63 75 63 35 19 92 15 86 54 19 62 7 52 87 68 73 84 94 88 73 70 68 74 89 33 19 78 34 89 46 49 54 31 41 84 75 89 93 79 86 47 87 65 72 61 36 26 91 50 70 72 31 61 14 49 88 79 97 97 78 82 54 87 74 72 74 36 24 93 51 78 75 35 66 18

red = winner

Computation time
[Shotton 06] [Verbeek 07] SF

## Training Time 2 days 1 hr 2 hrs

Test Time 30 sec / image 2 sec / image < 0.125 sec / image

Boat

Face

Cow

Tree

Sign

Bird

Dog

Sky

Car

Cat

## MSRC Quantitative Comparison

Pixel-wise segmentation accuracy (%)
Average Global
Airplane Building Bicycle Flower Sheep Water Grass Chair

Book

Body

## [Shotton 06] [Verbeek 07] SF SF + ILP [Tu 08]

62 98 86 58 50 83 60 53 74 63 75 63 35 19 92 15 86 54 19 62 7 52 87 68 73 84 94 88 73 70 68 74 89 33 19 78 34 89 46 49 41 84 75 89 93 79 86 47 87 65 72 61 36 26 91 50 70 72 31 49 88 79 97 97 78 82 54 87 74 72 74 36 24 93 51 78 75 35 69 96 87 78 80 95 83 67 84 70 79 47 61 30 80 45 78 68 52

71 58 54 31 - 64 61 14 68 63 66 18 72 67 67 27 78 69 red = winner

Computation time
[Shotton 06] [Verbeek 07] SF [Tu 08]

## Training Time 2 days 1 hr 2 hrs a few days

Test Time 30 sec / image 2 sec / image < 0.125 sec / image 30-70 sec / image

Boat

Face

Cow

Tree

Sign

Bird

Dog

Sky

Car

Cat

## MSRC Influence of Design Decisions

average accuracy 64.1% 65.5% 66.1% 66.9% 64.4% 64.2% 64.6%

only leaf node bins all tree node bins only region prior bins full model no transformations unsupervised STF weakly supervised STF
MSRC dataset Category average accuracy

## VOC 2007 Segmentation Results

person table chair dog

## [Brookes] SF SF + Image-Level Prior [TKK] SF + Detection-Level Prior

Average Accuracy 9 20 24 30 42

Detection-Level Prior
[TKK] detection bounding boxes as segmentation prior

## Driving Video Database

test image

ground truth(!)

STF + SF result

[Brostow, Shotton, Fauqueur, Cipolla, ECCV 2008] new structure-from-motion cues can improve object segmentation

## Semantic Texton Forests Summary

Semantic texton forests
effective alternative to textons for recognition

## Image categorization improves segmentation

image-level prior can use identical image features

Memory
high memory requirements for training

Efficiency
very fast on CPU

## Amit & Geman

Shape Quantization and Recognition with Randomized Trees. Neural Computation 1997.

Moosmann et al.
Fast Discriminative Visual Codebooks using Randomized Clustering Forests. NIPS 2006. Scalable Recognition with a Vocabulary Tree. CVPR 2006. Fast Keypoint Recognition in Ten Lines of Code. CVPR 2007. To appear ECCV 2008. Semantic Texton Forests for Image Categorization and Segmentation. CVPR 2008. TextonBoost for Image Understanding: Multi-Class Object Recognition and Segmentation by Jointly Modeling Texture, Layout, and Context. IJCV 2007. Video Google: A Text Retrieval Approach to Object Matching in Videos. ICCV 2003. Sharing visual features for multiclass and multiview object detection. PAMI 2007. Auto-context and Its application to High-level Vision Tasks. CVPR 2008. Vector Quantizing Feature Space with a Regular Lattice. ICCV 2007. A statistical approach to texture classification from single images. IJCV 2005. Region Classification with Markov Field Aspect Models. CVPR 2007. Robust Real-time Object Detection. IJCV 2004. Object Categorization by Learned Universal Visual Dictionary. ICCV 2005.

Bosch et al.
Image Classification using Random Forests and Ferns. ICCV 2007. Random Forests. Machine Learning Journal 2001. To appear ECCV 2008. Visual Categorization with Bags of Keypoints. ECCV Workshop on Statistical Learning in Computer Vision, 2004. Extremely Randomized Trees. Machine Learning 2006. The Pyramid Match Kernel: Discriminative Classification with Sets of Image Features. ICCV 2005. Discriminant Embedding for Local Image Descriptors. ICCV 2007. Creating Efficient Codebooks for Visual Recognition. ICCV 2005. Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories. CVPR 2006. Keypoint Recognition using Randomized Trees. PAMI 2006. Distinctive image features from scale-invariant keypoints. IJCV 2004. Contour and Texture Analysis for Image Segmentation. IJCV 2001. Scale and Affine invariant interest point detectors. IJCV 2004.

## Nister & Stewenius zuysal et al. Sharp

Breiman Brostow et al. Csurka et al. Geurts et al. Grauman & Darrel Hua et al. Jurie & Triggs Lazebnik et al. Lepetit et al. Lowe

## Sivic & Zisserman

Torralba et al. Tu

Tuytelaars & Schmid Varma & Zisserman Verbeek & Triggs Viola & Jones Winn et al.

## Randomized decision forests are

very fast
(GPU friendly [Sharp, ECCV 08])

## Ideas for more research

biasing the randomness

optimal fusion of trees from different modalities e.g. appearance, SfM, optical flow

http://jamie.shotton.org/work/presentations/ICVSS2008.zip

Thank You
jamie@shotton.org

Maximum Depth 14

More Results

test image

no ILP

with ILP

person

## sky sidewalk plant

mountain building

object classes

grass snow

car

Categories 1-5
1.0 0.8 1.0

Categories 6-10
0.8

Precision

0.6 0.4

Precision
0.0 0.2
grass

0.2
0.0 0.4 0.6 0.8
cow

1.0
sheep

0.0
sky

0.2

0.4

0.6
water

0.8
face

1.0
car

Recall
building tree

Recall
aeroplane

Categories 11-15
1.0 0.8

Categories 16-21
1.0
0.8

Precision

Precision

0.0
bicycle

0.2
flower

0.4

0.6
sign

0.8
bird

1.0
book

0.0
chair

0.2

0.4
cat

0.6
dog

0.8
body

1.0
boat

Recall

Recall