Introduction to Pattern Recognition: Key Concepts

Introduction to Pattern
Recognition
Center for Machine Perception
Czech Technical University in Prague
Vojtěch Franc
xfrancv@cmp.felk.cvut.cz
What is pattern recognition?
“The assignment of a physical object or event to one of

several prespecified categeries” -- Duda & Hart
 A pattern is an object, process or event that can be

given a name.
 A pattern class (or category) is a set of patterns
sharing common attributes and usually originating from
the same source.
 During recognition (or classification) given objects
are assigned to prescribed classes.
 A classifier is a machine which performs classification.
Examples of applications
• Handwritten: sorting letters by postal code,
input device for PDA‘s.
• Printed texts: reading machines for blind
• Optical Character people, digitalization of text documents.
Recognition (OCR) • Face recognition, verification, retrieval.

• Finger prints recognition.
• Speech recognition.
• Biometrics
• Medical diagnosis: X-Ray, EKG analysis.
• Diagnostic systems • Machine diagnostics, waster detection.
• Automated Target Recognition (ATR).

• Military applications • Image segmentation and analysis (recognition
from aerial or satelite photographs).
Approaches
 Statistical PR: based on underlying statistical model of

patterns and pattern classes.
 Structural (or syntactic) PR: pattern classes
represented by means of formal structures as
grammars, automata, strings, etc.
 Neural networks: classifier is represented as a
network of cells modeling neurons of the human brain
(connectionist approach).
Basic concepts
Pattern
 x1  Feature vector x  X
x  - A vector of observations (measurements).
 2  x
y   - x is a point in feature space X .
 
 xn 
Hidden state y Y
- Cannot be directly measured.
- Patterns with equal hidden state belong to the same class.
Task
- To design a classifer (decision rule) q : X  Y
which decides about a hidden state based on an onbservation.
Example
height Task: jockey-hoopster recognition.
 x1  The set of hidden state is Y  {H , J }

x   x The feature space is X  2
weight  2
Training examples {(x1 , y1 ), , ( x l , yl )}
Linear classifier: x2 yH
H if (w  x)  b  0
q(x)  
 J if (w  x)  b  0
yJ ( w  x)  b  0
x1
Components of PR system
Sensors and Feature Class

Pattern Classifier
preprocessing extraction assignment
Teacher Learning algorithm
• Sensors and preprocessing.

• A feature extraction aims to create discriminative features good for classification.
• A classifier.
• A teacher provides information about hidden state -- supervised learning.
• A learning algorithm sets PR from training examples.
Feature extraction
Task: to extract features which are good for classification.
Good features: • Objects from the same class have similar feature values.
• Objects from different classes have different values.
“Good” features “Bad” features

Feature extraction methods
Feature extraction Feature selection
 m1 
 m1  φ1  x1  m   x1 
m  x   2 x 
 2 φ2  2  2
 m3 
       
       
mk  φn  xn  mk   xn 
Problem can be expressed as optimization of parameters of featrure extractor φ(θ) .

Supervised methods: objective function is a criterion of separability
(discriminability) of labeled examples, e.g., linear discriminat analysis (LDA).
Unsupervised methods: lower dimesional representation which preserves important
characteristics of input data is sought for, e.g., principal component analysis (PCA).
Classifier
A classifier partitions feature space X into class-labeled regions such that
X  X 1  X 2    X |Y | and X 1  X 2    X |Y |  {0}
X1
X1 X3 X1
X2
X2 X3
The classification consists of determining to which region a feature vector x belongs to.
Borders between decision boundaries are called decision regions.
Representation of classifier
A classifier is typically represented as a set of discriminant functions
f i (x) : X  , i  1,  , | Y |
The classifier assigns a feature vector x to the i-the class if f i (x)  f j (x) j  i
f1 ( x )
f 2 ( x)
x max y
Feature vector  Class identifier
f|Y | (x)
Discriminant function
Bayesian decision making
• The Bayesian decision making is a fundamental statistical approach which

allows to design the optimal classifier if complete statistical model is known.
Definition Obsevations X A loss function W :Y  D  R

: Hidden states Y A decision rule q:X D
Decisions D A joint probability p(x, y )
Task: to design decision rule q which minimizes Bayesian risk
R(q)    p(x, y ) W(q(x), y )

yY xX
Example of Bayesian task
Task: minimization of classification error.

A set of decisions D is the same as set of hidden states Y.
0 if q(x)  y
0/1 - loss function used W(q(x), y )  
1 if q(x)  y
The Bayesian risk R(q) corresponds to probability of

misclassification.
The solution of Bayesian task is
p(x | y ) p( y )
q  arg minR(q)  y  arg max p( y | x)  arg max
* *
q y y p(x)
Limitations of Bayesian
approach
• The statistical model p(x,y) is mostly not known therefore
learning must be employed to estimate p(x,y) from training
examples {(x1,y1),…,(x,y)} -- plug-in Bayes.
• Non-Bayesian methods offers further task formulations:
• A partial statistical model is avaliable only:
• p(y) is not known or does not exist.
• p(x|y,) is influenced by a non-random intervetion .
• The loss function is not defined.
• Examples: Neyman-Pearson‘s task, Minimax task, etc.
Discriminative approaches
Given a class of classification rules q(x;θ) parametrized by θ

the task is to find the “best” parameter θ* based on a set of
training examples {(x1,y1),…,(x,y)} -- supervised learning.
The task of learning: recognition which classification rule is

to be used.
The way how to perform the learning is determined by a

selected inductive principle.
Empirical risk minimization
principle
The true expected risk R(q) is approximated by empirical risk
1 
R emp (q( x; θ))   W(q(x i ; θ), yi )
 i 1
with respect to a given labeled training set {(x1,y1),…,(x,y)}.
The learning based on the empirical minimization principle is
defined as
θ  arg min R emp (q(x; θ))
*
Examples of algorithms: Perceptron, Back-propagation, etc.

Overfitting and underfitting
Problem: how rich class of classifications q(x;θ) to use.
underfitting good fit overfitting
Problem of generalization: a small emprical risk Remp does not imply

small true expected risk R.
Structural risk
minimization principle
Statistical learning theory -- Vapnik & Chervonenkis.
An upper bound on the expected risk of a classification rule qQ

1 1
R(q)  R emp (q)  R str ( , h, log )
 
where  is number of training examples, h is VC-dimension of
class of functions Q and 1- is confidence of the upper bound.
SRM principle: from a given nested function classes Q1,Q2,…,Qm,

such that
h1  h2    hm
select a rule q* which minimizes the upper bound on the expected risk.
Unsupervised learning
Input: training examples {x1,…,x} without information about the
hidden state.
Clustering: goal is to find clusters of data sharing similar properties.
A broad class of unsupervised learning algorithms:

{x1 , , x } { y1 , , y}
Classifier Classifier q : X  Θ  Y
θ
Learning algorithm L : (X Y)  Θ

Learning
algorithm (supervised)
Example of unsupervised
learning algorithm
k-Means clustering:
Goal is to minimize
{x1 ,  , x } 
 i q( xi )
|| x
i 1
 m || 2
Classifier
y  q(x)  arg min || x  m i ||
i 1,, k
m3
θ  {m1 , , m k } m1
Learning algorithm
1
mi   x j , I i  { j : q(x j )  i}
| I i | jI i
m2
{ y1 , , y}
References
Books
Duda, Heart: Pattern Classification and Scene Analysis. J. Wiley & Sons, New
York, 1982. (2nd edition 2000).
Fukunaga: Introduction to Statistical Pattern Recognition. Academic Press, 1990.
Bishop: Neural Networks for Pattern Recognition. Claredon Press, Oxford, 1997.
Schlesinger, Hlaváč: Ten lectures on statistical and structural pattern recognition.
Kluwer Academic Publisher, 2002.
Journals
Journal of Pattern Recognition Society.
IEEE transactions on Neural Networks.
Pattern Recognition and Machine Learning.

Introduction to Pattern Recognition: Key Concepts

Diunggah oleh

Informasi Dokumen

Deskripsi Asli:

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Introduction to Pattern Recognition: Key Concepts

Diunggah oleh

Hak Cipta:

Format Tersedia

Introduction to Pattern

“The assignment of a physical object or event to one of

 A pattern is an object, process or event that can be

Recognition (OCR) • Face recognition, verification, retrieval.

• Automated Target Recognition (ATR).

 Statistical PR: based on underlying statistical model of

 x1  The set of hidden state is Y  {H , J }

Training examples {(x1 , y1 ), , ( x l , yl )}

Linear classifier: x2 yH

Sensors and Feature Class

Teacher Learning algorithm

• Sensors and preprocessing.

“Good” features “Bad” features

Problem can be expressed as optimization of parameters of featrure extractor φ(θ) .

• The Bayesian decision making is a fundamental statistical approach which

Definition Obsevations X A loss function W :Y  D  R

Task: to design decision rule q which minimizes Bayesian risk

R(q)    p(x, y ) W(q(x), y )

Task: minimization of classification error.

The Bayesian risk R(q) corresponds to probability of

Given a class of classification rules q(x;θ) parametrized by θ

The task of learning: recognition which classification rule is

The way how to perform the learning is determined by a

Examples of algorithms: Perceptron, Back-propagation, etc.

Problem: how rich class of classifications q(x;θ) to use.

underfitting good fit overfitting

Problem of generalization: a small emprical risk Remp does not imply

An upper bound on the expected risk of a classification rule qQ

SRM principle: from a given nested function classes Q1,Q2,…,Qm,

A broad class of unsupervised learning algorithms:

Anda mungkin juga menyukai