System Overview
MFCC & VQ
Experimental Results
Live Demo
Speaker Verification
Speaker Identification
Text Dependent
Text Independent
Text Dependent
Text Independent
System Overview
Training
mode
Feature Matching
Feature Extraction
Feature extraction:is a special form of dimensionality reduction. The aim: is to extract the formants.
Feature Extraction
The extracted features must have specific characteristics:
Easily measurable, occur naturally and frequently in speech.
Not change over time. Vary as much among speakers, consistent for each speaker. Not affected by: speaker health, background noise.
Many algorithms to extract them: LPC,LPCC,HFCC,MFCC. We used Mel Frequency Cepstral Coefficients algorithm: MFCC.
FFT
Spectrum
Vocal tract
Glottal pulse
Spectrum
Mel spectrum
mel(f)= 2595*log10(1+f/700)
Cepstrum
Mel spectrum
MFCC Coeff.
DCT of the logarithm of the magnitude spectrum, the glottal pulse and the impulse response can be separated.
Classification
Classification, that is to build a unique model for each
speaker in the database.
We used VQ algorithm.
VQ Algorithm
The VQ technique consists of extracting a small number of
representative feature vectors.
K-means Clustering
start
No. of clusters k No centroids No change Distance objects to centroids yes End
VQ Example
Given data points, split into 4 codebook vectors with initial values at
(2,2),(4,6),(6,5),(8,8).
VQ Example
Once theres no more change, the feature space will be partitioned into
4 regions. Any input feature can be classified as belonging to one of the 4 regions. The entire codebook can be specified by the 4 centroid points.
K-means Clustering
If we set the codebook size to 8 then the output of the
clustering will be:
10 8 6 4 4 2 0 -2 -2 -4 -6 -8 -4 8 6
VQ
10
12
-6 0
10
12
Feature Matching
For each codebook a distortion measure is computed. The speaker with the lowest distortion is chosen. Define the distortion measure Euclidean distance.
d ( x, y ) ( x i y i ) 2
2 i1
Online
MFC C Featu re Extra ction Calcu late VQ Make Decis ion & Displ ay
Monit oring
Micro phone
Applications
Speaker Recognition for Authentication.
Banking application.
Forensic Speaker Recognition
Proving the identity of a recorded voice can help to convict a criminal or discharge an innocent in court.
Results
12 MFCC, 29 Filter banks, 64 Codebook size ELSDSR database. To show how the system identify the speaker according to Euclidean
distance calculation.
Results
Number of MFCC Vs. ID rate.
No. of MFCC 5 12 20
ID Rate 76 % 91 % 91 %
Above 30 ms
Bad
Results
The effect of the codebook size on the ID rate & VQ distortion.
14
ID rate (%)
Matching Score
100 98
Codebook Size
Codebook Size
Results
Number of filter-banks Vs. ID rate & VQ distortion.
120%
9 8
Matching Score
0 10 20 30 40 50
100%
7 6 5 4 3 2 1
ID rate (%)
0 0 10 20 30 40 50
Results
The performance of the system on different test shot
lengths.
Test speech length 0.2 sec 2 sec ID Rate 60 % 85 %
ID rate (%)
100 80 60 40 20 0 0 2 4 6 8 10 12
6 sec
10 sec
90 %
95 %
Summary
Effect of changing some parameters on: MFCC algorithm. VQ algorithm. Our system identify the speaker regardless of the language and the text. Satisfied results: The same training and testing environment. Test data needs to be several ten seconds.