Anda di halaman 1dari 26

Alaa Spaih

Abeer Abu-Hantash Directed by Dr.Allam Mousa

Outline for Today


1. 2. 3. 4. 5.
Speaker Recognition Field

System Overview
MFCC & VQ

Experimental Results
Live Demo

Speaker Recognition Field


Speaker Recognition

Speaker Verification

Speaker Identification

Text Dependent

Text Independent

Text Dependent

Text Independent

System Overview

Training
mode

Speaker modeling Speaker Model Database

Feature extraction Testing


Speech input

Feature Matching

Mode Decision Logic Speaker ID

Feature Extraction
Feature extraction:is a special form of dimensionality reduction. The aim: is to extract the formants.

Feature Extraction
The extracted features must have specific characteristics:
Easily measurable, occur naturally and frequently in speech.
Not change over time. Vary as much among speakers, consistent for each speaker. Not affected by: speaker health, background noise.

Many algorithms to extract them: LPC,LPCC,HFCC,MFCC. We used Mel Frequency Cepstral Coefficients algorithm: MFCC.

Feature Extraction Using MFCC


Input speech Framing and windowing Fast Fourier transform Absolute value Mel scaled-filter bank Log
Feature vectors

Discrete cosine transform

Framing And Windowing

FFT

Spectrum

Vocal tract

Glottal pulse

Mel Scaled-Filter Bank

Spectrum

Mel spectrum

mel(f)= 2595*log10(1+f/700)

Cepstrum

Mel spectrum

MFCC Coeff.


DCT of the logarithm of the magnitude spectrum, the glottal pulse and the impulse response can be separated.

Classification
Classification, that is to build a unique model for each
speaker in the database.

Two major types of models for classification.

Stochastic models: GMM,HMM,ANN

Template models: VQ , DTW

We used VQ algorithm.

VQ Algorithm
The VQ technique consists of extracting a small number of
representative feature vectors.

The first step is to build a speaker-database consisting of N codebooks,


one for each speaker in the database. Clustered into codewords

Speaker Feature vectors

Speaker model (codebook)

K-means Clustering
start
No. of clusters k No centroids No change Distance objects to centroids yes End

Grouping based on minimum distance

VQ Example
Given data points, split into 4 codebook vectors with initial values at
(2,2),(4,6),(6,5),(8,8).

VQ Example
Once theres no more change, the feature space will be partitioned into
4 regions. Any input feature can be classified as belonging to one of the 4 regions. The entire codebook can be specified by the 4 centroid points.

K-means Clustering
If we set the codebook size to 8 then the output of the
clustering will be:
10 8 6 4 4 2 0 -2 -2 -4 -6 -8 -4 8 6

VQ

10

12

-6 0

10

12

MFCCs of a speaker (1000x12)

Speaker Codebook (8x12)

Feature Matching
For each codebook a distortion measure is computed. The speaker with the lowest distortion is chosen. Define the distortion measure Euclidean distance.

d ( x, y ) ( x i y i ) 2
2 i1

System Operates In Two Modes


Offline

Online
MFC C Featu re Extra ction Calcu late VQ Make Decis ion & Displ ay

Monit oring

Micro phone

Applications
Speaker Recognition for Authentication.
Banking application.
Forensic Speaker Recognition
Proving the identity of a recorded voice can help to convict a criminal or discharge an innocent in court.

Speaker Recognition for Surveillance.


Electronic eavesdropping of telephone and radio conversations.

Results
12 MFCC, 29 Filter banks, 64 Codebook size ELSDSR database. To show how the system identify the speaker according to Euclidean
distance calculation.

Sp 1 Sp 1 Sp 2 Sp 3 Sp 4 Sp 5 10.7492 13.2364 17.5438 16.1360 14.9324

Sp 2 13.2712 10.2740 16.1177 13.7095 15.7028

Sp 3 17.8646 13.2884 11.9029 15.5633 17.2842

Sp 4 14.7885 11.7941 16.2916 11.7528 17.8917

Sp 5 13.2859 14.0461 17.7199 16.7327 12.3504

Results
Number of MFCC Vs. ID rate.

Frame Size Vs. ID rate.


Frame size(10-30) ms Good

No. of MFCC 5 12 20

ID Rate 76 % 91 % 91 %

Above 30 ms

Bad

Results
The effect of the codebook size on the ID rate & VQ distortion.
14

ID rate (%)

96 94 92 90 88 86 84 82 0 50 100 150 200 250 300

Matching Score

100 98

12 10 8 6 4 2 0 0 50 100 150 200 250 300

Codebook Size

Codebook Size

Results
Number of filter-banks Vs. ID rate & VQ distortion.
120%
9 8

Matching Score
0 10 20 30 40 50

100%

7 6 5 4 3 2 1

ID rate (%)

80% 60% 40% 20% 0%

0 0 10 20 30 40 50

Number of Filters in Filter-Bank

Number of Filters in Filter-bank

Results
The performance of the system on different test shot
lengths.
Test speech length 0.2 sec 2 sec ID Rate 60 % 85 %
ID rate (%)
100 80 60 40 20 0 0 2 4 6 8 10 12

6 sec
10 sec

90 %
95 %

Test Speech Length (sec)

Summary
Effect of changing some parameters on: MFCC algorithm. VQ algorithm. Our system identify the speaker regardless of the language and the text. Satisfied results: The same training and testing environment. Test data needs to be several ten seconds.

Anda mungkin juga menyukai