SI Final PPT PDF

Alaa Spaih
Abeer Abu-Hantash Directed by Dr.Allam Mousa
Outline for Today

1. 2. 3. 4. 5.
Speaker Recognition Field
System Overview
MFCC & VQ
Experimental Results
Live Demo
Speaker Recognition Field

Speaker Recognition
Speaker Verification
Speaker Identification
Text Dependent
Text Independent
Text Dependent
Text Independent
System Overview
Training
mode
Speaker modeling Speaker Model Database
Feature extraction Testing

Speech input
Feature Matching
Mode Decision Logic Speaker ID
Feature Extraction
Feature extraction:is a special form of dimensionality reduction. The aim: is to extract the formants.
Feature Extraction
The extracted features must have specific characteristics:
Easily measurable, occur naturally and frequently in speech.
Not change over time. Vary as much among speakers, consistent for each speaker. Not affected by: speaker health, background noise.
Many algorithms to extract them: LPC,LPCC,HFCC,MFCC. We used Mel Frequency Cepstral Coefficients algorithm: MFCC.
Feature Extraction Using MFCC

Input speech Framing and windowing Fast Fourier transform Absolute value Mel scaled-filter bank Log
Feature vectors
Discrete cosine transform
Framing And Windowing
FFT
Spectrum
Vocal tract
Glottal pulse
Mel Scaled-Filter Bank
Spectrum
Mel spectrum
mel(f)= 2595*log10(1+f/700)
Cepstrum
Mel spectrum
MFCC Coeff.

DCT of the logarithm of the magnitude spectrum, the glottal pulse and the impulse response can be separated.
Classification
Classification, that is to build a unique model for each
speaker in the database.
Two major types of models for classification.
Stochastic models: GMM,HMM,ANN
Template models: VQ , DTW
We used VQ algorithm.
VQ Algorithm
The VQ technique consists of extracting a small number of
representative feature vectors.
The first step is to build a speaker-database consisting of N codebooks,

one for each speaker in the database. Clustered into codewords
Speaker Feature vectors
Speaker model (codebook)
K-means Clustering
start
No. of clusters k No centroids No change Distance objects to centroids yes End
Grouping based on minimum distance
VQ Example
Given data points, split into 4 codebook vectors with initial values at
(2,2),(4,6),(6,5),(8,8).
VQ Example
Once theres no more change, the feature space will be partitioned into
4 regions. Any input feature can be classified as belonging to one of the 4 regions. The entire codebook can be specified by the 4 centroid points.
K-means Clustering
If we set the codebook size to 8 then the output of the
clustering will be:
10 8 6 4 4 2 0 -2 -2 -4 -6 -8 -4 8 6
VQ
10
12
-6 0
10
12
MFCCs of a speaker (1000x12)
Speaker Codebook (8x12)
Feature Matching
For each codebook a distortion measure is computed. The speaker with the lowest distortion is chosen. Define the distortion measure Euclidean distance.
d ( x, y ) ( x i y i ) 2
2 i1
System Operates In Two Modes

Offline
Online
MFC C Featu re Extra ction Calcu late VQ Make Decis ion & Displ ay
Monit oring
Micro phone
Applications
Speaker Recognition for Authentication.
Banking application.
Forensic Speaker Recognition
Proving the identity of a recorded voice can help to convict a criminal or discharge an innocent in court.
Speaker Recognition for Surveillance.

Electronic eavesdropping of telephone and radio conversations.
Results
12 MFCC, 29 Filter banks, 64 Codebook size ELSDSR database. To show how the system identify the speaker according to Euclidean
distance calculation.
Sp 1 Sp 1 Sp 2 Sp 3 Sp 4 Sp 5 10.7492 13.2364 17.5438 16.1360 14.9324
Sp 2 13.2712 10.2740 16.1177 13.7095 15.7028
Sp 3 17.8646 13.2884 11.9029 15.5633 17.2842
Sp 4 14.7885 11.7941 16.2916 11.7528 17.8917
Sp 5 13.2859 14.0461 17.7199 16.7327 12.3504
Results
Number of MFCC Vs. ID rate.
Frame Size Vs. ID rate.

Frame size(10-30) ms Good
No. of MFCC 5 12 20
ID Rate 76 % 91 % 91 %
Above 30 ms
Bad
Results
The effect of the codebook size on the ID rate & VQ distortion.
14
ID rate (%)
96 94 92 90 88 86 84 82 0 50 100 150 200 250 300
Matching Score
100 98
12 10 8 6 4 2 0 0 50 100 150 200 250 300
Codebook Size
Codebook Size
Results
Number of filter-banks Vs. ID rate & VQ distortion.
120%
9 8
Matching Score
0 10 20 30 40 50
100%
7 6 5 4 3 2 1
ID rate (%)
80% 60% 40% 20% 0%
0 0 10 20 30 40 50
Number of Filters in Filter-Bank
Number of Filters in Filter-bank
Results
The performance of the system on different test shot
lengths.
Test speech length 0.2 sec 2 sec ID Rate 60 % 85 %
ID rate (%)
100 80 60 40 20 0 0 2 4 6 8 10 12
6 sec
10 sec
90 %
95 %
Test Speech Length (sec)
Summary
Effect of changing some parameters on: MFCC algorithm. VQ algorithm. Our system identify the speaker regardless of the language and the text. Satisfied results: The same training and testing environment. Test data needs to be several ten seconds.

SI Final PPT PDF

Diunggah oleh

Informasi Dokumen

Deskripsi Asli:

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

SI Final PPT PDF

Diunggah oleh

Hak Cipta:

Format Tersedia

Alaa Spaih

Abeer Abu-Hantash Directed by Dr.Allam Mousa

Outline for Today

Speaker Recognition Field

Speaker modeling Speaker Model Database

Feature extraction Testing

Mode Decision Logic Speaker ID

Feature Extraction Using MFCC

Discrete cosine transform

Framing And Windowing

Mel Scaled-Filter Bank

Two major types of models for classification.

Stochastic models: GMM,HMM,ANN

Template models: VQ , DTW

The first step is to build a speaker-database consisting of N codebooks,

Speaker Feature vectors

Speaker model (codebook)

Grouping based on minimum distance

MFCCs of a speaker (1000x12)

Speaker Codebook (8x12)

System Operates In Two Modes

Speaker Recognition for Surveillance.

Sp 1 Sp 1 Sp 2 Sp 3 Sp 4 Sp 5 10.7492 13.2364 17.5438 16.1360 14.9324

Sp 2 13.2712 10.2740 16.1177 13.7095 15.7028

Sp 3 17.8646 13.2884 11.9029 15.5633 17.2842

Sp 4 14.7885 11.7941 16.2916 11.7528 17.8917

Sp 5 13.2859 14.0461 17.7199 16.7327 12.3504

Frame Size Vs. ID rate.

96 94 92 90 88 86 84 82 0 50 100 150 200 250 300

12 10 8 6 4 2 0 0 50 100 150 200 250 300

80% 60% 40% 20% 0%

Number of Filters in Filter-Bank

Number of Filters in Filter-bank

Test Speech Length (sec)

Anda mungkin juga menyukai