Anda di halaman 1dari 24

Automatic Speaker Verification

Zouhir Wakaf, PhD

Outline

Introduction Speaker identification vs verification Speaker verification overview The parts of a speaker verification system Evaluation of speaker verification performance Application Future Directions

Introduction
Extracting Information from Speech
Goal: Automatically extract information transmitted in speech signal

Speech Signal

Speech Speech recognition recognition

Words
How are you?

Speaker Speaker recognition recognition

Speaker identity
Dr. Ahmad

Speaker identification
Determine the speaker identity Selection between a set of known voices The user does not claim an identity Closed set identification Assume that all speakers are known to the system Open set identification Possibility that speaker is not among the speakers known to the system ?

Whose voice is this?

Speaker Verification
Synonyms: authentication, detection User claims an identity System task: Accept or reject identity claim The voice can come from outside the set of known speakers All speakers known: closed set Impostor: All voices but the true identity
Is this Ahmads voice?

Identification vs verification
Speaker1 Speaker1 Speaker2 Speaker2

Feature Feature extraction extraction

Speaker ID
decision decision

Speaker SpeakerN N Speaker Speaker Model Model Feature Feature extraction extraction Impostor Impostor Model Model

+ _
decision decision

> accept < reject

Speech Modalities
Application dictates different speech modalities: Text-dependent recognition Recognition system knows text spoken by person Examples: fixed phrase, prompted phrase Used for applications with strong control over user input Knowledge of spoken text can improve system performance Prompting may reduce risk of impostors using voice recordings Text-independent recognition Recognition system does not know text spoken by person Examples: User selected phrase, conversational speech Used for applications with less control over user input More flexible system but also more difficult problem Speech recognition can provide knowledge of spoken text

Speech for Identification


Speech is easily produced It does not require advanced input devices Can be applied using telephones, PCs Can be supplied with - password phrase
to improve security

- Personal knowledge

Speaker verification

Which features? How to model the speaker How to model the imposters How to make the decision to minimize probability of error

Phases of Speaker Verification System


Two distinct phases to any speaker verification system Enrolment Phase Enrolment speech for
each speaker Voiceprints (models) for each speaker

Ahmad Salma Verification Phase

Feature Feature extraction extraction

Model Model training training

Ahmad

Salma

Feature Feature extraction extraction

Verification Verification decision decision

Accepted!

Claimed identity: Salma

Features for Speaker Recognition


Humans use several levels of perceptual cues for speaker recognition There are no exclusive speaker identity cues Low-level acoustic cues (physical traits) most applicable for automatic systems Desirable attributes of features for an automatic system Occur naturally and frequently in speech Occur naturally and frequently in speech Practical Easily measurable Easily measurable Not Notchange changeover overtime timeor orbe beaffected affectedby byspeakers speakershealth health Robust Not Notbe beaffected affectedby byreasonable reasonablebackground backgroundnoise noisenor nor depend on specific transmission characteristics depend on specific transmission characteristics Secure Not Notbe besubject subjectto tomimicry mimicry

Features for Speaker Recognition

No feature has all these attributes Features derived from spectrum of speech have proven to be the most effective in automatic systems Typically: MFCCs

Speaker Models

Speaker models (voiceprints) represent voice biometric in compact and generalizable form Modern speaker verification systems use Hidden Markov Models (HMMs)
HMMs are statistical models of how a speaker produces sounds HMMs represent underlying statistical variations in the speech state (e.g., phoneme) and temporal changes of speech between the states. Fast training algorithms (EM) exist for HMMs with guaranteed convergence properties. h-a-d

Speaker Models
Form of HMM depends on the application Fixed Phrase Word/phrase models Open Semsame

Prompted phrases/passwords /s/ /i/

Phoneme models /x/

Text-independent

single state HMM

General speech

Text-independent speaker verification


The imposter model is built using speech from all speakers GMM with high number of mixture components The speaker model is built using speaker adaptation Relatively small amount of speech

Verification Decision
The decision is a 2-class hypothesis test H0: the speaker is an impostor H1: the speaker is indeed the claimed speaker. Statistic computed on test utterance S as likelihood ratio: =log Likelihood S came from speaker HMM Likelihood S did not come from speaker HMM
Speaker Speaker Model Model Feature Feature extraction extraction Impostor Impostor Model Model

+ _
decision decision

> accept < reject

Verification Performance Evaluating Speaker Verification Systems


There are many factors to consider in evaluating speaker verification systems
Channel and microphone characteristics Noise level and type Variability between enrolment and verification speech Fixed/prompted/user-selected phrases Free text Duration and number of sessions of enrolment and verification speech Size and composition

Speech quality Speech modality Speech duration Speaker population

DET-curve

Importance of the error types depend on application!

Applications
Transaction authentication Toll fraud prevention Telephone credit card purchases Telephone brokerage (e.g., stock trading)

Applications
Access control Physical facilities Computers and data networks

Applications
Monitoring Remote time and attendance logging Home parole verification Prison telephone usage

Applications

Information retrieval Customer information for call centers Audio indexing (speech skimming device)

Applications
Forensics Voice sample matching

Recorded threat

Suspect

Future Directions
Research will focus on using speaker recognition for more unconstrained, uncontrolled situations
Audio search and retrieval Increasing robustness to channel variability Incorporating higher-levels of knowledge into decisions

Speaker recognition technology will become an integral part of speech interfaces


Personalization of services and devices Unobtrusive protection of transactions and information

Anda mungkin juga menyukai