Anda di halaman 1dari 24

GENDER VOICE RECOGNITION THROUGH SPEECH

ANALYSIS WITH HIGHER ACCURACY


neha13158@iiitd.ac.in

divya13153@iiitd.ac.in

Neha Jain
IIIT Delhi,GovindPuri,
Okhla ,Phase-3,NewDelhi-110058

Divya Kaushik
IIIT Delhi,Govindpuri,
Okhla,Phase-3,New Delhi-110058

Abstract: Gender Voice recognition is actually


a very important branch of research in the
field of acoustics and speech processing. The
human voice exhibits very interesting aspects
which have been analyzed and discussed in
this paper. This paper is about an
investigation on speech signals to devise a
gender classifier. Gender classification by
speech analysis basically aims to predict the
gender of the speaker by analyzing different
parameters of the voice sample. A database
consisting of 200 voice samples of celebrities,
both male and female, using an open source
tool
was
created.
The
short-time
autocorrelation and average magnitude
difference function (AMDF) was used
collectively by assigning some weightage
factor and a threshold is set on the basis of
the fact that male has lower fundamental
frequency(nearly 120Hz) as compared to
female(nearly 200Hz). It is a text and
language independent analysis Moreover
audacity tool is used for noise suppression
purpose.

of the speaker. This type of information is


termed as voice information. This gender
detection has application in many fields like
Sorting telephone calls by gender for
gender sensitive surveys.
Identifying the gender and removing
the gender specific components gives
higher compression rate and
enhanced the bandwidth.
In terms of acoustic properties, voice
information has been described by several
acoustic parameters. Two of these
parameters are of perceptual relevance:
the fundamental frequency (f0) and
the spectral formant frequencies.

Keywords: Audacity, Short time autocorrelation,


Center clipping, average magnitude difference
function (AMDF), cepstrum, harmonic product
spectrum, normalized cross correlation(NCCF),
Harmonic product spectrum, Linear predictive
coefficients, etc.

INTRODUCTION
In speech communication, the listener does
not only decode the linguistic message from
the speech signal, but at the same time he or
she also infers paralinguistic information
such as the age, gender and other properties

Male has both the frequencies lower than the


female. But formant frequency is something
related to the vowels and hence it is text
dependent and as our project is text
dependent therefore gender classification
here is done by using pitch/fundamental
frequency extracted from different methods.
Pitch is a very important feature which can
be obtained from different methods in both
time domain as well as frequency domain
and also by the combination of both time
domain and frequency domain. Time
domain methods include all those methods
in which we work directly on the speech
samples. The speech waveform is directly
analyzed and the methods like short time
autocorrelation, modified autocorrelation
through clipping technique, normalized
cross
correlation
function,
average
magnitude difference function, square
difference function etc. Similarly, in

frequency domain methods the frequency


content of the signal is initially computed
and then the information is extracted from
the spectrum. These methods include
harmonic product spectrum analysis,
Harmonic to sub harmonic ratio etc. There
are also some methods which do not come
under either time domain or frequency
domain like wavelets, LPC analysis etc.
In this paper, section I discusses about the
pitch as a feature and how it is used for the
gender voice identification. Section II
discusses about the modified autocorrelation
with clipping technique used for pitch
estimation, Section III discusses about
another time domain method i.e. average
magnitude difference function. In Section IV
we will see how collectively we work on
both these methods to get 99% accuracy and
we show our simulated results in the form of
table containing the database of 200 voice
samples 100 each of male and female and
their respective fundamental frequencies. In
last there is a section for appendix contains
flow chart and matlab code for the proposed
project.

I PITCH: AN IMPORTANT GENDER


CLASSIFIER
Pitch period is defined as the time interval
between two consecutive voiced excitation
cycles i.e. the distance in time from one
peak to the next peak. It is the fundamental
frequency of the excitation source. Hence an
an accurate pitch estimate calculator should
be used in an algorithm for gender
classification. Fundamental frequency (f0)
estimation is referred to as pitch detection
The main intuition for using the pitch period
comes from the fact that the average
fundamental frequency (reciprocal of pitch
period) for men is typically in the range of
85-185 Hz, whereas for women it is 165-200
Hz. However, there are several challenges

while using pitch period as the feature for


gender identification. First, a good estimate
of the pitch period can only be obtained
from voiced portions of a clean non-noisy
signal .Second ,an overlap of the pitch
values between male and female voices
naturally exists, thus making it a non-trivial
problem to solve. Nearly all information in
speech is in the range 200Hz to 8kHz.
Humans discriminate voices between males
and females according to the frequency.
Females speak with higher fundamental
frequencies than males. And this is the main
reason why we have mainly focused on
pitch detection algorithms for the gender
voice detection purposes in this paper.
As we told before, that to perform gender
voice detection, we have basically two
features only which are pitch or the
fundamental frequency (f0) and the other is
the spectral formant frequencies. Pitch
depends on the relative highness or lowness
of a tone as perceived by the ear.
It depends on the number of vibrations per
second produced by the vocal cords.
Fundamental frequency (F0) is an objective
estimation of pitch. Following section
discusses some of the methods by which we
can detect the pitch and capture some of the
hidden information about the gender of the
speaker from the pitch itself.
I.a PITCH DETECTION METHODS
A pitch detection algorithm (PDA) is
an algorithm designed
to
estimate
the pitch or fundamental
frequency of
a quasi-periodic or virtually periodic signal,
usually a digital recording of speech or a
musical note or tone. This can be done in
the time domain or the frequency domain or
in both the two domains. Now, as the name
suggests the time domain methods directly
work on the speech waveform and exploits
the periodicity of the waveform directly.
Due to these reasons, they give more

accurate results. On the other hand the


frequency domain methods, first transform
the signal to convert them into frequency
domain and then apply suitable methods to
find the pitch period.
Now, estimation of the pitch is not a single
handed problem, rather it requires deep
investigation of the waveform and
application of several operations on the
waveform , no matter it is done in time
domain or is done in the frequency domain.
There are several methods for pitch tracking
some of which are listed below:
Time Domain:(a) Modified
auto
correlation
method using clipping
(b) Average magnitude difference
function
(c) Zero crossings rate.
(d) Data reduction method (DARD)
(e) Parallel
processing
method
(PPROC)
Frequency domain:(a)Harmonic product spectrum
(b) Harmonic to sub harmonic ratio.
(c) Spectral Autocorrelation.
(d) Cepstrum
Hybrid methods:
(a) Wavelets
(b) LPC Analysis
Now, since we are using only the time
domain methods in this paper. The next
sections discuss about the method Average
magnitude difference function (AMDF) and
modified autocorrelation method using
clipping used for pitch tracking.

the autocorrelation function is generally


defined as:

II.
Modified Autocorrelation Method
using clipping (ACF)

III.
Average
Magnitude
Difference Function (AMDF)

Another commonly used method to estimate


pitch (fundamental frequency) is based on
detecting the highest value of the
Autocorrelation function. For a given
discrete time signal x(n), defined for all n,

AMDF is one of the conventionally used


algorithms, and is a variation on the
autocorrelation analysis. The AMDF
performed within each window is given by:-

Where, is the delay, N is the total no. of


samples in a short frame. And we know that
the autocorrelation of a periodic signal is
also periodic but the fact is that the speech
signal is not periodic for entire duration
(speech is quasi periodic). Therefore, we
will divide the speech signal into frames and
apply short time autocorrelation function for
all frames and calculate pitch for every
frame and then get the average pitch from all
the frames.
The autocorrelation function gives a
measure of the correlation of a signal with a
delayed copy of itself. In the case of voiced
speech, the main peak in short-time
autocorrelation function normally occurs at
a lag equal to the pitch-period. This peak is
therefore detected and its time position gives
the pitch period of the input speech.
However, due to the computational intensity
of the many multiplications required for the
computation of the autocorrelation function,
centre-clipping technique is applied to
eliminate the need for multiplication in
autocorrelation-based
algorithm.
This
involves suppressing values of the signal
between two adjustable clipping thresholds.
The detailed algorithm and matlab code
along with flowchart is given in appendix.

where s(i) is the samples of input speech.


The raw pitch period is estimated from each
voiced region as follows:

where max t and min t are respectively the


possible maximum and minimum value. The
motivation here is that no multiplications are
used, so this measure of the degree to which
data is periodic is well suited to special
purpose hardware.
By the expression for AMDF, we can easily
catch the fact that just as the time
autocorrelation provides us with the
maxima/peak values at the time shift equal
to one period. And from there the inverse of
one pitch period will give us the pitch
frequency/fundamental frequency (F0).Now,
exploiting the same fact for the AMDF we
can easily see that at the time shift equal to
one period the AMDF function will provide
us with the minima/nulls, and that is where
we can find the pitch period and hence the
pitch frequency. That is why, in this method
we search for the minimum value of AMDF
(t) for a given t within a frame of length L.
The detailed algorithm and the flowchart has
been described in the appendix A.

IV. Simulation and Result


We take the training sequences of 50 users
(25 male and 25 female) and on the basis of
observation it was noticed that the accuracy
of Modified autocorrelation method(ACF) is
95% whereas of AMDF is 85%.
And the proposed method is the combination
of these two called combo-classifier has an
accuracy of 99%. The good is built in such a

fashion that it will suppress the noise and


will increase the accuracy. In this 0.9
weightage is assigned to the ACF as it is
highly accurate and 0.1 weightage is
assigned to the AMDF method results in
99% accuracy. Moreover audacity tool is
also used for noise suppression purpose.
Flow chart of overall process is given in
appendix. Below is the output result in
tabular form consisting of pitch of 100
females and 100 males, from table we can
see for all females we have pitch greater
than equal to 200 and for male less than 200.

V. Future Work
Use formant frequency also to get command
over variation in vowels. Moreover, there
are some suspicious users whose pitch lies
between the overlapping region of the male
and female frequency range. The use of
GMM(Gaussian Mixture Modeling) for such
user will give the best model to determine
the gender of all types of users.
VI References
[1] A Comparative Performance Study of
Several Pitch Detection Algorithms IEEE
TRANSACTIONS
ON
ACOUSTICS,
SPEECH, AND SIGNAL PROCESSING,
VOL.
ASSP-24,NO.
5,
OCTOBER
1976LAWRENCE
R.
RABINER,
FELLOW, IEEE, MICHAEL J. CHENG,
STUDENT MEMBER, IEEE, AARON E.
ROSENBERG, MEMBER, IEEE, AND
CAROL A. Mc GONEGAL
[2] Gender Classification Using Pitch and
Formants Pawan Kumar, Nitika Jakhanwal,
Anirban Bhowmick, and Mahesh Chandra

Birla Institute of Technology Mesra Ranchi,


Jharkhand, India
[3] Pitch-based Gender Identification
with Two-stage Classification Yakun Hu,
Dapeng Wu, and Antonio Nucci Yakun Hu
and Dapeng Wu are with Department of
Electrical and Computer Engineering,
University of Florida, Gainesville, FL 32611
Correspondence author: Prof. Dapeng Wu,
wu@ece.ufl.edu, http://www.wu.ece.ufl.edu.
Antonio Nucci is with Narus, Inc., 570
Maude Court, Sunnyvale, CA 94085.
[4] Fast, Accurate Pitch Detection Tools
for Music Analysis by Philip McLeod, a
thesis submitted for the degree of Doctor of
Philosophy at the University of Otago,
Dunedin, New Zealand.30 May 2008

Table for output Results of males pitch


S.No.

Name of male celebrities

Pitch

1.

Tom Cruise

108.5767

2.

Tom Hanks

63.2045

3.

Jude Law

119.5053

4.

Bradely Coopers

138.3667

5.

Johny Depp

79.3707

6.

Robert Pattinson

116.6351

7.

Anil Kapoor

130.7395

8.

John Abrahim

137.2650

9.

Amitabh Bachan

97.6146

10.

Sharukh khan

148.6133

11.

Tushar Kapoor

194.8936

12.

Manoj bajpal

99.2207

13.

Sonu sood

158.2973

14.

Gulshan grover

181.6534

15.

Amrish puri

141.7098

16.

Manoj kumar

108.8971

17.

Dev anand

178.3067

18.

Amzad khan

165.1176

19.

Prem chopra

178.3925

20.

Raj Kapoor

158.1798

21.

Varun dhawan

138.0241

22.

Ashutosh

129.0241

23.

Sidharth Malhotra

167.0254

24.

Rishi kapoor

135.2410

25.

Hritik Roshan

140.2755

S.No.

Name of male celebrities

Pitch

26.

Gurudat

149.7733

27.

Prithvi Raj Kapoor

141.2599

28.

Ashok Kumar

197.4741

29.

Rajesh khanna

137.2349

30.

Shami kapoor

156.7412

31.

Nashrudin shah

159.6098

32.

Nana patekar

124.3021

33.

Raj Kumar

112.3169

34.

Ajit

123.3782

35.

Mithun Chakrwati

134.5500

36.

Dhermender

139.5424

37.

Vinod Kumar

168.2245

38.

Shatrughan Sinha

134.9560

39.

Sunny Deol

128.8951

40.

Sanjay Dutt

132.9483

41.

Ajay Devgan

144.9683

42.

Anupam Kher

149.3025

43.

Amir Khan

111.0893

44.

Jacky Schroff

198.0241

45.

Praan

125.2454

46.

Paresh

134.2558

47.

Sunil Shetty

157.7082

48.

Akshay Kumar

92.4166

49.

Abhishek Bachan

166.0036

50.

Akshay Kumar

167.0349

S.No.

Name of male celebrities

Pitch

51.

Salman Khan

143.2753

52.

Saif ali khan

195.4256

53.

Imran Hashmi

187.2138

54.

Jitnder

178.2451

55.

Rajnder Nath

111.4580

56.

Rakesh Roshan

154.2408

57.

Om puri

102.452

58.

Om prakash

130.544

59.

Sharman Joshi

198.4496

60.

Randhir Kapoor

102.993

61.

Jony leiver

173.4899

62.

Arshad varsi

145.000

63.

Dara singh

110.2458

64.

Chunkey pandey

189.1541

65.

Raj babbar

124.4880

66.

Mukesh khanna

135.1444

67.

Utpal dutt

137.0012

68.

Sekhar suman

125.7775

69.

Sanjeev kumar

167.0249

70.

Ranbeer kapoor

125.780

71.

Ranveer singh

145.9820

72.

Rajnikanth

110.2540

73.

Ronit roy

182.004

74.

Suriya

120.0701

75.

Jimmy shergil

124.3650

S.No.

Name of male celebrities

Pitch

76.

Bobby Deol

156.7514

77.

Kumar Gurav

111.5411

78.

Rajnder Kumar

145.1478

79.

Pradeep Kumar

179.2451

80.

Govinda

125.4789

81.

Asrani

124.7987

82.

Mehmood

198.4574

83.

Kishor Kumar

112.4571

84.

Balraaj Saini

164.1240

85.

Boman Irani

125.7850

86.

Mamooty

197.4450

87.

Faran Akhtar

125.5121

88.

Arjun Rampal

178.3582

89.

Shahid Kapoor

159.2540

90.

Randeep Hoda

123.4780

91.

Pankaj Kapoor

186.7716

92.

Vikram

156.4455

93.

Utpal Dutt

129.6633

94.

Mohnish Bhel

189.154

95.

Kulbhushan

132.2451

96.

Vivek oberoi

102.4511

97.

Rhul bose

130.1254

98.

Prakash raj

183.2412

99.

Ritesh deshmuh

100.2455

100.

Danny

126.7892

Table for output result of Females pitch


1

Sushmita sen

240.7523

Parineeti Chopra

216.6328

Hema malini

268.5971

Juhi Chawla

259.4355

Kareena Kapoor

211.4817

Smita Jaykar

226.5636

Madhuri Dixit

241.5194

Mahima chaudhary

249.0680

10

Sonali bender

226.8533

11

Rima sen

234.2131

12

Aishwarya rai

253.2921

13

Amisha patel

217.5891

15

Kate winslet

236.4374

16

Angelina Jolie

204.5421

17

Cameroon Diaz

187.0417

18

Emma stone

258.6801

19

Jennifer Lopez

258.6873

20

Miley cyrus

244.2267

21

Lara dutta

256.9634

22

Celina jaiteley

227.8148

23

Jennifer anniston

235.7534

24

Jaya bhaduri

252.7637

25

Asin

267.2349

26

Sonkashi sinha

240.7200

27

Amrita rao

238.1478

28

Rekha

253.7141

29

Alia bhatt

201.1405

30

Kajol

221.3216

31

Shilpa shetty

241.6334

32

Karisma kapoor

215.8584

33

Dimple kapadia

264.7761

34

Simi garewal

248.6547

35

Pooja bhatt

270.6949

36

Diya mirza

264.5704

37

Manisha koirala

267.8155

38

Sonam kapoor

238.9886

39

Aruna irani

200.0912

40

Asha parekh

213.5467

41

Esha deol

209.0812

42

Twinkle khanna

200.0145

43

Rehana sultana

289.1491

44

Ranjita sehgal

280.0145

45

Bindiya Goswami

278.8013

46

Shweta talwar

256.8715

47

Gauhar Khan

200.0156

48

Tanisha

276.8974

49

Farida jalal

278.1783

50

Sharmila tagore

200.0137

50

Deepika padukone

200.2193

51

Shamita shetty

210.9627

52

Ila arun

298.6543

53

Jaya pradha

213.7423

54

Smita pathak

245.9821

55

Rita bhaduri

230.0071

56

Divya dutta

210.9312

57

Bindu

208.9712

58

Malaika arora khan

200.0128

59

Johra sehgal

213.1897

60

Udita goswami

245.8124

61

Mala sinha

214.6621

62

Esha deol

210.6712

63

Priyanka chopra

236.7761

64

Raveena tandon

218.8812

65

Priety zinta

234.5526

66

Shabana azmi

260.0012

67

Asha Parekh

217.7714

68

Kamini Kaushal

245.8901

69

Divya Bharti

245.7761

70

Anjana Om Kashyap

267.1230

71

Nalini Jaywant

200.0012

72

Sulakshana Pandit

230.0124

73

Jia Khan

215.9871

74

Anie Hathway

250.8971

75

Rani Mukherjee

275.1200

76

Lindsey lohan

200.0178

77

Mughda Godse

210.9124

78

Bhoomika Chawla

256.1256

79

Dina Pathak

230.0901

80

Samira reddy

245.6712

81

Bipasha Basu

209.8712

82

Mamta Kulkarni

240.0900

83

Mandira Bedi

265.9815

84

Nargis

289.0156

85

Lalita Pawar

290.0012

86

Rambha

210.0123

87

Neha dhupia

290.1267

88

Nanda

200.0005

89

Shradha kapoor

219.0234

90

Ritu singh

210.1450

91

Sharone stone

234.8761

92

Shweta singh

230.0912

93

Smriti irani

267.9813

94

Kadra

265.0913

95

Rajeshwari

278.8912

96

Amrita singh

290.0135

97

Shweta talwar

288.8129

98

Mandakini

210.0135

99

Illiana Decruise

280.0129

100

Genelia dsouza

230.0913

Appendix
A. Flow charts of the algorithms used.
I. Modified autocorrelation method using clipping

II. Average Magnitude Difference Function

III. Flow chart of complete procedure

B. Matlab code
I. Modified autocorrelation method
I (a). pitch function
%input
%
x: original speech
%
fs: sampling rate
% Output:
% avgF0: average or final fundamental frequency
%In this code orignal speech file is segmented into frames and for each
%frame a autocorr function is called which give the pitch or fundamental
%frequency of each frame(pitch can be obtained by obtaining the peak of
%autocorrelation) and then we compute average fundamental freq from all the
%frames

function [avgF0] = pitch(x, fs)


x=x(:); %vectorize the speech signal
% get the number of samples
ns = length(x);
% error checking on the signal level, remove the DC bias
mu = mean(x);
x = x-mu;
% use a 60msec segment, choose a segment every 50msec
% that means the overlap between segments is 10msec
fRate = floor(120*fs/1000);
updRate = floor(110*fs/1000);
nFrames = floor(ns/updRate)-1;
% the pitch contour is then a 1 x nFrames vector
f0 = zeros(1, nFrames);
f01 = zeros(1, nFrames);
% get the pitch from each segmented frame
k = 1;
avgF0 = 0;
m = 1;
for i=1:nFrames %nframes means total no. of frames
xseg = x(k:k+fRate-1);
f01(i) = pitchacorr(fRate, fs, xseg); %f01 is the vector which contains
%pitch of all frames
% do some median filtering for every 3 frames so that less affected by the
noise
%if nframes<=3 i.e no. of frames is less than equal to 3 then no need to
%median filtering.
if i>2 & nFrames>3
z = f01(i-2:i); %median filtering when nframes>3
md = median(z);

f0(i-2) = md;
if md > 0
avgF0 = avgF0 + md;
m = m + 1;
end
elseif nFrames<=3 %no need of median filtering
f0(i) = a;
avgF0 = avgF0 + a;
m = m + 1;
end
k = k + updRate;
end
if m==1
avgF0 = 0;
else
avgF0 = avgF0/(m-1); %finally average f0 is calculated
end

pitchacorr function called in above code


% Pitch estimation using the autocorrelation method

function [f0] = pitchacorr(len, fs, xseg)


% LPF at 900Hz
[bf0, af0] = butter(4, 900/(fs/2));
xseg = filter(bf0, af0, xseg);
% find the clipping level, CL
i13 = len/3;
maxi1 = max(abs(xseg(1:i13)));
i23 = 2 * len/3;
maxi2 = max(abs(xseg(i23:len)));
if maxi1>maxi2
CL=0.68*maxi2;
else
CL= 0.68*maxi1;
end
% Center clip waveform, and compute the autocorrelation
clip = zeros(len,1);
ind1 = find(xseg>=CL);
clip(ind1) = xseg(ind1) - CL;
ind2 = find(xseg <= -CL);
clip(ind2) = xseg(ind2)+CL;

engy = norm(clip,2)^2; %energy of clipped waveform


RR = xcorr(clip); %compute autocorrelation function of clipped waveform
m = len;
% Find the max autocorrelation in the range 60 <= f <= 320 Hz
LF = floor(fs/320);
HF = floor(fs/60);
Rxx = abs(RR(m+LF:m+HF));
[rmax, imax] = max(Rxx);
imax = imax + LF;
f0 = fs/imax;
% Check max RR against V/UV threshold
silence = 0.4*engy;
if (rmax > silence) & (f0 > 60) & (f0 <= 320)
f0 = fs/imax;
else % -- its unvoiced segment --------f0 = 0;
end

II. Average Magnitude Difference Function


function [f] = PitchTimeAmdfCode(x,fs)
% number of results
Hop_length=2048;
Block_length=4096;
Blocks
= ceil (length(x)/Hop_length);
% compute time stamps
t
= ((0:Blocks-1) * Hop_length + (Block_length/2))/f_s;
% allocate memory
f
= zeros(1,Blocks);
% initialization
f_max
=
f_min
=
eta_min
=
eta_max
=
for (n = 1:Blocks)
i_start
i_stop

2000;
50;
round(f_s/f_max);
round(f_s/f_min);

= (n-1)*Hop_length + 1;
= min(length(x),i_start + Block_length - 1);

% calculate the amdf minimum


afAMDF
= amdf(x(i_start:i_stop), eta_max);
[fDummy, f(n)] = min(afAMDF(1+eta_min:end));
end

% convert to Hz
f

= f_s ./ (f + eta_min);

end
function [AMDF] = amdf(x, eta_max)
K
= length(x);
AMDF

= ones(1, K);

for (eta=0:min(K-1,eta_max))
AMDF(eta+1) = sum(abs(x(1:K-eta)-x(eta+1:end)))/K;
end
end

III. Combination of AMDF and ACF with weighting factor


function [f]= gender_det(x,fs)
f1=pitch(x,fs);
f2=PitchTimeAmdf(x,fs);
f=(0.9*f1)+(0.1*f2);
end