Abstract—In this work, we are interested in developing a speech The noise level estimation is based on spectral entropy. In
denoising tool by using a discrete wavelet packet transform (DWPT). addition, the classical estimation technique based on MAD
This speech denoising tool will be employed for applications of supposes that the noise distribution is Gaussian. However, in
recognition, coding and synthesis. For noise reduction, instead of
practice this supposition is not forever valid. That’s way
applying the classical thresholding technique, some wavelet packet
nodes are set to zero and the others are thresholded. To estimate the non Sungwook has proposed a noise estimation technique based on
stationary noise level, we employ the spectral entropy. A comparison of spectral entropy employing intensity histogram of wavelet
our proposed technique to classical denoising methods based on packet coefficients for each node on adapted wavelet packet
thresholding and spectral subtraction is made in order to evaluate our tree. In this work, we try to employ their technique for
approach. The experimental implementation uses speech signals thresholding a part of wavelet packet nodes and the others are
corrupted by two sorts of noise, white and Volvo noises. The obtained
set to zero. In this paper, we compare our proposed technique to
results from listening tests show that our proposed technique is better
than spectral subtraction. The obtained results from SNR computation the wavelet packet thresholding method using a modified hard
show the superiority of our technique when compared to the classical thresholding function based on P law algorithm, and
thresholding method using the modified hard thresholding function spectral subtraction based denoising method. The comparison is
based on P law algorithm. based on two criteria, one is objective by computing SNR and
the other is subjective by making listening tests. The speech
Keywords—Enhancement, spectral subtraction, SNR, discrete signals are corrupted by two types of noises: a white noise and a
wavelet packet transform, spectral entropy Histogram car noise.
I. INTRODUCTION
II.WAVELET TECHNIQUES
271
World Academy of Science, Engineering and Technology 33 2007
sign( x)( x O ) if x !O Equation (3) represents the expression of the modified hard
THRSoft ® (3) thresholding with a parameter P and the employed threshold O .
¯0 if x dO P
In the reference [5], is chosen to be equal to 255. There are
different types of threshold, the constant or the variable ones. A
Equations (1) and (2) designate successively the expressions
global threshold was proposed in [7, 12]. It was defined as
of hard and soft thresholding functions, where O is the follow:
employed threshold.
O Vˆ 2 log(N ) (5 )
Oj Vˆ j 2 log N j (7)
^Mj1(t 2j1n)` ^ `
linear manner, the coefficients which are less than the threshold
j1
value in order to eliminate abrupt changes. Other thresholding of V and \j1(t 2 n) of W . Where
j 1 j1
functions could be selected such as
P law function [5]:
nZ nZ
«¬ »¼ ¸
°O¨ signx¸ si x O packets, this decomposition is applied to the detail
° ¨ P ¸ subspace W j . In this new representation, each node corresponds
° ¨ ¸
¯ © ¹ to a subspace W j
p
where j the depth is, and p is the number
272
World Academy of Science, Engineering and Technology 33 2007
of the nodes to the left of this particular node at the same depth bands where the voiced or unvoiced sound is dominant. It is well
and we have: known that the power of the speech signal is contained around
W jp W j2p1 W j2p1 1 (7) Where is the direct sum of the two the frequency of the first formant. For many vowels of male and
female voices, the statistic results indicate approximately that
detail subspaces. The relations of splitting [15] define the
2p
the frequency of the first formant doesn’t exceed 1 kHz and is
wavelet packet orthogonal bases of the subspaces W j 1 superior to 100Hz. In addition, the fundamental frequency of a
2 p 1 normal voice is localized between 80 and 500Hz.
and W j 1 . These relations are defined as follow:
In this paper, the used speech signals are sampled at 16 kHz
and the employed wavelet packet tree is shown by the Fig.3.
\ 2p
j 1 (t ) ¦ h ( n )\
n
p
j (t 2 j n ) (8)
0,0
\ 2 p 1
j 1 (t ) ¦ g ( n )\
n
p
j (t 2 j n ) (9)
V0 1,0 1,1
2,0 2,1 2,2 2,3
V1 W1
273
World Academy of Science, Engineering and Technology 33 2007
Step 1. Estimate spectral pdf through histogram of wavelet a. SNR after enhancement
packet coefficients for each node. The Histogram is composed Tables 1to 4 report the obtained values of the SNR after
of B bins. enhancement. The obtained results are given in these tables.
Step 2. Compute the normalized spectral entropy: TABLE III MALE VOICE CORRUPTED BY WHITE NOISE.
B
ENT (n) ¦ P log B ( P ) (12)
b 1 RSBi
DWPT based DWPT based Spectral
where n 1, 2, ..., N 0 , with N 0 as the number of the best denoising denoising subtraction based
technique by using technique by denoising
nodes. The probability P was defined as:
a P law using
spectral entropy
technique
thresholding
function
N . of wavelet packet coefficien ts in bin b
0
P
Node size in adapted wavelet packet tree -5 -2.46 -0.27 6.96
Step 3. Estimate the spectral magnitude intensity by histogram 0 2.37 3.08 10.04
and the standard deviation of noise for node dependent wavelet 5 6.60 8.11 13.15
thresholding. 10 9.64 11.47 15.51
15 11.29 13.36 17.14
A. Experimental evaluations
In this paper, an Arabic speech database was disposed. It
contains twenty sentences sampled at 16 kHz, ten pronounced TABLE IV FEMALE VOICE CORRUPTED BY WHITE NOISE.
by a male voice and the others by a female’s. These sentences
RSBi
are corrupted by a white noise and a Volvo car noise separately. DWPT based DWPT based Spectral
The SNR, before enhancement, named RSBi, varies from -5 to denoising technique denoising technique subtraction
15dB. by using a by based denoising
P law using
spectral entropy
technique
TABLE I FEMALE VOICE CORRUPTED BY VOLVO NOISE. thresholding
function
RSBi -5 -2.37 -1.17 8.76
DWPT based DWPT based Spectral
0 2.59 3.20 11.96
denoising denoising subtraction based
technique by using technique by denoising 5 7.21 8.95 14.15
a
P law using
spectral entropy
technique 10 11.19 13.18 15.34
15 13.56 18.24 16.42
thresholding
function
-5 0.50 1.76 7.99 The four tables show that the three denoising techniques
0 5.27 6.00 10.87 improve the signal to noise ratio. The results show also that the
5 9.52 10.69 12.84 spectral subtraction based method is better than the two
10 12.75 13.55 14.81 thresholding based denoising methods. We also remark that
15 14.43 17.97 15.97 results obtained by applying the DWPT based denoising
technique using the spectral entropy are better than those
obtained from the DWPT based denoising method
TABLE II MALE VOICE CORRUPTED BY VOLVO NOISE
using P law thresholding function.
RSBi
DWPT based DWPT based Spectral b. Listening tests
denoising denoising subtraction
technique by using technique by based denoising By making the listening tests, we compute the percentage of
a
P law using
spectral entropy
technique the recognized words by using the following formula:
thresholding
number of the recognized words
function u 100
-5 0.36 1.92 4.45 total number of the employed words
0 4.83 5.67 6.82
5 8.44 9.41 10.35 Fig, 2-a, 2-b and 2-c illustrate the recognized words
10 10.67 12.04 14.37 percentage vs SNR for the three techniques.
15 11.69 13.36 16.72
274
World Academy of Science, Engineering and Technology 33 2007
100
those obtained from spectral subtraction. Thus, when speaking
80
of intelligibility, the DWPT based denoising technique using
60
spectral entropy is better than the spectral subtraction based one.
DWPT based denoising The curves also show that when SNR is lower, the denoising
using a modified hard
40 thresholding technique using spectral entropy is better than that using
DWPT based denoising P law thresholding function, and we have the opposite when
20
using spectral entropy SNR is higher.
0
Spectral subtraction
-10 0 10 20 based denoising c. Speech signal representation:
SNR The figures 3, 4 and 5 are examples of speech denoising by
using the DWPT based denoising technique using spectral
(a) Case of female voice corrupted by car noise. entropy, and figures 6and 7are examples of speech enhancement
by employing the DWPT based denoising method using
Recognition percentage vs SNR P law thresholding function. The figure 8 is an example of
120
speech denoising method based on spectral subtraction.
R ecognition percentage
80 0
-1
60 DWPT based 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5
4
denoising using a x 10
40 Noisy speech signal
modified hard 1
thresholding
20 DWPT based 0
denoising using spectral
0 entropy -1
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5
4
-10 0 10 20 Spectral subtraction x 10
Enhanced speech signal
SNR based denoising 1
100 0
-1
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5
80 4
x 10
Noisy speech signal
60 DWPT based 1
denoising using a 0
40 modified hard
-1
thresholding 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5
20 DWPT based 4
x 10
denoising using spectral Enhanced speech signal
0 entropy 1
-10 0 10 20 0
Spectral subtraction
SNR based denoising -1
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5
4
x 10
(c) Case of male voice corrupted by White noise.
Fig. 4. Male voice corrupted by Volvo noise (SNR=10dB).
Fig, 2. (a), (b), (c) Words recognition percentage vs SNR.
275
World Academy of Science, Engineering and Technology 33 2007
1
Clean speech signal
1
0
0
-1
-1 0 1 2 3 4 5 6
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 4
4
x 10
x 10 1
Noisy speech signal
1
0
0
-1
-1 0 1 2 3 4 5 6
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 4
x 10
4
x 10 1
Enhanced speech signal
1
0
0
-1
0 1 2 3 4 5 6
-1
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 Fig, 8. male voice corrupted by Volvo noise (SNR=0dB).
4
x 10
These figures illustrate the time representations of the clean,
Fig, 5. Male voice corrupted by White noise (SNR=10dB).
the noisy and the enhanced speech signals. Figures 3 to 7 show
the superiority of the denoising technique based on DWPT using
spectral entropy when compared with the denoising technique
based on DWPT using the modified hard thresholding function
based on P law algorithm. Our proposed technique is very
efficient in speech denoising. In fact a more important amount of
noise was suppressed from non speech and speech segments
while preserving the majority of the speech signal. We remark
that there is a little difference between the enhanced speech
signal and the clean one. Thus there isn’t any sever distortion of
the denoised signal, when using our proposed technique. The
figure number 8 shows that spectral subtraction reduces a great
part of noise but introduces distortions on the enhanced speech
signal.
The figures 9, 10, and 11 represent respectively the
spectrograms of clean speech, noisy speech and enhanced
speech obtained by applying our proposed denoising technique.
The noise which corrupting the speech signal, is The F16
Fig. 6. Male voice corrupted by Volvo noise (SNR=10dB). cockipt with the SNR is equals to 5dB.
5
Freq (k Hz )
0
Fig, 7. Female voice corrupted by White noise (SNR=10dB). 0 0.5 1 1.5 2
Time (sec)
276
World Academy of Science, Engineering and Technology 33 2007
4
The results obtained from listening tests show also that when
SNR is lower the denoising technique using spectral entropy is
better than that using P law thresholding function, and we
3
have the opposite when SNR is higher.
2
REFERENCES
[1] Y. Ephraim and D. Malah, “Speech enhancement using a minimum mean
1 square error short time spectral amplitude estimator,” IEEE Trans. On
Acoust. Speech Signal Processing, vol. 32, no. 6, pp. 1109-1121, 1984.
[2] D.L. Donoho, “Denoising by soft thresholding,” IEEE Trans on
0 Information Theory, vol. 41, no. 3, pp. 613-627, 1995.
0 0.5 1 1.5 2
[3] I. M. Johnstone and B. W. Silverman, “Wavelet threshold estimators for
Time (sec)
data with correlated noise”, J. Roy. Statist. Soc. B, vol. 59, pp. 319-351,
1997.
[4] X. Huang, A. Acero, H. Hon, “Spoken Language Processing,” Prentice
Hall, p. 474, 2001.
Fig. 10. Spectrogram of the noisy speech.
[5] Sungwook Chang, Y. Kwon, Sung-il Yang, and I-jae Kim. “Speech
enhancement for non-stationary noise environment by adaptive wavelet
packet” IEEE Tans. pp. 561-564, 2000.
[6] S.S. Chen, Basis Pursuit, Phd Thesis, Standford University, November
8 1995.
[7] D. Donoho and I. M. Johnstone. ‘‘Ideal spatial Adaptation via Wavelet
7 Shrinkage’’ Biometrika, 41. pp. 425-455, 1994.
[8] Waleed H. Abdulla. “HMM-based techniques for speech segments
extraction”. ISSN 1058-9244/02/S8.00 © 2002-IOS Press.
6 [9] Mohammed BAHOURA and Jean ROUAT “Wavelet noise reduction:
application to speech enhancement”. CiteSeer, 2000.
[10] [V. Balakrishnan, Nash Borges and Luke Parchment. “Wavelet denoising
5 and speech enhancement,” Spring 2006.
[11] H. Sheikhzadeh and H. Reza Abutalebi. ‘‘An improved wavelet-based
F req (k Hz )
IV.CONCLUSION
In this study, a denoising method is developed under Matlab
and compared with conventional thresholding technique
using P law thresholding function and spectral subtraction
277
World Academy of Science, Engineering and Technology 33 2007
278