Anda di halaman 1dari 18

Perceptual WPT and time-adaptive level

thresholding based enhancement of


degraded speech
Presented by
Nitesh Kumar Chaudhary
Department of Electronics & Communication Engineering
The LNM Institute Of Information Technology, Jaipur
Under the Supervision of
Dr. Navneet upadhyay

Why speech
enhancement ?...
The presence of noise in speech can significantly reduce the intelligibility of speech and
degrade automatic speech recognition performance.
Reduction of noise has become an important issue in speech signal processing system, such as
speech coding and speech recognition system.
(a) Additive acoustic noise - such as the noise added to the speech signal when recorded in an
environment with noticeable background noise, like in an aircraft cockpit.
(b) Acoustic reverberation - results from the additive effect of multiple reflections of an
acoustic signal.
(c) Convolutive channel effects - resulting in an uneven or band-limited response, can result
when the communication channel is not modeled effectively for the channel equalizer to
remove the channel impulse response.

(d) Electrical interference


(e) Codec distortion - distortion caused by the coding algorithm due to compression
(f) Distortion introduced by recording apparatus - poor response of microphone

Keywords: Perceptual Wavelet packet transform (PWPT), Time adaptive Thresholding,


TEO, Probability of detection Pd and false alarm Pf, Masking.

Block Diagram

Perceptual Wavelet Packet


Transform :

The Wavelet Packet Transform (WPT) is one such time frequency analysis tools. It is a
transform that brings the signal into a domain that contains both time and frequency
information.

In wavelet analysis, a signal is split into an approximation and a detail. The approximation
is then itself split into a second-level approximation and detail, and the process is repeated.

In the corresponding Perceptual wavelet packet situation, each detail coefficient vector is
also decomposed into two parts using the same approach as in approximation vector
splitting and 17 critical bands are selected because for speech with 8 kHz sampling rate,
17 critical bands are required to cover the entire range of frequency

Noisy Signal Wavelet Packet Decomposition


0.4

(0,0)
0.3

(1,1)

Decomposition Level

(2,0)

(2,1)

(3,0)

(4,0)

(3,1)

(4,1)

(4,2)

(3,2)

(4,3)

(2,2)

(3,3)

(3,4)

(4,4) (4,5) (4,6) (4,7) (4,8) (4,9)

0.2

(2,3)

(3,5) (3,6) (3,7)

Signal Magnitude

(1,0)

0.1

-0.1

-0.2
(5,0) (5,1) (5,2) (5,3) (5,4) (5,5) (5,6) (5,7)

Wavelet Decomposition

-0.3

0.5

Sample Point

1.5

2
4

x 10

data1
data2
data3
data4
data5
data6
data7
data8
data9
data10
data11
data12
data13
data14
data15
data16
data17
data18
data19
data20
data21
data22
data23
data24
data25
data26
data27
data28
data29
data30
data31
data32

TEO & level dependent


thresholding

TEO is powerful non-linear operator which has been successfully used in various
speech applications, TEO can then be used to estimate the second moment angular
bandwidth of a signal and the moments of a signal duration and that of its spectrum.
TEO can determine the energy functions of quite complicated functions For a given
band limited signal, TEO introduced by Kaiser is given by

The time adaptive threshold selection for wavelet coefficients has been computed,
which takes care of varying noise time into account.

0.4

(0,0)
0.3

(1,1)

Decomposition Level

(2,0)

(2,1)

(3,0)

(4,0)

(3,1)

(4,1)

(4,2)

(3,2)

(4,3)

(2,2)

(3,3)

(3,4)

(4,4) (4,5) (4,6) (4,7) (4,8) (4,9)

0.2

(2,3)

(3,5) (3,6) (3,7)

0.1
Signal Magnitude

(1,0)

-0.1

-0.2
(5,0) (5,1) (5,2) (5,3) (5,4) (5,5) (5,6) (5,7)

Wavelet Decomposition

-0.3

0.5

Sample Point

1.5

2
4

x 10

Masking Construction:
For a selected band, mask is obtained by

The voice activity shape V(n) is calculated by

Time adaptive threshold


calculation :

Level 3
Noise Signal of level 3rd of Wavelet Tree
Node (3,5)

0.5
0
-0.5
-1

500

1000

1500

2000

Frequency in Hz

2500

3000

3500

Node (3,6)

0.5
0
-0.5
-1

500

1000

1500

2000

Frequency in Hz

0
-0.5
-1

500

1000

2500

3000

Signal Amplitude

0
-0.5
0

500

1000

1500

2000

Frequency in Hz

2500

3000

3500

2500

3000

3500

2500

3000

3500

2500

3000

3500

0
-0.5
0

500

1000

1500

2000

Frequency in Hz
Node (3,7)

0.5

2000

Frequency in Hz

0.5

-1

3500

1500

Node (3,6)

Node (3,7)

-1

0.5

1
Signal Amplitude

1
Signal Amplitude

Node (3,5)

1
Signal Amplitude

Signal Amplitude

Signal Amplitude

Denoised Signal of level 3rd of Wavelet Tree

0.5
0
-0.5
-1

500

1000

1500

2000

Frequency in Hz

Level 3, node by node denoising

Level 4
Denoised Signal Of Level 4th Of Wavelet Tree

Node (4,4)

1
0
-1

200

400

600

200

400

600

1
0
-1

200

400

600

1200

800
1000
Frequency in Hz
Node (4,6)

1200

800
1000
Frequency in Hz
Node (4,7)

1200

1400

1600

1400

1600

1400

1600

0
-1

200

400

600

1200

1400

1600

0
-1

200

400

800
1000
Frequency in Hz
Node (4,9)

1200

1400

200

400

600

800
1000
Frequency in Hz

200

400

600

800
1000
Frequency in Hz
Node (4,5)

1200

1400

1600

200

400

600

800
1000
Frequency in Hz
Node (4,6)

1200

1400

1600

-1

200

400

600

800
1000
Frequency in Hz
Node (4,7)

1200

1400

1600

200

400

600

800
1000
Frequency in Hz
Node (4,8)

1200

1400

1600

200

400

600

800
1000
Frequency in Hz
Node (4,9)

1200

1400

1600

200

400

600

800
1000
Frequency in Hz

1200

1400

1600

0
-1

0
-1

1
0

-1

1600

Amp

Amp

600

1
0
-1

Node (4,4)

1
Amp

Amp

800
1000
Frequency in Hz
Node (4,8)

1
0
-1

1
Amp

1
Amp

800
1000
Frequency in Hz
Node (4,5)

Amp

-1

Amp

1
0

Amp

Amp

Amp

Amp

Noise Signal Of Level 4th Of Wavelet Tree

1200

1400

1600

0
-1

Level 4, node by node denoising

Level 5
Noise Signal Of Level 5th Of Wavelet Tree

200
400
600
Frequency in Hz
Node (5,6)

-1

200
400
600
Frequency in Hz

800

200
400
600
Frequency in Hz
Node (5,7)

200
400
600
Frequency in Hz

Amp

800

200
400
600
Frequency in Hz
Node (5,4)

200
400
600
Frequency in Hz
Node (5,6)

200
400
600
Frequency in Hz

Level 5, node by node denoising

800

200
400
600
Frequency in Hz
Node (5,5)

800

200
400
600
Frequency in Hz
Node (5,7)

800

200
400
600
Frequency in Hz

800

1
0
-1

800

0
-1

200
400
600
Frequency in Hz
Node (5,3)

0
-1

800

0
-1

800

0
-1

0
-1

800

0
-1

800

Amp

200
400
600
Frequency in Hz
Node (5,5)

200
400
600
Frequency in Hz
Node (5,2)

Amp

Amp

0
-1

800

Amp

1
Amp

0
-1

Node (5,1)

0
-1

800

0
-1

800

Amp

Amp

200
400
600
Frequency in Hz
Node (5,4)

200
400
600
Frequency in Hz
Node (5,3)

Amp

0
-1

-1

800

Amp

Amp

200
400
600
Frequency in Hz
Node (5,2)

Amp

Node (5,0)

1
Amp

Amp

Amp

0
-1

Node (5,1)

Amp

Node (5,0)

Denoised Signal Of Level 5th Of Wavelet Tree

800

0
-1

Evaluation

To verify the effectiveness of the proposed algorithms, we compared the speech


detection and false-alarm probabilities
The proposed methods are all evaluated by receiver operating characteristic (ROC)
curves which show discriminative properties of VAD between noise-only and noisy
speech frames in terms of the Probability of Correct detection (Pd) and Probability of
false-alarm (Pf) such that

Performance Evaluation
20.6710 dB
shape-preserving
linear

0.01

10

Pd: Probability of detection

10

-0.01

10

-0.01

10

10

0.01

0.02

10
10
Pf: Probability of False alarm

0.03

10

0.04

10

Wavelet Filter type (filter


Length)

Probability Of Correct
Detection (Pd %)

Probability Of False Alarm


(Pf %)

Computation time
(CP)

Daubechies 2

86.4

15.6

2.872 s

Daubechies 4

89.3

11.7

2.884 s

Daubechies 8

91.8

9.2

3.023 s

Daubechies 10

94.3

5.7

3.074 s

Daubechies 12

94.5

5.5

3.898 s

Daubechies 14

94.8

5.2

3.899 s

References :

Shi-Huang Chen, HsinTe Wu, Yukon Chang and T.K. Truong Robust voice activity detection
using perceptual wavelet-packet transform and Teager energy operator in Pattern Recognition
Letters 28 (2007) 13271332.
Daubechies, I. (1992), Ten lectures on wavelets, CBMS-NSF conference series in applied
mathematics, SIAM Ed.
D. L. Donoho, I. M. Johnstone, Ideal Spatial Adaptation via Wavelet Shrinkage, Biometrika,
vol. 81, pp. 425-455, 1994.
S. Mallat, A theory for multiresolution signal decompo-sition: The wavelet representation,
IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 11, No. 7, pp. 674693,
July 1989.
M. Berouti, R. Schwartz, and J. Makhoul, Enhancement of speech corrupted by acoustic
noise, in Proc. IEEE ICASSP, Apr. 1979, pp. 208211.
Johnstone, I.M., Silverman, B.W., 1997. Wavelet threshold estimators for data with correlated
noise. J. Roy. Stat. Soc. B 59, 319351.
G. David Forney, Jr., Exponential error bounds for erasure, list, and decision feedback
schemes, Information Theory, IEEE Transactions on, vol. 14, no. 2, pp. 206220, Mar 1968.