1
} ) ( { ) (
i
N E =
Estimate clean speech spectrum S
i
(), using Gain function G
i
() of
corrupted speech spectrum Y
i
() + estimated ():
) ( ) ( ) (
i i i
Y G S =
)) (
), ( ( ) (
i i
Y f G =
=
frames only - noise
) (
1
) (
M
i
Y
M
Magnitude Spectral Subtraction
Signal model:
Estimation of clean speech spectrum:
) (
,
) (
) ( ) ( ) (
i y
j
i
i i i
e Y
N S Y
=
+ =
[ ] ) (
) ( ) (
) (
,
j
i i
e Y S
i y
=
Spectral Subtraction
PS: half-wave rectification
[ ]
) (
) (
) (
1
) (
) ( ) (
) (
i
G
i
i i
Y
Y
e Y S
i
43 42 1
(
(
=
=
)) (
) ( , 0 max( ) (
)) ( , 0 max( ) (
=
i i
i i
Y S
G G
Power Spectral Subtraction
Signal model:
Estimation of clean speech spectrum:
{ } { } { }
2 2 2
2 2
) ( ) ( ) (
) ( ) ( ) (
i i i
i i i
N E Y E S E
N S Y
=
+ =
Spectral Subtraction
PS: half-wave rectification
( )
2
2
2
) (
) ( , 0 max ) (
i i i
Y S =
{ } { } { }
{ }
{ }
{ }
{ }
2 2
2
2
2
2 2 2
) ( ) (
) (
) (
1 ) (
) ( ) ( ) (
i i
i
i
i
i i i
G Y E
Y E
N E
Y E
N E Y E S E
=
(
(
=
=
Suppression Behavior
{ }
{ }
=
(
(
|
|
\
|
=
) (
1
1
) (
) (
1 ) (
2
2
2
i
i
i
i
Y E
N E
G
) (
i
G
) (
i
) (
i
G
Wiener Filter in Frequency Domain
Wiener Estimation
Goal: find linear filter G
i
() such that MSE
is minimized
Solution: The partial derivative of
2
) (
) ( ). ( ) (
48 47 6
i
S
Y G S E
i i i
( )( ) { } * ) ( ) ( ) ( ) ( ) ( ) ( ) (
) (
2
Y G S Y G S E S S E =
`
with respect to the real part of G
i
() which yields the condition:
and hence we have:
( )( ) { } * ) ( ) ( ) ( ) ( ) ( ) ( ) (
) (
2
i i i i i i i i
Y G S Y G S E S S E =
)
`
{ }
0
) ( Re
) (
) (
2
=
)
`
i
i i
G
S S E
{ }
{ }
{ }
{ }
{ } { }
{ } { }
{ }
2
2 2
2 2
2
2
2
) (
) ( ) (
) ( ) (
) (
) (
) (
) ( Re
i
i i
i i
i
i
i
i
Y E
N E Y E
N E S E
S E
Y E
S E
G
=
+
= =
Generalized Formula
Generalized magnitude squared spectral gain function
{ }
{ }
=
(
(
|
|
\
|
=
) (
1
1
) (
) (
1 ) (
2
2
2
i
i
i
i
Y E
N E
G
Practical heuristic form of spectral subtraction rule:
(
(
(
|
|
|
\
|
=
2
2
2
2
) (
) (
1 ) ( ) (
i
i i
Y
Y S
Suppression Behavior
{ }
{ }
=
(
(
|
|
\
|
=
) (
1
1
) (
) (
1 ) (
2
2
2
i
i
i
i
Y E
N E
G
) (
i
G
) (
i
) (
i
G
Ephraim-Malah Suppression Rule (EMSR)
( )
(
(
|
|
|
|
+
+
|
|
\
|
+
|
|
\
|
+
=
prio
post
prio
prio
post
SNR 1
SNR
SNR 1
SNR 1
SNR
SNR 1
1
2
) (
M
G
i
MMSE Estimation
with:
( )
(
(
|
|
\
+
+
prio
post
SNR 1
SNR 1 M
) (
) ( ) (
,0) )max(SNR - (1 ) ( SNR
1
) (
) (
) ( SNR
)
2
( )
2
( ) 1 ( ] [
2
1 1
post prio
2
post
1 0
2
+ =
=
(
+ + =
i i
i
Y G
Y
I I e M
modified Bessel functions
previous frame
Wiener Estimation
Power Spectral Subtraction
Magnitude Spectral Subtraction
Gain functions
Ephraim-Malah Suppr. Rule
= most frequently used in practice
Non-linear Estimation
Maximum Likelihood
Wiener Estimation
Interpretation
Power Spectral Subtraction method is interpreted as a time-
variant filter with magnitude frequency response:
The short-time energy spectrum |Y
i
()|
2
of noisy speech
signal is calculated directly. The noise level ()
2
is estimated signal is calculated directly. The noise level ()
2
is estimated
by averaging over many non-speech frames where the
background noise is assumed to be stationary.
Negative values resulting from spectral subtraction are
replaced by zero. This results into musical noise: a
succession of randomly spaced spectral peaks emerges in
the frequency bands -> the residual noise which is composed
of narrow-band components located at random frequencies
that turn on and off randomly in each short-time frame
magnitude subtraction
] [k y ] [ k s
magnitude subtraction
] [k y ] [ k s
Solutions
Flooring factor
Over-subtraction factor
SNR-dependent subtraction factor
Averaging estimated noise level over K frames
Reduce noise variance at each frequency: apply a simple
recursive first-order low-pass filter (using smoothing coef
p controlling bandwidth & time constant of the LP filter)
Solutions
) ( ) ( ) (
i i
Y G S =
- Magnitude averaging: replace Y
i
() in
calculation of G
i
() by a local average over
frames
probability that speech is present, given observation
i i
instantaneous average
- EMSR (p7)
- augment G
i
() with soft-decision VAD:
G
i
() P(H
1
| Y
i
()). G
i
()