where wf req is a Hamming window of size 2K1 +1 and inter- IV. E XPERIMENTAL R ESULT
frame smoothed SAP can be shown as follows: For the performance evaluation of the proposed method,
K1
experiments has been conducted with audio samples recorded
qf rame (k, l) = wf rame (i)(k, l i). (3) over 1200 spoken sentences uttered in Korean by 12 person (5
i=K1 female, 7 male). The spoken words are captured by a mobile
Fianlly, a priori SAP is shown as phone which is located in 50cm away from a speaker. Noise
signals are captured in various noisy environment such as
(k, l)
q(k, l) = qlocal qf req (k, l)
qf rame (k, l). (4)
street, cafe, car, etc. and mixed with clean signal. In our task,
Now the multi-channel speech presence probability proposed ASR system based on deep neural network decoder (KALDI)
by [5] is shown as is used. The experimental results are shown in Table I.
1
q(k, l)
l)
(k, TABLE I
p(k, l) = 1 + 1 + (k, l) exp
1 q(k, l) l)
1 + (k,
E XPERIMENTAL RESULT WITH WORD RECOGNITION RATE
Using this probability, noise update smoothing parameter is According to this result, proposed method is superior to con-
calculated as ventional method. The main reason of the result is robustness
(k.l) = v + (1 v )p(k, l), (6) of nding directions. For proposed algorithm, Estimation of
TDOA and noise update smoothing parameter is all included
and use it to obtain a noise PSD matrix estimate at frame l as
in integrated speech presence probability calculation, it helps
follows:
to prevent false gain estimation. Furthermore, data-driven
vv (k, l) =
vv (k, l 1) + [1
(k.l) (k.l)]y(k, l)y(k, l)H . based single channel enhancement is likely to outperform
(7) conventional method. Then, we can conclude that proposed
Lastly, we obtain PMWF gain function hW (k, l) using esti- algorithm works well in any acoustic environment.
mated noise PSD matrix vv (k, l) and as
V. C ONCLUSION
1 (k, l)
yy (k, l) IN N
hW (k, l) = vv (8) In this paper, we propose a new speech enhancement
+ (k, l) for client-server distributed ASR architecture. The princi-
1
where (k, l) = tr[
vv (k, l)yy (k, l)] N .
pal contribution of this paper is combination the advan-
tages of a multi-microphone noise reduction technique and
III. S INGLE - CHANNEL DATA - DRIVEN APPROACH the single-microphone noise reduction technique. Parametric
Among single-channel speech enhancement algorithms, multi-channel speech enhancement algorithm helps improve
data-driven scheme is a method of training the noise reduction performance by using multi-channel speech presence probabil-
gain function by using a priori SNR and a posteriori SNR [4]. ity. In single-channel stage, the speech enhancement method
Looking in detail as follows, noise suppression gain is found was possible through the training by the output of multi-
by means of a training procedure using the speech from the channel speech enhancement. The performance of the pro-
training database. In each frame, for each frequency bin, we posed approach has been found to be superior to that of the
have a pair of a priori SNR and a posteriori SNR that falls conventional method through the ASR test.
into one of the parameters cells. A priori SNR and a posteriori
SNR pairs from different frequency bins and different frames R EFERENCES
can fall into the same parameter cell during the course of the [1] J. Cho, S. Lee and I. Hwang, A Practical Approach to Robust Speech
Recognition Using Two Microphones in Driving Environments, in Proc.
training. To each of those a priori SNR and a posteriori SNR Audio Engineering Society Convention 137, Los Angeles, USA, 2014.
pairs corresponds a clean amplitude A and a noisy amplitude [2] W. Cui, J. Cho and S. Lee, A robust TDOA estimation method for in-
R. Those are collected and after all the train signals are car-noise environments, in Proc. Interspeech2014, Singapore, 2014.
[3] M. Souden, J. Chen, J. Benesty and S. Affes, On optimal frequency-
processed, the optimal value of Gij for parameter cell (i, j) domain multichannel linear ltering for noise reduction, IEEE Trans.
is found by minimizing a distortion measure of interest. We Audio, Speech, Lang. Process., vol.18, no. 2, pp. 260-276, Feb. 2010.
consider the following distortion measure that is Weighted- [4] J. Erkelens, J. Jensen and R. Heusdens, A data-driven approach to opti-
mizing spectral speech enhancement methods for various error criteria,
Euclidean distortion measure [6] Speech Commun., vol. 49, pp. 530-541, 2007.
Mij Mij [5] M. Souden, J. Chen, J. Benesty and S. Affes, Gaussian model-based
Gij = ( Ap+1
ij (m)Rij (m))/( Apij (m)Rij (m)) (9) multichannel speech presence probability, IEEE Trans. Audio, Speech,
Lang. Process., vol. 18, no. 5, pp. 1072-1077, Jul. 2010.
m=1 m=1 [6] P.C. Loizou, Speech enhancement based on perceptually motivated
where Rij (m) is the mth noisy amplitude that fell into Bayesian estimators of the magnitude spectrum, IEEE Trans. Speech,
Audio. Process., vol. 13, no. 5, pp. 857-869, Sep. 2005.
parameter cell (i, j) and Aij (m) is the corresponding clean
amplitude.