discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/274387042
READS
13
2 AUTHORS:
Julian Palacino
Rozenn Nicol
Orange Labs
7 PUBLICATIONS 0 CITATIONS
SEE PROFILE
SEE PROFILE
http://acousticalsociety.org/
Page 1
INTRODUCTION
For several decades spatial audio has been only used by movies, music composers and researchers in laboratories
[1]. Because of their complexity, people have always been away from 3D audio techniques. Dedicated devices such
as microphone and loudspeaker arrays are expensive and cannot be used without some expertise of audio capturing
and reproduction. Nowadays the main barrier preventing a consumer solution from capturing spatial audio is the big
number of transducers needed to get an accurate 3D sound image [2]. In order to break down this barrier we propose
a new 3D audio recording set-up which is composed of a three-microphone array able to get the full 3D audio
information. A 2D version, consisting of a two-microphone array, is also available. The sound localization is based
on the transducer directivities and additional information to solve the angular ambiguity.
This paper will describe firstly the microphone set-up and its associated algorithm. Secondly the performances of
sound localization will be assessed.
Page 2
! !
! !"#
!"#
where the Spherical coordinates are defined by radius , azimuth angle , and elevation angle .
The directivity functions of the three microphones are [3]:
$%&' ( I
$%&' ( I
(4)
! ! ,
! ! ,
$%&' ( I
!"#
(5)
Since the direction is unchanged for any value of , radius is fixed to in following expressions.
The sound source produces a signal )* + at the basis origin. Assuming that each microphone is located at this
point, their output signals )%&' , are:
(6)
)%&' +( ( I $%&' ( I)* +
The signals -./01 ,( 2( I allow to getting three data:
1) The monophonic signal of the sound source, by using relations (5).a (5).b and (6)
(7)
)* + )%&' +( ( I
)%&' +( ( I
2) The elevation angle of the sound source, by using equations (5).c and (6)
)%&' +( ( I
(8)
5
!"#3 4
)* +( ( I
3) The azimuth angle of the sound source, by using relations (5).a (5).b
+( ( )%&'
+( (
)%&'
(9)
9
67) 3 8
)* + !
Alternately it is possible to use other directivity pattern microphones to reconstruct cardioid directivity virtually
(see Section: Synthesizing Virtual Cardioid Microphones).
The same method can be used on a 2D version using only the two microphones corresponding to the horizontal
plane. In this case an arbitrary elevation must be fixed. This introduces an error increasing with the angular
mismatch from the chosen elevation and the real position.
In this first step, source location is calculated using exclusively the microphone directivity pattern. However the
azimuth is estimated with a sign ambiguity (front-rear) due to the cosine of Equation (9). This ambiguity can be
solved by moving microphones perpendicularly to their pointing direction, which will be illustrated in the next
section.
< = ! !
> ! !"#
? !"# .
(10)
(11)
(12)
(13)
In frequency domain )%&' + becomes @%&' A (where A BC is the angular frequency with C the time
frequency) by applying a Fourier Transform DEF G
@%&' A( ( DEF)%&' +( ( G $%&' ( @* AH 3IJKL ,
(14)
(15)
with
Page 3
(16)
(17)
Consequently
and
N@ N@ A< < .
The time delay < < < between the two microphones is expressed by:
(18)
(19)
Since this result is used to solve the ambiguity in relation (9) only its sign is needed. It is inserted in this latter as:
)%&'
+( ( )%&'
+( (
<
(20)
67) 3 8
9
M< M
)* + !
Now it will be shown how to use the equations (7), (8) and (9) with virtual cardioid microphones synthesized
from bidirectional microphones. The method is described here for an array composed of two bidirectional
microphones and a cardioid one.
Microphone Layout
The array is composed of two bidirectional microphones pointing x and y axis over the horizontal plane and a
third cardioid microphone pointing to the Z axis (see FIGURE 2). It should be noticed that alternately a
soundfield microphone [2] could be used since the X and Y components of B-format are equivalent to the
bidirectional signal described above.
!* ,
!"#
)%&' +( ( I
The virtual signals )%&'SR&T +( ( I are obtained using the expressions:
)* +
)QR+( ( I
U
V
)%&'SR&T +( ( I
)* +
)* +
)QR+( ( I
)%&'SR&T +( ( I
U
V
)* +
(21)
a
(22)
Page 4
[
)%&'SR&T +( ( I
)* +
where the signal Z(t) refers to the Z component of the B-format.
(23)
(24)
(25)
Page 5
A ,u#3 w
v
\` A
x\y^ A
\y_ A
{
FIGURE 3 Angle estimation ambiguity: Cardioid directivity method (continuous blue line; see Equation (9)), Acoustic intensity
method (dotted green line; see Equation (32)), Theoretical position (red dashed line)
TABLE 1 Ambiguity resolution by cross-checking the angular estimation from the directivity (Equation (9)) and
intensity (Equation (32)) methods
estimated |
real |
}u!~
}u!~| ( y
}u!~
}u!~ | y (
}u!~
}u!~| ( y (
}u!~W
}u!~W| y (
Directivity
| (
y
| ( (
y
| ( (
y
| (
y
Intensity
|\ ( (
y
|\ (
y
|\ ( (
y
|\ (
y
Intensity
|
|\
|
|\
|\
|\
Localization based on the cardioid directivity allows to obtaining the azimuth angle with front rear ambiguity
due to the cosine relation involved in its estimation (eq.(9)). This estimation can be solved for non coincident arrays
by inserting a delay. For coincident arrays it is possible to use acoustic intensity to estimate azimuth angle, but this
time with left - right ambiguity due to the inverse of the tangent in the relation (32).a.
As shown by FIGURE 3 and TABLE 1, the front - rear ambiguity is complementary to the left - right
ambiguity. Four cases are pointed out, corresponding to the four combinations of the two ambiguous estimations.
The real position can then be found using a conditional research.
In theory once the ambiguity is solved, both methods (i.e Equation (9) or Equation (32)) give the same result.
However they may be slightly different in practice. Depending on the sound scene, the sound stimulus or the noise
level, one method can achieve better performances.
LOCALIZATION ASSESSMENT
A computer program stimulates the signals which would have been recorded by the two microphone array setups previously described. Various stimuli were used (music, random noise, noise band and harmonic tone).
Evaluation Criteria
The azimuth and elevation error : and : are calculated here as the angular distance between the real location
and the estimated one. The total error :T is the angular distance between the real and the estimated location on the
sphere (see eq(33)). The angular distance is calculated using the scalar product of the real and estimated position.
Page 6
e
&% eT
: ! 3 U
V
e
&% eT
(33)
where
&%
T
(34)
&% , e
T ,
T
When : or : are calculated, &% and T , or &% and T component are fixed to 0 respectively. The error
is expressed in degrees where 0 states the best estimation and 180 the worst one. In addition a new criterion is
proposed, computed as the error level obtained by at least 75% of 1/3 octave spectrum. It will be referred to as : .
e
&%
Results
FIGURE 4 Source localization of a random noise moving around from bottom to the top of a 3 cardioid microphones array.
Horizontal plane microphones are separated by 2cm.
a) Source location at 1036 Hz, azimuth (blue) and elevation (green). b) : criterion, azimuth (blue), elevation (green), total (red)
c) Elevation source location error. d) Azimuth localization error.
FIGURE 4 depicts the results obtained when localizing a random noise moving around and from bottom to the
top. Localization accuracy is affected by the microphone spacing because the reconstruction of the omnidirectional
pressure @* is altered. As shown in FIGURE 4c, high frequencies are more affected when the wavelength is closer to
the microphone distance. For high elevations variations of the cardioid pattern ()%&' ) are slow which results in a
poor angular discrimination. As azimuth localization uses elevation (cf. eq. (20)), azimuth estimation is
consequently degraded, as shown by FIGURE 4d.
FIGURE 5 Source localization of a random noise in azimuth (blue) and elevation (green) picked up by 2 bidirectional + 1cardioid
microphone array at 1036 Hz found with a). directivity only, b) intensity only, c) directivity and intensity
When coincident arrays are used, estimation of the pressure signal is more accurate (see eq.(23)) and results are
not affected with elevation over all frequencies (see FIGURE 5). However, in practice coincident microphone arrays
are impossible to build. As a consequence some artifacts will always occur in high frequencies.
As it has been specified before, it is assumed that only one source is present at each moment and at each
frequency. In practice, acoustic field is complex and more than one source is present affecting localization accuracy
(see FIGURE 6 and FIGURE 7). Low energy signals close to high energy signals are then localized to the direction
of the higher one. When the target sound source level is higher than 12 dB to disturbing noise, localization error is
Page 7
FIGURE 6 Source localization evaluation of a random noise: Elevation error (1st row), Azimuth error (2nd row) and : criterion
(3rd row) picked up with a 3 cardioid microphone array in presence of a disturbing second noise source at =0 and =0 and a level
difference of a). 0 dB, b) -12 dB, c) -20 dB
weak: Some front-rear confusions are introduced below 500 Hz since phase information of the target source is
altered by disturbing second source. Obviously this is only observed in the case of the cardioid array. The influence
of a disturbing source becomes insignificant for level differences higher than 20 dB.
Contrasting with FIGURE 6, it is observed for the bidirectional array that error increases unexpectedly for low
elevations (see FIGURE 7-3c). Indeed the target source level is disadvantaged by the array directivity which is
deaf at those directions. This phenomenon can be turned into advantage if the disturbing noise is placed at the
deaf area of the array. On the contrary for cardioid array the target source is homogenously picked-up at all
directions. As suggested by FIGURE 6, azimuth error : has not the same impact in terms of total angular distances
:TT% in function of the elevation angle. FIGURE 7 clearly shows that azimuth error has less impact when the
source is near to the poles.
Page 8
FIGURE 7 Source localization evaluation of a random noise : Elevation error (1 row), azimuth error (2 row) and : criterion
(3rd row) picked up with a 2 bidirectional + 1 cardioid microphone array in presence of a disturbing second noise source at =0
and =0 and a level difference of a). 0 dB, b) -12 dB, c) -20 dB
st
nd
Sound scene analysis presented in this paper can be used as the first step of object based spatial audio
representation. Using the information of source position, it is possible to render the sound scene over any type of
spatial audio system such as stereo, 5.1, 7.1, 22.2, Higher Order Ambisonics [7] or Wave Field Synthesis [8].
REFERENCES
[1]. J. Sunier, The story of stereo: 1881-. Gernsback Library, 1960.
[2]. R. Nicol, Reprsentation et perception des espaces auditifs virtuels , HDR, Universit du Maine, Le Mans,
France, 2010.
[3]. J. Jouhaneau, Notions lmentaires dacoustique: lectroacoustique. Tec & Doc Lavoisier, 1999.
[4]. Michel A. Gerzon et Peter G. Craven, Coincident microphone simulation covering three dimensional space and
yelding various directional outputs , U.S. Patent 4,042,77916-aot-1977.
[5]. M. Bruneau, Manuel dacoustique fondamentale. Herms, 1998.
[6]. V. Pulkki, Directional audio coding in spatial sound reproduction and stereo upmixing , in Proc. of the AES
28th Int. Conf, Pitea, Sweden, 2006.
[7]. J. Daniel, Evolving views on HOA: From technological to pragmatic concerns , Ambisonics Symposium 2009,
June 25-27, Graz, 2009.
[8]. A.J. Berkhout, D. de Vries & P. Vogel, Acoustic Control by Wave Field Synthesis , J. Acoust. Soc. Am., 1993,
93, pp. 2764-2778.
Page 9