Anda di halaman 1dari 31

Catalina Forensic Audio Toolbox

Version 3.0h

User’s Manual

Catalin Grigoras, Ph.D.


Copyright Notice

(C)2007 Catalin Grigoras, Ph.D.


forensicav@techemail.com

Page 2 of 31
Content

Introducing Catalina Forensic Audio Toolbox..............................................................4


System Requirements..........................................................................................4
Installation...........................................................................................................5
Getting Help........................................................................................................5

Interfacing Wavesurfer..................................................................................................5
Fundamental Frequency......................................................................................7
Formants..............................................................................................................8
Long Term Average Spectrum............................................................................8

Catalina Forensic Audio Toolbox


Basics
General Plots.............................................................................................9
Long Term Formants...............................................................................10
Formant Space.........................................................................................11
Long term average spectrum…………………………………………...14

Recommendations........................................................................................................16

Future Developments...................................................................................................16

References....................................................................................................................17

Appendix A..................................................................................................................19

Appendix B..................................................................................................................24

Appendix C..................................................................................................................29

Page 3 of 31
Introducing Catalina Toolbox
Catalina Forensic Audio Toolbox is a software system for forensic audio analysis.
This version has been designed for Windows 98/2000/Me/NT/XP/Vista.

Historical Note
The first version of the Catalina Forensic Audio Toolbox ("Catalina") was developed
in 1993. At the time, the speed of PCs and sound card quality was relatively low
compared to present-day equipment. The most important updates were written
during completion of my Ph.D. dissertation1998-2001. The program evolved to use
an external software program to do the work of analyzing speech fundamental
frequency F0, formants F123 and the long term average spectrum LTAS. For this
current version I use Wavesurfer 1.8.5, developed at the KTH Institute in Stockholm
by Kåre Sjölander and Jonas Beskow. More details about this software can be found
on http://www.speech.kth.se/wavesurfer/. In the chapter on "Interfacing Wavesurfer”
I explain the use of this software with Catalina.

History
- v3.0h (2007): added indications for intra-speaker variability,
- v3.0c (2006): first stand-alone version,
- v3.0b (2005): added information on individual vowels,
- v3.0a (2004): added information on long term cumulative formant distribution,
- v3.0 (2003): added information on long term average formants analysis,
- v2.0c (2002): added information on long term average spectrum histogram,
- v2.0 (1998-2001): my Ph.D. thesis, second major version of Catalina,
- v1.0 (1992): first version of Catalina, a Matlab toolbox.

System Requirements
1. A PC running Windows 98 SE, Windows XP or Windows NT
2. A computer having a CPU of at least 133 MHz
3. A copy of Wavesurfer, version 1.8.5 or later

Special Thanks
I wish to thank IAFPA-International Association of Forensic Phonetics and Acoustics
for a grant to finish this latest version of this software. I also appreciate the very
important help of Professor Francis Nolan from Cambridge University, Professor
Brandusa Pantelimon from Bucharest University and Durand R. Begault, Ph.D.,
Audio Forensic Centre, Charles M. Salter Associates, Inc., San Francisco, CA, USA.
I am grateful to the Cambridge Colleges Hospitality Scheme for making possible my
visit to Cambridge in summer 2003.

Page 4 of 31
Installation
Run CatalinaSetup.exe. By default the program will be installed on C:\Catalina and a
shortcut will be placed on Desktop.

You should get the following folder structure:

C:\Catalina\bin - for executable files, do not modify it


C:\Catalina\Evidence - for WAV files to be analysed and saved with Wavesurfer
C:\Catalina\Plots - for graphical TIFF results
C:\Catalina\Results - for numerical results
C:\Catalina\toolbox - do not modify it.

Getting Help
For further details, you can contact the author directly at forensicav@techemail.com.
In the e-mail title/subject please indicate „Catalina Toolbox”.

Interfacing Wavesurfer
Catalina depends on the long-term average and formant analysis capabilities of
Wavesurfer. Other programs that can provide exported text versions of these analyses
can also be used, but the demonstrations given here use Wavesurfer. There is a
specific naming format that Catalina depends on when exporting data analyses from
Wavesurfer to the 'Evidence' folder that is explained in detail below.
Run Wavesurfer and open a WAV PCM file, 8 KHz, 16 bit, mono file recommended.
You should get a window like the following one (see Fig.1). Select the Speech
analysis configuration.

Figure 1. Wavesurfer, Choose Configuration option

Page 5 of 31
The Wavesurfer display will show 3 plots: waveform, spectrogram with formant
estimator tracking overlay, and fundamental frequency (see figure 2).

waveform

spectrogram and formants

fundamental frequency

Figure 2. Wavesurfer Speech Analysis display

The following Wavesurfer settings are recommended for correct parameters


extraction:

F0 Properties > Pitch contour


Pitch method: ESPS
Max pitch value: 200 Hz for male voices and 400 Hz for female voices
Analysis window length: 0.0075 s
Frame interval: 0.01 s
Spectrogram
FFT window length: 256 points
Analysis window type: Hamming
Analysis bandwidth: 125 Hz, Window 64 points
Pre-emphasis factor: 0.97
Cut spectrogram at: 4000 Hz
Formants
Number of formants: 4
Analysis window length: 0.049 s
Analysis window type: Hamming
Pre-emphasis factor: 0.7
Frame interval: 0.01 s
LPC order: 12
LPC type: 0
Down-sampling frequency: 8000 Hz

Page 6 of 31
Fundamental frequency

Fundamental frequency (F0) is the frequency of repetition of the (quasi-)periodic


waveform of the voiced speech signal, corresponding closely to our perception of the
pitch of the speech. F0 analysis can be performed with different algorithms either in
the time or in the frequency domain.
In Wavesurfer the analysis can be carried out using the AMDF (Average Magnitude
Difference Function) algorithm or ESPS (Entropic Speech Processing System). By
default Wavesurfer uses the ESPS algorithm. See Wavesurfer manual for details
about F0 settings.
To reduce spurious (false) F0 values introduced by other sounds or non-normal
speech (e.g. Fig.3) three methods can be applied: filter the noises, delete these
samples or change the F0 limits in Wavesurfer. This last technique is primarily for
limiting non-normal speech effects (e.g., falsetto).

Figure 3. F0 selection

You may create an F0 text file using the following steps: select the entire wave by
pressing F11, select all with Ctrl+A, right click on F0 plot and Save data file as
C:\Catalina\Evidence\filename.f0
For example, to test.wav file will correspond the test.f0 file.

Page 7 of 31
Formants

In Wavesurfer the formants analysis can be carried out using linear prediction. By
default Wavesurfer uses the 12th order LPC algorithm. (Refer to the Wavesurfer
manual for details about formant settings).
Catalina requires a text file containing data for formants F1-F2-F3. You will need to
create an F123 text file using the following steps: (1) select the entire wave by
pressing F11 or select all with Ctrl+A, (2) right click on formants plot, (3) export the
formant data file as filename.frm
For example, the test.wav file will correspond the test.frm file.

Long Term Average Spectrum

The long term average spectrum (LTAS) is the mean of successive short-term
spectral analyses computed over the duration of a given speech sample. Each short-
time spectrum (computed by means of the discrete Fourier transform) reflects the
phonetic quality of the current segment, but the LTAS analysis characterizes the
overall spectral content of the entire sample. The LTAS is influenced by the
combined effect of the analyzed speech, the background noise, the equipments noises
and the frequency response of the transmission chain.
Right click on Waveform and select LTAS. Make certain that the entire waveform
has been selected and that 'average of selection' has been chosen. The following plot
will be displayed.

Fig.4. LTAS option

You now need to create an LTAS text file by clicking the 'export 'option and saving it
to the filename.lts. For example, to test.wav file will correspond the test.lts file.
Page 8 of 31
Catalina Forensic Audio Toolbox

Catalina Forensic Audio Toolbox allows an examiner to compute statistics and create
TIFF files containing text information and plot distributions for the data files
exported from Wavesurfer or an equivalent software program:

- fundamental frequency F0,


- formants F1, F2 and F3,
- long term formants distributions LTCF,
- long term cumulative formants distribution LTCFD,
- F1-F2 space for vowels [a], [e], [i] and [o],
- F2-F3 space for vowels [a], [e], [i] and [o],
- long term average spectrum LTAS.

General Plots
Create or copy the filename.f0, filename.frm and filename.lts files to
C:\Catalina\Evidence folder. Run Catalina from the desktop icon or
C:\Catalina\bin\win32\Catalina3x.exe and select a file from the C:\Catalina\Evidence
folder. Catalina will ask for the name of the F0 text file, and it will then search for this
file, along with similarly-named frm and lts text export files, from this same
'Evidence' folder. The program then writes plot files to C:\Catalina\Plots using the
same naming convention.

As an example to demonstrate the program, select the included file test20sec. The
software will start to compute statistics and create TIFF files stored in
C:\Catalina\Plots. Check the resulted TIFF files on C:\Catalina\Plots

01-test20sec.tif voice profile containing F0, LTAS and F123 histogram plots
02-test20sec.tif LTAF and LTCF
03-test20sec.tif all F1 vs F2 formant space
04-test20sec.tif all F2 vs F3 formant space
05-test20sec.tif F1 vs F2 formant space
06-test20sec.tif F2 vs F3 formant space
07-test20sec.tif LTAS
08-test20sec.tif LTAS, LTCF and LTAF

Page 9 of 31
Figure 5. General plot obtained for the file test180sec.

The general plot from figure 5 contains:


- fundamental frequency F0 histogram, mean and standard deviation F0 values,
along with the total length of voiced signal; Catalina requires 8 kHz sample
rate files to determine the duration correctly,
- long term average spectrum LTAS,
- long term average formant distributions LTAF for F1 (red), F2 (green), F3
(blue), mean and standard deviation for F1, F2 and F3. Histograms for F1, F2,
and F3

Long Term Formants

Plots of the Long term average formants LTAF and long term cumulative formants
LTCF are displayed on Figure 6. LTCF represents the vertical addition of all LTAF.
Note that the LTCF and LTAF outlines represent the same contours of the histograms
seen in the lower plot of Figure 5.

Page 10 of 31
Figure 6. Long term formants

Formant Space

Catalina creates the formants F2 vs F1 and F2 vs F3 plots, and automatically detects


vowels [a], [e], [i], [o] based on the user-defined settings in the editable text file
C:\Catalina\formants.txt. By default, the settings in formants.txt are as follows:

601 850 1100 1600 2200 2800 ← vowel [a]


401 600 1500 2000 2100 2800 ← vowel [e]
220 400 2000 2400 2400 2900 ← vowel [i]
370 600 700 1200 2200 2600 ← vowel [o]

↑ ↑ ↑ ↑ ↑ ↑
low high low high low high
limits limits limits
for F1 for F2 for F3

Page 11 of 31
Figure 7. F1-F2 and F2-F3 space display

Page 12 of 31
These values are those indicated in different reference for different languages. Other
references may be used to determine the vowel limits for a specific language, or
vowel limits can be analyzed by inspecting formant values for a specific set of
speakers.

An example of F1-F2 vowel space display is presented in figure 8. Filled red circles
indicate the mean of the supplied values from the F123 file at those times when a
corresponding F0 value has been indicated for that specific time frame. When there is
no estimate for an F0 time frame, the corresponding F123 value is discarded from the
mean calculation. This removes any bias from the mean estimate that would be
caused by formant values analyzed during unvoiced sections.

An example of F2-F3 vowel space display is presented in figure 9.

In figures 8-9, the blue points adjacent to the filled red circles represent the average
values for first and second halves of the all analysed formants. These dots and their
values can be useful to analyse intra-speaker variability.

Figure 8. F1-F2 vowel space display

Page 13 of 31
Figure 9. F2-F3 vowel space display

Long term average spectrum

The long term average spectrum – Fast Fourier Transform (LTAS-FFT) plot
produced by Catalina is identical to the LTAS plot produced in Wavesurfer.

The LTAS-Histogram plot produced by Catalina shows, for each individual short-
term DFT plot, the number of appearances of each energy level in the spectrum.

These plots may be potentially useful in comparing speech exemplars where the same
level of background noise and speech transmission system is present. Any differences
in the compared LTAS plots can then be explained as resulting primarily from
characteristics of vocal formants.

As explained earlier, LTAS is the mean of successive short-term spectral analyses


computed over the duration of a given speech sample. Each short-time spectrum
(computed by means of the discrete Fourier transform) reflects the phonetic quality of
the current segment, but the LTAS analysis characterizes the overall spectral content
of the entire sample. The LTAS is influenced by the combined effect of the analyzed
speech, the background noise and other periodic background sounds, and the
frequency response of the transmission chain.

Page 14 of 31
Figure 10. LTAS-FFT and LTAS-Histogram

Figure 11. LTAS-FFT, LTCF and LTAF plots

Page 15 of 31
Recommendations

For comparison between plots generated for different voice samples such as
questioned and known exemplars, it is recommended that Catalina be used with:

- linear PCM, 8 kHz, 16 bits, mono recorded wav files, analyzed within
Wavesurfer,
- known (reference, suspect) and unknown (questioned) exemplars recordings
made as contemporaneously as is practically possible,
- known (reference, suspect) and unknown (questioned) recordings made with
the same recording/transmission channel,
- normal/modal phonation samples,
- exemplar durations of longer than 10 seconds,
- speech signal to noise ratio (SNR) greater than > 10 dB.

Users should note that some telephonic transmission systems or other recordings may
have high-pass filter characteristics (visible in the LTAS analysis) that can bias the
estimate of F1 to a higher frequency compared to what would be recorded for the
same voice, using a reference microphone and linear recording system.

Future Developments

Future options, including a means for calculating a likelihood ratio, will be added to
future releases of the Catalina Forensic Audio Toolbox. Check the website
periodically for updates.

Page 16 of 31
REFERENCES

Baldwin, J. and French, P. (1990) Forensic Phonetics, London: Pinter.


Byrne, C., Foulkes, P. (2004) ‘The Mobile Phone Effect on Vowel Formants’, International Journal
of Speech, Language and the Law 11(1), 83-102
Carlson, R., Fant, G., and Granström, B. (1975) ‘Two-formant models, pitch and vowel perception’,
in G. Fant and M.A.A. Tatham (eds), Auditory Analysis and Perception of Speech, London:
Academic, 55-82.
Gonzalez-Rodriguez, J., Ortega-Garcia, J. and Lucena-Molina, J.J. (2001) ‘On the application of the
Bayesian approach in real forensic conditions with GMM-based systems’, Proceedings of 2001:
A Speaker Odyssey - The Speaker Recognition Workshop, 135-138.
Grigoras, C. (2001) ‘Digital voice processing system’, unpublished PhD thesis, University of
Bucharest, Electric Department, Romania
Grigoras, C. (2003) ‘Voice analysis on noisy recordings’, Paper presented at Cambridge Forensic
Phonetics Workshop, August 2003, Cambridge, UK.
Hess, W. (1983) Pitch Determination of Speech Signals: Algorithms and Devices, Berlin: Springer-
Verlag.
Hollien, H. (1990) The Acoustics of Crime: the New Science of Forensic Phonetics, New York:
Plenum.
Hollien, H. (2000) Forensic Voice Identification, New York: Academic Press.
Jessen, M., Köster. O. Gfroerer, S. (2005) ‘Influence of vocal effort on average and variability of
fundamental frequency’, International Journal of Speech, Language and the Law 12(2), 174-213
Künzel, H.J. (2001) ‘Beware of the ‘telephone effect’: the influence of telephone transmission on
the measurement of formant frequencies’, Forensic Linguistics 8(1), 80-99.
Ladd, D.R. and Terken, J. (1995) ‘Modelling intra- and inter-speaker pitch range’, Proceedings of
the 13th International Congress of Phonetic Sciences, Stockholm, vol.2, 386-89.
Laver, J. (1980) The Phonetic Description of Voice Quality, Cambridge: Cambridge University
Press.
McDougall, K. (2004) ‘Speaker-specific formant dynamics: an experiment on Australian English
/aI/’, International Journal of Speech, Language and the Law 11(1), 103-130.
Meuwly, D. (2001) ‘Reconnaissance de locuteurs en sciences forensiques: l'apport d'une approche
automatique’, PhD thesis, University of Lausanne.
Nolan, F. (1983) The Phonetic Bases of Speaker Recognition, Cambridge: Cambridge University
Press.
Nolan, F. (1990) ‘The limitations of auditory phonetic speaker recognition’, in H. Kniffka (ed.),
Texte zu Theorie und Praxis forensischer Linguistik, Tübingen: Niemeyer, 457-479.
Nolan, F. (1993) ‘Auditory and acoustic analysis in speaker recognition’, in J. Gibbons (ed.),
Language and the Law, London: Longman, 326-345.
Nolan, F. (2002) ‘The “telephone effect” on formants: a response’, Forensic Linguistics 9(1), 74-82.
Nolan, F. (2005) ‘Forensic speaker identification and the phonetic description of voice quality’, in
W.J. Hardcastle and J. MacKenzie Beck (eds), A Figure of Speech: a Festschrift for John Laver,
Mahwah, N.J.: Erlbaum, 385-411.
Nolan, F. and Grigoras, C. (2005) ‘A case for formant analysis in forensic speaker identification’,
International Journal of Speech, Language and the Law 12(2), 143-173
Rabiner, L.R., Cheng, M.J., Rosenberg, A.E. and McGonegal, C.A. (1976) ‘A comparative study of
several pitch detection algorithms’, IEEE Transactions on Audio, Speech and Signal Processing
24, 399-413.
Repp, B. (1982) ‘Phonetic trading relations and context effects: new experimental evidence for a
speech mode of perception’, Psychological Bulletin 92, 81-110.
Rodman, R., McAllister, D., Bitzer, D., Cepeda, L. and Abbitt, P. (2002) ‘Forensic speaker
identification based on spectral moments’, Forensic Linguistics 9(1), 22-43.
Rose, P.J. (2002) Forensic Speaker Identification, London: Taylor and Francis.

Page 17 of 31
Scherer, K. R. (1986). ‘Voice, stress, and emotion’, in M. H. Appley and R. Trumbull (eds),
Dynamics of Stress: Physiological, Psychological, and Social Perspectives, New York: Plenum,
159-181.
Stevens, K.N. (1989) ‘On the quantal nature of speech’, Journal of Phonetics 17, 3-45.
Wells, J. (1982) Accents of English, Cambridge: Cambridge University Press.

Page 18 of 31
Appendix A - two samples analysis of the same speaker

Page 19 of 31
Page 20 of 31
Page 21 of 31
Page 22 of 31
Page 23 of 31
Appendix B - samples analysis of two different speakers

Page 24 of 31
Page 25 of 31
Page 26 of 31
Page 27 of 31
Page 28 of 31
Appendix C – a short (approx. 10 sec) voice sample analysis

Page 29 of 31
Page 30 of 31
Page 31 of 31

Anda mungkin juga menyukai