Version 3.0h
User’s Manual
Page 2 of 31
Content
Interfacing Wavesurfer..................................................................................................5
Fundamental Frequency......................................................................................7
Formants..............................................................................................................8
Long Term Average Spectrum............................................................................8
Recommendations........................................................................................................16
Future Developments...................................................................................................16
References....................................................................................................................17
Appendix A..................................................................................................................19
Appendix B..................................................................................................................24
Appendix C..................................................................................................................29
Page 3 of 31
Introducing Catalina Toolbox
Catalina Forensic Audio Toolbox is a software system for forensic audio analysis.
This version has been designed for Windows 98/2000/Me/NT/XP/Vista.
Historical Note
The first version of the Catalina Forensic Audio Toolbox ("Catalina") was developed
in 1993. At the time, the speed of PCs and sound card quality was relatively low
compared to present-day equipment. The most important updates were written
during completion of my Ph.D. dissertation1998-2001. The program evolved to use
an external software program to do the work of analyzing speech fundamental
frequency F0, formants F123 and the long term average spectrum LTAS. For this
current version I use Wavesurfer 1.8.5, developed at the KTH Institute in Stockholm
by Kåre Sjölander and Jonas Beskow. More details about this software can be found
on http://www.speech.kth.se/wavesurfer/. In the chapter on "Interfacing Wavesurfer”
I explain the use of this software with Catalina.
History
- v3.0h (2007): added indications for intra-speaker variability,
- v3.0c (2006): first stand-alone version,
- v3.0b (2005): added information on individual vowels,
- v3.0a (2004): added information on long term cumulative formant distribution,
- v3.0 (2003): added information on long term average formants analysis,
- v2.0c (2002): added information on long term average spectrum histogram,
- v2.0 (1998-2001): my Ph.D. thesis, second major version of Catalina,
- v1.0 (1992): first version of Catalina, a Matlab toolbox.
System Requirements
1. A PC running Windows 98 SE, Windows XP or Windows NT
2. A computer having a CPU of at least 133 MHz
3. A copy of Wavesurfer, version 1.8.5 or later
Special Thanks
I wish to thank IAFPA-International Association of Forensic Phonetics and Acoustics
for a grant to finish this latest version of this software. I also appreciate the very
important help of Professor Francis Nolan from Cambridge University, Professor
Brandusa Pantelimon from Bucharest University and Durand R. Begault, Ph.D.,
Audio Forensic Centre, Charles M. Salter Associates, Inc., San Francisco, CA, USA.
I am grateful to the Cambridge Colleges Hospitality Scheme for making possible my
visit to Cambridge in summer 2003.
Page 4 of 31
Installation
Run CatalinaSetup.exe. By default the program will be installed on C:\Catalina and a
shortcut will be placed on Desktop.
Getting Help
For further details, you can contact the author directly at forensicav@techemail.com.
In the e-mail title/subject please indicate „Catalina Toolbox”.
Interfacing Wavesurfer
Catalina depends on the long-term average and formant analysis capabilities of
Wavesurfer. Other programs that can provide exported text versions of these analyses
can also be used, but the demonstrations given here use Wavesurfer. There is a
specific naming format that Catalina depends on when exporting data analyses from
Wavesurfer to the 'Evidence' folder that is explained in detail below.
Run Wavesurfer and open a WAV PCM file, 8 KHz, 16 bit, mono file recommended.
You should get a window like the following one (see Fig.1). Select the Speech
analysis configuration.
Page 5 of 31
The Wavesurfer display will show 3 plots: waveform, spectrogram with formant
estimator tracking overlay, and fundamental frequency (see figure 2).
waveform
fundamental frequency
Page 6 of 31
Fundamental frequency
Figure 3. F0 selection
You may create an F0 text file using the following steps: select the entire wave by
pressing F11, select all with Ctrl+A, right click on F0 plot and Save data file as
C:\Catalina\Evidence\filename.f0
For example, to test.wav file will correspond the test.f0 file.
Page 7 of 31
Formants
In Wavesurfer the formants analysis can be carried out using linear prediction. By
default Wavesurfer uses the 12th order LPC algorithm. (Refer to the Wavesurfer
manual for details about formant settings).
Catalina requires a text file containing data for formants F1-F2-F3. You will need to
create an F123 text file using the following steps: (1) select the entire wave by
pressing F11 or select all with Ctrl+A, (2) right click on formants plot, (3) export the
formant data file as filename.frm
For example, the test.wav file will correspond the test.frm file.
The long term average spectrum (LTAS) is the mean of successive short-term
spectral analyses computed over the duration of a given speech sample. Each short-
time spectrum (computed by means of the discrete Fourier transform) reflects the
phonetic quality of the current segment, but the LTAS analysis characterizes the
overall spectral content of the entire sample. The LTAS is influenced by the
combined effect of the analyzed speech, the background noise, the equipments noises
and the frequency response of the transmission chain.
Right click on Waveform and select LTAS. Make certain that the entire waveform
has been selected and that 'average of selection' has been chosen. The following plot
will be displayed.
You now need to create an LTAS text file by clicking the 'export 'option and saving it
to the filename.lts. For example, to test.wav file will correspond the test.lts file.
Page 8 of 31
Catalina Forensic Audio Toolbox
Catalina Forensic Audio Toolbox allows an examiner to compute statistics and create
TIFF files containing text information and plot distributions for the data files
exported from Wavesurfer or an equivalent software program:
General Plots
Create or copy the filename.f0, filename.frm and filename.lts files to
C:\Catalina\Evidence folder. Run Catalina from the desktop icon or
C:\Catalina\bin\win32\Catalina3x.exe and select a file from the C:\Catalina\Evidence
folder. Catalina will ask for the name of the F0 text file, and it will then search for this
file, along with similarly-named frm and lts text export files, from this same
'Evidence' folder. The program then writes plot files to C:\Catalina\Plots using the
same naming convention.
As an example to demonstrate the program, select the included file test20sec. The
software will start to compute statistics and create TIFF files stored in
C:\Catalina\Plots. Check the resulted TIFF files on C:\Catalina\Plots
01-test20sec.tif voice profile containing F0, LTAS and F123 histogram plots
02-test20sec.tif LTAF and LTCF
03-test20sec.tif all F1 vs F2 formant space
04-test20sec.tif all F2 vs F3 formant space
05-test20sec.tif F1 vs F2 formant space
06-test20sec.tif F2 vs F3 formant space
07-test20sec.tif LTAS
08-test20sec.tif LTAS, LTCF and LTAF
Page 9 of 31
Figure 5. General plot obtained for the file test180sec.
Plots of the Long term average formants LTAF and long term cumulative formants
LTCF are displayed on Figure 6. LTCF represents the vertical addition of all LTAF.
Note that the LTCF and LTAF outlines represent the same contours of the histograms
seen in the lower plot of Figure 5.
Page 10 of 31
Figure 6. Long term formants
Formant Space
↑ ↑ ↑ ↑ ↑ ↑
low high low high low high
limits limits limits
for F1 for F2 for F3
Page 11 of 31
Figure 7. F1-F2 and F2-F3 space display
Page 12 of 31
These values are those indicated in different reference for different languages. Other
references may be used to determine the vowel limits for a specific language, or
vowel limits can be analyzed by inspecting formant values for a specific set of
speakers.
An example of F1-F2 vowel space display is presented in figure 8. Filled red circles
indicate the mean of the supplied values from the F123 file at those times when a
corresponding F0 value has been indicated for that specific time frame. When there is
no estimate for an F0 time frame, the corresponding F123 value is discarded from the
mean calculation. This removes any bias from the mean estimate that would be
caused by formant values analyzed during unvoiced sections.
In figures 8-9, the blue points adjacent to the filled red circles represent the average
values for first and second halves of the all analysed formants. These dots and their
values can be useful to analyse intra-speaker variability.
Page 13 of 31
Figure 9. F2-F3 vowel space display
The long term average spectrum – Fast Fourier Transform (LTAS-FFT) plot
produced by Catalina is identical to the LTAS plot produced in Wavesurfer.
The LTAS-Histogram plot produced by Catalina shows, for each individual short-
term DFT plot, the number of appearances of each energy level in the spectrum.
These plots may be potentially useful in comparing speech exemplars where the same
level of background noise and speech transmission system is present. Any differences
in the compared LTAS plots can then be explained as resulting primarily from
characteristics of vocal formants.
Page 14 of 31
Figure 10. LTAS-FFT and LTAS-Histogram
Page 15 of 31
Recommendations
For comparison between plots generated for different voice samples such as
questioned and known exemplars, it is recommended that Catalina be used with:
- linear PCM, 8 kHz, 16 bits, mono recorded wav files, analyzed within
Wavesurfer,
- known (reference, suspect) and unknown (questioned) exemplars recordings
made as contemporaneously as is practically possible,
- known (reference, suspect) and unknown (questioned) recordings made with
the same recording/transmission channel,
- normal/modal phonation samples,
- exemplar durations of longer than 10 seconds,
- speech signal to noise ratio (SNR) greater than > 10 dB.
Users should note that some telephonic transmission systems or other recordings may
have high-pass filter characteristics (visible in the LTAS analysis) that can bias the
estimate of F1 to a higher frequency compared to what would be recorded for the
same voice, using a reference microphone and linear recording system.
Future Developments
Future options, including a means for calculating a likelihood ratio, will be added to
future releases of the Catalina Forensic Audio Toolbox. Check the website
periodically for updates.
Page 16 of 31
REFERENCES
Page 17 of 31
Scherer, K. R. (1986). ‘Voice, stress, and emotion’, in M. H. Appley and R. Trumbull (eds),
Dynamics of Stress: Physiological, Psychological, and Social Perspectives, New York: Plenum,
159-181.
Stevens, K.N. (1989) ‘On the quantal nature of speech’, Journal of Phonetics 17, 3-45.
Wells, J. (1982) Accents of English, Cambridge: Cambridge University Press.
Page 18 of 31
Appendix A - two samples analysis of the same speaker
Page 19 of 31
Page 20 of 31
Page 21 of 31
Page 22 of 31
Page 23 of 31
Appendix B - samples analysis of two different speakers
Page 24 of 31
Page 25 of 31
Page 26 of 31
Page 27 of 31
Page 28 of 31
Appendix C – a short (approx. 10 sec) voice sample analysis
Page 29 of 31
Page 30 of 31
Page 31 of 31