Anda di halaman 1dari 208

1

USING STRANGE ATTRACTORS


TO MODEL SOUND
Submitted to
The University of London
for the Degree of
Doctor of Philosophy
Jonathan Mackenzie
King's College
April 1994
2
Abstract
This thesis investigates the possibility of applying nonlinear dynamical systems
theory to the problem of modelling sound with a computer. The particular interest is in
the creative use of sound, where its representation, generation and manipulation are
important issues. A specific application, for example, is the modelling of
environmental sound for film sound-tracks.
Recently, there have been a number of major advances in the field of nonlinear
dynamical systems which include chaos theory and fractal geometry. It is argued that
these provide a rich source of ideas and techniques relevant to the issues of modelling
sound. One such idea is that complex behaviour may be generated from simple
systems. Such behaviour can often replicate a wide range of natural phenomena, or is
of interest in its own right because of its aesthetic appeal. This has been demonstrated
often through computer generated images and so an equivalent is sought in the audio
domain. This work is believed to be the first substantial attempt at this.
The investigation begins with a consideration of fractal and chaotic properties of
sound and with a comparison between established approaches to modelling and the
alternatives suggested by the new theory. Then, the inquiry concentrates on strange
attractors, which are the mathematical objects central to chaos theory, and on two
ways in which they may be used to model sound.
The first of these involves using static fractal functions to represent sound time
series. A technique is developed for synthesising complex abstract sounds from a
small number of parameters. A class of these sounds have the novel property that they
are simultaneously rhythms and timbres. It is believed these have potential for use in
computer music composition. Also considered is the problem of modelling a given
time series with a fractal function. An algorithm for doing this is taken from the
literature, shown to be of limited ability, and then improved. The results indicate that
data compression may be achieved for certain types of sound.
The second approach focuses on modelling the dynamics of a sound via the
embedded reconstruction of an attractor from a time series. Two models are presented,
one deterministic, the other stochastic. It is demonstrated that with the first of these,
certain sounds may be modelled such that their perceived qualities are preserved. For
some other signals, although the sound is not so well preserved, many statistical
aspects are. The second model is shown to provide a solution to the film sound-track
problem.
It is concluded that this investigation shows strange attractors to have considerable
potential as a basis for modelling sound and that there are many areas for continued
research.
3
To
Valerie Duff
4
Acknowledgements
I would very much like to thank my supervisor, Dr. Mark Sandler, for encouraging
me to begin this research project, for finding the funding for it, and for everything he
has done towards making it such a stimulating and enjoyable experience. I am also
indebted to Solid State Logic for providing the sponsorship and to Chris Jenkins for
arranging it. I doubt whether I would have had the opportunity to pursue the project of
my choice otherwise.
I am enormously grateful to my colleagues at King's College who have always
been helpful, supportive and inspiring. These include Maaruf Ali, Julian Bean, Victor
Bocharov, Rob Bowman, Ian Clark, Chris Dunn, Jason Goldberg, Anthony Hare, Rod
Hiorns, Simon Kershaw, Panos Kudumakis, Anthony Macgrath, Phillipa Parmiter,
Allan Paul, Marc Price, Mark Townsend, Mike Waters, and Jie Yu.
For sharing their knowledge and for always being helpful I would like to thank
Dr. Bill Chambers, Prof. Tony Davies, and Dr. Luke Hodgkin. I am also deeply
grateful to Peter King, Mustaq Mohammed and Talat Malik for their generous
technical support.
Finally, special thanks to Val, my family and friends for their support, enthusiasm,
patience and inspiration and for knowing never to ask "when are you going to finish?"
5
Contents
Abstract ............................................................................................................... 2
Acknowledgements ....................................................................................................... 4
Contents ............................................................................................................... 5
List of Figures ............................................................................................................... 8
List of Tables ............................................................................................................. 14
List of Sound Examples .............................................................................................. 16
List of Acronyms......................................................................................................... 19
1. Introduction........................................................................................20
2. Modelling Sound.................................................................................24
2.1. Sound and its Representation................................................................... 24
2.2. Music composition. .................................................................................. 25
2.3. The Roomtone Problem........................................................................... 26
2.4. Digital Audio............................................................................................ 27
2.5. The Modelling Framework....................................................................... 28
2.6. Conventional Models ............................................................................... 29
2.6.1. Physical Modelling.................................................................... 29
2.6.2. Additive and Subtractive Synthesis........................................... 29
2.6.3. Frequency Modulation and Waveshaping................................. 32
2.7. Summary .................................................................................................. 33
3. Chaos Theory and Fractal Geometry ..............................................34
3.1. Introduction.............................................................................................. 34
3.2. The Significance of Chaos ....................................................................... 35
3.3. Dynamical Systems and State Space........................................................ 36
3.4. Stability.................................................................................................... 37
3.5. Attractors.................................................................................................. 39
3.6. Chaos........................................................................................................ 40
3.7. Visualisation............................................................................................. 42
3.8. Bifurcation................................................................................................ 44
3.9. Statistical Descriptions of Dynamics ....................................................... 47
3.10. Fractal Geometry.................................................................................... 48
3.11. Iterated Function Systems ...................................................................... 53
3.11.1. Contraction Mappings............................................................. 54
6
3.11.2. The Random Iteration Algorithm............................................ 56
3.11.3. The Shift Dynamical System................................................... 58
3.11.4. The Collage Theorem.............................................................. 59
3.11.5. The Continuous Dependence of the Attractor on the IFS
Parameters.............................................................................. 60
3.12. Summary ................................................................................................ 60
4. Applying Chaos and Fractals to the
Problem of Modelling Sound.............................................................62
4.1. The Reasons for Using Chaos Theory................................................. 62
4.2. Diagnosis of Chaotic Behaviour ......................................................... 64
4.2.1. Chaos and Woodwind Instruments ........................................... 65
4.2.2. Chaos and Gongs....................................................................... 66
4.2.3. Fractal Time Waveforms........................................................... 66
4.2.4. 1/f Noise.................................................................................... 67
4.3. Representing Sound Using Chaos and Fractals................................... 71
4.4. Summary ............................................................................................. 73
5. Fractal Interpolation Functions........................................................75
5.1. Theory ................................................................................................. 75
5.2. The Synthesis Algorithm..................................................................... 78
5.3. Experiments with the Synthesis Algorithm......................................... 80
5.4. Rhythm/Timbres ................................................................................. 85
5.5. Generating Time-Varying FIF Sounds................................................ 87
5.6. A Genetic Parameter Control Interface............................................... 90
5.6.1. Implementation ......................................................................... 91
5.6.2. Experiments .............................................................................. 95
5.7. Conclusions....................................................................................... 101
6. Modelling Sound with FIFs.............................................................103
6.1. Deriving Interpolation Points from Naturally Occurring Sound ........... Waveforms 103
6.2. Mazel's Time Series Models ............................................................. 107
6.3. Comparison with Requantisation...................................................... 109
6.4. Mazel's Inverse Algorithm for the Self-Affine Model ...................... 114
6.4.1. Initial Results .......................................................................... 118
6.4.2. Error Weighting ...................................................................... 121
6.4.3. Interpolation Point Range Restriction..................................... 124
6.5. Conclusions....................................................................................... 128
7
7. Chaotic Predictive Modelling..........................................................131
7.1. Chaotic Time Series .......................................................................... 131
7.2. Embedding ........................................................................................ 133
7.3. The Analysis/Synthesis Model.......................................................... 135
7.4. The Inverse Problem......................................................................... 138
7.5. A Solution to the Inverse Problem................................................... 140
7.6. Experimental Technique ................................................................... 143
7.7. Experiments with a Lorenz Time Series ........................................... 148
7.8. Experiments with Sound Time Series............................................... 155
7.8.1. Air Noises................................................................................ 155
7.8.2. Gong Sounds ........................................................................... 162
7.8.3. Musical Tones ......................................................................... 164
7.9. Conclusions....................................................................................... 167
7.10. Further Work..................................................................................... 172
7.10.1. Using the Same Model with More Sounds ........................... 172
7.10.2. Optimising the Synthetic Mapping ....................................... 173
7.10.3. Stability Analysis .................................................................. 174
7.10.4. Connections with IFS............................................................ 174
7.10.5. Time Varying Sounds............................................................ 177
8. The Poetry Generation Algorithm..................................................178
8.1. Introduction....................................................................................... 178
8.2. Description of the Algorithm............................................................ 179
8.3. Analysis of the PGA.......................................................................... 184
8.4. Implementation of the PGA for Sound ............................................. 187
8.5. Results............................................................................................... 191
8.6. Conclusions....................................................................................... 197
9. Summary and Conclusions..............................................................200
Appendix A. Previously Published Work..........................................209
AES Preprint ................................................................................................. 210
ISCAS '94...................................................................................................... 221
References ............................................................................................225
8
List of Figures
Figure 1.1 A synthetic cloud, fern and a Julia set [frac90]. ........................................ 20
Figure Error! Bookmark not defined..1 The analysis-synthesis scheme................. 25
Figure Error! Bookmark not defined..2 The sound modelling framework. ............ 28
Figure Error! Bookmark not defined..3 A schematic diagram for additive synthesis.
..................................................................................................................................... 30
Figure Error! Bookmark not defined..4 Karplus-Strong algorithm. Top, simplified
recursive linear filter and bottom, general delay-line view. ........................................ 31
Figure Error! Bookmark not defined..5 The basic units used within the FM (top)
and waveshaping (bottom) synthesis techniques......................................................... 32
Figure Error! Bookmark not defined..6 State space representation of a dynamical
system.......................................................................................................................... 37
Figure Error! Bookmark not defined..7 Illustration of the three regular attractor
types. ........................................................................................................................... 40
Figure Error! Bookmark not defined..8 Sequence of magnifications of the Lorenz
attractor showing its fractal, self-similar property. ..................................................... 42
Figure Error! Bookmark not defined..9 Two simulations of the Lorenz system for
similar initial conditions showing sensitive dependence on initial conditions. .......... 42
Figure Error! Bookmark not defined..10 Three phase portraits constructed from a
time series of observations of the Lorenz chaotic system. Delay values are: (a) 1, (b)
10, (c) 100. .................................................................................................................. 43
Figure Error! Bookmark not defined..11 The logistic mapping for 0 9 . . .......... 45
Figure Error! Bookmark not defined..12 Bifurcation diagram for the logistic
mapping with corresponding time series plots............................................................ 46
Figure Error! Bookmark not defined..13 The exactly self-similar, triadic Koch
curve. ........................................................................................................................... 49
Figure Error! Bookmark not defined..14 General formula for similarity dimension
derived by inspection of standard Euclidean shapes. ................................................. 50
Figure Error! Bookmark not defined..15 Iterative construction of the triadic Koch
curve. ........................................................................................................................... 52
9
Figure Error! Bookmark not defined..16 Area of closed Koch curve (dark grey) is
within area of circle (light grey) showing that it is finite. ........................................... 52
Figure Error! Bookmark not defined..17 Three affine contraction mappings on
X=R
2
and their single combination, W. ..................................................................... 55
Figure Error! Bookmark not defined..18 The repeated application of a contractive
mapping, W, to some initial set B, tending to the limit set, or attractor, A.................. 55
Figure Error! Bookmark not defined..19 Example of Random Itaration Algorithm
(RIA) in operation. The three images show the results of iterating the Markov process,
(a)~100, (b)~300, (c)~1000 times. .............................................................................. 57
Figure Error! Bookmark not defined..20 Examples of RIA attractors where the
mappings are weighted with different associated probabilities................................... 58
Figure Error! Bookmark not defined..21 Example of an IFS attractor partitioned
into three disjoint subsets according to the effect of the three individual contraction
mappings on the attractor. ........................................................................................... 59
Figure Error! Bookmark not defined..22 Bifurcation diagram showing a Hopf
bifurcation occurring at the threshold of oscillation in a wind instrument as the
blowing pressure is increased...................................................................................... 65
Figure Error! Bookmark not defined..23 Time series plots and spectral density
forms for 1/f noise compared with white noise and Brown noise............................... 69
Figure Error! Bookmark not defined..24 Power spectral densities of wind noise
(left) and an industrial roomtone (right) showing 1/f characteristic over the audible
range of frequencies. ................................................................................................... 70
Figure Error! Bookmark not defined..25 A demonstration of the property of
continuous dependence of IFS attractors on the parameters that define them. This also
illustrates the power of manipulation capable with chaotic models [frac90].............. 73
Figure Error! Bookmark not defined..26 An example of the effect of three shear
maps, w w w
1 2 3
, and on the area A and an illustration of one of the vertical scaling
factor, d
1
...................................................................................................................... 77
Figure Error! Bookmark not defined..27 The initial arbitrary set, B, and a sequence
of five iterations of the deterministic algorithm. ........................................................ 81
Figure Error! Bookmark not defined..28 FIF for equally spaced interpolation points
derived from a single cycle of a sinewave, but where the vertical scaling factors
increase for the mappings from left to right. ............................................................... 82
10
Figure Error! Bookmark not defined..29 FIF where x values are spaced according to
a square law. Sequence of magnifications of windows is shown in (a)-(d). ............... 83
Figure Error! Bookmark not defined..30 Same interpolation points as Figure Error!
Bookmark not defined..29, but with 6 iterations showing the cumulative effect of
errors in the algorithm. The bottom plot is a magnification of the middle ~1000 points
of the top plot. ............................................................................................................. 84
Figure Error! Bookmark not defined..31 FIF generated from random x,y and d
values for the interpolation points............................................................................... 84
Figure Error! Bookmark not defined..32 (a) (left) FIF generated with random y
values, but evenly spaced x. All d = 0.9. (b) (right) FIF generated with random y, but
square law x values. All d = 0.9. ................................................................................. 85
Figure Error! Bookmark not defined..33 - see Table Error! Bookmark not
defined..1.................................................................................................................... 86
Figure Error! Bookmark not defined..34 Development of two rhythm/timbres from
rhythmic design, top, through interpolation points, middle, to final waveform, bottom.
..................................................................................................................................... 87
Figure Error! Bookmark not defined..35 Control rule for time-varying FIF sound.
Left, pseudocode where
j
i
j
i
y x , is the ith interpolation point of the jth FIF and d
i
j
is
the vertical scaling factor for the ith map of the jth FIF. Right, graphical depiction of
the effect on the interpolation points through time. .................................................... 88
Figure Error! Bookmark not defined..36 Left, time plot of the whole waveform
generated with the control rule shown in Figure Error! Bookmark not defined..35
with selected magnifications of individual FIFs to show how the sound develops
through time. Right, spectrogram of the first half of the sound showing how it
contains complex, time varying partials similar to those found in naturally occurring
musical sounds. ........................................................................................................... 89
Figure Error! Bookmark not defined..37 Pictorial representation of the FIF
parameter control used to generate the second example of a time-varying FIF sound.90
Figure Error! Bookmark not defined..38 Schematic diagram of the model for
biological evolution..................................................................................................... 92
Figure Error! Bookmark not defined..39 Schematic diagram of hardware used for
GEN program. ............................................................................................................. 92
Figure Error! Bookmark not defined..40 Example of mutation, (a), and
recombination, (b), of FIF parameters......................................................................... 94
11
Figure Error! Bookmark not defined..41 A single screen-shot from the program
GEN............................................................................................................................. 96
Figure Error! Bookmark not defined..42 A sequence of populations generated with
the program GEN. In this case, the FIFs are produced from 6 interpolation points. At
the start (waveform A - top left) all interpolation points and vertical scaling factors
are zeroed. At each stage, 7 mutations are produced and then a single survivor is
chosen by the operator (starred waveform), which reappears as waveform A in the
next generation. ........................................................................................................... 98
Figure Error! Bookmark not defined..43 Starting point (top left) and sequence of
starred waveforms from Figure Error! Bookmark not defined..42 shown in more
detail. ........................................................................................................................... 99
Figure Error! Bookmark not defined..44 Mutated varients of an FIF that is defined
by a relatively large number of parameters. It can be seen (and heard) that when this is
the case, low factor mutations are found not to be distinctive from one another...... 100
Figure Error! Bookmark not defined..45 Results of an experiment to extract
interpolation points by decimating a wind sound waveform and then constructing an
FIF with them. ........................................................................................................... 103
Figure Error! Bookmark not defined..46 Original wind sound waveform (top),
interpolation of peak points (bottom left), and reconstructed waveform (bottom right).
................................................................................................................................... 105
Figure Error! Bookmark not defined..47 Section of original wind sound (left) and
part of the composite FIF (right) constructed using groups of peak points............... 106
Figure Error! Bookmark not defined..48 Mapping of amplitudes in requantisation
process....................................................................................................................... 110
Figure Error! Bookmark not defined..49 Degradation against compression
performance of Mazel's inverse algorithms for a variety of data and model types
compared with the theoretically expected performance of requantisation. ............... 113
Figure Error! Bookmark not defined..50 First trial pair of interpolation points on
the original time series graph. ................................................................................... 115
Figure Error! Bookmark not defined..51 Mapping of whole time series to in
between the first pair of interpolation points. ........................................................... 115
Figure Error! Bookmark not defined..52 Maximum vertical extent of part of the
original time series between a pair of consecutive interpolation points and the
maximum vertical extent of the mapped original time series. The vertical scaling
factor is calculated so as to make these two extents equal........................................ 117
12
Figure Error! Bookmark not defined..Error! Bookmark not defined. Error
weighting function parameterised by ..................................................................... 122
Figure Error! Bookmark not defined..53 Graph of the results shown in Table
Error! Bookmark not defined..9. ........................................................................... 123
Figure Error! Bookmark not defined..54 Comparison of performance between
requantisation and error-weighted version of Mazel's algorithm. The original is 1000
samples of wind noise which is processed as 10x100 sample sections. ................... 124
Figure Error! Bookmark not defined..12 Comparison of performance of the window
restricted inverse algorithm with that of requantisation. The original time series is
wind noise and processed as 10x100 sample sections. ............................................. 126
Figure Error! Bookmark not defined..13 Waveform plot of original wind noise
(left) and compressed FIF version (right) using the modified inverse algorithm. The
compression ratio in this case is 8.1:1, and the SNR is 22.6dB................................ 127
Figure Error! Bookmark not defined..14 Column chart showing the performance
figures given in Table Error! Bookmark not defined..11 for a variety of different
original sound time series. ........................................................................................ 128
Figure Error! Bookmark not defined..55 The proposed analysis/synthesis model
based upon the embedded attractor and measure representation of a sound time series.
................................................................................................................................... 136
Figure Error! Bookmark not defined..56 Left, an example recursive partition for
m=2 and right, the associated search tree. ................................................................. 142
Figure Error! Bookmark not defined..57 Lorenz input, N=10,000, Q=256 and a
variety of embedding dimensions, m.................................................................. 149
Figure Error! Bookmark not defined..58 Lorenz input, N=10,000, m=7, and a
variety of number of domains, Q. ............................................................................. 150
Figure Error! Bookmark not defined..59 Lorenz input, Q=64, m=7 and a variety of
original time series lengths, N ....................................................................... 151
Figure Error! Bookmark not defined..60 Time series plots from original Lorenz
system (left) and the synthetic one shown as phase portrait Error! Bookmark not
defined..58(f) (right). ................................................................................................ 152
Figure Error! Bookmark not defined..61 Estimates of amplitude probability
distributions for original, left, and synthetic, right, time series shown in Figure Error!
Bookmark not defined..60. ..................................................................................... 153
13
Figure Error! Bookmark not defined..62 Time series plots and phase portraits for:
left, original fan rumble sound and right, best synthetic output, rc127..................... 157
Figure Error! Bookmark not defined..63 Time series plots and phase portraits for
some more outputs from the sound model using the fan rumble as input. Note that
only about a third the length of the output appears in the phase portraits as it does in
the time series plots for the sake of clarity................................................................ 159
Figure Error! Bookmark not defined..64 Time series plots (first fifth of top plot
shown magnified as second plot), power spectra and phase portraits for original wind
noise, left, and synthetic version, right...................................................................... 161
Figure Error! Bookmark not defined..65 Time series plots, phase portraits and
amplitude histograms for original, left, and synthetic, right, lightly-struck gong sound.
Both amplitude histograms were computed with 10,000 samples and 100 bins....... 163
Figure Error! Bookmark not defined..66 Time series plots, phase portraits and
amplitude histograms for original, left, and synthetic, right, hard-strike gong sound.
Both amplitude histograms were computed with 10,000 samples and 100 bins....... 164
Figure Error! Bookmark not defined..67 Time series plots, power spectra and phase
portraits for original, left and synthetic, right, tuba tones. ........................................ 166
Figure Error! Bookmark not defined..68 Time series and phase portraits for
original, left, and synthetic, right, saxaphone tones. ................................................. 166
Figure Error! Bookmark not defined..69 Relative one-step prediction errors for the
best results found for each of the time series. .......................................................... 168
Figure Error! Bookmark not defined..70 Autocorrelation functions for original, left,
and synthetic, right, gently struck gong sound. The upper plot shows the function upto
8,000 delays, and the lower upto 100 delays. Both were calculated by convolving
10,000 samples of the time series with itself for different delays. ............................ 171
Figure Error! Bookmark not defined..71 The top line shows the interdependence of
the components of the RIA version of an IFS. The bottom line shows a suggested path
to obtain a solution to the inverse problem. .............................................................. 179
Figure Error! Bookmark not defined..72. Input to the algorithm treated as a circular
sequence. ................................................................................................................... 181
Figure Error! Bookmark not defined..73 Part of the state space, X, corresponding to
an example PGA showing some of the possible states and their associated transitions.
................................................................................................................................... 185
14
Figure Error! Bookmark not defined..74 Crossfade envelopes applied to beginning
and end of original time series which are then added together to form modified time
series. This is then stored in the circular register so that there is no amplitude
discontinuity between its end and its beginning........................................................ 191
Figure Error! Bookmark not defined..75 Time domain plots of the original
roomtone showing 300 (left) and 3000 (right) samples. ........................................... 194
Figure Error! Bookmark not defined..76 Time domain plots of output time series
when (a) I=300, L=1, (b) I=3000, L=3, and (c) I=300, L=4...................................... 194
Figure Error! Bookmark not defined..77 Comparison between original (left) and
synthetic time series (right) showing: (a)&(b) time domain plots, (c)&(d) power
spectral densities calculated by averaging eleven 4096 point FFTs, and (e)&(f)
amplitude histograms calculated from 30,000 samples. ........................................... 195
15
List of Tables
Table Error! Bookmark not defined..2 A summary of possible sound types. After
[ross82]........................................................................................................................ 25
Table Error! Bookmark not defined..3 (left) example set of interpolation points and
vertical scaling factors that define the FIF shown in Figure Error! Bookmark not
defined..27. ................................................................................................................ 80
Table Error! Bookmark not defined..4 (right) vertical scaling factors used in
generating Figure Error! Bookmark not defined..28. .............................................. 80
Table Error! Bookmark not defined..5 and Figure Error! Bookmark not
defined..78 Input data and waveform plot of the resulting FIF that is a rhythm/timbre.
..................................................................................................................................... 86
Table Error! Bookmark not defined..6 Summary of the results obtained by Mazel
for his four FIF based models/inverse algorithms..................................................... 109
Table Error! Bookmark not defined..7 Summary of results for reimplementation of
Mazel's algorithm for the self-affine model. Each original time series of length T
tot
has
been processed as m=10 sections of length T=100. .................................................. 119
Table Error! Bookmark not defined..8 Running algorithm with wind noise as
original time series for a variety of section lengths T. .............................................. 120
Table Error! Bookmark not defined..9 Results of error weighting the inverse
algorithm for a range of weighting function gradients, . The original time series is
wind noise and is processed as 10x 100 sample sections.......................................... 122
Table Error! Bookmark not defined..10 Performance of modified FIF inverse
algorithm with a specified window restricting the range of the trial interpolation point.
................................................................................................................................... 126
Table Error! Bookmark not defined..11 Table of performance figures for window
restricted inverse algorithm using a variety of sound time series. Each original time
series is processed as 10x100 sample sections and the restriction window is set at l=15
and r=25 samples. ..................................................................................................... 127
Table Error! Bookmark not defined..12 Summary of results using fan rumble sound
as input to the dynamic model............................................................................... 156
Table Error! Bookmark not defined..13 Summary of analysis parameters for best
results using gong sounds.......................................................................................... 162
Table Error! Bookmark not defined..14 Analysis details for the musical tones. .. 165
16
Table Error! Bookmark not defined..15 Example of the PGA acting on a short
paragraph of text for a variety of values of the seed length parameter. .................... 180
Table Error! Bookmark not defined..16 Example sequence of iterations of the PGA.
................................................................................................................................... 182
Table Error! Bookmark not defined..17 Simple example showing how the
preprocessing reorders the original input sequence. ................................................. 189
Table Error! Bookmark not defined..18 Summary of results obtained with PGA and
industrial roomtone as original time series. (Numbers in brackets are experiment
identification.) ........................................................................................................... 192
Table Error! Bookmark not defined..19 Summary of results for PGA used with
other roomtones having different qualities................................................................ 196
Table Error! Bookmark not defined..20 Summary of results obtained with PGA and
a variety of other background sounds........................................................................ 197
17
List of Sound Examples
All sounds are created by playing 16-bit sound files at 48kHz or 44.1kHz sample-
rate unless otherwise stated. The sample-rate is indicated by the suffix of the sound
file name given in brackets after each description. For example, '.441' indicates an
original sound recording made with a sample-rate of 44.1kHz or a synthetic version
played-back at that rate. The suffix '.mbi' is used to indicates an abstract waveform
with no intrinsic sample-rate. These files are played at 48kHz.
Playback is via a Digital Audio Labs 'CardD Plus' system connected to an IBM
compatible P.C. This allows an AES/EBU compatible, serial digital audio data-stream
to be generated from the sound file. This is then passed to a Sony TCD-D10 digital
audio tape (DAT) recorder which is used as the digital-to-analogue device.
Chapter 5
1. FIF derived from 17 equally x-spaced interpolation points taken from a single
sinewave cycle, 5 iterations. (sine_5.mbi) .................................................................. 81
2. Same as Sound 1, but with increasing vertical scaling factors. (sine3.mbi) ......... 81
3. FIF derived from 129, square-law x-spaced interpolation points taken from a single
sinewave cycle, 3 iterations. (sine9_3.mbi) ................................................................ 82
4. Same waveform used in Sound 3, but played as a sequence where the speed of
playback is halved at each stage. (sine9_3.mbi) ......................................................... 83
5. FIF derived from randomised interpolation points and vertical scaling factors.
(rand4.mbi).................................................................................................................. 84
6. FIF derived from interpolation points whose y-values are randomised, but that are
regularly x-spaced. (rand2.mbi)................................................................................... 84
7. Same as Sound 6, but with square-law x-spacing. (rand3.mbi) .............................. 84
8. Original FIF rhythm/timbre. (fif1.mbi) ................................................................... 85
9. Same waveform used in Sound 8, but played as a sequence where the speed of
playback is halved at each stage. (fif1.mbi) ............................................................... 85
10. First designed FIF rhythm/timbre. (rhy2_1_x.mbi) .............................................. 86
11. Second designed FIF rhythm/timbre. (rhy4_4.mbi) .............................................. 86
12. Percussive sounding, time-varying FIF. (tv1.mbi)................................................ 89
18
13. Second example of a time-varying FIF. (tv2.mbi) ................................................ 89
14. Audio output from the program GEN which accompanies Figure Error!
Bookmark not defined..79. Each of the 8 sounds is a member of a single evolved
population of FIFs. Played at 48kHz........................................................................... 95
15. Sounds to accompany Figure 5.17. Each of the 8 sounds is the chosen survivor of
a sequence of generations produced with GEN. Played at 48kHz. ............................. 96
16. Concatenated sequence of ~15 short, evolved FIFs. (mbi1log.mbi)..................... 97
17. Concatenated sequence of 4 related FIF rhythm/timbres. (goodone.mbi) ............ 97
18. Audio output from GEN which accompanies Figure 5.19. Each sound is the
member of one generation evolved from FIF parameters similar to those used in
Sound 3. It can be heard how there is little to distinguish the mutated offspring. Played
at 48kHz. ................................................................................................................... 100
Chapter 6
19. FIF whose interpolation points are the peak-points of a wind noise waveform.
(wp1.mbi) .................................................................................................................. 105
20. As Sound 19, but using groups of peak-points. (wp2.mbi)................................. 106
Chapter 7
All the examples from Chapter 7 are presented as pairs of the original sound and
the synthetic version produced with the chaotic predictive model.
21. Original fan rumble air-noise. (fan_rmb5.48)..................................................... 157
22. Synthetic version of above. (rc127b.48) ............................................................. 157
23. Original wind noise. (wind6.48) ......................................................................... 160
24. Synthetic version of above. (rc162b.48) ............................................................. 160
25. Original lightly-struck gong sound. (gong4.48) .................................................. 162
26. Synthetic version of above. (rc115b.48) ............................................................. 162
27. Original hard-strike gong sound. (gong6.48) ...................................................... 162
28. Synthetic version of above. (rc148b.48) ............................................................. 162
29. Original tuba tone. (tuba2.48) ............................................................................. 165
30. Synthetic version of above. (rc175x.48) ............................................................. 165
19
31. Original saxophone tone. (sax9.48) .................................................................... 165
32. Synthetic version of above. (rc108x.48) ............................................................. 165
Chapter 8
33. Original industrial roomtone. (rmt4.441)............................................................ 192
34. Synthetic version of Sound 33 produced by PGA where I=300 and L=1.
(rmt4_111.441).......................................................................................................... 192
35. As 34, but I=3,000 and L=2. (rmt4_212.441) ..................................................... 192
36. As 34, but I=3,000 and L=3. (rmt4_213.441) ..................................................... 192
37. As 34, but I=3,000 and L=4. (rmt4_214.441) ..................................................... 192
38. As 34, but I=30,000 and L=2. (rmt4_312.441) ................................................... 192
39. As 34, but I=3,000 and L=5. (rmt4_315.441) ..................................................... 192
40. Original laboratory roomtone, played at 48kHz. (lab_rmt.48)............................ 196
41. Synthetic version of above produced with PGA, played at 48kHz.
(lab_313.48) .............................................................................................................. 196
42. Original 'rumble-like' industrial roomtone. (rmt11.441)..................................... 196
43. Synthetic version of above produced with PGA. (rt11_314.441) ....................... 196
44. Original industrial roomtone with drone. (rmt15.441)........................................ 196
45. Synthetic version of above produced with PGA. (rt15_314.441) ....................... 196
46. Original river sound. (river.48) ........................................................................... 197
47. Synthetic version of above produced with PGA. (rive_313.48) ......................... 197
48. Original wind noise. (wind1.48) ......................................................................... 197
49. Synthetic version of above produced with PGA. (wind_313.48)........................ 197
50. Original audience applause sound. (applause.48) ............................................... 197
51. Synthetic version of above produced with PGA. (appl_312.48)......................... 197
52. Original rainforest ambience. (ecuador.48)......................................................... 197
53. Synthetic version of above produced with PGA. (ecua_314.48) ........................ 197
54. Original speech extract. (speech.48) ................................................................... 199
55. Synthetic version of above produced with PGA. (sp_pga.48) ............................ 199
20
21
Summary of Acronyms
DAT Digital Audio Tape
DSP Digital Signal Processor
FFT Fast Fourier Transform
FIF Fractal Interpolation Function
FM Frequency Modulation
IFS Iterated Function System
jpdf joint probability density function
LPC Linear Predictive Coding
pdf probability density function
PGA Poetry Generation Algorithm
RIA Random Iteration Algorithm
rms root mean square
SDS Shift Dynamical System
SNR signal to noise ratio
20
Chapter 1
Introduction
This thesis is about applying science and technology to the arts. In particular, the
science is that of chaos theory, which includes fractal geometry, the technology is the
computer, and the medium of interest, sound. Fractals and chaos are recent
developments which are revolutionising our understanding of the complex and
irregular nature of the world. Chaos theory is concerned specifically with the
behaviour of nonlinear dynamical systems. It is about the realisation that simple,
deterministic systems can exhibit complex, unpredictable behaviour. Fractal geometry
deals with a class of forms that are not accounted for by conventional, Euclidean
geometry. The two overlap with the concept of a strange attractor which both
embodies the nature of chaotic systems and is itself a fractal object. The relevance and
use of chaos and fractals is currently spreading through a diverse range of subjects. A
number of developing areas of interest are characterised by the overlap of both
scientific and artistic concerns. In particular, two subjects have emerged that have
considerable popularity: visual art and music. Both combine fractal and chaotic
models with computer technology to provide powerful tools for artistic
experimentation. The aim of this work is to seek a parallel to this, but involving
sound.
Consider the images shown in Figure Error! Bookmark not defined..1. These
are examples of the power of fractals and chaos. Using only very simple models it is
possible to create images that can be either complex abstract forms or realistic replicas
of natural objects. The question is, can the same be found in the acoustic domain? For
example, could a complex, naturally occurring sound be represented with a simple
model? Does there exist an aural equivalent of the Julia set?
Figure 1.1 A synthetic cloud, fern and a Julia set [frac90].
21
Interest in fractal music has concentrated on the arrangement of sequences of notes
with reference to fractal or chaotic models. Although the end product is audio, the
actual sounds used are conventional natural or synthetic ones (for example see
[pres88, gogi91 and jone90] ). The time scale on which fractals and chaos are being
used for music, then, is different to that of the sounds themselves. Musical
fluctuations range from thousandths of Hertz up to several Hertz. Audio fluctuations,
however, range from hundreds of Hertz to tens of thousands. An important discovery
that supports the use of fractals and chaos for music composition is that, when
analysed, music from a wide range of cultures and historical periods is found to have
fractal properties [voss78, hsu90 and hsu91]. It has been suggested, however, by
Benoit Mandelbrot, the inventor of the term fractal, that such properties should not
extend beyond the musical structure to the sounds themselves as these are governed
by different mechanisms [mand83].
But why should this necessarily be the case? What about the complex and
irregular side of musical sound, for example the hiss of a breathy saxophone, or the
crash of a cymbal? Also, what about non-musical sound? All around us there are
complex and irregular sounds generated by our environments: a burbling brook,
splashing water, the roaring of the wind, the rumble of thunder and the variety of
screeching, scraping, buzzing and humming noises made by machinery. Is it, perhaps,
that these sounds represent an aural equivalent to the shapes found in nature that have
been neglected by Euclidean geometry and then rediscovered as fractals? Criticising
the conventional Fourier approach to modelling musical sound, the contemporary
composer Iannis Xenakis has said:
"It is as though we wanted to express a sinuous mountain silhouette by portions of
circles." [xena71]
Compare this to what Mandelbrot says in the introduction to his 'The Fractal
Geometry of Nature':
"Clouds are not spheres, mountains are not cones, coastlines are not circles, and
bark is not smooth, nor does lightning travel in straight lines." [mand83]
This thesis, then, presents an exploratory study into the idea of using chaos theory
and fractal geometry to model sound. Apart from the interest in this as a research
topic, the work is practically motivated with the aim of developing computerised tools
that would allow control over complex and irregular sounds for creative uses. The
potential applications for such tools include computer music composition and the
generation of sound effects for film and television.
22
The overall design of this thesis is as follows: Chapters 2, 3 and 4 present the
background to this thesis and develop specific problems on which to work. Then
Chapters 5, 6, 7 and 8 present original contributions towards the solution of these
problems. Each of these chapters contains its own conclusions and a discussion of
further work where relevant. Chapter 9 contains a summary of the thesis and some
general conclusions. An appendix is included which contains copies of previously
published papers on this work and the thesis ends with a full list of references.
Throughout the thesis, references are made to sound examples which are presented on
an accompanying cassette tape. The sound examples are listed, along with all figures
and tables, after the contents pages. Also included is a summary of acronyms for
reference. The content of each chapter is previewed below.
Chapter 2 defines what is meant by a sound model. It considers what sound is, and
the general concept of its representation via the procedures of analysis and synthesis.
Some specific applications are described, including 'the roomtone problem', which
allows a functional description of a model to be developed. Brief reviews of some
well known models fitting this description are given including some of their
advantages and limitations.
Chapter 3 presents a review of chaos theory and fractal geometry. This includes an
outline of some main features and their significance. The emphasis is on
understanding how complex behaviour arises from simple systems, the importance of
strange attractors, and the introduction of Iterated Function Systems (IFS), which
provide a useful practical framework for manipulating strange attractors.
In Chapter 4 the issue of applying the ideas of chaos theory and fractal geometry
to the problem of modelling sound is considered. It is argued that both appear to have
potential use, but that two main questions are raised. Firstly, on a diagnostic level: are
sounds chaotic or fractal? Positive evidence is collected both from the literature and
from original work. The second question is then a practical one: in what way can
sound be represented with chaos or fractals? The conclusion is to concentrate on using
strange attractors in two different ways with an emphasis on involving IFS.
Chapter 5 is concerned with using IFS strange attractors to produce synthetic
sound by generating waveforms with Fractal Interpolation Functions (FIF), a class of
IFS. A basic technique is designed that is then advanced in several ways. The most
important result is the discovery of a new class of sounds that are simultaneously
rhythms and timbres. With these techniques complex sounds may be generated with
small amounts of data and are demonstrated to have potential for musical applications.
Chapter 6 keeps the theme of FIF, but considers the analysis and synthesis of a
given sound. An algorithm is taken from the literature which appears suitable for this
23
task. It is shown, however, to be inadequate, a reason found, and the algorithm
improved. Results indicate that some degree of data compression may be obtained for
certain sounds.
Chapter 7 is concerned with the problem of modelling the dynamics of a sound via
a strange attractor. The assumption is made that a chaotic system is responsible for a
digital audio time series. The system may then be reconstructed from the time series
with a technique known as embedding. Because of the properties preserved by
embedding, the construction of another chaotic system that approximates the
embedded one should produce a time series that is statistically similar to the original.
An approach to this problem is considered which combines techniques taken from
work on the nonlinear prediction of time series with an original method inspired by
the Shift Dynamical System (SDS) version of an IFS. An analysis/synthesis algorithm
is developed and a number of experiments performed. The algorithm is shown to be
capable of modelling known chaotic systems from their time series. Also, despite
some difficulties, the algorithm is capable of successfully reproducing some natural
sound so that it is perceptually similar to the original.
Chapter 8 is also concerned with the problem of modelling the dynamics of a
sound in an embedded state space setting. The model considered, however, is the
Random Iteration Algorithm (RIA) version of an IFS where a Markov chain is used to
model the embedded invariant measure. In the course of this investigation, an
algorithm is developed which solves the roomtone problem for certain ambient
sounds.
Chapter 9 presents a summary of the thesis and some general conclusions on the
subjects of inverse problems, algorithmic complexity and developments of the work.
24
Chapter 2
Modelling Sound
This chapter develops a working definition of a sound model. It will consider what
sound is and its representation within an analysis/synthesis framework. Some possible
applications of such a model will be discussed including a specific one concerning
film sound-track editing, known as 'the roomtone problem'. This leads to a set of
useful functions that define the model. Also, a brief review of established modelling
techniques, their advantages and limitations is included.
2.1. Sound and its Representation
What is sound? It can be defined as either an auditory sensation perceived by the
mind, or as the physical disturbance that gives rise to such a sensation [ross82]. A
practical model for sound has, in some way, to represent it in an appropriate form.
Starting from this definition of sound there are a number of levels on which this
representation could take place. Consider these as ordered from the outside in: on the
outside level, a model could be made of the complete physical system that is
responsible for the sound. This might include the source of the disturbance and its
reverberant environment. A list of possible disturbances is shown in Table 2.1.
Secondly, this model may be simplified to include only that which is relevant to
describing the pressure fluctuations in the air at a single point; for example at the ear
or a microphone. Next, a model could be made for the time waveform created by
recording those pressure fluctuations at a single point without any, or little,
consideration of the physical system that created it. The waveform is then an abstract
pattern which is to be modelled. Finally, the model may account for just the
perception of the sound, so that an accurate representation of the time waveform is not
necessary, but a representation is needed that just contains the relevant information to
capture the essential characteristics of the sound.
At whatever level the representation is made, a useful framework within which to
test its validity is provided by the analysis-synthesis scheme shown in Figure 2.1
[riss82]. The important feature is that a listener judges how good the representation is
at capturing the characteristics of the sound. In order to refine this modelling
framework, it will be useful to consider some of the applications where sound models
are, or might be used.
25
synthesis analysis
representation
sound sound
listener
Figure 2.1 The analysis-synthesis scheme.
Physical Disturbance Example
vibrating solid bodies metal bar, speaker cone, violin body
vibrating air column pipe organ, woodwind instrument
flow noise in fluids due to
turbulence
jet engines, air leaking under
pressure, wind noise
interaction of
moving solid with fluid
or
moving fluid with solid
rotating propeller or fan blade
air flow in duct or through grill,
water in pipe, waves breaking on sea
shore
rapid changes in temperature or
pressure
thunder and other sounds caused by
electrical discharge, chemical explosion
shock waves caused by motion or
flow at supersonic speed
supersonic boom caused by jet
aircraft
Table 2.1 A summary of possible sound types. After [ross82].
2.2. Music composition.
An important aspect of music composition is, obviously, the control over the type
and quality of sound used. This century has seen the use of electronic and, more
recently, computer based techniques grow from the experimental to the mainstream.
Typically, such techniques involve obtaining musical sound and processing it to
modify it, or generating it entirely synthetically. Of importance are the degrees of
26
musical usefulness and flexibility that are offered by a technique coupled with the
ease and efficiency with which it can be executed.
Imagine the example of a drum synthesiser. What might be its attractive features
for a composer? It might be able to take the recording of an original drum sound and
reproduce it so as to retain its relevant characteristics, discarding any perceptually
unimportant information in the process. It might then allow the sound to be modified
in a way related to its physical attributes, for example, to be able to change the sound
as if it came from a larger version of the same drum, or one that had a tighter skin and
has been struck with a different beater. Furthermore, the synthesiser might allow drum
sounds to be generated that it would not be possible to create with real instruments.
A more detailed discussion of sound modelling techniques used for music
composition is given in the forthcoming sections 2.5 - 2.8.
2.3. The Roomtone Problem
Another area of creative sound use is film sound-track editing. This, as with music
composition, generally involves manipulating sound in a number of ways except that
often the sound is non-musical. A good example of this is the use of sound effects.
Here, the desire is to add certain sounds to a film to enhance or complement what is
taking place visually. Traditionally, this is done by simulating the appropriate sounds
with a variety of acoustic devices or making use of large reference libraries of
recordings. It is, however, often problematic and time consuming to get exactly the
desired sound. A specific example of this is the roomtone problem which was posed
by the company that sponsored this research.
The roomtone problem arises during post-production editing of a film sound-track.
Often, due to problems that have occurred with the location filming, it is necessary to
replace sections of the original sound-track at a later date. For example, this can
involve having them dubbed by the original actors in an acoustically dry sound studio.
The problem occurs when the new pieces of sound track are inserted into the original
as there is often a noticeable lack of background sound. As these background sounds
tend to be characteristics of internal locations, they are known as roomtones. One
traditional solution to this problem involves referring to libraries of roomtone
recordings to find a matching sound. It is often difficult, however, to find exactly the
right sound and the process can also be time consuming. Another solution is to make
use of small snippets of the roomtone found in places on the original recording, for
example between lines of dialogue. These may be spliced together, or looped to form
as long a piece as is necessary. As with the other solution, this can be an intricate and
27
time consuming process, the results often not good enough because the splices and
loops are audible.
An ideal solution to this problem, then, would be some form of sound model that
is able to capture certain essential characteristics of the roomtone from a small
original sample and then produce greater quantities of a synthetic version.
Both the examples of the drum synthesiser and the roomtone problem illustrate a
certain type of creative application for sound models. Generally, the need is for the
model to capture essential characteristics of the sound; for it to allow useful
manipulation of the sound; and/or for it to generate synthetic sound. An important
aspect of such models is that the representation involves a set of parameters. These are
the variables of the model that, with the particular representation, form all the
information extracted by the analysis, and/or used by the synthesis. So for the drum
model, the parameters might include the physical attributes of the drum, or for the
roomtone model, the extract of original sound.
2.4. Digital Audio
Being more specific about the sound model, it is assumed that it will operate
within a computer and therefore rely on digital audio as an intermediate
representation. This brings the enormous advantage that the modelling process may be
implemented as a computer program, which makes it highly flexible, and convenient
to develop [math82]. Digital audio satisfies the definition of a representation for
sound that has been given already. It is a discrete time, discrete amplitude model for
the time waveform generated from recording sound at a single point in space. It
preserves perceived information in the form of all frequencies contained within the
sound up to one half of the sampling frequency. This is guaranteed by Nyquist's
sampling theorem [nyqu28]. It is, however, unwieldy, in that a large amount of data is
required for good quality representation. For example, the industry standard of a
48kHz sampling rate and 16 bits per sample [aes85] means that approximately one
million bytes of data are required to represent ten seconds of sound; this data not
being in a form that is obviously related to the perceived characteristics of the sound.
This is therefore another reason for further representation of the sound waveform: so
as to reduce the amount of parameter data. Assuming the use of digital audio and
therefore computers also means that the model has to perform its desired functions
within the constraints imposed by the processing ability of the computing devices
used.
28
2.5. The Modelling Framework
Following the discussion developed within this chapter, then, a working functional
description of a sound model is summarised as follows. A sound model is of use if:
1) it can represent the essential perceived characteristics of the sound;
2) there is less parameter data than there is original sound data;
3) the parameter data is of a form such that its manipulation has a useful or
interesting effect on the sound;
4) it can generate new sounds, or replicas of naturally occurring ones, from a little
data and/or a simple model.
Although much is known for particular situations, it is very difficult to say, in
general, what physical attributes of the sound it is sufficient to preserve in the
representation so as to satisfy 1). This is still an open question in psychoacoustics [see
deut82]. Point 2) on its own may also be described as data compression. Although this
tends to be an attractive feature of a model in terms of reducing the amount of storage
required, it is considered here also in combination with 3) in the sense that the
parameters are more manageable if there are less of them. The synthesis capability of
the model, 4), may be derived from the analysis model and used by supplying it
modified, or artificial parameters, or it may exist on its own as a synthesis-only
technique.
It has also been assumed that the model will operate on a digital audio
representation so that it can operate within a computer. A more detailed diagram of
the sound modelling framework, then, is shown in Figure 2.2.
representation
parameters
modify etc.
operator
microphone loudspeaker
analysis synthesis
reconstruct
and
amplify
1374
1587
1745
1956
....
....
....
....
digital audio
time waveform
1374
1587
1745
1956
....
....
....
....
digital audio
sample and quantise
sound
Figure 2.2 The sound modelling framework.
29
Now that a general modelling framework has been defined, the next section gives
some brief reviews of particular, well known representations that fit this description.
These serve to illustrate the points made so far, and act as a reference when the issue
of modelling sound using chaos theory is discussed in Chapter 4.
2.6. Conventional Models
2.6.1. Physical Modelling
Physical modelling is a synthesis-only technique that is used to generate musical
sound from a computer representation of the physical system responsible for that
sound. The system can include the action of the musician on the instrument, and the
instrument itself. The system is usually partitioned according to physical, functional or
computational criteria which in fact often coincide. So for example, a violin may be
divided into the bow, strings, bridge and soundboard as separate coupled physical
systems; or into an excitation part (bow on string) that feeds a resonator (string,
bridge, sound board); or into a nonlinear oscillator (exciter) that is input to a linear
filter (resonator).
The appeal of physical modelling is that sounds may be created from a purely
theoretical basis and that the models and parameters are in a form that can be
intuitively understood by the user. The main disadvantage is that despite much basic
theory being known about the physics of musical sound generation, often the models
resulting from a direct implementation of the equations produce sounds that are flat
and lifeless [riss82]. This suggests that there are therefore many subtle aspects of
sound production that are important to the highly sensitive perceptual mechanisms of
the ear and brain that are not included in the basic theory. This is an area of current
research [cmj92].
2.6.2. Additive and Subtractive Synthesis
Additive and subtractive synthesis are terms used to cover a range of analysis-
synthesis techniques used for modelling musical instrument and voice sounds and
which rely on spectral representations of the time waveform. As mentioned above, a
number of such sounds can be presumed to be the product of some form of excitation
feeding a resonator. A time-varying spectral analysis of the sound can reveal these
components in a form that then suggests suitable further representations. For example,
such an analysis shows a bowed violin sound to consist of an approximately periodic
excitation, revealed as a set of harmonically related spectral lines, or partials, within
30
an overall spectral envelope, which is attributed to the resonances of the violin body.
A similar result can be found for voiced speech sounds, where the resonances, also
called formants, vary in time. Unvoiced speech sounds, however, show a broad-band
spectrum modulated by the formant envelope.
Additive synthesis seeks to regenerate the sound by adding together a set of
sinewaves whose frequency and amplitude 'trajectories' vary in time [serr90, riss82].
A diagram of this is shown in Figure 2.3. The trajectories are extracted from the
spectral analysis using a variety of methods. In this form, however, a large amount of
parameter data can be generated. It has been shown, however, that it is the overall
trend of the trajectories that is of greatest perceptual importance and their
approximation with simple piece-wise linear functions allows a considerable degree of
data reduction while maintaining the quality of the reproduced sound [grey75].
Modification of these functions then also allows musically interesting transformations
of the sound.
output
.
.
.
.
.
.
.
.
amp 1
+
freq 1
amp 2
freq 2
amp 3
freq 3
amp 4
freq 4
amp 5
freq 5
amp 6
freq 6
control
sinewave
generators
trajectories
Figure 2.3 A schematic diagram for additive synthesis.
Additive synthesis works well at representing certain sounds to a high degree of
perceptual accuracy. These are ones with a well defined partial structure arising from
periodic excitation and/or systems with simple vibrational modes. It is, however,
limited in its capability to represent complex or noisy sounds, i.e. ones with broad-
band spectral structures.
Subtractive synthesis also seeks to regenerate the sound using the spectral
information. It does this in the opposite sense to additive synthesis by starting with a
spectrally rich input that is then refined with a time varying filter. The excitation may
31
be periodic or noise-like, to give harmonic or wide-band spectral structure
respectively. The filter then shapes this to provide the formant envelope.
A powerful method for estimating suitable filters is linear prediction [makh75,
moor90]. This encompasses a number of techniques that allow the estimate of
parameters for a digital, recursive linear filter from the original time series. These
filters are of the form,
y x b y
n n i n i
i
M

1
where x is the excitation input, y the output, b the filter coefficients, and M the filter
order which corresponds to one half the number of formant peaks.
This technique is used widely for speech modelling where between 3-7 formants
are required to adequately represent the sound, and so provides a considerable degree
of data reduction. Attempts at modelling drum sounds suggest that approximately 100
are necessary [sand89]. This technique offers the potential for modification of the
individual resonances or the excitation so as to transform the sound in an intuitive
way. There are difficulties, however, associated with the numerical manipulation and
implementation of the high order filters required [sand92].
A much simplified synthesis-only derivative of the recursive filter model, known
as the Karplus-Strong algorithm, has been found to generate certain sounds very
effectively. These include plucked string, drum and electric guitar timbres [karp83,
jaff83, sull90]. The simplification is in having high order filter models, but with all
the coefficients set to zero except the higher index ones. Variants include the insertion
of other elements, for example randomly controlled switches and nonlinearities, in the
feedback path. It is therefore equivalently described as a delay-line with feedback via
some kind of modifier. Both these views are shown in Figure 2.4. Typically, the sound
is generated by inputting a burst of noise, or a simple periodic waveform to the delay
line.
modifier
output
z
-1
z
-1
z
-1
z
-1
z
-1
z
-1 ......
+
delay of samples D
output
delay of samples D
coefficients
input
input
Figure 2.4 Karplus-Strong algorithm. Top, simplified recursive linear filter and
bottom, general delay-line view.
32
Finally, a technique for combining both additive and subtractive synthesis has also
been proposed [serr90].
2.6.3. Frequency Modulation and Waveshaping
Frequency modulation (FM) and waveshaping are related synthesis-only
techniques that allow the generation of sounds with complex line spectra using simple
models [chow73 and lebr79]. A basic unit of each technique is shown in Figure 2.5.
The units are then combined by either adding several outputs together, or nested so
that the output of one forms the input to another. The parameters inputted to the
model are accessed directly by the user, and/or controlled by simple functions to
generate time-varying sounds.
To their advantage, the sounds produced by these models are often approximate
replicas of musical ones. Both harmonic and inharmonic sounds may be simulated that
are like those generated from string or wind, and percussive instruments, respectively.
It is also possible to generate a wide range of abstract sounds. The relatively small
number of parameters involved allows for easy experimentation by the user and the
simplicity of the models enables them to be easily implemented.
+
amp
freq
amp
freq
output
output
x(t) f [x(t)]
f
nonlinear
function
input
carrier frequency
modulation
frequency
and
intensity
output
amplitude
Figure 2.5 The basic units used within the FM (top) and waveshaping (bottom)
synthesis techniques.
The disadvantages of these models are that no analysis methods exist that can
produce a set of parameters from a given sound and that, as with physical modelling,
the sounds can lack certain 'natural' qualities [moor90].
33
2.7. Summary
This chapter has developed the concept of a model for sound with which to work.
The principal idea is that of representation. There are many levels on which a
representation for sound can take place, from the physical to the perceptual. Also,
several representations may be used together. An example is the chain of
representations that exists within the additive synthesis model: physical system;
pressure fluctuations at microphone; time waveform; digital audio time series; time-
varying spectrum; set of variable amplitude and frequency sinewaves; set of piece-
wise linear functions.
From a consideration of the types of creative applications where such a model
might be used, a functional description has been advanced. Central to this description
is the idea of a parameterised representation, where the parameters consist of less data
than the modelled sound and are of a form that facilitates manipulation of the sound in
useful ways.
Finally, several well known models fitting this description have been reviewed. These
models are primarily for music and speech sounds and, consequently, focus on
representing those elements that characterise such sounds, both physically and
perceptually, for example spectral lines and formant envelopes. The models, therefore,
concentrate mainly on the top two categories of Table 2.1. No models fitting the
description given in this chapter have been found in the literature which have been
found for sounds that are outside these categories.
34
Chapter 3
Chaos Theory and Fractal Geometry
3.1. Introduction
This chapter presents an overview of chaos theory and fractal geometry. The
intention is to present a theoretical basis for the forthcoming chapters. Theory relevant
to each experimental chapter is then presented in that chapter. The emphasis is
therefore on the following subjects: the significance of chaos and fractals; strange
attractors; Iterated Function Systems; and several other relevant ideas and tools. The
chapter may be read in its entirety as a concise introduction to chaos and fractals, or
referred to as and when needed during later chapters. Sources for the general theory of
chaos and fractals include [stew92, farm90, glei87, laut88, deva89, schr91, peit88,
moon87, hao84, mand83, barn88].
Chaos theory is about a new understanding of dynamics, the way in which systems
behave through time. It concerns the realisation that deterministic systems which obey
fixed laws, can exhibit unpredictable behaviour. This runs contrary to the established
viewpoint, dating back to Newton, that the behaviour of deterministic systems can be
predicted for all future time. Also, chaotic behaviour, characterised by being irregular
and complex, may be found in very simple systems. This, again, apparently
contradicts the traditional scientific expectation that complex behaviour arises only in
complex systems.
The theory of fractals, however, provides a new understanding of geometry. It is
based on a realisation that there exists a large class of geometric objects not
encompassed by the traditional Euclidean geometry of points, lines and circles, or the
forms of differential calculus, for example smooth curves. Fractal objects have
properties unlike those of their traditional counterparts because of the way they fill
space. For example, they typically have dimensions which are not integers and curves
with infinite length can be contained within a finite volume. Many fractals have the
same form when viewed on different scales, a property known as self-similarity. Like
chaos, it is also possible to construct complex fractal forms using only simple rules.
Of greatest importance, perhaps, is that both chaos and fractals can accurately
represent naturally occurring phenomena. Advances in abstract theory have been
paralleled with discoveries of real-world phenomena which confirm the relevance and
usefulness of chaos and fractals. A selection of the subjects in which this has taken
place are: architecture, art, astrophysics, biology, chemistry, communications,
35
computing, data compression, economics, electronics, fluid dynamics, geology,
geophysics, linguistics, meteorology, music, physics, signal processing. See [glei87,
pick90, schr91, peit88, cril91, stew90 and moon87] and references therein.
3.2. The Significance of Chaos
Chaos theory concerns the dynamic behaviour of simple nonlinear systems.
Traditionally, the problem of dynamics has been approached in two different ways -
deterministic dynamics and stochastic processes. The deterministic approach assumes
that fixed laws govern the behaviour of a system. These laws may be written down
with linear differential equations, a solution found, and so the behaviour of the system
is known for all time. Such an approach applies to systems with a few degrees of
freedom and where linear relationships, or approximations, exist between the
component parts. The advantage to this approach is that the resulting solution gives
complete, predictive knowledge about the behaviour of the system. The main
disadvantage, however, lies also with the solution - it is not always possible to find
one. Analytic techniques do not provide a universal means of solution to systems of
differential equations, especially if they contain nonlinearities.
The alternative, stochastic, approach makes the assumption that the system under
investigation is too complex to be able to describe explicitly with fixed laws. This is
either because there are too many degrees of freedom, or it is not possible to measure
all the relevant aspects of the system. In this case, a partial description of the system
may be given using probability. That is, the degree of uncertainty about a system's
present state, or future behaviour may be quantified. Instead of describing the dynamic
behaviour of every degree of freedom with an explicit solution, only the likelihoods of
expected behaviour are known. These correspond to the average or typical behaviour
found by empirically accumulating information about the system. This is also a
powerful approach as, for example in thermodynamics, the average properties of
particles in a gas provides a useful description despite the exact behaviour of the
particles not being known.
Both these approaches have been maintained, side by side, in science for hundreds
of years. The deterministic description is assumed to be nearer to the true behaviour of
the system than the probabilistic one which is thought only to arise because of
ignorance about the system. It is also implied by these approaches that the degree of
complexity of the system relates to that of its behaviour. The analytic solution to a
dynamic system with few degrees of freedom is simple and regular. The complex
motion of particles in a fluid is assumed to be a consequence of the large number of
particles and their interactions.
36
The significance of chaos theory is that it is an understanding of dynamical
systems that combines elements of the two traditional approaches. In fact, a
deterministic chaotic system and a stochastic process can be indistinguishable to an
observer of the two systems.
Chaotic systems are deterministic and may be written down with explicit fixed
laws. They are, however, nonlinear and so, in general, no analytic solution can be
found. Instead the system may be explored by numerical integration with the help of a
computer. In fact, one of the reasons for the discovery of chaos can be attributed to the
availability of the computer.
Although chaos may occur in systems that are simple and deterministic, chaotic
behaviour is typically complex, irregular and unpredictable. The unpredictability of
chaotic behaviour does arise from ignorance about the system, but is not due to there
being too many degrees of freedom. It is ignorance about the exact state of the system,
a theoretical possibility, but practical difficulty, that manifests itself as
unpredictability of the system's future behaviour. The complexity and irregularity of a
chaotic system, however, is inherent to the system and has nothing to do with the
ignorance of an observer. Both in theory and practice, the complexity of chaotic
behaviour is manifest.
As well as being a revolution in scientific theory, chaos is significant for being
found to represent many naturally occurring dynamic processes. The list of subjects
given in the introduction to this chapter and the computer generated images shown in
the introduction to the thesis are all examples of this. It is the representation of
naturally occurring acoustic dynamics which concerns this work and will be discussed
further in the next chapter.
3.3. Dynamical Systems and State Space
A general description of a dynamical system begins with its state, x . This could
be, for example, a scalar, vector or function that gives all the information about the
system, or all that is relevant, at any one time. For this work it is considered to be a
vector. The state may then be represented by a single point in state space (sometimes
known as phase space). For a system with d degrees of freedom, x will have d
components and the state space will be d-dimensional. The behaviour of the system is
then charted by the movement of the point in state space through time. The path it
takes is known as a trajectory, and in the discrete-time case it is also known as an
orbit. This is illustrated in Figure 3.1.
If the system is deterministic then the rules governing the temporal evolution of
the point in state space may be described by a single equation. In the continuous time
case,
37
, ) , ) t x f
dt
x d
=
(3.1)
which describes a flow and in the discrete time case,
, )
t t
x F x =
+1 (3.2)
which is a mapping. Because this work concerns modelling digital audio, the discrete
time version will be used.
state of system
trajectory
evolution of state
state vector
each dimension = variable of system
Figure 3.1 State space representation of a dynamical system.
The function that defines the system, F, may be linear or nonlinear. It may also
depend on variables that change with time, but which change slowly relative to the
dynamics of the state, or that are changed between experiments. The variables that
define the function are, in this case, termed the parameters and are typically scalar
values.
A system for which d-dimensional hypervolumes in state space do not change
under the action of the system in time are termed conservative, whereas those for
which hypervolumes contract are called dissipative. This relates to a description of
physical systems in which energy is either conserved or dissipated, although the
terminology is also applied to abstract dynamical systems.
3.4. Stability
Stability is a generic term for a system's response to perturbation. If a small
perturbation gives rise to a large change, the system is unstable. If the perturbation
dies away and has no long-term effect, the system is stable. Given two points in state
space, one a slightly perturbed version of the other, what happens to the two
subsequent trajectories into the immediate future? The perturbed trajectory might
38
converge to the original trajectory, or diverge. This is referred to as local stability at a
point. After an infinite amount of time, what happens to the two trajectories? Again,
the perturbed trajectory may return to the original, or the two trajectories may
continuously separate. This is a matter of asymptotic local stability, and is quantified
by the spectrum of Lyapunov exponents. These measure the rate at which an
infinitesimal d-dimensional ball in state space distorts, on average, under the effects of
the dynamical system mapping in state space. Since the ball is infinitesimal, it distorts
according to the linear part of the mapping, and hence distorts into an ellipsoid. The
Lyapunov exponents measure the rate of change of the principal axes of the ellipsoid
relative to the original ball. If the radius of the ith axis at time t is
, ) t r
i (3.3)
the Lyapunov exponents are defined as
, )
, )
, )
|
|

\
|
=

0
log
1
lim lim
0 0
i
i
r t
i
r
t r
t i

(3.4)
Notice that the limit of infinite time and the
1
t
term average the result over a
trajectory, and the other limit ensures an infinitesimal ball. For a d-dimensional state
space, there will be d Lyapunov exponents. A positive Lyapunov exponent indicates a
direction of instability, a value of zero, marginal stability, and a negative value,
stability. If the exponents are ordered according to size,
d
> >
2 1 (3.5)
then lengths in state space change as
, ) , )
t
l t l
1
10 0

~
(3.6)
areas as
, ) , )
, )t
a t a
2 1
10 0
+
~
(3.7)
and volumes as
, ) , )
, )t
v t v
3 2 1
10 0
+ +
~
(3.8)
etc. The largest Lyapunov exponents reflects the asymptotic local stability of the
system. Its polarity indicates whether the distance between perturbed states is
increasing or decreasing and its value measures at what rate. The polarity also relates
to the presence, and type of attractor. This will be discussed in the next section. The
Lyapunov exponents also indicate the dissipative or conservative nature of the system.
The polarity of the sum of Lyapunov exponents measures the rate at which state space
hypervolumes change.
Sets in state space which are unaffected by the action of the dynamical system
mapping are termed invariant. The set B is invariant if
39
, ) B B F
t
=

(3.9)
for all time t where ot indicates that the mapping F has been iterated t times. Invariant
sets also have associated stability. For example, if a point in an invariant set is
perturbed to be outside that set, does it return or move away? For example, a single
point which is invariant is termed a fixed point. A repelling fixed point is one which is
unstable, and an attracting fixed point is stable.
3.5. Attractors
An attracting set is named because of the way it appears to "pull" nearby
trajectories towards it (it is stable) and then hold them there (it is invariant).
Conservative systems do not have attractors, but dissipative ones do. Since dissipative
systems shrink volumes at an exponential rate, in the limit of infinite time volumes are
shrunk to zero. Therefore for Equation (3.9) to hold for all time, attractors must be
sets of zero volume. The set of initial conditions in state space which are pulled
towards an attractor is termed the basin of attraction.
Attractors are important because they describe the typical long-term behaviour to
which a system settles after transients. They are useful because they allow dynamical
behaviour to be described with geometry. The geometry is not that of real physical
space, but of abstract state space. Dynamical systems theory has traditionally been
concerned with three types of attractors. These are known as regular attractors and
contain trajectories that are asymptotically locally stable (no positive Lyapunov
exponents). Regular attractors are objects in state space that have Euclidean geometry.
The three types of regular attractor are points, cycles and tori and are illustrated in
Figure 3.2.
Point Attractor
Limit Cycle
Torus
Trajectory tending
to attractor
Figure 3.2 Illustration of the three regular attractor types.
40
Point attractors correspond to states of rest and are typically found in dissipative
systems that have no input of energy. For example a damped pendulum that is
perturbed will settle to a rest state. Also, linear systems may only exhibit point
attractors if they have attractors at all. A point attractor has associated Lyapunov
exponents with polarity (-,-,-,...). Cycles (or limit cycles) have Lyapunov exponents of
the form (0,-,-,...). They correspond, for example, to sustained, stable oscillations in
nonlinearly driven linear resonators, such as clock mechanisms. If a clock pendulum is
slightly disturbed, it will eventually settle back to a regular oscillation. The geometry
of a limit cycle is that of a closed loop. The periodicity of the system's behaviour
corresponds to the return of the trajectory to the same point in state space and hence
the closed loop form of the attractor. Tori attractors in state space occur for systems
which display quasiperiodic behaviour. That is, oscillations that combine two or more
incommensurate frequencies. Tori attractors have associated Lyapunov exponents of
the form (0,0,-,...). Earlier views of turbulence in fluid systems, for example,
accounted for the development of turbulence as the cumulative addition of modes in
the motion, corresponding to tori attractors of increasing dimension [stew90].
3.6. Chaos
Chaos is a type of dynamic behaviour with properties unlike the regular motion
associated with the regular attractors of the previous section. It occurs in nonlinear,
dissipative systems. Chaotic behaviour corresponds to an attractor in state space that
typically has non-Euclidean, in fact fractal, geometry. Such attractors are termed
strange attractors. Trajectories on the attractor are always asymptotically locally
unstable (Lyapunov spectra of the form (+,-,-,...) for example). Trajectories on
attractors never meet up, and so the associated motion is not periodic. The motion is
typically irregular and complex despite the system having low order (small d).
For a linear system, asymptotic local instability implies that the resulting
behaviour will be unbounded and the state will "fly off" to infinity. For a chaotic
system, the asymptotic instability occurs in conjunction with the bounding nature of
the nonlinear mapping. Consequently, neighbouring trajectories in state space are
simultaneously pushed apart, but remain on or near the attractor. The former
behaviour is termed sensitive dependence on initial conditions. The main consequence
of it is that the motion becomes unpredictable in the long term. For regular motion,
any uncertainty in the position of an initial condition due to measurement inaccuracies
can be modelled as a slight perturbation to the actual state. Because of the asymptotic
local stability, the trajectories from the actual and errorful states remain together. The
small error in knowledge of the initial conditions remains a small error for all future
41
time and so the system is predictable. For a chaotic system, the small error
perturbation becomes magnified exponentially at a rate governed by the largest
positive Lyapunov exponent. The slight error in knowledge of the initial condition
grows with time. Consequently, chaotic systems are predictable in the short term, but
not the long term.
An example chaotic system, and one of the first to be studied, is the Lorenz model
of atmospheric convection [lore63]. This is written as a set of three nonlinear
differential equations,
, )
n n n
n n n n
n n
bz y x z
z x y Rx y
z y x
=
=
=


(3.10)
where x, y, and z form the state of the system and represent physical aspects of the
atmosphere, and
= = = 10 0 28 0 2 67 . , . , . R b
(3.11)
are the system parameters with values that give rise to chaotic behaviour. The set of
equations are analytically intractable, but may be numerically integrated.
Figure 3.3 shows the strange attractor of the Lorenz system. The figure is a 2-
dimensional projection of the 3-dimensional state spaced formed by the variables x,y
and z. Note the way in which the trajectory always passes nearby other trajectories, but
never joins them. In the limit of infinite time, an infinite length trajectory would be
contained within a finite volume. This is typical of fractal objects, and relates to the
self-similar nature of the banding illustrated in the Figure. Successive magnifications
of the trajectory on the attractor reveal the same form of detail on each scale. Figure
3.4 illustrates sensitive dependence on initial conditions by showing one of the state
variables of the Lorenz system for two simulations with similar initial conditions. The
resulting waveforms can be seen to be similar in the short term, but dissimilar in the
long term after divergence. This illustrates how long term prediction of a chaotic
system is impossible for limited knowledge of initial conditions.
3.7. Phase Portraits
An important and often used technique in work on dynamics and especially chaos is
that of 'time-delayed embedding'. In the Lorenz example of the previous section, the
state space attractor can be shown directly because the state variables x,y and z are
directly available. It is often desirable to view the state space attractor to determine its
form, look for closed loops or fractal structure. Often, however, there is no direct
access to the state variables and instead there is only access to one observed variable
as it changes with time. Under certain conditions, however, the state of the system can
42
be reconstructed by forming vectors of time-delayed observations. This is discussed in
greater detail in Chapter 7 when it will be necessary to be more specific about the
embedding process.
A visualisation technique, known as a phase portrait, uses embedding to
reconstruct a topologically equivalent version of a system's attractor from a time series
of observations. This is done simply by plotting the time-series against a delayed
version of itself to form a 2-dimensional projection of the attractor. Or, three time-
delayed values can be used to construct a simulated 3-dimensional view. This has
been implemented in a program written for this work called PHS which allows phase
portraits to be constructed from a time series. An important variable which relates to
the quality of the phase portrait is the size of the time-delay. This is measured in
multiples of the sampling period and can range from 1 to 100. Figure 3.5 shows the
effect on the phase portrait of different values of the time-delay. This example has
been constructed from a time-series derived from the Lorenz system shown in the
previous section. One of the Lorenz variables (x) has been recorded to form the
observed time series. Note that if the delay is too small, there is little difference
between consecutive values of the time series and something close to a straight line is
plotted. At the other extreme, if the delay is too large, consecutive values become
unrelated and the phase portrait becomes messy. Note also the similarity of the phase
portrait topology to the topology of the strange attractor shown in Figure 3.3.
43
1
2
3
Strange attractor
of Lorenz system
magnification of box in 2
magnification of box in 1
Figure 3.3 Sequence of magnifications of the Lorenz attractor showing its fractal,
self-similar property.
Amplitude
of one variable
time
similar
exponential divergence
of Lorenz system
initial conditions
Figure 3.4 Two simulations of the Lorenz system for similar initial conditions
showing sensitive dependence on initial conditions.
44
(a) (b)
(c)
Figure 3.5 Three phase portraits constructed from a time series of observations of
the Lorenz chaotic system. Delay values are: (a) 1, (b) 10, (c) 100.
3.8. Bifurcation
Another aspect of stability is the response to perturbations of a system's
parameters. This is termed structural stability. For changes in a system's parameters,
attractors may appear or disappear or change form. It is common for nonlinear systems
to display both regular and chaotic behaviour for different parameter values. Often
these relate through sequences of bifurcations. This is illustrated by the behaviour of
the simple, one-dimensional system known as the logistic mapping,
, ) , ) j 1 , 0 , 1 , 0 , 1 4
0 1
e e =
+
x x x x
n n n (3.12)
The mapping is shown in Figure 3.6. The value of i corresponds to the peak height of
the inverted parabola.
45
0
1
0 1
x
x
n
n+1
Figure 3.6 The logistic mapping for ~ 0 9 . .
For small values of i, and any initial condition, , ) 1 , 0
0
e x , the iterated sequence of
states settle, after a transient, to a single fixed point. At a higher value of i, the output
settles to a limit cycle consisting of an alternating sequence of two values. As i
continues to rise, the output again changes to a limit cycle of period four and then
eight etc. Such changes are termed bifurcations and this, in particular, is a period
doubling bifurcation. At a critical value of i = 3.569..., the output becomes chaotic,
that is aperiodic and highly irregular. Further increases in i take the form of the output
sequence through bands of chaotic behaviour with interspersed periodic 'windows'.
All this behaviour may be displayed with a bifurcation diagram. One is shown in
Figure 3.7 for the logistic mapping. Note that a magnification of a small portion on the
boundary of a periodic window and a neighbouring chaotic band reveals the self-
similarity of this structure.
46
Magnification of small rectangle
Possible
values
of x
n
i
0.7 1
x
n
n
Figure 3.7 Bifurcation diagram for the logistic mapping with corresponding time
series plots.
The period doubling sequence, known as a route to chaos, is one of several ways
in which chaotic behaviour develops from regular behaviour [moon87].
47
3.9. Statistical Descriptions of Dynamics
Because of sensitive dependence on initial conditions and the limited accuracy to
which an initial condition can be known in practice, the exact long-term behaviour of
a chaotic system becomes unpredictable. It is therefore necessary to describe
dynamical behaviour of chaotic systems statistically. This allows for a general
description of typical trajectories and the determination of average behaviour. The
way this may be done is to consider the state vector as a random vector. Its behaviour
in time is then described by a multivariate stochastic process.
It may then be asked, for example, coming to a chaotic system for the first time:
what is the likelihood of finding the system in a given state? Since a chaotic system
possesses an attractor, it is therefore known that the state vector must be found
somewhere on this attractor,
x A e
(3.13)
The probability of finding the state in different parts of the attractor is then
described by a probability distribution function (pdf) whose support set is the
attractor. The probability of finding the state in some subset of state space, B, is then
, ) , ) x d x p B x P
B
}
= e
(3.14)
where p( ) is the pdf. Alternatively, this may be expressed in terms of a probability
measure, , so that
, ) , ) B B x P = e
(3.15)
These are equivalent descriptions. The word measure is used to mean both the
probability distribution and the probability associated with a specific set. Note that
since the attractor is the support of the measure,
, ) 1 = A
(3.16)
An invariant measure is one that is not effected by the action of the dynamical
system mapping. That is,
, ) , ) , ) B F B =
(3.17)
This is a statement of the conservation of probability as it says the probability of
finding a state in the subset B is the same as the probability of finding the state in the
subset to which B is mapped by F. A consequence of the measure being invariant is
that the random process representing the system behaviour is stationary. The
distribution of states over the attractor, i.e. the measure, is independent of time.
The measure may be used to calculate the expected value of a function of the state.
In general,
, ) , ) , ) , ) x d x f x f E
}
=
(3.18)
48
for some function f. The state has no absolute value, only a possible one, hence the
integration over its probability distribution. A measure is said to be ergodic if the
expected value in Equation (3.18) may be replaced with a time average so that
, ) , ) , ) , ) , )dt t x f
T
t x f E
T
T
}

=
0
1
lim
(3.19)
A measure that is ergodic may be approximated by the relative frequency with
which a single trajectory visits various parts of the attractor. This may be done in
practice by dividing the attractor into bins and counting the number of times the state
falls within each bin whilst following a single trajectory. This defines a histogram
approximation to the measure. The measure of each bin is given, approximately by,
, ) , ) N
N
n
B x P B
i
i i
large for ~ e =
(3.20)
where n
i
is the count in the i
th
bin, B
i
and N is the total count over all bins.
In theory, chaotic systems have many invariant measures associated with invariant
sets in state space. These sets may be reached if the system is given certain exact
initial conditions. They are, however, not stable and are therefore not attractors.
Consequently, in practice, a real trajectory never maps out these invariant sets nor has
the corresponding invariant probability measures. This is because of the presence of
noise (experimental or computational) perturbing the state away from the unstable
sets. The ergodic, invariant measure corresponding to a typical, real trajectory is
termed the natural measure. It is this, for example, that is being approximated by the
histogramming process performed on an actual or computational chaotic system.
3.10. Fractal Geometry
It has already been mentioned that, typically, a strange attractor is a fractal object.
This section outlines the theory of fractal geometry in general and discusses its
significance to chaotic dynamics.
The word fractal was coined by Benoit Mandelbrot from the Latin fractus meaning
irregular and fragmented. His specific definition is that a fractal is a set whose
Haussdorf-Besicovitch dimension, D
H
, is greater than its topological dimension, D
T
[mand83]. To understand this definition, it is necessary to consider the relationship
between two geometric concepts, those of self-similarity and dimension.
An object is self-similar if there is a similarity between the whole object and its
component parts, viewed on different scales. This is also called scale invariance. For
example, a line section, square or cube are self-similar because they may be composed
of scaled-down versions of themselves. See Figure 3.9. The fractal "triadic Koch
curve" shown in Figure 3.8 can be seen to be composed of four, one-third sized Koch
49
curves. These are examples of exact self-similarity. The property extends, however, to
the case where the similarity on different scales is a statistical one. Consider, for
example, a coastline. It appears to have the same irregular form when viewed on maps
made to different scales. These last two examples of self-similarity, however, may be
distinguished from the first by considering their scaling properties in the context of
dimension.
A B
A B
Magnification
x3
Figure 3.8 The exactly self-similar, triadic Koch curve.
The term dimension has several meanings. It is the relationship between these
meanings on which the definition of fractal is based. Dimension is used to describe
the number of independent directions in a space, or the number of coordinates
required to specify a unique point. This is often termed Euclidean dimension, E, for
real spaces of the form R
E
. Objects within such spaces are then classified according
to their topological dimension, D
T
. For example, a point has D
T
=0, a curve D
T
=1, a
surface D
T
=2, and a volume D
T
=3. These regular, Euclidean forms all have integeric
topological dimensions. Another type of dimension may be defined by considering the
self-similar, or scaling properties of an object. Consider again the straight line section
and its self-similar property. More specifically, let the line be composed of N versions
of the original, each scaled by a factor r<1. If the line's original length is L, then the
combined length of the scaled down parts must equal this, so
N.rL=L (3.21)
therefore
Nr=1 (3.22)
See Figure 3.9. Next, consider a square surface. Again this can be divided into N
smaller squares each scaled by a factor r. In this case, if the length of a side of the
original square is L, by equality of areas,
, ) .
2 2
L rL N =
(3.23)
50
and so
Nr
2
1 =
(3.24)
For a cube, the same argument leads to
Nr
3
1 =
(3.25)
A consideration of the self-similar properties of these objects therefore gives the
general relationship
Nr
D
S
= 1
(3.26)
where D
S
agrees with an intuitive notion of dimension and, in these cases, with the
topological dimension. D
S
is known as the similarity dimension [peit88].
L
r L
L
L
L
L
L
r L
r L
r L
r L
r L
Whole covered with N parts :
each scaled down by r = 1/N
each scaled down by r = 1/N
each scaled down by r = 1/N
2
3
In general - Nr = 1
D
1-D
2-D
3-D
S
Figure 3.9 General formula for similarity dimension derived by inspection of
standard Euclidean shapes.
What, however, does this approach yield for the triadic Koch curve? By inspection
of Figure 3.8, it can be seen that the Koch curve can be divided into N=4 versions of
itself, each scaled down by a factor r =
1
3
. From Equation (3.8), the similarity
dimension is found to be,
4 1
4
3
1 262
1
3
.
log
log
.
D
S
S
D
=
= ~
(3.27)
which is a non-integeric value, unlike those of the regular Euclidean objects.
51
The concept of a dimension relating to the scaling properties of a geometric object
may be expanded to apply generally by using the notion of measurement. Typically,
geometric measurement involves finding lengths, areas and volumes of objects. The
way these values scale with the size of the object relates to their dimension. For
example, the area of a circle scales according to the square of its radius, the volume of
a cube according to the third power of side length. The exponents of these power laws
equate, in these cases, to the topological dimensions of the objects. In general, define a
measurement function to be
, )
H
D
Nr r M =
(3.28)
where r is the linear extent of a generalised 'ruler' and N is the number of such rulers
required to cover the object being measured. Note that N and r now have different
meanings than they did for the discussion of similarity dimension. For a curve, M is
the length and D
H
=1, for a surface, M is the area and D
H
=2 etc. As r tends to zero, M
becomes a more accurate measurement of these quantities. The value of the scaling
exponent, D
H
, may be viewed as the only one that allows M(r) to tend to a positive,
finite value as r tends to zero. If D
H
were any larger, then M(r) would tend to infinity,
any smaller and M(r) would tend to zero and the resulting measurement would be
nonsense. This approach defines D
H
as the Haussdorf-Besicovitch dimension
[schr91].
Approaching the Koch curve in this way, its length may be measured with
reference to its means of construction. Figure 3.10 shows how the Koch curve may be
constructed. A generator curve, labelled the original, is, at each stage of construction,
scaled and inserted in the place of each straight line segment. In the limit of infinite
iterations, the result is the Koch curve. Let the original curve be exactly covered with
4 rulers of length r. After each iteration, the number of line segments increases by a
factor of four, but their length decreases by a factor of 3. The length of the curve
therefore increases by a factor of
4
3
. In the limit, its length will therefore become
infinite. This 'monstrous' property is compounded by the fact that a closed Koch curve
surrounds a finite area despite having infinite perimeter. See Figure 3.11.
52
Original
1 iteration
2 iteration
st
nd
Figure 3.10 Iterative construction of the triadic Koch curve.
Figure 3.11 Area of closed Koch curve (dark grey) is within area of circle (light
grey) showing that it is finite.
Now apply the Haussdorf dimension approach to the Koch curve. Is there a value
of D
H
that gives a stable, positive value of M as r tends to zero? The initial
measurement is made with N=4 rulers of length r. After the first iteration, there are
N=16 rulers of length
r
3
. For the measurement,
, )
H
D
Nr r M =
(3.29)
to remain constant,
H
H
D
D
r
r
|

\
|
=
3
16 4
(3.30)
and therefore
D
H
=
log
log
4
3
(3.31)
53
Which is nonintegeric and, in this case, the same as the similarity dimension.
Applying the measurement approach to a coastline also gives, empirically, the
same kind of result. Using cartographer's dividers to measure the length of a coastline,
it is found that it typically increases as the divider width is reduced [mand83]. Also,
there exists a definite power law relationship like Equation (3.9) with a Haussdorf
dimension of between 1 and 2, depending on the coastline measured.
It is the values of Haussdorf dimension, then, that distinguish between regular,
Euclidean objects, and those like the Koch curve and coastline. It is these latter two
that satisfy the definition
Haussdorf-Besicovitch dimension D
H
> topological dimension D
T
,
and which are therefore described as fractals. The Haussdorf dimension in this case, is
often termed the fractal dimension. This description of fractal objects holds intuitively
as well. The Koch curve is still topologically a curve and so a topological dimension
of D
T
=1 applies. Because it has infinite length contained in a finite volume, however,
it fills space to a greater degree than a straight line and more like a surface. A fractal
dimension of 1<D
H
<2 is therefore descriptive of this property.
Returning to chaos theory, recall that strange attractors are typically fractal objects.
It is their fractal geometry which is significant to the type of dynamical behaviour that
chaotic systems display. The fact that a trajectory of a chaotic system may be confined
to a low-dimensional, bounded subset of state space (the attractor) but does not join
up with itself (the motion is irregular and nonperiodic) means that in the limit of
infinite time, an infinite length trajectory exists within a finite volume. This is exactly
the same kind of fractal property as that possessed by the Koch curve. The behaviour
of chaotic systems is typically complex despite the system's defining equation being
simple. Again this is reflected by the fractal nature of the strange attractor. As has
been shown by the simple iterative construction of the Koch curve, simple rules can
define complex fractal objects.
3.11. Iterated Function Systems
Iterated Function Systems (IFS) is the name given to a scheme developed by
Barnsley and co-workers [barn88] for describing, generating, and manipulating a large
class of fractal objects. The scheme is theoretically well understood, and practically
robust making it an ideal environment for computer experimentation. An IFS may be
described in three equivalent ways: as a set of geometric operations, as a Markov
process, or as a chaotic system. Each of these models may define the same fractal
object, known as an attractor. This equivalence gives insight into chaos theory as it
links fractal geometry, deterministic chaos and random processes. Also, these three
54
models will feature in the forthcoming experimental chapters. The geometric
construction features in Chapters 5 and 6, the deterministic model in Chapter 7 and
the Markov model in Chapter 8. The background to these three views will be
discussed in turn in the following sections. Proofs and greater detail may be found in
[barn88].
3.11.1 Contraction Mappings
An IFS is defined as a set of contraction mappings acting on a metric space,
{ ; N n w
n
.. 1 : ; = X
(3.32)
Here, X, is the metric space with a metric, or distance function, defined between any
two points,
d(x,y) (3.33)
and the w
n
are the N contraction mappings. A contraction mapping is defined as one
whose action brings any pair of points closer together,
, ) , ) , ) , ) y x d s y w x w d
n n n
, , s
(3.34)
for all x and y where 0 1 s < s
n
is the contractivity factor of the mapping. Typically,
the individual w
n
are simple affine mappings that combine to form one simple
nonlinear mapping, W,
W x w x w x w x
N
( ) ( ) ( ) ( ) =
1 2
U UL U
(3.35)
or
W x w x
n
n
( ) ( ) =
U (3.36)
The contractivity factor of W is the highest of the contractivity factors of the
individual mappings,
{ ; N n s s
n
.. 1 : max = =
(3.37)
An affine mapping comprises a linear transformation combined with a translation.
Figure 3.12 shows an example of three simple affine contraction mappings in the
metric space X=R
2
. Each mappings comprises a scaling by a factor of 0.5 followed
by a shift. Their combination, W, is also shown.
55
A w(A)
A W(A)
|
|

\
|
+
|
|

\
|
|
|

\
|
=
|
|

\
|
5 . 0
0
5 . 0 0
0 5 . 0
1
y
x
y
x
w
|
|

\
|
+
|
|

\
|
|
|

\
|
=
|
|

\
|
5 . 0
5 . 0
5 . 0 0
0 5 . 0
2
y
x
y
x
w
|
|

\
|
+
|
|

\
|
|
|

\
|
=
|
|

\
|
0
0
5 . 0 0
0 5 . 0
3
y
x
y
x
w
|
|

\
|

|
|

\
|

|
|

\
|
=
|
|

\
|
y
x
w
y
x
w
y
x
w
y
x
W
3 2 1
Figure 3.12 Three affine contraction mappings on X=R
2
and their single
combination, W.
If each mapping is contractive, then so too is their combination W. If this is the
case, the repeated application of W to an arbitrary initial set is guaranteed to converge,
in the limit, to the IFS attractor, A. That is,
A W B
n
n
=

lim ( )
o
(3.38)
for some initial set, B. This is illustrated in Figure 3.13
B
W(B)
W (B)
o2
W (B)
o3
limit point,
A the attractor,
initial set
Figure 3.13 The repeated application of a contractive mapping, W, to some initial
set B, tending to the limit set, or attractor, A.
56
As can be seen, in this case the attractor is a fractal object (a Sierpinski triangle).
This fractal attractor is invariant to the mapping, W,
A=W(A) (3.39)
This section has given the definition of an IFS and shown how a fractal object may
be constructed with the repeated application of a simple, nonlinear deterministic
mapping.
3.11.2 The Random Iteration Algorithm
In the version of an IFS previously described, a fractal attractor is defined by a set
of simple contraction mappings. Their combined effect on a set - i.e. on a large
number of points in parallel - generates the attractor. If, however, the mappings are
taken individually and applied, one at a time, to a single point in the metric space, X, a
dynamical system is defined. The single point may then be considered the state of a
system, and X the state space. If the individual mappings are chosen at random, a
Markov process is defined. Let each mapping have an associated probability,
0 1 < < p
n
(3.40)
so that
p
n
n
N
=
=

1
1
(3.41)
Let the initial single point, or state, be
x
0
eX
(3.42)
and the sequence generated by choosing a mapping at random according to the
associated probabilities be
, )
i n i
x w x =
+1 (3.43)
where n is chosen at random for each iteration. The generated sequence of points are
found to lie on the attractor of the IFS. This process has the name of the 'Random
Iteration Algorithm' (RIA) for generating IFS attractors. The RIA in fact describes a
Markov process which possesses an invariant, ergodic measure whose support set is
the attractor of the IFS, A.
Recall that a measure describes the probability distribution of states in state space
at any one time. Let be some such measure. A (discrete time, first order) Markov
process is defined by a Markov operator that determines the probability distribution at
the next future time step entirely from the distribution at the current time,
, )
n n
M =
+1 (3.44)
For example, may be a probability (row) vector describing the distribution of a
discrete state system, and M may be a square matrix containing all the probabilities of
57
transfer from one state to another. In this case, multiplication of the vector, , by the
matrix, M, determines the distribution of states at the next time step.
An IFS with associated probabilities defines a Markov operator according to
, ) , ) , ) , )
N N
w p w p w p M + + + =
2 2 1 1 (3.45)
And if the individual mappings are contractive, this possesses an invariant measure, ,
so that,
, ) = M
(3.46)
Because the measure is ergodic, the distribution of points along a typical trajectory
(i.e. sequence of iterations of the RIA) will approximate the invariant measure. The
RIA 'draws out', point at a time, the IFS attractor. The relative frequency with which
any part of the attractor is visited by the trajectory approximates the measure of that
part.
Figure 3.14 shows an example of the RIA in operation. The three mappings of the
IFS are the same as that used in the previous section and which define a Sierpinski
triangle. In this example, the associated probabilities are equal, so that the resulting
probability distribution over the attractor is uniform. Figure 3.15, however, shows the
result, after ~1000 iterations, when the same three mappings are used, but with
different associated probabilities. In each case, the sum of the probabilities is equal to
1, but the mappings are weighted unevenly.
(a) (b) (c)
Figure 3.14 Example of Random Iteration Algorithm (RIA) in operation. The three
images show the results of iterating the Markov process, (a)~100, (b)~300, (c)~1000
times.
(a) (b) (c)
Figure 3.15 Examples of RIA attractors where the mappings are weighted with
different associated probabilities.
58
3.11.3 The Shift Dynamical System
So far, two different systems derived from the same set of mappings have been
shown to generate the same fractal attractor. Finally, there is a third formulation of an
IFS, called a Shift Dynamical System (SDS). An SDS is a deterministic nonlinear
dynamical system displaying chaotic behaviour and possessing a strange attractor. As
with the Markov process, the SDS acts on a single point in the metric space, which
may be interpreted as the state of a dynamical system. The mapping in this case,
however, is deterministic,
, )
n n
x S x =
+1 (3.47)
with
, ) , ) , ) A w x x w x S
n n
e =

when
1
(3.48)
The system mapping, S, consists of a partition and the inverses of the individual
IFS mappings. Note that now the individual mappings are expansive because they are
the inverses of contractive ones and so will separate neighbouring initial conditions.
The system therefore displays sensitive dependence on initial conditions. The partition
is formed by the action of the individual contraction mappings, w
n
on A. This is
shown in Figure 3.16. Note that if the individual mappings are nonoverlapping,
, ) , ) j i B w B w
j i
= C = all for
(3.49)
then the partition will be formed with disjoint subsets. This is assumed to be the case
for a deterministic SDS.
w
1
( ) A w
2
( ) A
w
3
( ) A
Figure 3.16 Example of an IFS attractor partitioned into three disjoint subsets
according to the effect of the three individual contraction mappings on the attractor.
The mapping S is then equal to the inverse of one of the individual mappings
according to which partition set the state is in. In the case of affine w's, S is locally
59
linear, but overall is nonlinear. It is a fixed, deterministic rule applied at each iteration,
and the resulting dynamics can be shown to be chaotic [barn88]. The IFS attractor is
then the strange attractor of a chaotic system.
In subsections 3.11.1 to 3.11.3, three equivalent versions of an IFS have been
described. They each define the same fractal attractor, and in the case of the RIA and
the SDS, can be shown to have dynamics with the same statistics [barn88]. Therefore,
the geometric construction of a fractal, a Markov process and a deterministic chaotic
system are theoretically closely linked. Each of these three forms are simple, and
robust for computer implementation. Two further points about IFS will be of use in
the forthcoming experimental chapters of this thesis and are presented in the next two
subsections.
3.11.4 The Collage Theorem
The inverse problem for IFS is as follows: given a target set, T, find a set of IFS
mappings such that the IFS attractor they define is close to T.
The previous three subsections have shown ways to construct a complex fractal
attractor from a simple nonlinear system. The inverse problem describes the often
desired reverse process. Beginning with a complex object, is it possible to find a
simple system which can generate it? And if so, what is that system? If the target set is
to be approximated by an IFS attractor A, the collage theorem provides a useful error
criteria. It says that an IFS will generate an A that is close to T if the IFS mappings
applied to T form a close collage of T. The closeness between sets of a metric space
may be quantified by the Haussdorf metric,
, ) Y X h ,
(3.50)
where X Y , c X. A collage of the target set T with mapped versions of itself is
, )

N
n
n
T w
1 =
(3.51)
Let the closeness of this collage to the original be bounded, so that
, ) s
|
|

\
|
=

N
n
n
T w T h
1
,
(3.52)
The collage theorem then states that
, )
s
A T h

s
1
,

(3.53)
See [barn88]. That is, the closeness of the target set to the IFS attractor, and therefore
the measure of success of the inverse solution, is bounded according to the collage
60
error and the contractivity factor of the IFS. The better the collage and the higher the
contractivity of the mappings, the better the resulting solution to the inverse problem.
3.11.5. The Continuous Dependence of the Attractor on the IFS
Parameters
When the mappings of an IFS satisfy the contractivity requirement, the system
they define is structurally stable. Any small changes in the values of the mapping
parameters will corresponds to a small change in the form of the resulting attractor. In
fact there is a continuous relationship between the two. Any small error in the
mappings parameters will have only a small effect on the IFS attractor. Also, any
small change in the parameters may be used to effect a small change in the attractor.
3.12. Summary
This chapter has presented an outline of chaos theory and fractal geometry which
will be relevant to the rest of this thesis. Chaos is a class of complex irregular
behaviour that can occur in simple nonlinear dissipative dynamical systems. It is
defined by sensitive dependence on initial conditions and positive Lyapunov
exponents, and is typically associated with fractal strange attractors existing in the
system's state space. Chaos theory is significant because it offers new interpretations
and models of complex and unpredictable behaviour that are somewhere between the
two traditional themes of deterministic dynamics and stochastic processes. In the
setting of deterministic dynamics, chaos corresponds to a state space attractor having a
different geometry to those of traditional regular attractors. This different geometry is
that of fractals in which dimensions that relate to the space filling properties of objects
have nonintegeric values. Viewed from a stochastic process perspective, chaotic
dynamics is equivalent to stationary behaviour. The probability distribution of states
over the fractal attractor is described by an invariant, ergodic measure. In physical or
computer experiments, this is termed the natural measure and is seen when following
a typical trajectory of the system.
A well understood and practically convenient scheme for manipulating simple
chaotic systems that possess complex fractal attractors is provided by Barnsley's IFS.
Three equivalent forms of an IFS relate fractal geometry, Markov processes and
deterministic chaos.
62
Chapter 4
Applying Chaos and Fractals to the Problem of
Modelling Sound
Having outlined the main features and significance of chaos theory and fractal
geometry, this chapter considers its application to the problem of modelling sound.
The chapter begins with a discussion of the reasons for using chaos and fractals, and
speculation about the possibilities. This discussion concludes with two questions.
Firstly, a diagnostic one: is there any evidence for a connection between naturally
occurring sound and chaos or fractals? Secondly, a pragmatic one: how can chaos or
fractals be used to represent sound? The rest of the chapter is then devoted to
considering these issues so that a strategy of investigation for the experimental work
can be devised.
4.1. The Reasons for Using Chaos Theory
The main reasons for applying chaos theory to the problem of modelling sound
come from considering the properties of chaos/fractals, the nature of certain sounds
and the current state of sound modelling techniques.
Consider the following three main features of chaos and fractals. Firstly, chaotic
systems and fractal objects are capable of representing and replicating a wide range of
natural phenomena. In Chapter 3, a number of subjects were listed in which chaotic
models are used to represent natural systems. And, in Chapter 1, the example was
given of computer generated fractal images that replicate natural objects such as
clouds and mountains (see Figure 1.1). Secondly, chaotic behaviour, which is complex
and irregular, may be generated from simple, nonlinear systems 'from the bottom up'.
That is, the complex behaviour emerges as a consequence of the simple nonlinear
rules and is not explicitly specified. This implies that certain complex phenomena may
be modelled with simple systems in this way. This is an alternative to the
methodology whereby complex behaviour is modelled with a complex system, the
detail of the original being explicitly specified, or imposed on the system 'from the top
down'. Thirdly, simple chaotic systems are capable of generating complex abstract
forms that are beautiful in their own right. Again, in Chapter 1, the example of the
Julia set was given. This highly intricate fractal set is the strange attractor of a very
simple system.
63
Now consider the nature of sound. Sound is a dynamic entity that may also be
complex and irregular. The main sources of sound around us are speech, music,
environmental sound and machine noise. Although both speech and music consist
mainly of regular, semiperiodic sounds, they do have a variety of irregular features.
For example, fricative speech sounds, such as 'ess' or 'eff', or the crash of a cymbal and
the breathy sound of a saxophone. Environmental and machine sounds are primarily
irregular and complex, for example, a burbling brook, splashing water, the roaring of
the wind, the rumble of thunder and the variety of screeching, scraping, buzzing and
humming noises made by machinery.
Is it, then, that chaotic dynamics exist in the systems that are responsible for these
types of sound? Do these sounds form an acoustic parallel to the natural images that
can be replicated, convincingly, with chaotic systems or fractal objects? Is it possible
to model the essential characteristics of complex sound 'from the bottom up' using
simple systems? Recall that a desired feature of a sound model was considered, in
Chapter 2, to be the relatively small amount of parameter data associated with the
representation of a sound. This is important for the reasons of data reduction and
manageability of the model. The prospect of modelling a complex, naturally occurring
sound with a simple system is therefore a desirable one. In any case, is it possible to
create new and beautiful abstract sound with chaotic systems and, for example,
generate an acoustic equivalent of the Julia set? These questions form the main
hypothesis of this work and are the issues that have inspired the investigation that is
presented.
Consider also the state of current sound models as described in Chapter 2. There
are two salient points. Firstly, the type of sound that these models concentrate on are
primarily regular ones, such as musical tones or speech. In fact, no models appear in
the literature for the large range of other, irregular sounds discussed above. Secondly,
the approach taken with some of these models mirrors the traditional approach taken
to problems of dynamics in general. That is, complex behaviour is modelled 'from the
top down' by complex systems and/or the systems used are linear. For example,
additive and subtractive synthesis and LPC consist of linear systems. Consider, in
particular, additive synthesis where a complex tone is modelled by describing each
component of the complex behaviour explicitly with an enveloped sinewave. The
complex detail is imposed on the model 'from the top down'. The resulting model is
always as complex as the tone it represents, the complexity not emerging from a
simpler system. On the other hand, physical modelling and F.M. synthesis are
examples of nonlinear models where complex musical tones do emerge from simpler
systems. For neither of these two, however, does there exist an analysis procedure that
64
can produce a model from a given sound. Also, both are models of regular musical
sounds.
So, a further aim of developing chaotic models is to complement existing models
in two respects. Firstly, by expanding the range of sound that may be modelled to
include irregular sounds and irregular aspects of primarily regular sounds. Secondly,
to approach the problem of modelling complex sound with the new methodology
whereby complexity is accounted for as emerging 'from the bottom up' in simple,
nonlinear systems.
Having considered the general reasons for, and possible advantages of modelling
sound using chaos theory, it becomes necessary to ask two questions. Firstly, a
diagnostic one: do any naturally occurring sounds have chaotic or fractal properties?
The second question is a practical one: how can chaos/fractals be used to represent
sound as part of a model?
In the following section the issue of diagnosis is discussed and a variety of
positive evidence is presented that suggests naturally occurring sound does have
chaotic and fractal properties. Following this is a discussion of the second question
which concludes this chapter with a strategy for experimental investigation.
4.2. Diagnosis of Chaotic Behaviour
The diagnosis of chaotic behaviour in a real system, or from a signal derived from
a system, is an area of ongoing research where there is still much debate about the
theory and algorithmic techniques used to identify chaos and their validity. For
example, see [farm90, gibi92, casd92 and vass]. The main approaches are based on
identification and characterisation of the two main features of chaotic dynamics:
fractal strange attractors and positive Lyapunov exponents. In both cases, algorithms
have been developed that estimate the dimension of the strange attractor, or the
associated largest Lyapunov exponents, from a signal derived from a chaotic system.
For example, see [moon87]. The basic premise is that if the attractor dimension is
nonintegeric, or the largest Lyapunov exponent positive, then chaotic behaviour is
present. Both these techniques, however, are computationally difficult to implement
and the results they provide are subject to a number of complicating factors. These
include the presence of noise in the data, the presence of transients (i.e. whether or not
the system has settled to an attractor in state space) the quantity of data and the effect
of processing the data, for example with band-limited measuring devices. There is still
much debate over the effect of these factors and how it is best to proceed to give
reliable results.
65
In practice, therefore, chaos is diagnosed with a combination of techniques and
some knowledge about the system under investigation, for example knowledge of the
existence of physical nonlinearities. As well as the two techniques mentioned, it is
common to use phase space visualisation to inspect an attractor to identify its form. It
is also common to identify the routes to chaos in a system, for example to investigate
the existence of period doubling bifurcation sequences as a parameter is varied. The
following subsections present two examples where there is evidence for chaotic
behaviour associated with systems that generate sound.
4.2.1. Chaos and Woodwind Instruments
As outlined in Chapter 2, the established physical model of many musical
instruments consists of a periodic excitation feeding a resonator. It has long been
established that the oscillations which form the excitation are the product of a
nonlinear dynamical system, while the resonator is a linear system. Lord Rayleigh
analysed the conditions of such self-sustained oscillations in nonlinear systems and
also identified the existence of a Hopf bifurcation at the threshold of oscillation
[rayl83]. As discussed in Chapter 3, bifurcations can occur in nonlinear dynamical
systems as some parameter of the system is varied. In the case of woodwind
instruments, for example, the main parameter is the blowing pressure at the
mouthpiece, controlled by the player. In woodwinds, the nonlinear system is formed
by the interaction of the reed in the mouthpiece, the incoming air source and the
reflected pressure waves coming from the resonant tube which forms the bulk of the
instrument [keef92]. Low blowing pressures are insufficient to excite the oscillation,
but as the pressure is slowly increased there comes a threshold point at which the
oscillation comes into being. This is an example of a Hopf bifurcation and its form on
a bifurcation diagram is shown in Figure 4.1
system parameter = blowing pressure
state of system
no sound oscillation = musical tone
=
output pressure
Figure 4.1 Bifurcation diagram showing a Hopf bifurcation occurring at the
threshold of oscillation in a wind instrument as the blowing pressure is increased.
66
More recent research, [gibi88, gibi92 and lind88], has shown that if the blowing
pressure is increased further, a number of wind instruments, including the recorder,
oboe, clarinet and saxophone, exhibit sequences of period doubling bifurcations that
culminate with noisy chaos. This sequence can be diagnosed by inspection of the
phase portraits derived from the sound signals. In the regular mode of oscillation, the
periodic tone is seen as a simple closed loop attractor. The loop then acquires an extra
part, its length doubling with each period-doubling bifurcation. Chaos is then
exhibited by a complex, fractal attractor that does not form a closed loop.
Some of these unorthodox modes of oscillation used in modern music and by jazz
musicians and are known as 'multiphonic tones' [gibi92].
4.2.2. Chaos and Gongs
It has also been suggested that chaos is responsible for sounds produced by other
types of musical instruments, in particular, gongs and cymbals [legg89]. A first
indication of the possibility of chaotic behaviour is that the sounds produced by these
instruments are typically complex and irregular despite the instruments themselves
having simple geometric forms. Crucial to the operation of these instruments is the
presence of geometric nonlinearities, for example rims, domes and dimples, in
otherwise regular forms. It has been suggested that these nonlinearities are responsible
for chaotic dynamics in the vibrations of the instruments which are, in turn,
responsible for the noisy, irregular type of sounds they produce. Experiments have
shown that bifurcation sequences occur when such instruments are excited by
external, sinusoidal vibrations of increasing intensity.
Both these examples suggest that chaotic dynamics are responsible for certain
types of sound produced by musical instruments. Other suggestions have included the
unvoiced aspects of speech sounds [mara91], for example fricatives, where turbulent
air flows are responsible for the noise. However, no published accounts of
experimental verification have been found. As well as evidence for chaotic dynamics
being responsible for certain sounds, there is also evidence for the existence of other
fractal properties.
4.2.3. Fractal Time Waveforms
The most obvious place to look for other fractal properties of sound apart from the
strange attractor is in the time domain waveform of the sound. As a geometric object,
however, this has a subtle but significant difference to the types of objects discussed
67
in Chapter 3 in the section on fractal geometry. Because a sound waveform is a
depiction of amplitude as a function of time, there is no natural relationship between
the scales of the two axes. The amplitude or time scales may be set independently of
one other. But the fractal objects considered in Chapter 3, such as the Koch curve and
coastline, are examples of objects in spaces where all directions are equivalent. All the
spatial dimensions are inherently linked and so do not scale independently, but
together. Such objects are self-similar if there is similarity between the whole object
and scaled versions. The scaling is specified by a single scale factor applied to all
spatial dimensions. For time domain waveforms, however, two scaling factors are
required, one for the time scale, and the other for amplitude. If there is similarity
between the scaled version of the waveform and the whole, this is termed 'self-
affinity'.
To diagnose self-similarity and quantify it with a fractal dimension, it is necessary
to measure the object under investigation with 'rulers' of various length and see if the
relationship
, )
H
D
Nr r M =
(4.1)
holds (see Equation (3.29)). Recall that this technique was used to diagnose the fractal
nature of coastlines. A number of methods and computer algorithms exist which
implement this idea, such as the box-counting method [voss88]. This idea, however,
only translates to the case of the self-affine waveform if the two independent scales of
time and amplitude are fixed in some way. The waveform can then be treated as a
geometric curve in a space where each direction is equivalent. The fixing between
scales is, however, arbitrary, and can effect the value of the fractal dimension
[voss88]. For this, and other reasons, see [farm90], there is still much debate over the
validity of results obtained with fractal dimension estimation techniques.
Despite this, there are results that assign fractal dimensions to the waveforms
derived from certain sounds. The type of sounds analysed are mostly speech, but
animal noises have also been used [pick90, mara91 and sene92]. The results suggest
that the time domain waveforms have self-affine properties which are related to the
nature of the sound. For example, in speech, fricative sounds have higher fractal
dimensions than voiced sounds. This relationship appear to parallel that found to exist
between fractal dimension and the texture of fractal images. It has been found that the
fractal dimension increases according to the degree of 'roughness' or 'wiggliness' of the
texture [voss88].
68
4.2.4. 1/f Noise
My own investigations have also produced evidence for the existence of fractal
sound waveforms by showing that wind sound and roomtones are examples of 1/f
noise. 1/f noise is the term used to describe signals whose power spectral densities are
of the form
, )

f
f S
1

(4.2)
[voss88] over several decades of frequency, where
5 . 1
~ ~
5 . 0 < < .
(4.3)
1/f noises are signals that are statistically self-affine. To see this, consider a time
domain signal, v(t), with Fourier Transform V(f), and power spectral density
, ) , )
2
f V f S = (4.4)
In general, if the time domain signal is scaled along the time axes by a (positive)
factor, , and the amplitude is scaled by a factor, , that is,
, ) , ) t v t v
(4.5)
then
, )
|
.
|

\
|


f
V f V
(4.6)
and
, ) , )
|
.
|

\
|
=
|
.
|

\
|
=
|
.
|

\
|
=


f
S
f
V
f
V f V f S
2
2
2
2
2
2
2
(4.7)
i.e.
, )
|
.
|

\
|


f
S f S
2
2
(4.8)
If the signal is a 1/f noise and has power spectral density
, )

f
f S
1
=
(4.9)
and is then scaled by in amplitude and in time, the power spectral density will
change according to (4.8) giving,
, )

f f
f
1 1 1
2 2
2
2

=
(4.10)
If the scaling factors are chosen correctly, so that
2
2
1

(4.11)
then the power spectral density will therefore remain unchanged. A 1/f noise is
therefore invariant to changes in scale of the time domain waveform. Since power
spectral density is an average measure, the scale invariance, or self-affinity (time and
amplitude are scaled by different factors) is a statistical one.
69
Figure 4.2 shows an example time series plot and the spectral form of 1/f noise
compared with that of white noise and Brown noise (one-dimensional Brownian
motion, or integrated white noise) which are also noises with power spectral densities
of the form
, )

f
f S
1
=
(4.12)
but where =0 (white noise) and =2 (Brown noise).
1
_
f
0
1
1
_
f
2
1
_
f
log S(f)
log f
v(t)
t
log S(f)
log S(f)
log f
log f
v(t)
v(t)
t
t
white noise
1/f or
pink noise
Brownian
motion
Figure 4.2 Time series plots and spectral density forms for 1/f noise compared
with white noise and Brown noise.
1/f noise occurs naturally in a wide variety of situations, for example electronic
component noise, signals in nerve membranes, and time series derived from sunspot
activity, but a single model for its physical origins remains unknown [voss78]. More
relevant to this work is that music has been shown to be a 1/f noise. Specifically, the
fluctuations of audio power and instantaneous frequency of musical signals have a 1/f
power spectral density over the range of frequencies 10 10
4
Hz [voss78, hsu90 and
hsu91]. The 1/f noise, or fractal property, is therefore associated with the patterns or
fluctuations contained in the music itself, and not the sound that is heard since sounds
are fluctuations in the frequency range 10 10
4
Hz. The 1/f property occurs for music
taken from a wide range of cultures and historical periods. The 1/f property of music
is confirmed by constructing aleatoric music with the three noises shown in Figure
70
4.2. The music is made by mapping the noise signal to the pitch and timing parameters
of a sequence of notes. Most listeners will agree that the sequences of notes generated
with white noise are too random and uncorrelated to sound musical. Those generated
with Brown noise are the opposite; too correlated and varying too slowly in time. 1/f
noise is found to be somewhere in the middle of these and produces the most 'music-
like' fluctuations out of the three.
In [mand83] Mandelbrot relates the presence of 1/f noise in music to the
hierarchical temporal structure that it typically possesses. He also points out that the
same 1/f structure cannot be expected to be found in the sounds themselves as the
mechanism of production is different. This may be true for resonant musical
instruments, but is not the case for some environmental sounds.
Figure 4.3 shows the power spectral densities of two environmental sounds - wind
noise and a roomtone. The wind noise is taken from a BBC sound effects compact
disc and described as "blustery wind on a beach" and is sampled at 44.1KHz [bbc87].
The roomtone is taken from a library used for film soundtracks and described as
"industrial room tone, small, with ventilation noise" and is also sampled at 44.1kHz
[ssl89]. Both power spectral densities have been computed from approximately one
second of audio by averaging a sequence of 12 X 4096 point, non-overlapping, non-
windowed FFTs. The spectral density plots are displayed on log-log graphs to reveal
the 1/f relationship as a straight line. The gradient of the line relates to the exponent
according to,
, )
, )
f
f S
f
f S
log
log
1
=
=

(4.13)
The magnitude scale shown in the graphs is actually 20logS(f). Estimated
gradients give values of ~1 5 . for the wind noise and ~1 3 . for the roomtone.
Figure 4.3 Power spectral densities of wind noise (left) and an industrial roomtone
(right) showing 1/f characteristic over the audible range of frequencies.
71
In the previous sections a variety of evidence has been presented for both the
occurrence of chaotic dynamics in systems that generate sound and the existence of
fractal sound waveforms. It can therefore be concluded that there is a physical basis to
support the idea of using chaos/fractals to model sound. Moreover, attempting to
model naturally occurring sound with chaotic/fractal models would itself be a rigorous
diagnostic test. A successful model, i.e. one that satisfied the criteria given in Chapter
2 of accurate representation and compactness, would confirm the sound to be of the
same type as the model. For example, recall that LPC modelling of speech is
successful because the structure of the model - excitation feeding a linear resonator -
matches the physical structure of the system generating the sound. Likewise, if a
chaotic system successfully models a sound, for example, then this is a strong reason
for supposing that the sound is a product of a chaotic system itself.
Having confirmed the physical relevance of chaotic/fractal models for sound, the
next section considers the practical problem of how exactly to represent sound using
chaos and fractals.
4.3. Representing Sound Using Chaos and Fractals
Although the use of fractals and chaotic systems for algorithmic music
composition has received considerable attention - see for example [gogi91, pres88 and
jone88] - there is little published work in the literature on the idea of modelling sound
itself, and few experimental reports.
In [trua90], the author considers the idea of modelling sound with chaos to have
potential for a number of reasons. For example, because of its apparent relevance to
several unsolved problems in musical acoustics and its influence on science on the
arts in general. The author proposes a synthesis technique in which simple systems
known to chaos theory are used, arbitrarily, to control acoustic grains in the
time/frequency domain. He reports interesting sounding results, but no attempt is
made to model naturally occurring sound. Experiments with the same granular model
are also reported by the authors in [wasc90] who also point out what they see as a
problem with 'perceptual discontinuity'. This term describes the difference between
the way in which a fractal image may be perceived at all scales at once, whereas
differing time scales of sound are perceived in different ways as rhythm and timbre.
This issue will be discussed again in Chapter 5. Work reported in [rode93 and
maye93] is also on the subject of creating abstract synthetic sound with a known
chaotic system, in these cases the chaotic electronic circuit of Chua. In both cases, the
72
authors report the generation of classes of music-like sounds by converting one
variable of the chaotic system into an acoustic signal.
Although such accounts are relevant to the subject of this thesis, no accounts could
be found in the literature that consider the problem of an analysis/synthesis model for
sound. It is this subject that is discussed next.
How, then, can chaos theory be used to model sound? In other words, what is a
suitable representation for sound that can exploit the properties of chaos theory and
that is physically relevant to sound? The previous section has shown that there is some
evidence for chaotic systems being responsible for sound and that graphs of some
sound waveforms have fractal properties. This, then, suggests two approaches:
1) to represent the dynamics of the sound with a chaotic system, and
2) to represent the graph of a sound signal with a fractal object.
Recall from Chapter 3 that strange attractors embody the characteristics of chaotic
dynamics and may themselves be fractal objects. They are therefore suitable for
representing sound in both the forms described above. In the first case, assuming the
system generating the sound is chaotic, its dynamics may be represented with a
strange attractor. If this strange attractor is modelled, then effectively, so too is the
sound. In the second case, the sound is treated as a static waveform, or geometric
object. If this has fractal properties, it can be modelled with a fractal object, such as a
strange attractor. In both cases, the desire is to find a solution to the inverse problem.
That is, given a fractal object, find a system whose strange attractor matches it.
It is important to understand the difference between these two cases of the inverse
problem. In the first case, chaotic dynamics are assumed to be responsible for the
sound, and it is these which are to be modelled via the strange attractor. In the second
case, however, there is no assumption or modelling of chaotic dynamics. Instead, the
signal waveform, or graph, is being treated as a static object with fractal geometry and
modelled as such. Strange attractors may then be used as a source of fractal objects,
but the dynamics associated with them have nothing to do with the dynamics of the
sound. This situation is similar to that where fern leaves are replicated with IFS
strange attractors (see Figure 1.1). The dynamics associated with the IFS attractors
have no relevance to the fern leaf, since a fern leaf is not a chaotic system. It is,
however, a fractal object and it is this which is being replicated with a fractal strange
attractor.
Having concluded that strange attractors are suitable objects for representing
sound, it is further proposed that IFS are emphasised as the framework within which
to work. There are several reasons for using IFS which relate to their properties
73
outlined in Chapter 3. Firstly, they allow the generation of a large class of fractal
strange attractors with simple systems. Secondly, they are robust to computer
implementation. Thirdly, they are well understood and a number of theories about
them are of practical use. One of these is the collage theorem, which sets an error
criterion for the solution of the inverse problem (see Chapter 3). Another is the
property of continuous dependence of attractors on parameters. This is an important
property of IFS if the model is to be used to manipulate sound. It guarantees that,
while the IFS mappings are contractive, any small change in the mapping parameters
will effect only a small change in the IFS attractor. There is a continuous relationship
between the two. It would be undesirable if this was not the case, since a small change
in parameters, either intentional or due to errors in implementation, would effect a
drastic change in the behaviour of the model. To illustrate the property of continuous
dependence and again suggest the possible advantages of chaotic modelling of sound,
Figure 4.4 shows a set of IFS attractors, each of which has been generated by making
small changes to the parameters of the fern leaf IFS shown in Figure 1.1. It can be
seen how the small changes in the parameters result in small, but impressive,
transformations of the image.
This example also illustrates the power of modelling objects with strange
attractors. Not only are the complex IFS attractors defined by just 28 parameters,
making the model easily manageable, but the images may be easily and impressively
manipulated. Again, the question may be asked: could an acoustic equivalent of this
model be found?
Figure 4.4 A demonstration of the property of continuous dependence of IFS
attractors on the parameters that define them. This also illustrates the power of
manipulation capable with chaotic models [frac90].
74
4.4. Summary
This chapter has discussed the idea of applying chaos theory to the problem of
modelling sound. This discussion has included the reasons and possible advantages of
using chaos theory, the physical evidence to support the idea and concluded with an
approach to the problem. This is to use strange attractors to represent sound in two
different ways and to emphasise the use of IFS. The next four chapters (Chapters 5-8)
present a number of experimental investigations that explore this idea. The first two
concern using Fractal Interpolation Functions (FIF) to model the waveforms of sound
signals. FIFs are a class of IFS whose attractors form continuous, single-valued
functions of one variable and are therefore ideal for representing signal waveforms.
Chapters 7 and 8 then present an exploration of dynamic models for chaotic sound.
Note then, that the approach labelled 2) on Page 72 is considered first.
75
Chapter 5
Fractal Interpolation Functions
This chapter presents the first of the experimental approaches to modelling sound
with strange attractors. It investigates the use of IFS as a synthesis-only technique. A
particular class of IFS are used called 'Fractal Interpolation Function' or 'FIF' whose
attractors are used to represent sound waveforms. The theory of FIFs is outlined, and a
sound synthesis algorithm developed. This is followed by a number of experimental
explorations which have two aims: to develop an understanding of the nature of FIFs
and the synthesis algorithm; and to search for interesting and potentially useful
sounds. This is followed by a number of advanced experiments where the basic FIF
algorithm is used as part of a more sophisticated model.
5.1. Theory
An FIF is an IFS whose attractor has the special form of being a single-valued
function of one variable. An FIF is therefore defined on the plane, R
2
, and is an object
with the same geometry as the waveform of a sound signal. An FIF is constructed so
that, as its name suggests, a set of points is interpolated by a fractal function. FIFs
therefore complement the traditional non-fractal interpolating functions such as piece-
wise linear functions and polynomials. Also, complex fractal functions may be
constructed to interpolate only a few interpolation points.
The interpolation points, in combination with another set of values, uniquely
define the FIF as they define the mapping parameters of an IFS. The parameters that
define an FIF are a restricted class of affine mapping parameters on the plane. Let
, ) , , N n y x
n n
2 , 1 , 0 : ,
2
= eR
(5.1)
be a set of interpolation points lying on the plane and let them be ordered such that
N
x x x < < <
1 0
(5.2)
The points may then be interpolated by any continuous function of the form
j R
N
x x f , :
0
(5.3)
such that
, ) n y x f
n n
all for = (5.4)
Now consider how the interpolation points define a set of contraction mappings.
For each consecutive pair of points,
, ) , )
n n n n
y x y x , , ,
1 1
(5.5)
76
a mapping can be defined that maps the end points,
, )
0 0
, y x and , )
N N
y x , (5.6)
to this pair. By choosing this mapping to be an affine shear map of the form,
|
|

\
|
+
|
|

\
|
|
|

\
|
=
|
|

\
|
n
n
n n
n
n
f
e
y
x
d c
a
y
x
w
0
(5.7)
the parameters a c e f
n n n n
, , and may be derived from the constraints given by the
interpolation points. That is, the end points must map to the consecutive pair of
interpolation points
|
|

\
|
=
|
|

\
|

1
1
0
0
n
n
n
y
x
y
x
w
(5.8)
and
|
|

\
|
=
|
|

\
|
n
n
N
N
n
y
x
y
x
w
(5.9)
for each n=1,2,...,N.
Solving (5.7), (5.8) and (5.9) gives,
0
1
x x
x x
a
N
n n
n

=

(5.10)
, )
0
0 1
x x
y y d y y
c
N
N n n n
n


=

(5.11)
0
0 1
x x
x x x x
e
N
n n N
n

=

(5.12)
and , )
0
0 0 0 1
x x
y x y x d y x y x
f
N
N N n n n N
n


=

(5.13)
while the d
n
remain variable. It can be seen that such a mapping is contractive in the
x- direction since
1 0
>
n n N
x x x x (5.14)
It is also a shear transform, so that lines parallel to the y-axis remain so after being
mapped. The length of lines parallel to the y-axis, however, change by a factor d
n
.
The d
n
are therefore called the 'vertical scaling factors'. Limiting d
n
<1 ensures that
the mapping, w
n
, is contractive. Figure 5.1 shows an example of four interpolation
points and the effect of the three shear transforms that they define.
77
A
(x ,y )
0 0
(x ,y )
1 1
(x ,y )
2 2
(x ,y )
3 3 w (A)
1
w (A)
2
w (A)
3
x
y
h
d h
1
height
of A
Figure 5.1 An example of the effect of three shear maps, w w w
1 2 3
, and on the area
A and an illustration of one of the vertical scaling factor, d
1
.
A set of interpolation points defines a set of the mappings, w
n
, each of which is
contractive. The set of mappings therefore define an IFS that is guaranteed to possess
an attractor. This is the main theorem of FIFs [barn88]: given the IFS,
, , N n w
n
, , 2 , 1 , ;
2
= R
(5.15)
where the w
n
are derived from (5.10)-(5.13) and d
n
<1, then there exists an attractor,
G c R
2
, that is a continuous interpolation function satisfying
, ) , , G N n y x
n n
c = , , 1 , 0 : , (5.16)
This theorem is demonstrated in practice in the forthcoming section, 5.3, after a
synthesis algorithm has been developed to implement it.
The vertical scaling factors, d
n
, control the 'wiggliness' or 'roughness' of G that is
analogous to the way a 'depth of modulation' parameter controls amplitude
modulation, for example. If all the d
n
=0, the attractor, G, reduces to a non-fractal
piece-wise linear interpolation of the data points , ) , ,
n n
y x , . The d
n
in fact relate to the
fractal dimension, D, of G via
d a
n n
D
n
N

=
=

1
1
1
(5.17)
when
d
n
n
N
>
=

1
1
(5.18)
and the interpolation points do not all lie on a single straight line.
5.2. The Synthesis Algorithm
In Chapter 3, three alternative forms of an IFS were described that define an
attractor from a set of mappings. Two of these forms are commonly used as the basis
for generation algorithms for displaying the IFS fractal attractors graphically. These
78
are the deterministic, geometric approach and the random iteration algorithm
[barn88]. Since FIFs are IFS attractors, either of these algorithms may be used to
generate the FIF from the interpolation points via the contraction mappings. For this
work, however, the desired result is not a fractal image, but a fractal waveform that
can be converted into sound. In particular, the waveform will be digital audio and
therefore a sequence of integeric values. For this reason, a new generation algorithm
has been devised which produces a FIF in the correct format to be converted directly
into sound.
The synthesis algorithm is a version of the deterministic IFS algorithm which
works in the two-dimensional discrete space
I T (5.19)
which is a quantised approximation to R
2
, the space on which an IFS is defined.
I is the set of quantised amplitude values of the waveform:
, , 1 , 2 , , 1 , + = Q Q Q Q I (5.20)
where 2Q is the number of quantisation levels and T is discrete time
, , 1 2 , 1 , 0 = T T (5.21)
where T is the length, in samples, of the waveform. The set of interpolation points are
now
, ) , , N n y x
n n
2 , 1 , 0 : , = e I T (5.22)
with the added restrictions that
1 and 0
0
= = T x x
N
(5.23)
so that the first and last interpolation points span the entire length of the waveform
which is of T samples.
The input to the synthesis algorithm is the set of interpolation points and the set of
vertical scaling factors. The algorithm consists of two parts. Firstly, the mapping
parameters are calculated from the interpolation points, then secondly, the mappings
are used to generate the FIF. The output of the algorithm is a discrete time, discrete
amplitude version of the FIF, G:
, , 1 , 1 , 0 : = e T t z
t
I (5.24)
The FIF generation part of the algorithm is an implementation of Equation (3.38)
which indicates that the attractor may be obtained by iterating the combined
contraction mappings of the IFS,
, ) B W A
i
i


= lim
(5.25)
for some arbitrary initial set B. For this, the FIF case, the combined mapping is the
union of the shear mappings,

N
n
w W =
(5.26)
79
and the attractor is the sequence z
n
, which is the discrete time, discrete amplitude
approximation to the FIF G. The iteration of (5.25) is implemented by mapping the
contents of one array to another, back and forth, a finite, not infinite, number of times.
Let,
, , 1 , 1 , 0 : = e T t u
t
I (5.27)
be the contents of one array and
, , 1 , 1 , 0 : = e T t v
t
I (5.28)
be the other and let i be the number of iterations. The synthesis algorithm consists of
the following steps:
Calculate the mapping parameters
- Input the set of N+1 interpolation points, , ) , , N n y x
n n
0 : , = , and the set of N
vertical scaling factors, , , N n d
n
1 : =
- From these, calculate the parameters of the N mappings, w
n
, that is the set of
parameters , , N n f e c a
n n n n
1 : , , , = from Equations (5.10)-(5.13)
The deterministic algorithm
- Initialise the array, u
n
, with some arbitrary initial set B. e.g. let
u t T
t
= = 0 0 1 for K
- Use W to map the contents of the array u
t
, into the contents of the array v
t
.
That is, for each of the mappings w
n
, map each element of u
t
to the other array
according to
, )
n
t
n j
e
u
t a v +
|
|

\
| = 0
(5.29)
where
, )
(
]
1

+
|
|

\
|
=
n
t
n n
f
u
t d c
j int
(5.30)
and int[.] returns the integer value.
- Copy the contents of v
t
into u
t
- Iterate the above two steps i times.
- Either array then contains an approximation to the FIF
, , 1 , 1 , 0 : = e ~ T t z G
t
I and may be treated as digital audio and converted into
sound with a suitable digital-to-analogue converter.
80
5.3. Experiments with the Synthesis Algorithm
The first set of experiments use simple patterns of interpolation points derived
from sinewaves to illustrate the way the synthesis algorithm works, and to begin to
understand the properties of FIFs.
Table 5.1 shows a set of 17 interpolation points derived from equally spaced
samples of a single cycle of a sinewave. Note that there are only 16 vertical scaling
factors as each pair of interpolation points defines a mapping. In this case T has been
set to 48,000 samples which makes the resulting FIF last one second as a sound since
the sampling rate used is 48kHz. Also, Q=32768, since 16-bit digital audio is being
used. Note also that, in general, no two x-values must be the same and all x-values
must be in order of ascending value.
interpolation
points
vertical scaling vertical scaling
x value y value factor, d factor, d
0 0 - -
3000 5740 0.5 0.05
6000 10607 0.5 0.1
9000 13858 0.5 0.15
12000 15000 0.5 0.2
15000 13858 0.5 0.25
18000 10607 0.5 0.3
21000 5740 0.5 0.35
24000 0 0.5 0.4
27000 -5740 0.5 0.45
30000 -10607 0.5 0.5
33000 -13858 0.5 0.55
36000 -15000 0.5 0.6
39000 -13858 0.5 0.65
42000 -10607 0.5 0.7
45000 -5740 0.5 0.75
48000 0 0.5 0.8
Table 5.1 (left) example set of interpolation points and vertical scaling factors that
define the FIF shown in Figure 5.2. Table 5.2 (right) vertical scaling factors used in
generating Figure 5.3.
81
the initial set, in this case y=0 i=1
i=2 i=3
i=4 i=5
Figure 5.2 The initial arbitrary set, B, and a sequence of five iterations of the
deterministic algorithm.
Figure 5.2 shows the initial arbitrary set, B, and a sequence of iterations of the
deterministic algorithm. What is shown is the contents of one of the arrays at each
stage of the generation process. B is chosen to be the function y=0, in other words
each element of the array is made to be zero. The first iteration therefore results in the
interpolation points being interpolated by a piece-wise linear function. The remaining
waveforms show the sequence converging to the FIF which is the IFS attractor of the
mappings. Notice the small difference between the fourth and fifth iterates. Although
this example illustrates the FIF generation algorithm, the resulting sound is only a
simple regular tone and can be heard as Sound 1.
To see the effect of the vertical scaling factors on the FIF, Figure 5.3 shows the
result after 5 iterations using the same interpolation points, but where the vertical
scaling factors are those shown in Table 5.2. As the value of the vertical scaling factor
increases the 'depth of modulation' in the result increases. This FIF can be heard as
Sound 2.
82
Figure 5.3 FIF for equally spaced interpolation points derived from a single cycle
of a sinewave, but where the vertical scaling factors increase for the mappings from
left to right.
The sounds of these examples are regular because the interpolation points are
regularly spaced along the x-axis placed in a regular, ordered pattern. The next
experiment, however, also derives a set of interpolation points from a single cycle of a
sinewave, but such that the points are spaced in the x direction according to a square
law. This uneven spacing has a considerable effect on the resulting sound. The
interpolation point values are found from the simple rules,
2
4 j x
j
=
(5.31)
and
|
|

\
|
=
65536
2
sin
j
j
x
y

(5.32)
where
128 2 , 1 , 0 = j (5.33)
is the interpolation point index. This gives a total of 129 interpolation points and an
FIF length of 65,536 samples. All vertical scaling factors are set to 0.9 to exaggerate
the modulation effect.
The first waveform of Figure 5.4 shows the resulting FIF after 3 iterations. As a
result of using unevenly spaced interpolation points, the corresponding sound of this
FIF is much more complex than that of the previous example. It consists of a complex
pulsing sequence of percussive tones. Both the tones fall in pitch, and the rhythm
slows down, a consequence of the same structure existing on different scales, but
being perceived differently. This can be heard as Sound 3. The nature of this sound is
a consequence of the self-similarity of the FIF. This 'acoustically fractal' property is
further illustrated by playing the sound at different speeds, for example with a
variable-speed tape player, a musical 'sampler', or by varying the playback sampling
rate of the digital-to-analogue converter. When playing the waveform at one octave
lower, or half the speed, the sound remains similar to the original: there is no change
in the perceived pitch although the sound lasts twice as long. This can be heard in
Sound 4 where the sound is played as a sequence of descending octave intervals.
83
The fractal property of this FIF can also be seen in the sequence of magnifications
shown in Figure 5.4. Notice how the third in this sequence of waveforms is similar to
the first.
(a) (b)
(c) (d)
Figure 5.4 FIF where x values are spaced according to a square law. Sequence of
magnifications of windows is shown in (a)-(d).
This experiment also illustrates a particular problem encountered with the
synthesis algorithm due to the limited resolution of the array. The array is only a
discrete approximation to the continuous space required by IFS theory and therefore
limits the detail of the result. When one of the mappings maps the whole waveform to
in-between a pair of consecutive interpolation points, a sub-sampling process takes
place in the time domain. This occurs when Equation (5.30) is implemented. When
there is a large amount of detail in the whole waveform, an effect akin to aliasing
occurs so that the mapped version contains distortions. When these distortions are
compounded over a number of iterations, a significant amount of noise can be added
to the resulting FIF. This can be avoided by reducing the number of iterations to
generate partially self-similar FIFs, where there is a limit to the number of scales on
which detail is added. This is why the previous example was only iterated three times.
The effect of continuing to iterate the same mappings to i=6 is shown in Figure 5.5.
As can be seen, the result contains more irregular noise than that shown in Figure 5.4.
84
This problem also explains the irregularity shown in the detail of the third waveform
in Figure 5.4.
The next set of interpolation data is, in contrast to the ordered ones derived from
sinewaves, based on uniformly distributed pseudo-random numbers. Figure 5.6 shows
one example of the result when all parameters are randomised. The x values are scaled
in the range 0-T, where again T=48,000, and then sorted in ascending order of size.
The y values cover the whole amplitude range of -Q to Q-1 and the d cover the range -
1 to 1. The sound produced is difficult to describe, being rough and fragmented and
having a jolting, fluttering quality. It can be heard as Sound 5. If only the y values are
randomised, the x values evenly spaced and the d values made constant, the result is,
despite appearances (see Figure 5.7(a) ) predominantly tone-like with a background
that sounds like the wind. This is presented as Sound 6. This example again
emphasises the importance of the x values - their regularity dominates the resulting
sound. Finally, random y values are combined with x values following the square law
of Equation (5.31), producing a similar effect as with the sinewave of an apparently
falling pitch. In this case, there is no definite pitch, but a 'rushing' noise. This
waveform is shown in Figure 5.7(b) and can be heard as Sound 7. Notice the
similarity of the waveform plots for the last two examples, despite a large difference
in their sounds.
Figure 5.5 Same interpolation points as Figure 5.4, but with 6 iterations showing
the cumulative effect of errors in the algorithm. The bottom plot is a magnification of
the middle ~1000 points of the top plot.
Figure 5.6 FIF generated from random x,y and d values for the interpolation
points.
85
Figure 5.7 (a) (left) FIF generated with random y values, but evenly spaced x. All d
= 0.9. (b) (right) FIF generated with random y, but square law x values. All d = 0.9.
This set of experiments with the FIF synthesis algorithm has shown a number of
things. They have illustrated the sequence of iterations that takes place in the
algorithm; shown how regularly spaced interpolation points produce regular sounds;
presented examples of complex waveforms generated from a relatively small number
of parameters; shown the fractal properties of such waveforms both visually and
acoustically; and shown a problem arising from using a discrete implementation
which has limited resolution.
5.4. Rhythm/Timbres
The FIF generated in the previous section that has audibly fractal properties
(Sound 3) is an example of a sound where both its rhythm and the timbre are the
expression of the same information. This information exists simultaneously on
different times scales which are then perceived differently. This one example suggests
that there exists a new class of fractal rhythm/timbre sounds which do not occur
naturally, nor have been generated with other synthesis techniques. Further
experimentation has confirmed that this is the case. Table 5.3 and Figure 5.8 show the
input parameters and a waveform plot of the resulting FIF whose sound is an abstract
percussive rhythm. This may be heard as Sound 8. The input data was the product of
heuristic experimentation with the placing of interpolation points.
The self-similar properties of this waveform can be clearly seen from the plot,
where the left-hand half appears on the right scaled by a half, quarter, eighth etc.
Again, this self-similarity can also be heard by playing the waveform at different
speeds, the overall quality of the sound remaining unchanged except for its length.
This can be heard in the sequence that comprises Sound 9.
86
Table 5.3 and Figure 5.8 Input data and waveform plot of the resulting FIF that is
a rhythm/timbre.
Having confirmed that rhythm/timbres may easily be generated with a small
number of input parameters, the next set of experiments show how it is possible to
design such sounds by specifying the form of the rhythm first. The method used is to
fit interpolation points around a design for the desired macro-level rhythm, leaving the
FIF procedure to fill in the micro-level detail to provide the timbre. Figure 5.9(a)
shows how an original rhythm design, top, is made into a crude waveform with 18
interpolation points, middle. The function is the same for each of the three 'beats' with
vertical scaling factors chosen to decrease along the waveform to create a decay in the
sound. This is then processed by the FIF algorithm to produce the resulting sound,
bottom. This result can be heard as Sound 10. Figure 5.9(b) shows a similar
construction, where the end note has been moved to the first half of the bar. Also, a
less complex waveform is used, without such a harsh attack to the beats. The result is
quite different, emphasising the way in which the rhythm and timbre are highly related
via the self-similar property of the waveforms. The second rhythm can be heard as
Sound 11.
The rhythm/timbre sounds presented in this section, then, are examples of
complex, interesting and potentially useful musical sounds that may be easily
generated with the simple FIF model using only small amounts of data. It is believed
that they form a new class of previously unheard sounds. Also, they may be
constructed to approximate a desired rhythm and have a range of novel, unusual
timbres.
interpolation
points
vertical scaling
x value y value factor, d
0 0 -
3 4 0.5
6 8 0.5
12 16 0.5
24 32 0.5
48 64 0.5
96 128 0.5
192 256 0.5
384 512 0.5
768 1024 0.5
1536 2048 0.5
3072 4096 0.5
6144 8192 0.5
10000 0 0.5
20000 0 0.5
87
x x x x x x
(a) (b)
Figure 5.9 Development of two rhythm/timbres from rhythmic design, top,
through interpolation points, middle, to final waveform, bottom.
5.5. Generating Time-Varying FIF Sounds
The previous experiments have concentrated on generating a sound with a single
FIF. Typically, the FIF is ~50,000 samples long corresponding to a sound lasting ~1
second. An alternative is to create sounds from a combination of many shorter FIFs,
for example where each FIF is only hundreds of samples long and corresponds to a
single cycle of a tone. The principle behind the next few experiments is to form
sounds by concatenating a large number of short FIFs such that each consecutive FIF
is related by having similar parameters. By virtue of the 'continuous dependence of
attractors on parameters' property of IFS, a gradual variation of the parameters over
the duration of the sound will effect a gradual transformation of the sound. The result
is then a dynamic, time-varying sound whose properties are swept as the parameters
are swept.
The general algorithmic template for these experiments relies on iterating a simple
indexed loop. Within the loop, a set of FIF parameters are generated from a simple
rule that is a function of the loop index, and then an FIF is generated from these
parameters and appended to the composite sound.
88
The algorithm template
- for index = start to end
- calculate interpolation point/vertical scaling factor parameters as a
function of index
- generate a short FIF with the parameters
- accumulate generated samples in output file
- next
The first example of this scheme is a sound made of 220 separate FIFs each being
220 samples long. The sound is then ~1 second long at the 48kHz sample rate used.
The FIF length was chosen to correspond to a fundamental period of ~220Hz, which
is one octave below middle A. Each FIF is generated from 4 interpolation points, the
first and last being fixed to y=0. The interpolation point control function then modifies
the middle two points in three ways. Firstly, it reduces their amplitude from full scale
to ~0 in an exponentially decreasing way. This is chosen to mimic the decay of a
percussive sound. The vertical scaling factors are likewise decreased to vary the
degree of modulation of the waveform from high to low. The x positions of the two
points are modified to cause some evolution of the spectral content of the waveform.
At all times, however, the y values are calculated to lie on a single cycle of a
sinewave. This control rule is shown in Figure 5.10.
for i=1 to 220
i
i i i
d d d 99 . 0 9 . 0
3 2 1
= = =
x
i
0
0 = x
i
3
220 =
x i
i
1
29 0 4 = + .
x i
i
2
191 0 4 = .
y
i
0
0 = y
i
3
0 =
, )
2 220
1
1
sin 99 . 0 9 . 0 15000

=
i
x i
i
y
, )
2 220
2
2
sin 99 . 0 9 . 0 15000

=
i
x i
i
y
next
,y ) (x
1 1
(x ,y )
2 2
(x ,y )
3 3
(x ,y )
0 0
Figure 5.10 Control rule for time-varying FIF sound. Left, pseudocode where
, )
j
i
j
i
y x , is the ith interpolation point of the jth FIF and d
i
j
is the vertical scaling factor
for the ith map of the jth FIF. Right, graphical depiction of the effect on the
interpolation points through time.
89
Figure 5.11 shows the resulting waveform of the complete sound including two
details of individual FIFs shown at the beginning and end of the control sequence.
Also shown is a spectrogram of the first half of the signal. The corresponding sound is
an interesting, complex percussive one which is the combination of a struck, damped
bell and a synthetic 'phasing' effect. This can be heard as Sound 12.
Figure 5.11 Left, time plot of the whole waveform generated with the control rule
shown in Figure 5.10 with selected magnifications of individual FIFs to show how the
sound develops through time. Right, spectrogram of the first half of the sound
showing how it contains complex, time varying partials similar to those found in
naturally occurring musical sounds.
The second example of this technique again uses FIFs generated with 4
interpolation points, but the control rule is slightly different. The end two points are
fixed, as before, but the middle pair are kept at constant y values with their x values
swept in opposite directions. This rule is shown in Figure 5.12. In this case 110 FIFs
were generated of 440 samples each, with all the vertical scaling factors set constant at
0.5. The resulting sound, again, has dynamic spectral properties being similar to that
of a complex tone passing through a swept band-pass filter. This can be heard as
Sound 13.
90
,y ) (x
1 1
(x ,y )
2 2
(x ,y )
3 3
(x ,y )
0 0
Figure 5.12 Pictorial representation of the FIF parameter control used to generate
the second example of a time-varying FIF sound.
5.6. A Genetic Parameter Control Interface
The results from the previous set of experiments suggest that FIFs are capable of
generating many interesting, unusual and potentially useful synthetic sounds from
simple models. These results, however, comprise a very few selections from the
complete space of FIF sounds, or the corresponding space of FIF parameters. These
selections have been made somewhat arbitrarily, or with the intention of creating a
certain type of sound. Also, the sets of parameters have been generated one at a time
by entering values into a file by hand, or by constructing one-off 'C' programs to
calculate them. A useful development of this work would therefore be to devise a
user-friendly interface that would allow easier navigation through the space of FIF
sounds, and would provide faster feedback to the user of the sound corresponding to
the specified parameters.
In his book, 'The Blind Watchmaker', Richard Dawkins presents a simple
computer-based scheme for demonstrating the cumulative effects of small mutations
on the evolution of algorithmically represented organisms which he calls 'biomorphs'
[dawk88]. The biomorphs are recursively generated line graphics that may take on
complex forms, despite the generative algorithm being simple and it being defined by
only a small number of parameters. The user selects, at each generation, one of
several, slightly mutated variants of a biomorph which then 'survives' and is passed on
to the next generation. The user is acting as 'artificial selection' as opposed to 'natural
selection' and may select the surviving biomorph according to any criteria. In effect, a
small random perturbation is added to the parameters of a biomorph to create the
mutation. The repeated selection at each generation effects a connected path through
parameter space and hence biomorph space. With this scheme, Dawkins demonstrates
how complex, aesthetically appealing, or intended designs may be produced by the
iterated accumulation of small moves in parameter space.
91
The scheme has also been applied by two computer artists, William Latham and
Karl Sims, as a scheme for evolving complex computer images [lath91] and [sims91].
The images, like the biomorphs, are complex, procedurally generated structures. The
procedures are generally simple, each having a small set of associated parameters,
although the combined effect of many nested procedures is often complex. The
genetic parameter interface is shown to be a powerful tool in exploring the possible
forms that the algorithms can generate.
There is therefore a strong similarity between the biomorph model, the computer
art models and the FIF sound model. All of these attempt to produce aesthetically
interesting or desired results from simple models that generate complex forms. This
section, then, is dedicated to presenting a similar genetic parameter control scheme to
be used for the FIF synthesis algorithm.
5.6.1. Implementation
The genetic models of Dawkin's, Latham and Sims contain several common
components based on a model of biological evolution. These are: a population of
organisms each defined by a small set of parameters called their genotype; an
algorithm that expresses the genotype as a complex organism with its own distinctive
characteristics, or phenotype; a user interface that allows selection of either one
organism for mutation or two organisms for mating; and a procedure for
implementing the mutation or combination of genetic material. Such a scheme is
shown in Figure 5.13. These components have been implemented as part of a program
called GEN which runs on an IBM compatible PC and a Texas Instruments
TMS320C30 (C30) digital signal processor (DSP) chip. This hardware combination
provides the necessary interfaces and processing power to realise the GEN scheme.
The hardware organisation is shown in Figure 5.14. The DSP is needed to speed up
the generation of FIFs so that a population of FIFs may be generated in a time that is
acceptable to the user. The program used in the previous section to generate the FIFs
runs only on a PC (a 20MHz, 386 machine) and produces a one second sound in
approximately 30 seconds to 1 minute. Running on the C30 DSP, however, the
processing time is about 2-3 seconds. Therefore, a population of 10 FIFs may be
generated, viewed and heard in about 30 seconds using the DSP, and not in 5-10
minutes as it would if the DSP were not used. This order of magnitude difference is
crucial in making the GEN scheme a viable one.
92
( ) ( ) ( ) ( )
( ) ( ) phenotype
organism
genotype
parameters
genotype
parameters
genotype
parameters
genotype
parameters
genotype
parameters
... ...
population
artificial selection of 'fittest'
modification and/or
population of next
Iterate
combination of genes
generation
expression of phenotype
from genotype with
synthesis algorithm
... ... ( ) phenotype
organism
( ) phenotype
organism
( ) phenotype
organism
( ) phenotype
organism
Figure 5.13 Schematic diagram of the model for biological evolution.
PC
processor
VDU
and input
device
C30
DSP
card
digital
audio
output
DAT
player
maintains
input/output
from/to
evolutionary
environment
operator
and
fast
generation
of
FIFs
generates
serial
format
digital
audio
produces
sound
and
stores
results
Figure 5.14 Schematic diagram of hardware used for GEN program.
93
The evolutionary scheme shown in Figure 5.13 is implemented on the hardware as
follows. The population of organisms, in this case the FIFs, is displayed on the VDU
as a set of time domain waveforms. Simultaneously, as the waveforms are being
displayed, the FIFs may be heard as sounds via the audio output of the DAT. Artificial
selection takes place when the user chooses one of two options. Either a single FIF is
selected for mutation, or a pair of FIFs are selected for mating. In the mutation case, a
mutation factor is also chosen which controls the strength of the mutation. Once the
choice is made, the program running on the PC generates a new genotype from the
user's input information according to the mutation or mating algorithms. These
algorithms are explained below.
Mutation.
The parameters of the single chosen FIF are reproduced to form the basis of the
new population. A random number, whose size is a function of the mutation factor, is
added to each parameter. Let
, ) N i d N i y x
i i i
1 : , 0 : , = ' = ' ' (5.34)
be the interpolation points and vertical scaling factors of the chosen FIF. A new
population of size P,
, ) , , P j N i d N i y x
j
i
j
i
j
i
1 : 1 : , 0 : , = = =
(5.35)
is then generated where,
x i
j
i
r x x + ' = : i=1..N-1
y y r
i
j
i y
= ' + : i=1..N-1
d d r
i
j
i d
= ' + : i=1..N
(5.36)
Note that the first and last interpolation points are not modified which keeps the
length of the FIF equal at all times. r r r
x y d
, , and are random numbers with uniform
pdfs over the ranges

s s
T
r
T
x
2 2

s s
Q
r
Q
y

s s
1 1

r
d
(5.37)
where T is the length of the FIFs in samples, 2Q is the number of amplitude
quantisation levels and is a divisor related to the mutation factor according to


=

2
9
(5.38)
where the mutation factor is an integer in the range 1-9. The range of mutation is
therefore an exponential one that corresponds to mutations whose effects are nearly
94
imperceptible (=1) to a complete randomisation of the parameters (=9). All
modified FIF parameters are maintained in their allowed ranges of
0 < < x T
s s Q y Q 1
< < 1 1 d
(5.39)
using wraparound overflow. Also, after modification the interpolation points are
sorted to maintain the order of ascending x values. An example set of mutated
parameters is shown in Figure 5.Error! Bookmark not defined. (a).
interpolation
points
vertical
scaling
interpolation
points
vertical
scaling
x value y value factor, d x value y value factor, d
0 0 - 0 0 -
3000 3830 0.5 becomes 3115 3190 0.56
6000 7070 0.5 5780 7210 0.51
... ... ... ... ... ...
48000 0 0.5 48000 0 0.45
(a)
interpolation
points
vertical
scaling
x value y value factor, d
A A -
A A A interpolation
points
vertical
scaling
A A A x value y value factor, d
... ... ... A B
A A A B B A
combined with becomes
A B B
interpolation
points
vertical
scaling
... ... ...
x value y value factor, d B A A
B B
B B B
B B B
... ... ...
B B B
(b)
Figure 5.15 Example of mutation, (a), and recombination, (b), of FIF parameters.
95
Mating.
The mating algorithm does not modify the FIF parameters in the same way as the
mutation algorithm, but reuses the information in a new way. As with biological
genetic recombination, information from two genotypes is randomly mixed to produce
information for the new genotype. The operation of the algorithm is shown in Figure
5.Error! Bookmark not defined. (b). Again, the interpolation points are sorted after
recombination to maintain the correct order.
After the genetic material for the new population has been generated, the new
population of FIFs is itself generated by passing the parameters, one at a time, to the
DSP which runs the synthesis algorithm. The program has now returned to the
beginning state of Figure 5.13, the choice of organisms may be repeated and the cycle
iterated as many times as desired. It is also possible to initialise the FIF parameters
from a previously created file so that the evolutionary cycle may begin from a known
point.
5.6.2. Experiments
This section presents a number of experiments that illustrate the way in which the
genetic scheme works and demonstrates some example uses.
Figure 5.16 shows a single screen-shot from the program giving an example of
what is presented to the user. In this case, the population size is chosen to be 8, the
FIFs have 6 interpolation points and the synthesis algorithm is iterated 6 times. The
manifestation of the original parameter set, which was created arbitrarily, is shown as
waveform A, while waveforms B to H are mutated versions. In Sound 14 the
corresponding audio output of the program can be heard and consists of the eight FIFs
played in sequence. The similarity between the mutations is evident.
96
Figure 5.16 A single screen-shot from the program GEN.
Another example of the program in operation is shown in Figure 5.17. This figure
shows the screen shot over many generations. In this example, the parameter set is
initialised so that the x values of the interpolation points are evenly spaced and the y
values and vertical scaling factors, d, are zeroed. In each generation, a single FIF is
chosen and mutated with a mutation factor that was chosen to have quite a high value
of 7 out of 9. The chosen survivor at each generation is shown in more detail in Figure
5.18 and this sequence can be heard as Sound 15.
This example demonstrates the way in which a complex FIF waveform can be
developed from nothing by accumulating small changes in the parameters at each
generation. The similarity between each generation can be clearly seen and
demonstrates the structured exploration of FIF parameter space. In practice, it is found
that a good operational approach is to start with a high mutation factor, and end with a
low one. This permits large jumps in parameter space at the beginning of a session
allowing the exploration of a wide variety of possible FIFs. Once one of interest is
chosen, it can be developed, with a medium mutation factor, so as to keep the general
form, but explore the possible forms around it. Finally, a small mutation factor can be
used to make fine changes to produce the end result.
97
The GEN program has been used to develop the idea presented in the previous
section of concatenating a number of shorter FIFs to produce a time-varying FIF
sound. Starting from an interesting sounding FIF of approximately one third of a
second duration, an entire evolutionary sequence of FIFs have been concatenated to
form Sound 16. A variety of matings and mutations with a low mutation factor have
been used which ensures a gradual and even development of the sound.
The final two experiments use the GEN program as a means of modifying an
existing parameter set, instead of evolving one from nothing. An approximation to a
desired result is input initially and then the GEN program is used as a tool to modify
the sound within the parameter neighbourhood. In the first experiment, a
rhythm/timbre design made from 6 interpolation points is used as an initial input.
Also, a slight modification has been made to the program GEN. So as to keep the
timing of the rhythm design close to the original, the x values remain unchanged by
the mutation procedure. Four of the evolved results have been selected and then
concatenated to form a longer rhythm/timbre which can be heard as Sound 17.
98


Figure 5.17 A sequence of populations generated with the program GEN. In this
case, the FIFs are produced from 6 interpolation points. At the start (waveform A - top
left) all interpolation points and vertical scaling factors are zeroed. At each stage, 7
mutations are produced and then a single survivor is chosen by the operator (starred
waveform), which reappears as waveform A in the next generation.
99
Figure 5.18 Starting point (top left) and sequence of starred waveforms from
Figure 5.17 shown in more detail.
The second experiment illustrates a limitation of the genetic interface. A variant of
the set of FIF parameters described in Section 5.3 is used as initial input where the
interpolation points are derived from a sinewave and the x values follow a square law.
In this case, there are 46 interpolation points which is many times more than has been
used so far with the genetic scheme. After the first mutation, it is found that for low
100
mutation factors the resulting offspring are, perceptually, nearly indistinguishable
from one another. For medium to high factors, the structure of the original data set is
lost and so it is not possible to explore subtle variations of a desired FIF. For example,
Figure 5.19 shows the first generation of mutations for a low mutation factor of 3.
Each mutation sounds like a noisy version of the original FIF. These can be heard as a
sequence in Sound 18. In this example, the number of iterations for the FIF generation
algorithm has been kept to 2 so that the results can be seen clearly.
Despite this problem, it is still possible to evolve interesting FIFs which have large
numbers of parameters. It is, however, easier to create subtle variants of a given FIF
sound when it is defined by only a small number of parameters.
Figure 5.19 Mutated variants of an FIF that is defined by a relatively large number
of parameters. It can be seen (and heard) that when this is the case, low factor
mutations are found not to be distinctive from one another.
101
5.7. Conclusions
This chapter has presented an investigation into using FIFs, a form of strange
attractor, as a sound synthesis technique. A variety of experiments and techniques
have been presented including a selection of the resulting sounds themselves. With
these experiments a number of the possibilities of FIF synthesis and some of the
problems have been demonstrated. A number of general conclusions can be drawn.
Most generally, these experiments demonstrate that there is an acoustic equivalent
to abstract fractal images. Even if the sounds generated with the FIF technique are not
as immediately stunning as many fractal images, they are interesting and unusual.
Also, only a small subset of all possible FIF sounds have been explored, and FIF
synthesis is only one technique of synthesising abstract fractal sounds.
The biggest advantage of FIF synthesis, which has been demonstrated by the
experiments, is that FIFs may produce complex sounds despite the model which
generates them being simple and requiring only a small number of parameters. It is
therefore possible to produce a range of interesting, unusual and potentially useful
sounds with a simple, easy to implement and manageable model. Many of the sounds
have qualities quite unlike those generated with other synthesis techniques. It is also
possible to use FIFs for producing musical sound, for example the bell-like tone of
Section 5.5.
Perhaps the most interesting and useful result has been the discovery of a new
class of sounds that are simultaneously rhythms and timbres. It has been found in this
case that the problem with fractal sound suggested by Waschka (see Section 4.3) of a
perceptual discontinuity in the acoustic domain is actually an advantage. Generating
both micro-level timbre and macro-level rhythm with the same structure on different
scales is believed to be a novel technique that has potential use for computer music
composition.
It was found, however, to be difficult to isolate those sounds of interest from the
large class of possible FIF sounds. This is similar to the situation found with IFS
images, [barn88], where the set of aesthetically pleasing or interesting images is very
small relative to the space of all IFS images and is widely scattered within it.
Generally, those sounds resulting from parameters that had a high degree of structure
produced the most interesting sounds. Also, a considerable degree of experimentation
and intuition is required to find the more interesting sounds.
Unlike the case of IFS images, no obvious replicas of naturally occurring sound
have been found so far which parallel something like the IFS fern images. Some FIF
102
sounds, however, do have elements of naturally occurring sound, for example they
contain certain echoey rumbles, or wind-like noises.
The genetic scheme was devised to allow a more ordered navigation through the
space of FIF sounds and is successful as a captivating, interactive piece of software.
With the genetic scheme it is possible to discover interesting FIF sounds without the
need to think about and provide a set of FIF parameters. It allows sounds to be
discovered by combining both an element of chance and an ordered process.
Finally, it is believed that there are many opportunities for further experimentation
with the FIF synthesis model. Any of the techniques presented may be concentrated on
with more time being spent exploring the sounds that can be generated.
103
Chapter 6
Modelling Sound with FIFs
In the previous chapter, a synthesis-only scheme was investigated where FIFs are
used as a source of abstract time domain waveforms that are converted into sound. In
this chapter, the focus is on using FIFs as part of an analysis/synthesis model for
representing naturally occurring sound. The chapter begins by exploring the
interpolation capabilities of FIFs by applying the synthesis technique of the previous
chapter to data derived from naturally occurring sound waveforms. The limitations of
this technique prompt a different approach to the modelling problem, which is to
consider the inverse problem for FIFs. Most of this chapter is then dedicated to
investigating a published algorithm which is claimed to be a solution of the inverse
problem. It is shown, however, that this algorithm does not solve the problem
satisfactorily. A modified version of this algorithm is then developed which does give
some successful results.
6.1. Deriving Interpolation Points from Naturally
Occurring Sound Waveforms
Fractal waveforms, that are either exactly or statistically self-affine, have the
property that there is a relationship between the information present on different time
scales. This therefore implies that there is some kind of redundancy in the waveform
that could be exploited to compress the waveform within an analysis/synthesis model.
The information common to all scales need only be represented once by the model and
then reused on each scale to reconstruct the original.
FIFs generate all the scales of a waveform from the information provided by the
interpolation points which themselves define the coarsest time scale. This suggests the
possibility of reducing a naturally occurring fractal waveform to a set of interpolation
points which can then be used to reconstruct either the original waveform, or
something with similar fractal properties. The most obvious way to do this is to take
the interpolation points from the original waveform itself, in other words to sub-
sample it. The following set of experiments present an investigation of this idea where
an extract of wind noise is used as the original waveform since this has already been
shown, in Chapter 4, to have statistically self-affine properties. In all the following
experiments, samples are taken from the wind sound waveform and then used directly
104
as interpolation points for the FIF synthesis algorithm. The amplitude of the sample
becomes the y value of the interpolation point and the sample index is used to derive
the x value.
In the first experiment, the original waveform extract is 5,000 samples long and
the interpolation points are taken regularly every 50 samples. This corresponds to sub-
sampling by a factor of 100. The vertical scaling factors are set to be 0.3 for every
mapping, and the FIF synthesis algorithm is iterated 4 times. These figures have been
chosen so that the resulting FIF approximates, visually, the original waveform. The
original waveform, a piece-wise linear interpolation of the interpolation points and the
resulting FIF can all be seen in Figure 6.1. Despite the apparent similarity between the
original waveform and the FIF, a magnified view reveals that, not surprisingly, the
regular spacing in the x-direction of the interpolation points results in a regular, almost
periodic FIF. This can be seen in the magnified view where the same waveform
pattern is repeated between each pair of interpolation points.
Note that in all the waveform plots shown in this section, the time scale is marked
in 'points', meaning samples. Since the sample rate used to capture the original wind
noise was 48kHz, 48,000 points corresponds to one second.
(a) original waveform (b) piece-wise interpolation of
interpolation points
(c) resulting FIF (d) magnification of small portion of
(c)
Figure 6.1 Results of an experiment to extract interpolation points by decimating a
wind sound waveform and then constructing an FIF with them.
105
Because of the dependence of the resulting FIF on the x-spacing of the
interpolation points, information has to be extracted from the original of both
amplitude and time patterns. The next idea is to use amplitude zero-crossing and peak
values of the waveform to provide such information.
The interpolation points for the next experiment are chosen to be the points in the
original waveform whose amplitude is maximum in between its zero-crossing points.
This procedure produces well spaced points that capture the general shape of the
original. The original wind waveform and the extracted points, having been piece-
wise linearly interpolated, are shown in Figure 6.2.
Figure 6.2 Original wind sound waveform (top), interpolation of peak points
(bottom left), and reconstructed waveform (bottom right).
Also shown in Figure 6.2 is an FIF constructed using the peak points as
interpolation points and where all vertical scaling factors are set to 0.2. Although the
amplitude of the sections that are mapped in between the interpolation points roughly
match that of the original, there appears to be a discrepancy in their shape, the FIF
having too much high frequency content. This is confirmed by listening to the
waveform which can be heard as Sound 19. The sound is also much rougher than the
softer, smoother sound of the original. There is, however, some element of similarity.
The excessive high frequency content occurs because the ratio of the x-distance
between the beginning and end interpolation point pair and the x-distance between any
106
two consecutive interpolation points is too high. Consequently, the whole waveform is
mapped to in between too small a space and the resulting detail is too fine.
The last experiment with this technique is an attempt to reduce the high frequency
content of the reconstructed waveform. The idea is to use the same interpolation
points derived from the peaks of the original waveform as above, but to divide them
into a number of sets and generate many shorter FIFs. Having control over the length
of the FIFs in this way then allows control over the ratio of interpolation point spacing
and hence over the degree of detail in the resulting FIF. The interpolation points from
the previous experiment are separated into groups of ten, with the last of one group
forming the first of the next. This ensures continuity between the consecutive FIFs so
they can be concatenated to form the complete resulting waveform. Figure 6.3 shows
the original waveform and the resulting FIF where it can be seen that there is a
stronger visual similarity than for the previous experiments. The resulting sound,
however, is a little disappointing. Although being closer to the original than the other
results, it is still unacceptably different from the original. This can be heard as Sound
20.
Figure 6.3 Section of original wind sound (left) and part of the composite FIF
(right) constructed using groups of peak points.
The limited success of the approach taken in this section and also of the heuristic
nature of the investigation suggest that a more rigorous approach to the FIF modelling
problem needs to be taken. The desire is to find an FIF that best approximates a given
waveform and that is as simple, i.e. having a small number of parameters, as possible.
This is the inverse problem for FIFs. Recall that the inverse problem for IFS is, given
some set, find a set of contraction mappings that define an IFS attractor that is as close
as possible, in the Haussdorf sense, to the original set. There have been a number of
approaches to the IFS inverse problem for the case where the set to be approximated is
a two-dimensional quantised image. For example, see [mant89, barn91, fish92 and
107
mant92]. The results in [fish92] indicate that naturally occurring images may be
modelled with compression ratios of up to 70:1. The approaches taken to the IFS
inverse problem may be divided into two categories: search and optimisation. The
former approach involves systematically searching a subset of the space of all IFS
mapping parameters to find a set of that minimises the collage error (see Section
3.11.4.). The latter approach also seeks to minimise the collage error, but using
iterative optimisation techniques. It is not possible, however, to directly apply either
of these techniques to the FIF inverse problem because of the different form of data.
The IFS models use two-dimensional affine mappings to collage a two-dimensional
image. For the FIF model, two-dimensional shear mappings must collage one-
dimensional waveforms. The general approaches of search or optimisation can,
however, be appropriated and applied to the FIF inverse problem. Note that at this
stage of investigation, the aim is not necessarily to find an elegant, efficient algorithm
to solve the inverse problem, but to find out whether it is possible at all to solve the
FIF inverse problem for sound waveforms and model a naturally occurring sound with
a relatively simple FIF.
It was found that an algorithm exists in the literature that attempts to solve the FIF
inverse problem using a search technique. The rest of this chapter is concerned with
assessing this algorithm, applying it to the problem of modelling sound waveforms
and to improving its performance.
6.2. Mazel's Time Series Models
Mazel presents four time series models, and their associated inverse algorithms
[maze91] and [maze92]. The models are for general, discrete time series and are based
on FIFs. He calls them the self-affine, piece-wise self-affine, hidden variable and
piece-wise hidden variable fractal models. The self-affine version models a time series
with a single FIF in the same way as has been investigated in this and the last chapter.
The other three models are more complicated and use the recurrent and higher
dimensional variants of an FIF, see [barn88] and [barn89]. This section reviews the
results obtained with these models and their associated inverse algorithms as reported
by Mazel. A comparison of the results with the performance of amplitude
requantisation puts Mazel's models into context as compression algorithms.
Mazel presents results for each of his algorithms using a number of different time
series. The performance of the algorithms are measured by the degree of compression
obtained, and the resulting amount of degradation of the time series. The degree of
compression is measured as the ratio of the number of bits used to represent the
108
original time series to the number of bits required to represent the FIF parameters. In
most cases, the FIF parameters are quantised to the smallest number of bits such that
no further errors are introduced by the model. This is possible because Mazel shows
there is a threshold in the accuracy of parameter representation with respect to
degradation of the original time series [maze91]. The degradation is measured as a
signal to noise ratio (SNR). The SNR is defined as the power in the original signal
divided by the power of the error signal introduced by the model. The error signal is
the difference between the original signal and the FIF version. Let the original signal
(time series) be
1 0 : = T t x
t
(6.1)
and let the FIF version be
1 0 : = T t f
t
(6.2)
then the SNR is given by,
, )
dB log 10
1
0
2
1
0
2

T
t
t t
T
t
t
f x
x
(6.3)
These measures of compression ratio and SNR quantify the performance of the
algorithm at compressing the original time series. The basic aim of successful
performance of a compression algorithm is to obtain the highest values of these
measures as possible.
Table 6.1 shows a summary of the results reported by Mazel for all four of his
models and a variety of signal types. The first thing that can be noticed is that the self-
affine model appears to perform substantially better than the other models. Mazel,
however, only presents this one result for the self-affine model and so it is not
possible to know from his experiments whether this is a freak occurrence, or an
example of the performance which is typically to be expected. By inspection of the
mountain profile data plotted in [maze91] it can be seen that approximately one third
of it is close to being a straight line. This might account for the good performance
relative to the other model/signal type combinations.
109
Model
Type
Time
Series
Number of bits
per sample of
original
Compressio
n
Ratio
SNR
in dB
self-affine mountain profile 9 22:1 35
piece-wise
self-affine
well logging data 16 6.4:1 16.8
piece-wise
self-affine
ECG 12 5.3:1 23.2
piece-wise
self-affine
seismic data 12 5.4:1 19.7
piece-wise
self-affine
speech 16 6.4:1 16.9
hidden variable sunspot data 8 4.6:1 10.1
hidden variable ECG 12 7.5:1 12.1
piece-wise
hidden variable
seismic data 12 9.2:1 10.8
piece-wise
hidden variable
speech 16 5.6:1 15.5
Table 6.1 Summary of the results obtained by Mazel for his four FIF based
models/inverse algorithms [maze91 and 92].
It can also be seen that there is a general relationship between the SNR and
compression ratio such that as the compression ratio increases, the SNR decreases.
This is to be expected as more errors are likely to be generated as the signal is further
compressed. How, though, are these results to be interpreted? It would be revealing to
compare the relationship between compression ratio and SNR with those of other
compression schemes since Mazel does not do this himself. To put Mazel's results in
perspective, then, it is revealing to compare them with the theoretically expected
performance of an amplitude requantisation of the original time series.
6.3. Comparison with Requantisation
Amplitude requantisation is, effectively, a simple and crude way of compressing a
time series by discarding some of the sample amplitude information. It therefore
provides a simple benchmark against which any compression scheme may be
compared. If the performance measures of a compression scheme are equal to or
worse than those of requantisation, then it would be just as effective, and easier, to
discard sample information to compress the signal. However, Mazel's original signal
data is unavailable to requantise so that a direct comparison with his FIF models
cannot be made. It is therefore necessary to make a comparison of the FIF models'
performance with the theoretically expected performance of requantisation.
110
The following analysis approximates the errors involved in the requantisation
process under certain general conditions so that an idea of the relationship between
compression and SNR can be obtained.
Begin with a general, complex, digital signal,
1 0 : = T t x
t
(6.4)
whose samples have been linearly quantised to r amplitude levels, or log
2
r bits. For
example, consider the signal to be a general digital audio time series. Assume the
amplitude range is normalised so that
1 s
t
x
(6.5)
Let the amplitudes of the samples be requantised, by rounding, to q levels where
q<r (6.6)
The full original, amplitude range of 2 will be mapped onto the q levels of the
requantised signal and therefore original amplitude ranges of size
2
q
will be mapped
onto the individual quantisation levels. Let the requantised signal be ' x
t
. The
requantisation process will generate an error signal,
t
, where
x x
t t t
= ' +
(6.7)
The maximum amplitude error, per sample, of the requantisation process will be of
magnitude,
c s
1
q
(6.8)
This is demonstrated in Figure 6.4.
Original
Signal amplitudes of :
Requantised
0
-1
1
q levels
2/r
maximum
=1/q
error
r levels
mapping of
amplitude ranges
Figure 6.4 Mapping of amplitudes in requantisation process.
It is common in the digital signal processing literature to assume, under these
conditions, that the quantisation error signal will be a zero mean, uniformly
111
distributed, white noise process that is uncorrelated with the original signal [carl86].
The amplitude probability distribution function of the error signal, , )
t
p , will be
constant over the range
q q
t
1 1
s s


(6.9)
and have a value of
q
2
so that the total area under the pdf is unity. The power of the
error signal is equal to its expected square value, or its variance. That is,
, ) , )
q
q
q
q
t
t t t t
q
d p E P
1
1
1
1
3 2
3
2 2

(
(

= = =
}

=
1
3
2
q
(6.10)
To estimate the maximum signal to noise ratio, SNR
max
, of the original signal to
the requantisation noise signal, consider that the maximum power of the original is
limited to P
x
s1 because it has normalised amplitude. Therefore the upper bound on
the SNR is
, ) dB 3 log 10 3
1
2
10
2
3
1
2
q q
q
= =
(6.11)
or, in terms of bits,
, )
b 2
10 max
2 . 3 log 10 SNR = dB
=4.77+6.02b dB
(6.12)
where
q
b
= 2 (6.13)
This gives an approximate relationship for the expected SNR for a simple,
rounding requantisation of the original time series. Note, however, that the result
obtained is an upper bound on the SNR. In practice, a signal might not have the
maximum power assumed, and so the SNR is likely to be smaller than that given by
this result. The effective compression of the requantistion process is simply given by
log
log
2
2
r
q
(6.14)
Now compare the performance of requantisation with that of one of Mazel's
algorithms. Take, for example, the case of using speech as the input time series for the
piece-wise self-affine model. According to the results shown in Table 6.1, a
compression of 6.4:1 is obtained with a corresponding degradation described by a
SNR of 16.9dB. Since the original signal was quantised to 16 bits, a requantisation
achieving the same compression would require 16/6.4=2.5 bits per sample. According
to Equation (6.12), an expected maximum SNR for the requantisation would be
4.77+6.02x2.5=19.8 dB (6.15)
112
This performance is of the same order, in fact better under the assumptions made, than
that obtained with Mazel's algorithm. This result suggests then, that at a first
inspection, Mazel's algorithm is no better than a simple requantisation of the original
time series.
To compare all of Mazel's results with the theoretical performance of
requantisation, Figure 6.5 shows graphs of SNR against compression ratio. Four
graphs are shown to account for the fact that the original time series used by Mazel are
originally quantised to different numbers of bits. As a result of this, although the
theoretical SNR remains the same when requantising to a certain number of bits (as
long as it is less than the original), the resulting compression ratio changes.
The performance of Mazel's algorithms relative to the theoretically derived
requantisation performance is indicated by the position of the SNR/compression data
pairs relative to the line. Pairs below the line indicate performance that is worse than
requantisation, on or near the line indicates similar performance, and above the line
indicates better performance. As can be seen, most of the results are worse, or only
slightly better, than the theoretically expected performance of requantisation. The
exceptional case is, as already mentioned, that of the mountain profile time series with
the self-affine model, whose performance is, by far, the best.
In order to try and confirm these findings, and to further experiment with Mazel's
techniques, the following section presents results for a reimplementation of one of
Mazel's algorithms using complex sounds as input.
113
Compression ratio
SNR
in dB
10
100
1 10
10
100
1 10
1
2
1
2
theoretically expected
requantisation
performance
speech
piece-wise
+
hidden variable
speech and
well-logging
piece-wise
self-affine
+
Compression ratio
SNR
in dB
10
100
1 10
ECG
piece-wise
self-affine
+
piece-wise
self-affine
+
seismic
hidden variable
ECG
+
piece-wise
hidden variable
+
seismic
theoretically expected
requantisation
performance
1
2
3
4
1 2
3
4
Compression ratio
SNR
in dB
1
10
100
1 10 100
mountain profile
+
self-affine
theoretically expected
requantisation
performance
1
1
Compression ratio
SNR
in dB
10
100
1 10
sunspot data
+
hidden variable
theoretically expected
requantisation
performance
1
1
Figure 6.5 Degradation against compression performance of Mazel's inverse
algorithms for a variety of data and model types compared with the theoretically
expected performance of requantisation.
114
6.4. Mazel's Inverse Algorithm for the Self-Affine Model
The self-affine model is chosen for reimplementation for several reasons. Most
importantly, this model is of the same form as that used for the experiments in the
previous chapter where a portion of sound waveform is represented by a single FIF.
Secondly, it is the simplest of Mazel's model/algorithm pairs and is therefore the
easiest to reimplement. Finally, as discussed in the review of Mazel's results, more
experiments are needed with the self-affine model to determine its typical behaviour.
Mazel's inverse algorithm for the self-affine model is a search technique that seeks
to find a set of FIF parameters given some time series. The parameters found are such
that they define an FIF attractor that approximates the original time series. The FIF
parameters consist of a set of interpolation points and vertical scaling factors. The
interpolation points are derived from the samples of the original time series and so the
search is for a smaller subset of the original samples that define the resulting FIF. The
search exploits the collage theorem to find this subset of samples.
Recall that the collage theorem allows an error criterion to be established for the
IFS inverse problem (see Section 3.11.4). Given some original set, the collage
theorem states that a collection of contraction mappings will define an IFS attractor
that is close to the original set if the mappings form a close collage of the original set.
A close collage is one where the difference, or error, between the original set and the
collage of that set is small. Mazel's algorithm applies the collage theorem by searching
for a set of interpolation points whose associated shear mappings form a good collage
of the original time series.
Let the original time series be the set of samples,
{ ; 1 0 : = T t u
t
(6.16)
and let the interpolation points to be found be
, ) { ; N n y x
n n
0 : , = (6.17)
These are restricted to be a subset of the original time series samples and so
, ) , ) { ; 1 0 : , , = e T t u t y x
t n n
(6.18)
which in effect means that only the x positions of the interpolation points need to be
found since the y values are implied by the original time series sample values. It is
also necessary to find a set of vertical scaling factors, one for each consecutive pair of
interpolation points,
{ ; N n d
n
1 : = (6.19)
The main function of the algorithm is to test pairs of consecutive interpolation
points chosen from the original time series samples. To begin the search, the left-hand
point of a pair is fixed on the first sample of the original time series, i.e.
, ) , )
0 0 0
, 0 , u y x = (6.20)
115
The second, or right-hand, point of the pair is then tested at every value of t along the
original time series, except the very closest to the left-hand point, i.e. from
1 2
1
= T x (6.21)
The closest point is not tested because a neighbouring pair of samples define a trivial
mapping. An example pair of interpolation points is shown in Figure 6.6
u
t
original time series
interpolation points
T-1
0
t
(x ,y )
0 0
(x ,y )
1 1
Figure 6.6 First trial pair of interpolation points on the original time series graph.
mapped version of
original time series
u
t
original time series
T-1
0
t
(x ,y )
0 0
(x ,y )
1 1
Figure 6.7 Mapping of whole time series to in between the first pair of
interpolation points.
Each test involves a calculation of the collage error for the piece of collage defined
by mapping the whole original time series waveform to in between the trial pair of
interpolation points. Each test consists of the following steps:
116
- calculate a value of the vertical scaling factor for the pair of interpolation
points so that a shear mapping is defined,
- apply the shear mapping to the whole time series to form a collage of the
portion of original time series between the pair of interpolation points. This is shown
in Figure 6.7.,
- calculate the closeness of fit, i.e. the error, of this piece of collage
The result of each test is a single error value which is then temporarily stored. At
the end of the sequence of tests, a collage error is known for each possible position of
the mobile right-hand interpolation point, , )
1 1
, y x . These errors are then compared so
as to determine which position of , )
1 1
, y x generated the lowest error and therefore the
best collage. The chosen position is then stored, with its associated vertical scaling
factor, as part of the resulting FIF parameter set.
The search sequence is then repeated, but the fixed point is made to be , )
1 1
, y x ,
and a new trial point, , )
2 2
, y x is introduced. This is then tested at every position along
the original time series,
, ) , ) 1 2 : , ,
1 2 2
+ = = T x t u t y x
t
(6.22)
Comparison of the resulting test errors yields another part of the final solution.
This routine is repeated until the last trial interpolation point is chosen to be at, or
near, the end of the time series. The result is then a set of interpolation points that
define mappings that form a collage of the original time series.
In the test procedure, the vertical scaling factor of the mapping associated with
each trial pair of interpolation points is calculated so that the maximum vertical extent
of the mapped original time series equals the maximum vertical extent of the original
time series between the pair of interpolation points. This is illustrated in Figure 6.8.
Although this method does not minimise the resulting collage error, it is used because
it is simple to implement and gives a good approximation to the optimal value. Mazel
experiments with other more complicated methods, but the corresponding results are
not significantly different [maze91].
117
original time series
mapped version of
original time series
part of
maximum vertical extents
(x ,y )
n n
(x ,y )
n+1 n+1
Figure 6.8 Maximum vertical extent of part of the original time series between a
pair of consecutive interpolation points and the maximum vertical extent of the
mapped original time series. The vertical scaling factor is calculated so as to make
these two extents equal.
The collage error for a piece of collage is calculated as the mean square difference
in amplitudes between mapped and original time series. Let
, ) , )
1 1
, , ,
+ + n n n n
y x y x (6.23)
be an interpolation point pair with an associated shear mapping (see Section 5.1),
|
|

\
|
+
|
|

\
|
=
n
n
n n
n
n
f
e
d c
a
w
0
(6.24)
where the vertical scaling factor, d
n
, is defined by setting the maximum vertical
extents equal. Let w
n
map the original time series,
{ ; 1 0 : = T t u
t
(6.25)
into
{ ;
1
:
+
=
n n t
x x t v (6.26)
where
, )
n
t
n j
e
u
t a v +
|
|

\
| = 0
(6.27)
and
, )
(

+
|
|

\
|
=
n
t
n n
f
u
t d c
j int
(6.28)
for t=0...T-1 (see Section 5.2). The collage error is then found from
, )

+
=
=
1
2
n
n
x
x t
t t n
v u
(6.29)
What is convenient about Mazel's search algorithm is that no value for the number
of mappings, N, is chosen in advance. N is determined by the operation of the search
since it depends on the position of the interpolation points with the lowest collage
errors. As a result, however, only a partial search of all possible configurations of
interpolations points is carried out. That is, only a subset of the full parameter space is
118
explored. For example, although the position of , )
1 1
, y x is chosen to minimise the
collage error for the section between , )
0 0
, y x and , )
1 1
, y x , it is then fixed. This
precludes a solution where the error for this section may be suboptimal, but the overall
collage error will be lower. For more details on the operation of Mazel's algorithms,
see [maze91] and [maze92].
6.4.1. Initial Results
The algorithm, as outlined above, has been implemented so that sounds may be
used as the original time series. The algorithm has been given the additional capability
that a time series may be processed as a number of consecutive shorter sections. That
is, an original time series of length T
tot
may be modelled as m separate time series of
length T with m concatenated FIFs so that
T
tot
=m.T (6.30)
This capability has been added for a number of reasons. Firstly, it allows the
performance of the model/algorithm to be evaluated as an average which is considered
to give a more reliable result than the performance for a single FIF. So, for example,
the performance of a T length FIF may be averaged over m sections of a time series
taken from the same source. Secondly it allows variation of the FIF length to see if
this variable effects the performance of the algorithm. Thirdly, it allows long sound
time series to be processed without prohibitive processing time.
To see this last point, consider the following analysis of the computational
processing time of the algorithm. Consider the worst case situation of the most
number of tests that the algorithm will have to evaluate for a given original time
series. Let T be the length of the original time series. The first interpolation point is
fixed at t=0 and the second is tested at t=2...T-1 giving T-2 tests. The worst case
giving the most number of tests is when the best collage is found to be the one for
which the second interpolation point is at t=2. The next sequence of tests for the third
interpolation point must then cover the values t=4...T-1 giving T-4 tests. If the testing
routines continue in this way so that the last sequence of tests is from t=T-3...T-1
giving 2 tests, then the total number of tests will be
t=(T-2)+(T-4)+(T-6)+...+4+2 (6.31)
So,
=
=

2
1
2
2
n
n
T
(6.32)
which is a finite arithmetic series and so the expression for the total of number of tests
becomes
119
=
+ T T
2
2 4
4
(6.33)
The computational processing time of the inverse algorithm is therefore of order
, )
2
T O . Processing a long sound time series as a number of shorter FIFs will therefore
require less computational time than processing it as a single long FIF.
The first experiment with the reimplementation of the algorithm is with a range of
different sound time series. Table 6.2 shows a summary of the resulting performance
figures. Because the original sound time series have been processed as a number of
shorter sections, each modelled as an individual FIF, the figures for compression ratio
and SNR give an average performance for the types of sound used.
original sound
time series
length of
resulting FIF
number of
shear
mappings used
compression
ratio
SNR in dB
wind noise 996 368 1.28:1 35.4
filling bath 994 364 1.29:1 10.4
industrial
roomtone
995 397 1.18:1 41.3
river 996 395 1.19:1 19.8
gong 995 428 1.09:1 25.6
violin 993 467 1.0:1 31.8
Table 6.2 Summary of results for reimplementation of Mazel's algorithm for the
self-affine model. Each original time series of length T
tot
has been processed as m=10
sections of length T=100.
number of
sections
section
length
length of
output
number of
mappings
compressio
n ratio
SNR
in dB
10 10 93 38 1.15:1 35.7
10 20 196 88 1.05:1 35.5
10 30 295 121 1.15:1 35.9
10 40 396 163 1.14:1 34.5
10 50 494 195 1.19:1 10.2
10 60 593 228 1.23:1 12.9
10 70 693 293 1.11:1 37.1
10 80 796 321 1.17:1 37.7
10 90 893 387 1.09:1 34.9
Table 6.3 Running algorithm with wind noise as original time series for a variety
of section lengths T.
120
Note that the algorithm often produces an FIF model of slightly fewer samples
than that asked for. For example, when asked to process 10x100 sample sections, the
algorithm produces an FIF model of only 996 samples. This is a consequence of the
algorithm choosing a trial interpolation point that is close to the end of the section of
original time series. Instead of then forcing another interpolation point to be at the
very end of the section, which will probably form a bad collage, the algorithm is
allowed to terminate.
One time series, that of wind noise, has also been processed with a variety of
section lengths to see if this has an effect on performance. These results are shown in
Table 6.3. All of the performance figures shown in the tables have been calculated in
the same way as by Mazel to enable direct comparison. In particular, it is assumed that
the interpolation points may be represented with 9 bits for both the x and y values,
whereas 16 bits are required for the vertical scaling factors. Since the original time
series are all quantised to 16 bits, the compression ratio is found from
16
34
T
N
tot
tot
(6.34)
where N
tot
is the total of number of mappings used and equal to the sum of Ns, the
number of mappings used per FIF section.
As can be seen from the tables, the results are of a very different nature compared
with Mazel's single result for the mountain profile. For this reimplementation there is
barely any compression at all, while the algorithm makes the time series deteriorate to
varying degrees. Changing the length of each section over the range shown does not
effect this result. By inspection of the operation of the algorithm, the problem appears
to be that the successful tests, when an interpolation point position is on trial, occur
for very low distances between the interpolation point pairs. The result is to generate
too many interpolation points which results in a low compression ratio. The choice of
interpolation points is determined by the closeness of the collage measured by the
collage error. The small distance between interpolation points therefore gives rise to
the lowest error. This, however, is a consequence of limited resolution. As discussed
in Chapter 5, when the whole time series is mapped to in between close interpolation
points there is effectively a severe sub-sampling. The error is then measured by
comparing a few points of the original with the few points of the mapped time series.
The detail of the mapped time series is therefore lost and does not contribute to the
error.
121
6.4.2. Error Weighting
A possible solution to this problem is to weight the error according to the distance
between the interpolation points so as to positively discriminate for those which are
further apart when a choice is made. Increasing this distance will increase the amount
of compression. This idea has been implemented by weighting the error as a function
of the trial interpolation point pair spacing. The weighting function is a linear one
constructed so that the error is decreased proportionally to the distance between the
interpolation point pair. The gradient of the linear weighting function may be varied
via a parameter o which is normalised to operate in the range 0 1 s s . For o=0, the
weighting is constant and equal to unity, and so the algorithm is no different from
Mazel's original. At the other end of the range, o=1, the function is designed so that
for the greatest distance between the trial pair of interpolation points, the collage error
is weighted to be near zero, and therefore guaranteeing a high chance that that pair of
interpolation points will be chosen. Choices of o in this range allows control over the
spacing of resulting interpolation points and therefore the degree of compression of
the algorithm. The error weighting function is illustrated in Figure 6.9.
The results shown in Table 6.4 and again in Figure 6.10 show how the
performance figures are effected by error weighting. Again, wind noise has been used
as the original time series.
t
end of original position of fixed
interpolation point time series
error
weighting
0
1
o=0
o=1
t=T-1
t=x
n
range of variable
interpolation point
Figure 6.9 Error weighting function parameterised by o.
122
gradient
parameter
o
length
of output
mappings
used
compression
ratio
SNR
in dB
0 994 421 1.11:1 35.4
0.1 999 397 1.19:1 29.2
0.2 999 348 1.35:1 19.5
0.3 999 308 1.53:1 14.6
0.4 999 242 1.94:1 9.5
0.5 999 207 2.27:1 8.9
0.6 999 167 2.82:1 7.5
0.7 999 132 3.56:1 6.84
0.8 999 102 4.6:1 5.67
0.9 999 77 6.14:1 3.46
1.0 999 55 8.56:1 1.06
Table 6.4 Results of error weighting the inverse algorithm for a range of weighting
function gradients, o. The original time series is wind noise and is processed as 10x
100 sample sections.
o
compression
ratio
0
1
2
3
4
5
6
7
8
9
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
5
10
15
20
25
30
35
40
SNR
in dB
compression
ratio
SNR
Figure 6.10 Graph of the results shown in Table 6.4.
123
These results show that the compression ratio may indeed be controlled with the
parameter o. It can also be seen that there exists the same kind of relationship between
compression ratio and signal to noise ratio where, as the amount of compression
increases, the quality of the resulting signal decreases.
To put these results in perspective, as was done with Mazel's results, it is
necessary to plot them along side the performance of requantisation. This was done
with Mazel's algorithms by plotting the theoretically expected performance of
requantistion. It is now also possible to plot the actual performance of requantising the
original sound time series, as the original data is now available. This provides a fairer
and more realistic comparison for the FIF inverse algorithm. It also allows a test of the
requantisation theory and a comparison between the actual and theoretically expected
performance. It is expected that the actual requantisation performance will be worse
than predicted, because the original signal does not contain the maximum power for
its dynamic range. As a result, the FIF model performance should appear improved
relative to requantisation.
Figure 6.11 shows a comparison of the theoretically expected and actual
requantisation performance with that of the error-weighted, self-affine model using
wind noise as input. The actual requantisation has been carried out by rounding the
sample amplitudes to simulate the process described in Section 6.3.
Compression ratio
SNR
in dB
1
10
100
1 10
theoretically expected
requantisation
performance
actual requantisation
performance
performance of FIF
inverse algorithm
with error weighting
in the range
1
2
o=0to o=1
1 2
Figure 6.11 Comparison of performance between requantisation and error-
weighted version of Mazel's algorithm. The original is 1000 samples of wind noise
which is processed as 10x100 sample sections.
124
Two things can be seen from the graph shown in Figure 6.11. Firstly, the actual
requantisation performance is of the same order as that which is theoretically
expected. Also, as anticipated, the actual requantisation performance is slightly worse
due to the original time series not having the full power assumed in the theoretical
model. Secondly, the graph shows that the degree of compression relative to signal
degradation is no better than that achieved by requantisation. This result, in
conjunction with Mazel's original results and those obtained for the other sound time
series, therefore suggest that the FIF model is no model at all. There appears to be no
advantage to representing a time series with an FIF. The modelling process is
computationally costly and yields results that are worse than those obtained by simply
requantising the original signal.
6.4.3. Interpolation Point Range Restriction
By studying the working of the inverse algorithm with error weighting, it is found
that the poor performance is still related to the distance between the chosen
interpolation point pairs. The process of error weighting is intended to stop close
spacing of the interpolation points. Unfortunately, the effect is to continue to produce
closely spaced points, but to also produce a number of widely spaced points as well.
The combination of both close and widely spaced interpolation points produces poor
results. The close interpolation points decrease the compression ratio, while the
distant ones considerably reduce the SNR. What is needed is for the algorithm to
choose interpolation points that are more evenly spaced and lie somewhere in between
the two extremes described above.
To test this hypothesis, the algorithm has been further modified so as to restrict the
range of the trial interpolation point. That is, instead of allowing it to be tested for the
whole range of positions between the fixed interpolation point and the end of the time
series, it is tested within a chosen window of values. The restriction window is
described by its left- and right-hand positions, in samples, relative to the section of
original time series being processed. Let these positions be l and r respectively where
2 3 s s l T (6.35)
and
l r T + s s 2 1 (6.36)
This fixed window, like error weighting, allows control over the distances
between the interpolation points and therefore over the degree of compression. Table
6.5 shows the performance figures where the original time series is wind noise and a
125
range of different window positions and lengths are used. These performance figures
are plotted alongside the requantisation performance in Figure 6.12
As can be seen from the graph, the effect of restricting interpolation point
positioning is to significantly improve the performance of the algorithm. Now the
performance is, in most cases, better than that of both the theoretically expected and
actual requantisation. For high compression ratios, the performance relative to actual
requantisation is considerably better. For example, requantising the wind noise to give
a compression ratio of 5.3:1 results in a SNR of 13.1dB. A similar degradation of
14.1dB produced by using the modified algorithm corresponds, however, to a
compression of 16.7:1. This is over three times the amount. For a visual comparison
of an extract of the original time series with the resulting FIF version, see Figure 6.13.
left of
window,
l
right of
window,
r
window
width
length of
output
mappings
used
compression
ratio
SNR
in dB
5 15 10 981 151 3.1:1 30.0
10 20 10 970 79 5.8:1 25.2
15 25 10 930 54 8.1:1 22.6
20 30 10 952 40 11.2:1 19.7
18 22 4 972 50 9.2:1 19.4
10 30 20 962 73 6.2:1 25.7
15 20 5 938 55 8.0:1 22.9
15 30 15 922 52 8.4:1 22.6
15 40 25 941 52 8.5:1 22.7
15 80 65 941 49 9.0:1 17.2
20 25 5 883 40 10.4:1 16.3
20 40 20 942 39 11.4 19.6
25 50 25 893 29 14.5:1 14.6
30 60 30 921 26 16.7:1 14.1
Table 6.5 Performance of modified FIF inverse algorithm with a specified window
restricting the range of the trial interpolation point.
126
Compression ratio
1
10
100
1 10 100 20
theoretically expected
performance of
requantisation
performance of
actual
requantisation
performance of FIF
inverse algorithm
with interpolation point
position restriction
SNR
in
dB
Figure 6.12 Comparison of performance of the window restricted inverse
algorithm with that of requantisation. The original time series is wind noise and
processed as 10x100 sample sections.
Figure 6.13 Waveform plot of original wind noise (left) and compressed FIF
version (right) using the modified inverse algorithm. The compression ratio in this
case is 8.1:1, and the SNR is 22.6dB
127
sound length of
output
mappings
used
compression
ratio
SNR in dB
gong 943 52 8.5:1 1.9
river 934 51 8.6:1 2.4
filling bath 956 52 8.7:1 0.9
industrial
roomtone
959 55 8.2:1 23.4
violin 961 57 7.9:1 10.0
chattering
crowd
950 55 8.1:1 8.4
city skyline 928 54 8.1:1 21.9
Ecuadorian
rainforest
948 51 8.8:1 2.3
laboratory
roomtone
953 53 8.5:1 15.5
audience
laughter
935 55 8.0:1 12.4
seawash 932 54 8.1:1 8.4
Table 6.6 Table of performance figures for window restricted inverse algorithm
using a variety of sound time series. Each original time series is processed as 10x100
sample sections and the restriction window is set at l=15 and r=25 samples.
0
5
10
15
20
25
gong river filling
bat h
indust -
rial
room-
t one
violin chat t er-
ing
crowd
cit y
skyline
rain-
forest
labor-
at ory
room-
t one
aud-
ience
laught er
seawash wind
noise
t heoret -
ical
requant -
isat ion
SNR in dB Compression Ratio
Figure 6.14 Column chart showing the performance figures given in Table 6.6 for
a variety of different original sound time series.
128
As a final experiment, the modified inverse algorithm is applied to a wide range of
other sound time series. The restriction window is fixed and chosen to give
approximately 8:1 compression. The degradations for each type of sound may then be
compared. The performance figures for this experiment are shown in Table 6.6 and
graphically in Figure 6.14 Although the restriction windows results in a similar
compression ratio for each sound, the associated degradation varies considerably. The
sounds which achieve the best performance and which give better
compression/degradation figures than those theoretically expected for requantisation
are wind noise and some other ambient environmental sounds such as the industrial
roomtone and the city skyline.
6.5. Conclusions
In this chapter, the FIF inverse problem for sound time series has been considered.
That is, given an original sound, find a subset of its time series samples that, when
used as interpolation points, define an FIF that is close to the original. In the first
section of this chapter, a number of heuristic experiments were presented based on the
idea of extracting information from the coarsest scale of statistically self-affine time
series. This information was then used to specify the interpolation points that define
an FIF. The experiments with this technique, however, have not generated any FIFs
that sound enough like the original for the technique to be of any use. It was
concluded that a more rigorous technique is required to see if the FIF inverse problem
has any solutions for sound time series.
The rest of the chapter has then concentrated on a search technique found in the
literature which was devised by David Mazel. This, it has been claimed, solves the
FIF inverse problem by partially searching the space of subsets of original time series
samples. It has been shown, however, that the performance quoted for the inverse
algorithm, and for a number of other more complicated FIF-based models/inverse
algorithms, is no better than the theoretically expected performance of simple
amplitude requantisation. That is, for a given degree of compression, the signal
degradation introduced by the FIF model is greater than, or approximately equal to,
that introduced by requantising the sample amplitudes. The inverse algorithm has
been reimplemented and used to process a wide variety of sound time series and to
obtain average results. The results from this have confirmed the fact that Mazel's self-
affine model and inverse algorithm have poor performance relative to requantisation.
It has also been concluded that the good result reported by Mazel for the mountain
profile data is an exception and not typical of the FIF model/inverse algorithm.
129
The realisation that Mazel's models/algorithms are no better than requantisation
has prompted a number of experiments to modify the self-affine inverse algorithm. By
inspection of the workings of the algorithm, it was found that the poor performance is
a result of the uneven spacing of the chosen interpolation point pairs. A novel version
of the algorithm which eliminates this problem has been demonstrated which
produces some results that perform much better than amplitude requantisation. The
sound time series for which the best results have been obtained are those of wind
noise, industrial roomtone and a city skyline. Since the performance figures are
considerably better than for requantisation, it can be concluded that the FIF model is
suitable for these particular sounds and that there is something inherent to them that
the algorithm is exploiting to achieve relatively low-degradation compression. Since
the FIF model is based on representing complex fractal waveforms with simple
systems, it can be concluded that model/inverse algorithm works by exploiting some
of the fractal redundancy present in the original signals. This conclusion is confirmed
by considering that in Chapter 4 it was shown that both wind noise and the industrial
roomtone are examples of 1/f noises which are necessarily statistically self-affine
signals. The success of the modified algorithm is therefore a satisfying result since it
both confirms that these signals do have fractal properties and demonstrates that these
properties can be exploited by a fractal model.
The inverse algorithm, however, has not been designed to be computationally
efficient, nor does it provide optimal results. Since the algorithm only searches part of
the space of possible solutions, there is still room for improving the performance of
the algorithm. This could be achieved with a complete search which, however, would
be very computationally intensive, or by using a global optimisation technique, such
as a genetic algorithm. Also, no attempt has been made to reimplement or modify
Mazel's other more complicated models/algorithms. There is therefore potential for
further work on FIF based models for sound and the results presented in this chapter
have indicated the potential of pursuing this line of research.
131
Chapter 7
Chaotic Predictive Modelling
The previous two chapters have concentrated on the problem of modelling a sound
by representing the graph of its time domain waveform with a strange attractor. Recall
from Chapter 4 that the other suggested approach is to represent the dynamics of the
sound with a strange attractor. This approach is explored in this and the next chapter.
As has been established in Chapter 2, a sound is presumed to be represented with
digital audio and is therefore already in a form directly compatible with the FIF
models considered in Chapters 5 and 6 which represent the time series waveform. In
order to model the dynamics of the sound, however, it is necessary to develop a
suitable further representation. This will involve considering the relationship between
a chaotic system and a time series derived from it. The beginning of this chapter is
devoted to this issue and concludes by describing an analysis/synthesis model
requiring the solution of a specific inverse problem. The rest of this chapter then
focuses on an approach to this problem inspired by work on time series prediction.
The following chapter presents a related idea that is found to solve the roomtone
problem.
7.1. Chaotic Time Series
The founding assumption of this approach is that a chaotic system is responsible
for the sound that is to be modelled. Because the sound is in the form of digital audio
and therefore the model is to operate in the discrete time domain, it is convenient to
begin with a discrete dynamical system defined by a mapping:
, )
n n
x F x =
+1
x
n
d
e =
e
+
X R
Z
(7.1)
or
, )
0
x F x
n
n
=
(7.2)
where x is the d-dimensional state vector in state space X, n is discrete time, F is an
invertible nonlinear mapping, and let x
0
be the initial condition of the system. Assume
that this system possesses a strange attractor, A, and an associated physical measure .
Recall that the attractor represents the long term dynamical behaviour of the system as
132
it is the set on which any typical trajectory of the system will lie, after transients. That
is,
x A n
A
n
e
c
for sufficiently large
X
(7.3)
For the rest of this chapter, n is considered to be any time value sufficiently large for
transients to have passed. Because of the unpredictable nature of chaos, the state can
be interpreted as a random vector. The physical measure then describes the
probabilistic distribution of states on the attractor. Let this system be known as the
original system and denote it by
=( x ,X,F,A,) (7.4)
that is,
system=(state vector, state space, system mapping, attractor, associated measure)
In general, consider that not much will be known about this system as the state and
the mapping are not directly accessible. It is therefore not possible to construct,
directly, a physical model for the sound. Instead, assume the only source of
information available is the sound in the form of a digital audio time series. Consider
that this is generated by observing the original system through an observation
function,
, )
R R
=
d
n n
o
x o u
:
(7.5)
where u is the time series and o is the observation function. The observation function
could represent, for example, the process of monitoring a sound field at a single point
in space with a microphone.
The interpretation of the state as a random vector implies that the observations
may themselves be viewed as random variables. Consequently, the time series, u, may
be interpreted as a realisation of a stochastic process,
{ ;
+
eZ n U
n
,
(7.6)
Because a natural measure is assumed, recall that this has been defined as one that is
invariant under the mapping F. The distribution of states at some time n is then the
same as that at time n+1 and so the distribution of the random variable U
n
is the same
as that of U
n+1
. Hence the stochastic process will be stationary [tayl91].
To summarise so far: the above provides a general model for how a digital audio
time series results from recording, or observing, a chaotic system. The observing
process is summarised by the function o and the result, u, may be viewed as the
realisation of a stationary stochastic process. The objective, however, is to model the
133
sound by representing its dynamics. Ideally, this would involve modelling the attractor
and associated measure of the original system. Since the original system is not directly
accessible, however, it is necessary to reverse the observing process and gain
information about the original system from the observed time series. This may be
done with a technique known as embedding.
7.2. Embedding
An embedding is a mapping with certain properties that allows one dynamical
system to be mapped to another while preserving essential features in the process
[take81, broo91]. In particular, such a mapping is continuous, differentiable and
invertible (a diffeomorphism). Consequently, each point in the original state space
maps to a unique point in the embedded state space, neighbouring points mapping to
neighbouring points, trajectories to trajectories, and therefore attractors map to
attractors. The properties of the embedding ensure that both the topology of the
attractor and the associated probability structure are preserved.
Define a (column) vector to be composed of m consecutive values of the time
series u,
, )
T
m n n n n n
u u u u y
1 2 1
, ,
+
=
(7.7)
This is equivalent to a vector of observations of the state vector, x , and so with
Equations (7.2) and (7.5) may be rewritten as
, ) , ) , ) , ) , )
, ) , ) , ) , ) , ) , ) , ) , )
T
n
m
n n n
T
m n n n n n
x F o x F o x F o x o
x o x o x o x o y
1 2 1
1 2 1
, ,
, ,
+
+
=
=

(7.8)
and therefore this may be viewed as a mapping of the state vector,
, )
n n
x H y =
(7.9)
Generally, the mapping, H, is itself an embedding when [take81]
m d > + 2 1
(7.10)
It will therefore map the inaccessible states of the original system onto accessible
states of what is known as the embedded system. Define another mapping, G, that
comprises a shift on a sequence of observations as viewed through an m-length
register. That is,
, ) , )
2 1 1 1
, ,
+ + +
=
m n n n m n n n
u u u u u u G
(7.11)
which, again, may be rewritten as,
, ) , ) , ) , ) , ) , ) , ) , ) , ) , ) , ) , )
T
n
m
n n
T
n
m
n n
x F o x o x F o x F o x F o x o G
2 1 1
, ,
+ +
=
(7.12)
and therefore,
134
, ) , ) , ) , )
n n
x F H x H G =
(7.13)
This shows that the evolution of the original state under the mapping F is
equivalent to the evolution of the embedded state under the shift mapping G [tayl91].
An equation can therefore be written for the embedded system that relates consecutive
states with G,
, )
n n
y G y =
+1 (7.14)
Write the embedded system in full as
=( y , Y, G, B, )
(7.15)
where,
, )
m
H R X Y = =
(7.16)
is the embedded state space and m the embedding dimension. This system has an
attractor
, ) A H B =
(7.17)
and also the probability of the embedded state being in some subset, b, of the attractor
is given by
, ) , ) , )
B b
b H b
c
=
1

(7.18)
In words: the probability of finding the embedded state in some subset, b, of the
embedded attractor may be found by mapping the subset back into the original system
with the inverse of the embedding mapping and then finding the probability of the
original state being in this mapped subset.
If the embedded system is itself observed by a projection of one coordinate of the
embedded vector, the original observed time series will result. That is,
, )
n n
y p u =
(7.19)
where
, )
, )
T
m
y y y y
y y p
, ,
2 1
1
=

(7.20)
is the projection function.
To summarise, the embedding procedure enables the reconstruction of the original
system by mapping inaccessible states and trajectories onto an accessible embedded
state space. The original and embedded systems may be viewed as equivalent systems
that generate exactly the same time series via different observation functions. The
embedding procedure may therefore be seen as a means of representing the dynamics
of a sound from its digital audio time series with an attractor and measure in
135
embedded state space. Given that the time series may be viewed as the realisation of a
stochastic process, the embedding procedure creates a representation that preserves
this viewpoint. To see this, consider the mth order joint probability density function
(jpdf)
, )
1 1
,
+ m n n n U
u u u P
(7.21)
that partly describes the stationary process, { ;
n
U .This may be approximated with an
m-dimensional histogram of the occurrences of the m-tuples,
, )
1 1
, , ,
+ m n n n
u u u
(7.22)
This is exactly the same procedure as would be necessary to approximate the
measure of the embedded system, , by accumulating occurrences of the m-
dimensional embedding vectors. The jpdf and the embedded measure are therefore
equivalent descriptions of the time series as a stationary stochastic process.
Having established a suitable representation for the dynamics of a sound with an
embedded attractor and measure, the objective is now to fit this within a possible
analysis/synthesis model for the sound.
7.3. The Analysis/Synthesis Model
Figure 7.1 shows the proposed analysis/synthesis model which may be compared
with the general sound model shown in Figure 2.2. The analysis involves embedding
the original time series to form an accessible representation of the dynamics of the
original system in the form of the embedded attractor and measure. From this is
constructed a synthetic system, also in embedded state space, that has a similar
attractor and measure to that of the embedded system. Synthesis then consists of
iterating the synthetic system and observing it, via the projection function, to generate
a synthetic time series. Crucial to the analysis procedure, and which is described as the
inverse problem, is the construction of the synthetic system. Since the embedded
system is defined by the mapping G, the inverse problem may be recast as finding a
similar mapping for the synthetic system - let this be denoted G
~
. A trivial solution to
the inverse problem is to make G G =
~
. The synthetic system would then be expected
to exactly model the original time series. Equation (7.11) shows, however, that G is
defined by the original time series and so the problem becomes: represent a time series
with a mapping that is defined by that time series. This is a trivial problem and clearly
of no use.
136
original
system
embedded
system system
synthetic
time
series
original synthetic
time
series
observe
inverse
problem embed observe
identical
if
statistically
similar
if
o p H
u v
=
~
analysis synthesis
Figure 7.1 The proposed analysis/synthesis model based upon the embedded
attractor and measure representation of a sound time series.
A more realistic and useful inverse problem is: find a G
~
that is an approximation
to G given only a finite sequence of the original time series. Preferably, G
~
should be
as simple as possible, and yet define a system whose attractor and measure adequately
match those of the embedded system. By adequate, it is meant that the model
preserves qualities of the sound so as to maintain perceptual similarity between the
original and synthetic versions. Note that no attempt is being made to exactly model
the original time series in this case, only the dynamics of the system responsible for it.
Consequently, a typical sequence of iterates of the synthetic system will not match
those of the embedded system, but will lie on a similar attractor.
More formally, let the synthetic system be,
=(z , Y,
~
,
~
,
~
B G )
(7.23)
i.e.
, )
n n
z G z
~
1
=
+
(7.24)
and let this system produce a time series via the projection observation function,
, )
n n
z p v =
(7.25)
which again may be viewed as the realisation of a stationary stochastic process,
{ ;
n
V
(7.26)
with an mth order jpdf
, )
1 1
, , ,
+ m n n n V
v v v P
(7.27)
The inverse problem is to find a G
~
such that
G G ~
~
(7.28)
137
and then
~ ~
~
and
~
B B (7.29)
and it follows that
U V
P P ~
(7.30)
This shows that a solution of the inverse problem would allow the original time
series to be modelled not exactly, but such that the synthetic time series is statistically
similar to the original. More precisely, they will share similar mth order jpdfs.
Whether or not this is adequate to preserve the perceived qualities of the sound must
be determined by experimental investigation.
Because of the shifting nature of G, see Equation (7.11), and because it is a
deterministic function of the embedded state vector, see Equation (7.14), it may be
rewritten as,
, ) , ) , ) , )
T
m n n n
T
m n n n
u u y g u u u G
2 1 1
, ,
+ +
=
(7.31)
where,
R R
m
g :
(7.32)
is a vector to scalar function. The inverse problem therefore reduces to approximating
g, i.e. finding % g. To see how this may be done, consider embedding the original time
series and forming data pairs comprising an embedded vector at time n and the value
of the time series at time n+1,
, )
1
,
+ n n
u y
(7.33)
These data pairs satisfy
, )
1 +
=
n n
u y g
(7.34)
by definition - see Equation (7.31). Assume that a sequence of N samples are taken
from the original time series (after transients) and are available for embedding. Let
these be written,
, )
1
0
=
=
N i
i i
u
(7.35)
These may also be formed into a set of data pairs,
, )
2
1 1
,
=
= +
N i
m i i i
u y
(7.36)
Note that only N-m pairs can be formed from N samples. An approximation to g may
be found with a function that is satisfied by only the N-m data pairs:
, ) 2 1
~
1
= =
+
N m i u y g
i i

(7.37)
which is a problem of function interpolation. This provides a basis for finding g
~
by
specifying its value at particular places, but not elsewhere. This condition may
therefore be relaxed slightly by replacing the equality in (7.37) to give
138
, )
1
~
+
~
i i
u y g
(7.38)
and still maintaining that
g g ~
~
(7.39)
To summarise, this section has proposed an original analysis/synthesis model for
sound based on an embedded attractor/measure representation of a digital audio time
series. This model relies on the solution of an inverse problem which reduces to one
of function interpolation/approximation.
7.4. The Inverse Problem
Several strategies for the solution of a similar problem have been proposed in
work concerned with the accurate short-term prediction of time series believed to have
come from chaotic systems [broo91, farm87, casd89, tayl91, sing92]. The sound
model proposed is in fact an extension of this work, being based on ideas from it, but
having a different purpose. In the prediction work the emphasis is on finding a
function, known as the forecast, or prediction, function, which is of the same form as
Equation (7.37) and which enables as accurate a prediction of a given time series as
possible. As a consequence of this, the strategies used are not ideally suited for use in
the proposed sound model. The following is a discussion of the reasons for this and
concludes with the need for a different approach which is then developed in the next
section.
Firstly, the strategies are most concerned with accurate prediction which is
equivalent to recreating a given sequence of the original time series as closely as
possible. Recall that the intention is for the sound model to capture the form of the
dynamics of a given time series, and not recreate it exactly. As a consequence of the
emphasis on accuracy, there is little interest in producing a function that is simple to
describe. For the sound model, a different situation is sought: a function that is as
simple as possible, to allow the model to be conveniently parameterised, but that can
adequately capture the perceptually important properties of the sound. This is not to
say that the function sought should not be accurate at forecasting, but that there is a
trade-off between accuracy and simplicity that will be determined by the different
requirements of the sound model. Also, when the concern is for short-term
forecasting, there is no consideration of the computational cost of iterating the
forecasting function, as this will only be done a relatively small number of times - up
to the order of 100. For the generation of a sound, however, the synthetic system will
139
be expected to produce ~50,000 values for each second of output. There is therefore a
difference of several orders of magnitude between the number of iterations required.
Finally, this leads to another consideration, that of stability. There is no concern in the
forecasting work for whether iteration of the forecasting function produces a stable
chaotic system.
There are two forms of prediction function found to have been used in the
literature: global and local. The global functions are single nonlinear functions, such
as polynomials or radial basis functions, which are defined over the whole embedded
state space. The local functions are piece-wise linear, or low-order polynomials whose
domains are defined according to a nearest neighbour criteria. That is, to predict the
future behaviour of any given vector in embedded state space, a number of nearest
neighbours are found from the embedded sequence, a linear or low-order polynomial
is fitted to satisfy Equation (7.38), and the resulting function used to predict the next
vector.
Global nonlinear prediction functions are known to be much more costly to
compute, especially for large values of N and m, than their piece-wise linear
counterparts [casd89]. Also, the piece-wise linear functions have strong similarities to
affine IFS. Recall the intention presented in Chapter 4 to concentrate on IFS as they
are a well understood means of manipulating strange attractors (this will be discussed
again in the Further Work section later in this chapter). For these reasons, piece-wise
linear functions are a preferable choice. As mentioned, however, the natural way in
which they are used in the prediction work involves calculating the domain of each
linear section with a nearest neighbour method. This is a computationally intensive
process [broo91] that is required once per iteration of the prediction function, the
complete prediction function not being calculated once in advance, as in the proposed
sound model. It is not obvious how this technique could be adapted so as to achieve
this.
What is therefore required is a new solution to the inverse problem that is more
suitable for use with the proposed sound model. Preferably this would result in the
specification of an easily implemented, piece-wise linear function to be used as g
~
in
the synthetic system. The next section presents such a solution and is followed by a
number of experimental results that reveal some of the capabilities of the resulting
sound model.
140
7.5. A Solution to the Inverse Problem
The inverse problem is now: given the set of data pairs in Equation (7.36), find a
piece-wise linear function, g
~
, that satisfies Equation (7.38). The full specification of
the piece-wise linear function requires two parts, the partition of embedded state space
that defines a set of disjoint domains for the individual linear functions, and the linear
functions themselves. Let the partition into domains be a set of subsets of the
embedded state space,
{ ;
Q j
j
j
D
=
=1
(7.40)
such that

Q j
j
j
D
=
=
=
1
Y (7.41)
and
j j D D
j j
' = C =
' (7.42)
For each domain, define a linear function,
, )
, )
j
m
m
j j j
j j j
b
y
y
y a a a
b y a y l
+
|
|
|
|
|
.
|

\
| =
+ =

2
1 2 1
(7.43)
Then define,
, ) , )
j j
D y y l y g e = when
~
(7.44)
i.e. g
~
being one of the linear functions depending on which domain the state is in.
The problem may therefore be split into two halves: firstly, construct the partition
and secondly, fit a linear function within each domain to those data pairs whose
embedded vector components fall within that domain. Each linear function in
Equation (7.43) must be made so as to best satisfy Equation (7.38) for those
embedded vectors contained within each domain. Since Equation (7.43) describes an
m-dimensional hyperplane, it can exactly interpolate m+1 data pairs or best fit a
greater number. The partition, therefore, must divide the set of data pairs in Equation
(7.36) so that there are at least m+1 in each domain.
The scheme created achieves this task by recursively dividing the set of data pairs
into two subsets containing approximately equal numbers. This is done until any
further subdivision would reduce the number of data pairs in a set to less than a
prespecified minimum, made to be at least m+1. Each successive division is with
141
respect to a different coordinate of the embedded state space, the coordinate
incrementing with each level of recursion. This generates an m-dimensional search
tree of the type used in multidimensional range searching [sedg83] and effectively
divides the embedded state space into a set of Q hypercuboid domains. In more detail:
The set of data pairs,
, )
2
1 1
,
=
= +
N i
m i i i
u y
(7.45)
are divided into two subsets according two,
, ) { ;
, ) { ;
1
1
1
1 1
1
1
1
1 0
: , and
: ,
c y u y S
c y u y S
i i i
i i i
> =
s =
+
+
(7.46)
where y
i
1
denotes the first component of y
i
, and c is chosen to make the number of
points in the two sets as close as possible,
1 0
# # S S ~
(7.47)
This effectively partitions Y with the hyperplane,
y c
1
1
1
=
(7.48)
This process is then repeated at the next level of recursion so that S
0
is divided into,
, ) { ;
, ) { ;
2
1
2
0 1 01
2
1
2
0 1 00
: , and
: ,
c y S u y S
c y S u y S
i i i
i i i
> e =
s e =
+
+
(7.49)
and S
1
into
, ) { ;
, ) { ;
2
2
2
1 1 11
2
2
2
1 1 10
: , and
: ,
c y S u y S
c y S u y S
i i i
i i i
> e =
s e =
+
+
(7.50)
The recursion continues until any further subdivision would violate
#S M m
j
> > +1
(7.51)
where M is the prespecified minimum number of points per domain.
Note that at each level of recursion, the division is with respect to a different
component of y
i
, the index of which increments and wraps around to 1 after m.
The resulting set of c's then form both the boundary values of the hypercuboid
domains, D
j
, and a search tree which is used to determine the domain in which any
vector in embedded state space is located. A simple example of the form of partition
resulting from this process is shown in Figure 7.2 which is of a m=2 dimensional
embedded state space.
142
c
1
1
c
2
2
c
2
1
c
1
2
c
1
4
c
1
3
c
1
5
c
2
3
c
2
4
c
2
5
c
2
6
c
2
7
c
2
8
c
2
9
c
2
10
range of
embedded
vectors
domains
c
1
1
c
2
1
c
2
2
c
1
2
c
1
3
c
1
4
c
1
5
c
2
3
c
2
4
c
2
5
c
2
6
c
2
7
c
2
8
c
2
9
c
2
10
> s
Figure 7.2 Left, an example recursive partition for m=2 and right, the associated
search tree.
A fitting error for each domain, j, can then be written as,
, ) , )
2
:
1
e
+
=
j i
D y i
i i j j
u y l e
(7.52)
which is the sum, over all the data pairs contained within the jth domain, of the
squared difference between the value according to the hyperplane being fitted and the
actual value given by the data pair. The best fit is therefore obtained by minimising
this error function with respect to the mapping parameters which is a standard linear
least-squares problem [nag91].
7.6. Experimental Technique
The program implemented to test the proposed sound model consists of the
following steps:
Analysis
- Input the original time series u.
- Embed the time series to form the set of data pairs , )
2
1 1
,
=
= +
N i
m i i i
u y .
- Recursively subdivide the data pairs to form sets S sorted into domains D that
partition the embedded state space. Also formed is the search tree.
- Fit within each domain D a linear function, l, to the data pairs S. Each linear
function is parameterised by a and b. The fit uses least squares to minimise the error
e.
143
- The search tree and the set of l then define the function g
~
and hence the
mapping G
~
.
Synthesis
- Initialise the synthetic system by setting z
0
equal to one of the embedded
vectors.
- Use the search tree to find which domain the synthetic vector z is located in.
- Apply the linear mapping associated with that domain and hence calculate
, )
n n
z G z
~
1
=
+
(7.53)
- Observe the synthetic vector with the projection function to get the synthetic
time series,
, )
n n
z p v =
(7.54)
- Iterate the above three steps as many times as is desired or is possible.
The variables that define the analysis are summarised as follows:
N - the length, in samples, of the original time series used for the analysis,
m - the embedding dimension,
M- the minimum number of points to result per domain after partitioning,
Q - the number of domains that form the partition - determined by N and M
and the nature of the data.
In order to experimentally evaluate the performance of this model, consider the
following criteria:
1) stability - does the synthetic system remain stable over the desired period of
iteration so that an output is generated?
2) predictability - since g
~
is a forecasting function, how accurately can it make
predictions of the original time series?
3) attractor similarity - if the synthetic system is stable, does a resulting trajectory
lie on an attractor and have a distribution (the associated measure) that are similar to
the embedded ones?
4) time series similarity - how do the original and synthetic time series compare?
5) sound similarity - of greatest importance, does the synthetic time series, v, when
converted to audio, sound like the original?
144
6) model complexity - how complex are the analysis and synthesis processes and
how many parameters are required to specify the synthetic system?
The following discusses these criteria in greater detail and develops an
experimental approach for testing the sound model.
1) Stability. The stability of the system is determined by whether or not the
synthetic vector, z , remains bounded within a predefined range. Since the input and
output time series are represented as 16-bit integers, this provides a natural range. All
the original time series used do not fill the full dynamic range and therefore leave
some headroom for the synthetic system to operate within. If the synthetic time series
exceeds this range, then, the system is considered to have become unstable and the
synthesis process is terminated.
2) Predictability. The accuracy of the prediction function may be calculated by
comparing a portion of synthetic time series with that of the original. In the prediction
literature this is done as follows [farm87]. The original time series is split into two
pieces; the first is embedded and used for the analysis, the second is used for the error
calculation. This second piece is divided into I sequences of length L+m, one
sequence for each trial. A trial consists of initialising the synthetic system with the
first m values of the sequence and then iterating the system L times to produce
predictions for L times ahead. Let
N n u
n
1 : =
(7.55)
be the N samples of the original time series used for the analysis, and
I i u u
m L i i
1 :
1
=
+ (7.56)
be the I trial sequences. Let
, )
T
i m i m i i
u u u z
1 1 0
,

=
(7.57)
be the initial value of the synthetic vector for the ith trial. Then iterate
, )
n i n i
z G z
~
1
=
+
(7.58)
L times and observe it via p to give the synthetic series
I i v v
L i i
1 :
1
=
(7.59)
The prediction error as a function of the time ahead, j, is then defined as
, )
, )
, )
2
1
2
1
1
2
1
2
1
1
|
.
|

\
|

|
.
|

\
|

=

=
=
+
N
n
n n
I
i
j i m j i
u u
N
v u
I
j
(7.60)
145
where

=
=
N
n
n n
u
N
u
1
1
(7.61)
is the mean of the portion of the original time series used for embedding. The
numerator of Equation (7.60) gives the r.m.s. prediction error averaged over the I
trials and the denominator is the variance of the portion of the original time series
used by the analysis. It can be seen that this denominator normalises the prediction
error so that c=0 for perfect predictions and c~1 if the predictions are constantly made
equal to the mean of the original.
3) Attractor similarity. As already explained, the long-term trajectory of the
synthetic system is not expected to be similar to that of the embedded system, but, if
the model is good, is expected to lie on a similar attractor and the states be similarly
distributed. The most immediate way of making a comparison is by visual inspection
of the phase portraits derived from the original and synthetic time series. Since a
phase portrait and an embedding are related through the method of taking delayed
versions of the time series, the portrait gives a direct view of the domain in which the
model is working. The phase portrait is considered to be sufficient to show if the
synthetic system possesses an attractor of similar form to the embedded one. A
quantitative comparison would involve calculating the 'closeness' of the embedded
and synthetic attractors/measures. Such a closeness function is provided by the
Haussdorf and Hutchinson metrics [barn88] which give distances between subsets and
their measures, respectively, of the embedded state space. So let
, ) B B d
~
,
HA
(7.62)
be the distance between the embedded and synthetic attractor and let
, )
~
,
HU
d
(7.63)
be the distance between their associated measures. In practice, these distances must be
estimated from sample trajectories lying on the attractors. It is not known, presently,
how to do this, but instead consider the following. Recall from Equation (7.28) that
the assumption is made that if
G G ~
~
(7.64)
which is equivalent to
g g ~
~
(7.65)
then
~ ~
~
and
~
B B (7.66)
So, define another metric
146
, ) g g d
~
,
map (7.67)
that quantifies the similarity between the embedded and synthetic mappings and
propose that minimising this will minimise both (7.62) and (7.63). Now, it can be seen
from Equation (7.60) that (1) is an estimate of d
map
as it is an average over I trials of
the difference between
, ) , )
1 1 0 1
, u u u g z g u
i m i m i m i

+
= =
(7.68)
and
, ) , )
1 1 0 1
,
~ ~
u u u g z g v
i m i m i i

= =
(7.69)
That is, it is the difference between the value of the embedded system function g, at
some point given by the trial vector, and the value of the synthetic approximation, g
~
,
at that same point. So, instead of directly calculating the closeness of the embedded
and synthetic attractors/measures to determine the accuracy of the model, it is
proposed that the prediction error for one time step ahead gives a measure of the
expected closeness.
4) Time series similarity. Also recall from Section 7.3 that if the synthetic
attractor/measure is close to the embedded one, then the synthetic time series should
be statistically similar to the original. i.e. it is expected that
U V
P P ~
(7.70)
Since these m-dimensional jpdfs are equivalent to the m-dimensional measures,
the same problem exists of not having a means to practically compare them. Instead,
however, it is possible to easily estimate the one-dimensional pdf by considering that,
because a natural measure is presumed, the processes are stationary and ergodic, and
so an estimate of the amplitude distributions of the processes is an estimate of the
pdfs. An estimate of the amplitude distribution is found by calculating a histogram of
the relative frequencies that the sample amplitudes take. That is, divide the amplitude
range of the time series into a number of bins and calculate the number of times the
sample amplitude falls within each bin divided by the total number of samples used.
5) Sound similarity. This necessarily requires a subjective comparison of both the
original and synthetic time series when converted into sound. As well as presenting a
report of my own opinion on the comparison, many of the sounds are included on the
accompanying cassette tape so that readers of this thesis may judge for themselves.
6) Model complexity. An assessment of the model complexity may be divided into
two aspects. Firstly, the computational complexity of the analysis and synthesis
processes that determine the time taken to generate the synthetic system and the time
taken to generate the synthetic time series. Secondly, the number of parameters
147
required to describe the synthetic system. This second aspect is of greater importance
to this work than the first since the emphasis is on determining whether or not simple
chaotic systems can represent complex sounds and not on the computational
efficiency of such techniques. The synthetic system is defined by the mapping % g
which in turn is defined by the partition and the associated set of locally linear
functions. The number of parameters required to represent the partition is
approximately equal to the number of partition domains, Q. Each of the Q linear
maps, l
i
, is defined by an m-dimensional vector a
i
and a scalar b
i
. The total number of
parameters that define the mapping is therefore given by
, ) , ) , ) parameters 2 1
~
Q m Q m Q g
P
+ = + + =
(7.71)
In practice, for the particular implementation of the model used in the experiments,
the partition is defined by a search tree of approximately 2Q nodes, each of which
requires 7 bytes of storage. The mapping parameters each use 8 bytes. The system
complexity in bytes is therefore given by,
, ) , ) bytes ) 22 8 ( 1 8 14
~
Q m m Q Q g
B
+ = + + =
(7.72)
These two measures quantify the complexity of the synthetic system seen from the
view of user manageability (number of parameters) and computer storage (number of
bytes).
The following experiments divide into two parts. Firstly, some tests are carried out
on the model with an artificially generated original time series. The intention is to
confirm that the model performs in accordance with the theory and to gain insight into
the relationship between the model parameters and its behaviour under known
conditions. Secondly, the model is used with a number of actual sounds with the aim
of achieving the best possible performance according to all, but mostly the fifth,
criteria.
7.7. Experiments with a Lorenz Time Series
In the following experiments, the time series used as input to the analysis has been
generated with a numerical simulation of Lorenz's chaotic system. The method of
simulation and the system parameter values are those given in [bidl92]. The Lorenz
system is a set of three differential equations:
, )
n n n
n n n n
n n
bz y x z
z x y Rx y
z y x
=
=
=


(7.73)
which are numerically integrated using
148
t z z z
t y y y
t x x x
n n
n n
n n

+ =
+ =
+ =
+
+
+
1
1
1
(7.74)
with parameter values
o c = = = = 10 0 28 0 2 67 0 01 . , . , . , . R b t
(7.75)
which put the system into a chaotic regime. The time series derives from observing
one of the three state variables, in this case x, after any transients have decayed. This
provides a time series from a known, stationary, noise-free, low-dimensional (d=3),
numerical chaotic system with which to test the sound model. The results presented on
the following three pages have been generated by varying each of the three analysis
parameters, N, m, and Q, in turn while keeping the other two fixed. Each set is
presented on a separate page showing a phase portrait derived from the original time
series and a number of portraits of the synthetic time series. All phase portraits are
derived from 10,000 samples of their respective time series, where possible, and have
a delay value of 10 samples. Also shown are graphs of the prediction error for one
time step ahead against the variable parameter.
149
m
c,1)
0
0.0001
0.0002
0.0003
0.0004
0.0005
0.0006
0.0007
0.0008
0.0009
0.001
0 2 4 6 8 10
(a) Original (b) Prediction error (1) against m
(c) m=2 (d) m=3
(e) m=4 (f) m=6
(g) m=7 (h) m=10
Figure 7.3 Lorenz input, N=10,000, Q=256 and a variety of embedding dimensions,
m.
150
Q
c,1)
0
0.0001
0.0002
0.0003
0.0004
0.0005
0.0006
0.0007
0.0008
0.0009
0.001
1 10 100 1000
(a) Original (b) Prediction error (1) against Q
(c) Q=16 (d) Q=32
(e) Q=64 (f) Q=128
(g) Q=256 (h) Q=512
Figure 7.4 Lorenz input, N=10,000, m=7, and a variety of number of domains, Q.
151
N
c,1)
0
0.0001
0.0002
0.0003
0.0004
0.0005
0.0006
0.0007
0.0008
0.0009
0.001
1 10 100 1000 10000 100000
(a) Original (b) Prediction error (1) against N
(c) N=750 (d) N=1,000
(e) N=5,000 (f) N=10,000
(g) N=50,000 (h) N=75,000
Figure 7.5 Lorenz input, Q=64, m=7 and a variety of original time series lengths, N
152
The results presented in Figures 7.3 to 7.5 show three main features. Firstly, they
confirm that the proposed model in the form described is successful at modelling the
dynamics of a chaotic time series via the embedded attractor. Secondly, that the
performance of the model depends considerably on the values of the analysis
parameters, m, Q, and N. And thirdly, that the one step prediction error, (1), does
relate to the performance of the model.
By successful it is meant that the model is capable of creating a synthetic system
that generates a trajectory in state space lying on an attractor that is similar to the
original/embedded one. This can be seen by comparing the phase portrait derived
from the original time series with a number of those derived from the synthetic
system, for example 7.3(g) and 7.4(f), and seeing that they have the same overall
form, size and features. It is then expected that the similarity of the attractors
corresponds to statistically similar observed time series. Figure 7.6 shows portions of
the original time series and the synthetic one used to generate the phase portrait in
7.4(f). The synthetic system in this case was initialised with a vector derived from the
first seven (equal to the dimension of the embedding space) values of the original time
series. It can be seen that, as expected, the synthetic time series approximately
matches the original for the first few time steps, corresponding to accurate short-term
prediction, and then diverges rapidly, a consequence of sensitive dependence on initial
conditions. The overall form of the two waveforms, however, can be seen to be very
similar.
Figure 7.6 Time series plots from original Lorenz system (left) and the synthetic one
shown as phase portrait 7.4(f) (right).
To partially compare the statistics of the two time series, Figure 7.7 shows an
estimate of their pdfs in the form of histograms of their respective amplitude values.
These were calculated with time series of length 10,000 samples and 100 bins of equal
width. Again, it can be seen that the form of the graphs is close enough to conclude
that the model is working in generating a synthetic time series with
statistics similar to the original.
153
a
P(a)
0
0.02
0.04
0.06
-0.3 -0.1 0.1 0.3
a
P(a)
0
0.02
0.04
0.06
-0.3 -0.1 0.1 0.3
Figure 7.7 Estimates of amplitude probability distributions for original, left, and
synthetic, right, time series shown in Figure 7.6.
It can be seen from the results that the similarity of the phase portraits depends
considerably on the parameters that define the analysis. In particular, Figure 7.3 shows
there to be a large increase in similarity from the ill-formed result when m=2 to when
m=4. As m is further increased, there is not such a dramatic change in the result. This
behaviour can be seen to be paralleled by the value of the one-step prediction
error (1) where there is a large drop in its value between m=2 and m=4 and then little
change from m=6 onwards.
According to the theory presented earlier in this chapter, the embedding procedure
should only preserve the trajectories and attractor of the original system when the
embedding dimension m satisfies Equation (7.10),
1 2 + > d m (7.76)
Consequently it would be expected that there should be a significant change in
performance between the cases where m<7 and m> 7since d=3 for the Lorenz system.
It is not known why, in this case, this does not occur and the model can be seen to
produce good results for m<7. The value of (1), however, does show to be at a
minimum for m=7.
Figure 7.4 shows that there is also an increase in similarity of the attractors as the
number of domains of the partition, Q, increases and that there is a point, in this case
approximately Q=128, where further increase results in no appreciable improvement.
This relationship is again paralleled by the variation in (1). Note, however, that there
is not an exact relationship between the accuracy of the model and the value of (1) as
can be seen by comparing Figures 7.3(e) and 7.4(c) which have similar corresponding
prediction error values of ~2.5e-4 and yet differ in their similarity to the original
attractor.
154
The third set of results shown in Figure 7.5 reveal another similar trend. This time,
the quality of the results improve with an increase in the length of the original time
series, N, and again this is strongly related to the value of (1). The results shown in
(c) and (d) are examples of unstable synthesis. The number of iterations before the
output time series goes out of range are 182 and 2368 respectively. The phase portraits
show clearly the trajectory leaving the site of the intended attractor. Note how these
unstable systems correspond to prediction errors that are considerably higher than the
other stable ones. Finally, notice that the best performance for this set, shown in (h),
corresponds to the lowest prediction error.
To conclude on these results, it appears that there are definite relationships
between the performance of the model and the analysis parameters. These can be seen
by visual inspection of the phase portraits and by the value of the one-step prediction
error. The exact nature of these relationships, however, is not known. For the Lorenz
case it appears that performance can be maximised by maximising the length of the
original time series, N, and the number of partition domains Q while optimising the
embedding dimension according to the state space dimension of the original system
and Equation (7.10). The objective of the model is to maximise the performance so
that it is perceptually acceptable while minimising the complexity of the synthetic
system mapping. The above relationships, then, should be considered along with the
fact that the number of parameters required to specify the synthetic system mapping,
, ) g
~
, is proportional to both m and Q, but not N.
Finally, to be specific about the synthetic system complexity, the example used so
far as the successful synthetic system whose phase portrait is shown in Figure 7.4(f)
is described by
, ) , ) . parameters 1152 128 2 7 2 = + = + = Q m
P

(7.77)
and correspondingly
, ) bytes 9984 22 8 = + = Q m
B

(7.78)
155
7.8. Experiments with Sound Time Series
This section presents results generated with the sound model using a test set of six
different sounds as input. The six sounds divide into three pairs of similar type such
that one of the pair is a more complex example of that type than the other. The three
types are: air noises, gong sounds, and musical tones. This test set has been chosen
because the sounds are the product of what are, or are believed to be, nonlinear
dynamical systems. This choice has been made on the basis of the physical nature of
the system, the type of behaviour of the sound time series, and the knowledge of
subjects discussed in Chapter 4. To summarise: the air noises are products of turbulent
fluid systems, one of which is wind noise and known to have fractal properties; gongs
are nonlinear systems exhibiting properties associated with chaos whose sound
waveforms are irregular and complex; and the musical tones are examples of
nonlinear systems exhibiting limit cycle behaviour. Each sound will be discussed in
greater detail in the forthcoming subsections. Note also, that all sounds have been
represented with 16-bit digital audio using a sampling rate of 48kHz.
As mentioned in the section on experimental technique, the intention of the work
described in this section is to ascertain whether the sound model works for real sound
time series having confirmed that it works for a synthetic chaotic signal as described
above. The aim is therefore to find as much useful evidence as possible that will allow
a conclusion to be drawn and to give insight into the nature of the model for possible
future work.
7.8.1. Air Noises
The two air noises are described as 'fan rumble' and 'wind noise'. The first of these
was created by fixing a microphone in a constant stream of air produced by a small
ventilation fan. This causes a low frequency, irregular rumble to be induced within the
head of the microphone which can be monitored through a microphone amplifier. The
microphone used was a Sony electret condenser type ECM-979. Varying the position
of the microphone transversely to the direction of the air stream varies the quality of
the sound produced. When the microphone is on the edge of the airstream, only a
quiet, high frequency hiss can be heard. At a certain point moving towards the centre
of the air stream, a louder, deep, irregular rumble also starts to appear. The severity
and volatility of this sound increases as the microphone is moved into the centre of the
air stream. The rumble is due to turbulence in the air stream as it passes through the
head of the microphone. Furthermore, it is believed that the point at which the rumble
occurs corresponds to weak turbulence and therefore low-order chaotic dynamics.
156
This idea is supported by the fact that several systems of fluid flow display low-order
chaotic dynamics during the onset of fully turbulent behaviour. That is, as some
controlling parameter is increased, for example the speed of fluid flow, the dynamics
of the system follow a bifurcating sequence which includes low-dimensional chaos
before becoming fully turbulent [crut86] and [goll75].
Several seconds of the fan rumble sound were sampled at a rate of 48kHz and low
pass filtered to remove the high frequency hiss as well as other extraneous noise
entering the microphone so as to leave just the low frequency rumble. The cut-off
frequency of this filter was approximately 3kHz. This has then been processed with
the sound model with a variety of analysis parameters. Table 7.1 shows a summary of
the resulting prediction errors and descriptions of the output time series for a selection
of analysis parameters.
experiment
identification
length of
original
time
series
N
embedding
dimension
m
minimum
number of
points per
domain
M
resulting
number
of
domains
Q
prediction
error
(1)
comments on synthetic
time series
rc159 5,000 5 10 258 0.0056 unstable after 194
iterations
rc158 25,000 10 20 1020 0.0015 unstable after 7457
iterations
rc157 12,000 20 40 255 0.0014 irregular modulation of
sinusoidal oscillation
rc156 25,000 20 40 512 0.0013 slight irregular
modulation of
sinusoidal oscillation
rc127 50,000 20 40 1021 0.0018 best result - similar to
original
rc132 100,000 20 40 2047 0.0013 as good as above, but no
better
rc155 50,000 5 80 512 0.0012 transient leading to
nearly periodic
oscillation
rc129 50,000 10 80 512 0.0014 as above, but with very
long transient
rc126 50,000 20 80 512 0.0012 similar, but lower
amplitude version of
original
rc130 50,000 30 80 512 0.0013 as above
rc131 50,000 40 80 512 0.0014 as above
rc128 50,000 20 160 256 0.0011 as above
Table 7.1 Summary of results using fan rumble sound as input to the dynamic
model.
157
Time series plots and phase portraits for both the original and the best synthetic
time series result, rc127, are shown in Figure 7.8. The mapping complexity for the
synthetic system is, in this case,
, ) parameters 462 , 22 1021 . 22 2 = = + = Q m
P

(7.79)
which corresponds to
, ) kbytes 186 22 8 ~ + = Q m
B

(7.80)
Figure 7.8 Time series plots and phase portraits for: left, original fan rumble
sound and right, best synthetic output, rc127.
As can be seen, there is a strong similarity between the two sounds when viewed
in these domains. The similarity is not as strong as that between the original and
synthetic Lorenz time series, but does show that some of the dynamic characteristics
of the original are being captured. This result, however, reveals itself to be very good
when the two time series are compared as sounds. A 3 second length (~150,000
iterations) of the synthetic time series rc127 has been generated for this purpose. This
and the original sound can be heard as Sound examples 21 and 22. The synthetic
version captures many of the fundamental perceived qualities of the original, such that
overall quality of the two is nearly indistinguishable. The difference that does exist is
that the original sounds slightly more 'boomy' than the synthetic version.
158
This is considered to be a very significant result for a number of reasons. Firstly, it
is a demonstration that the model can work for a real chaotic time series, and not only
for a synthetic one as in the previous section. Secondly, it shows that dynamic
modelling with a synthetic system whose strange attractor approximates that of the
embedded original can capture the perceived characteristics of a sound. I believe this
to be both an original idea and the first demonstration that it is possible.
Figure 7.9 shows some plots of the other results to show the behaviour resulting
with different analysis parameters. In general, it appears that these behaviours are
associated with state space trajectories that are 'stuck' in subsets of the embedded
original attractor, or at least unevenly distributed over it. For example, the result of
experiment rc126 shows, after a small transient, the trajectory settling to an attractor
that is similar to the inner part of the embedded original. The same applies to the
result of rc156, but the trajectory lies mostly on the outermost band of the original
embedded attractor. The result of rc157 shows an uneven distribution of the trajectory
where it stays at the edge and mid-part of the attractor much more than it does in the
middle. Note, however, the way in which this result captures the outermost parts of
the orbit, or spikes as they appear in the time plot, that exist in the original.
This tendency for the trajectory to be caught in regions of embedded state space
for longer periods than does the original, and the consequent unevenness of
distribution sets these results apart from that of rc127, the best result. These results,
however, also highlight what is different between rc127 and the original: a less
exaggerated version of the same thing. Inspection of the time series plots in Figure 7.8
show the same tendency for the output to change between different parts of the
attractor at a slower rate than the original, which is, by comparison, modulating with
greater volatility.
Note, however, that with these experiments, the one step prediction error, c(1),
does not have such a strong relationship to the performance as it does with the Lorenz
case. This can be seen in Table 7.1 where although the error for experiment rc159 is
highest by a factor of 3 or 4 and is the most unstable, there is not much difference
between all the other errors. That is, there is no reliable pattern to the error values that
distinguish between the occurrences of different types of behaviour.
159
rc126b rc129b
rc156b rc157b
Figure 7.9 Time series plots and phase portraits for some more outputs from the
sound model using the fan rumble as input. Note that only about a third the length of
the output appears in the phase portraits as it does in the time series plots for the sake
of clarity.
160
The second air noise to be used is the sound of the wind which has already been
discussed in Chapter 4 and shown to have a 1/f power spectrum. This is used in
contrast to the fan rumble signal as it is an example of strong turbulence and therefore
expected not to be low-dimensional chaos, but an example of something else, possibly
high-dimensional chaos. For discussions on the possible relationships between
turbulence, high-dimensional chaos and 1/f noise see [casd92], [bak91] and [mann80].
A portion of the wind noise has been chosen from a ten second recording which is
judged to be as steady-state in form as possible. This limits the resulting portion to
about one second (48,000 samples) for processing by the model. Figure 7.10 shows
time series plots, phase portraits and power spectra for the wind noise and
for the best of the results found covering a similar range of parameters as in the
previous experiment. In this case the analysis parameters were: N=46,000 m=20
Q=1008 resulting in synthetic system complexities of

P B
~ ~ 22 000 183 , parameters and kbytes.
As can be seen, a strong similarity is again apparent between the original and
synthetic time series. In this case, the similarity is particularly strong between the
power spectra of the two - the synthetic version having the same 1/f structure. In the
audio domain, the result is slightly disappointing given these strong similarities.
Although the synthetic version possesses elements of the original, including the same
'roaring' quality, it is more of a 'flapping' sound. The two sounds can be heard as
Sound examples 23 and 24.
161
Figure 7.10 Time series plots (first fifth of top plot shown magnified as second
plot), power spectra and phase portraits for original wind noise, left, and synthetic
version, right.
162
7.8.2. Gong Sounds
Figures 7.11 and 7.12 show time series plots, phase portraits and amplitude
histograms for the best results found for two gong sound inputs. The first is of a gently
struck gong, and the second of a hard strike. The analysis parameters and resulting
model complexities are shown in Table 7.2. For both results it can be seen that the
trajectories in state space for the synthetic systems approximately match those of the
originals, but that the time series themselves appear different. In particular, the long
term structure, which for the gently struck gong is present as a strong fundamental
periodic component, is not preserved in the synthetic time series. This results in the
synthetic versions sounding quite unlike the originals. The original softly-struck gong
can be heard as Sound 25 and the synthetic version as Sound 26. The original and
synthetic versions of the hard-strike gong sound can be heard as Sounds 27 and 28
respectively. Despite this, it can be seen that the amplitude histograms of the synthetic
time series show to have the same overall form as those of the originals.
Note the difference in models used for these two examples. In the lightly-struck
case, a high model order has been used with a relatively low number of partition
domains. For the hard strike the opposite is the case: a very low-order model, with a
relatively high number of partition domains.
description of
sound
length of
original
time series
N
embedding
dimension
m
number
of
domains
Q
prediction
error
c(1)
synthetic
system
complexities

P
and
B
lightly struck
gong
20,000 30 241 0.0028 7,712 params
63 kbytes
hard strike
gong
10,000 3 1024 0.0063 5,120 params
47 kbytes
Table 7.2 Summary of analysis parameters for best results using gong sounds.
163
amplitude, a
P(a)
0
0.005
0.01
0.015
0.02
0.025
-0.06 -0.04 -0.02 0 0.02 0.04 0.06
amplitude, a
P(a)
0
0.005
0.01
0.015
0.02
0.025
-0.06 -0.04 -0.02 0 0.02 0.04 0.06
Figure 7.11 Time series plots, phase portraits and amplitude histograms for
original, left, and synthetic, right, lightly-struck gong sound. Both amplitude
histograms were computed with 10,000 samples and 100 bins.
164
amplitude, a
P(a)
0
0.005
0.01
0.015
0.02
0.025
0.03
0.035
-0.1 -0.05 0 0.05 0.1
amplitude, a
P(a)
0
0.005
0.01
0.015
0.02
0.025
0.03
0.035
-0.1 -0.05 0 0.05 0.1
Figure 7.12 Time series plots, phase portraits and amplitude histograms for
original, left, and synthetic, right, hard-strike gong sound. Both amplitude histograms
were computed with 10,000 samples and 100 bins.
7.8.3. Musical Tones
The two musical tones chosen for analysis were recorded from a tuba and a
saxophone. Both extracts of the tones were devised to be as constant as possible,
avoiding any transient qualities or time-varying effects such as vibrato. These are
examples of nonlinear systems providing a periodic excitation to a resonator and
therefore are expected to exhibit limit cycles in their state spaces. The difference
between the two is that the tuba tone is much purer tone than that of the saxophone,
having a less complex spectral structure.
The original tuba waveform shows to be very nearly a regular sinusoid. The
spectra of this signal, however, shows the presence of a number of harmonics and the
165
phase portrait reveals the slight irregularity in amplitude causing a thickening of the
closed-loop limit cycle. These can be seen in Figure 7.13. Also shown is the best
synthetic version found, which is very close to the original. This similarity is also
preserved for the perceived sound which can be heard as Sound 29. The original can
be heard as Sound 30. As well as the result for the tuba sound being an accurate one, it
was also found to be possible with relatively simple models. In this case, only a four-
dimensional embedding and 32 partition domains were used. This results in synthetic
system complexity that is lower by several orders of magnitude compared to many of
the other sounds used. Full details of the analysis parameters for this and the next
experiment are shown in Table 7.Error! Bookmark not defined..
description of
sound
length of
original
time series
N
embedding
dimension
m
number
of
domains
Q
prediction
error
c(1)
synthetic
system
complexities

P
and
B
tuba 8,000 4 32 0.00032 192 params
1,728 bytes
saxophone 60,000 20 1024 0.0047 22.5 kparams
186 kbytes
Table 7.3 Analysis details for the musical tones.
Figure 7.14 shows the time series plots and phase portraits for a saxophone tone
and the best synthetic version using the dynamic model. These can be heard as Sounds
31 and 32 respectively. Note the difference in complexity between this and the tuba
tone which can be seen in both the time series plots and phase portraits. The topology
of the trajectory in state space is the same as that for the tuba: a thickened closed loop.
In this case, however, the loop is tangled in state space due to the presence of strong
harmonics. The synthetic version is not as close to the original as it is for the tuba
tone. A particular fault, which also occurs with the gently struck gong, but not for the
tuba, is that the synthetic time series loses the essential periodic structure of the
original. This can be understood by considering the state space trajectories. For the
saxophone, parts of the trajectory pass close to one another and are therefore likely to
be confused by the analysis when the state space is divided into partition domains.
Consequently, synthetic trajectories cross over from one part of the attractor to another
due to inaccurate predictions. As a result, the trajectory stays close to the original
attractor, but does not cover parts of the loop in the same sequence as the original.
166
Figure 7.13 Time series plots, power spectra and phase portraits for original, left
and synthetic, right, tuba tones.
Figure 7.14 Time series and phase portraits for original, left, and synthetic, right,
saxophone tones.
167
7.9. Conclusions
In this chapter, an original nonlinear dynamical analysis/synthesis model has been
proposed, implemented and tested with a number of sound time series. The overall
conclusion is that the results are good enough to confirm the feasibility of this
approach and to warrant further investigation. The main results are demonstrations:
1) of the ability of the model to recreate the Lorenz attractor to a high degree of
accuracy with a synthetic system;
2) that the modelling of an attractor can be sufficient to preserve the perceived
characteristics of sound, especially the irregular fan rumble sound and the regular tuba
tone;
3) that other sounds can be partially modelled such that the spectrum or amplitude
pdf is preserved although the perceived sound is not so well preserved;
4) that the one-step prediction error relates to the similarity of the synthetic and
original attractors for the Lorenz case;
5) that the relative one-step prediction errors relate well to the relative
performance with different time series (see below).
The performance of the model has been evaluated by both qualitative and
quantitative means. The former includes subjective comparisons of the perceived
sounds as well as visual inspection of the time series plots, phase portraits, power
spectra and amplitude histograms. From these it can be concluded that the model is
preserving characteristics of the original time series. Each sound has been analysed
using a similar range of analysis parameters with the aim of maximising the
performance of the model using the qualitative criteria. There is some variety to the
degree of success depending on the source of the time series. This relative, qualitative
assessment is paralleled quantitatively by the values of the one-step prediction errors.
This can be seen by inspection of Figure 7.15 which orders the best prediction errors
obtained for each sound as: Lorenz, tuba tone, fan rumble sound and the worst as
hard-strike gong. This ranking agrees with what was found qualitatively.
It was found that the analysis parameters have a strong relationship to the
performance of the model. In general it was found that increasing the order of the
model, m, the number of partition domains, Q, and the length of the original time
series, N, improves performance, but that there is a limit to the best performance
depending on the time series used. This limit is also paralleled by a floor on the value
of the prediction errors which tend to be approximately the same for the set of best
results for any one time series.
168
c,1)
0
0.001
0.002
0.003
0.004
0.005
0.006
0.007
lorenz t uba fan
rumble
gong
(gent le)
wind sax-
ophone
gong
(hard)
0.00018
0.0018
0.0028
0.0029
0.0047
0.0063
0.00032
Figure 7.15 Relative one-step prediction errors for the best results found for each
of the time series.
There are a number of possible reasons for the limits on performance of this
particular model. The following are four main suggestions. Firstly, that the time series
used is not produced by a low-order chaotic system, or one that has a low-order
attractor in its state space. Since no attempt has been made to diagnose low-order
chaos in the time series before modelling, it is not possible to confirm this. Instead,
the degree of the performance of the model itself may be viewed as diagnostic
evidence for chaos.
Secondly, that the excerpts of time series used do not correspond to the steady-
state behaviour of a system that has settled, after transients, to an attractor. This is
most likely for the gong sounds which are intrinsically transient. Their long term
behaviour is actually a rest state corresponding to a fixed point attractor. The
transients are, however, very long, taking up to a minute to die away. For this reason
the excerpts were assumed to be pseudo-stationary over the portion used for the
analysis. This might not, however, be satisfactory for the model, which assumes an
ergodic, stationary input. A consequence of this problem is that there is also a conflict
between the effects of the original time series length parameter, N. One effect of
increasing N is that it provides the analysis with more data which generally improves
the performance of the model as it did for the Lorenz input - see Figure 7.5. Contra to
this is that the pseudo-stationarity of the input in more likely to be violated as its
length is increased.
An alternative might be to drive the gong with a regular, steady state excitation,
such as a motorised periodic beater, configured so that the sound produced is constant
in quality. A similar problem may occur for the musical tones, especially that of the
saxophone, where the sound, although contrived to be constant, suffers from the
169
irregularities of human performance. Again, the performer could be replaced with a
mechanical air source, but it is also felt that it might be an advantage to model these
irregularities as they are, as they contribute to the natural quality of the sounds. For the
tuba tone, these irregularities have been preserved to an extent. This can be seen from
the preservation of the thickened form of the state space attractor seen in the phase
portraits - see Figure 7.13.
Thirdly, although a variety of analysis parameters have been used with each time
series, the form of the state space partition is fixed for the model. The means of
partitioning the embedded vectors was based mainly on the computational
convenience of implementation. Recall that it divides all the embedded vectors so that
each domain has approximately an equal number in each. The form of the partition
therefore has some relationship to the form of the attractor, but it is not designed to
optimise the performance of the model.
Finally, a possibility already mentioned and related to the state space partition is
that of trajectories being confused during the analysis. For example, neighbouring
trajectories on different parts of the attractor may be included in the same domain
during the analysis and consequently averaged out by the map fitting procedure.
Consequently, during synthesis, the mapping in that domain will not represent either
of the original neighbouring trajectories, but will be an amalgam of the two and
therefore unlike the original.
Another important point to reconsider is: what is actually expected of the model's
performance? In practice, the synthetic versions of the fan rumble and tuba sounds
preserve both the form of the attractor and the perceived sound. For the gently struck
gong sound, however, although the form of the attractor was approximated, the
perceived sound was not. For both the tuba and gong, it is long term correlations of
the time series which are important to the nature of the sound since they have strong
periodic elements.
The initial mathematical analysis of the model, however, concluded with the result
that only the mth-order jpdf of the original time series should be preserved given that
the time series is stationary due to the presence of an ergodic attractor - see Section
7.2. In other words, it can only be expected that the short-term structure of the time
series will be preserved. This is illustrated by the result obtained using the softly
struck gong sound and shown in Figure 7.11. It can be seen that the forms of the
original and synthetic attractor are similar, indicating that the synthetic system
preserves the set of possible states of the original; that the amplitude histograms are
similar, indicating that the probability distributions of one component of these states
over the attractor are similar (this distribution may be thought of as the view of a slice
170
through the centre of the attractor); but that the synthetic time series lacks the same
regularity present in the original. Detailed examination of the synthetic time series
reveals that it is in fact made up of shorter sequences from the original strung together
without the same long term order as the original. The regularity of the original
corresponds to a long term correlation that the model cannot be expected to preserve if
it only preserves its m=30th order jpdf. Correlations between samples up to 30 time
steps apart can be expected to be preserved, but not correlations over 30 steps. This is
because the autocorrelation function of a stationary stochastic process relates to its
jpdf. The autocorrelation is the expected value of the product of the process with a
delayed version of itself as a function of the delay. Let
{ ; Z e t X
t
,
(7.81)
be a stochastic process, then the autocorrelation function is defined as
, )

+
=
t t
X X E a
(7.82)
the expected product of the process with a delayed version of itself. The expected
value can be calculated from the jpdf of the process,
, )

+ + +


} }
=
t t t t t t
dx dx x x P x x a , . .
(7.83)
and so the autocorrelation function is directly related to the jpdf. Thus if the model
produces a synthetic time series which preserves the mth order jpdf of the original, it
will also preserve the autocorrelation function up to a delay of t=m. Figure 7.16 shows
the autocorrelation functions of both the original and synthetic gong time series
computed by convolving the time series with themselves using a range of delay times.
As can be seen, the short term correlation is indeed preserved, whereas in the long
term it is not. The details show the functions having strong similarities up to t=30, as
expected, and somewhat beyond, but not over the long-term, t>200.
So, it can only be expected that the short term structure of the time series can be
preserved by the model. In the case of the fan rumble, this is sufficient to preserve the
perceived sound because of its inherent irregular nature and lack of long term
correlation, a consequence of it being chaotic. For the gently struck gong, however, it
is not sufficient as the sound contains the long term structure of a periodic component.
But, for the tuba, the model does preserve the long term structure and is therefore
achieving more than is expected. This is probably due to the simplicity of the tuba
sound since when a more complex periodic time series is used (the saxophone) again
only the short term structure is preserved.
171
Figure 7.16 Autocorrelation functions for original, left, and synthetic, right, gently
struck gong sound. The upper plot shows the function up to 8,000 delays, and the
lower up to 100 delays. Both were calculated by convolving 10,000 samples of the
time series with itself for different delays.
One of the main aims of this work was to ascertain whether or not simple chaotic
systems could be used to represent complex and irregular sounds. The experiments in
this chapter have contributed positive evidence to show that chaotic systems can
indeed model sound, but no real attempts has been made to find simple chaotic
models. The work has been oriented towards preserving characteristics of the original
sounds by exploring the full ranges of analysis parameters available. The emphasis has
been on finding the best possible performance of the model and consequently the
resulting synthetic models are often far from being simple. For example, the model of
the fan rumble that produces perceptually similar results to the original (rc127)
requires over 22,000 parameters to define it. In data terms this is equivalent, for the
particular implementation used, to 186 kbytes. Relative to digital audio this is
equivalent to about 2 seconds of the original sound (16 bit, 48kHz sample rate). Since
the model is capable of producing unlimited quantities of the sound, then in it is an
efficient representation of the sound if, say, several minutes of the sound are required.
There is currently no knowing, however, how many of the model parameters are
relevant to the quality of the sound, or if and how much they can be quantised to
reduce the storage of the model. For example, the mapping parameters are stored to
double floating-point precision, which may be excessively accurate. The model has
been implemented with more regard to experimental convenience than to simplicity of
172
the resulting model. This is the case with, for example, the implementation of the state
space partition and the related search tree.
In one case, that of the tuba tone, since the model was able to preserve the tone
very well, the analysis parameters were varied with the specific intent of finding the
simplest model for which the quality of output could be maintained. Hence, the
relative simplicity of the resulting model which requires only 192 parameters.
Despite the large number of parameters associated with most of the synthetic
systems, their computational complexity is relatively low. Each iteration of the
synthetic system, and hence each output sample, requires looking up a search tree
followed by the computation of a linear mapping of the state vector. Running on a Sun
IPX Sparcstation the synthetic system generates the time series at a rate of
approximately 3 seconds per 1 second of digital audio. It is therefore easily within the
range of real-time generation, for example using a DSP. The analysis times, however,
range from about 30 seconds to several minutes.
7.10. Further Work
A number of areas for further work have been identified and are presented in the
following sub-sections.
7.10.1. Using the Same Model with More Sounds
An area of immediate further work would be to use the model as it is and
experiment with more sound time series. There are three possibilities. To use: new
versions of the sounds already used; other naturally occurring sounds, or more
produced from experiments contrived to produce chaotic behaviour. As mentioned
earlier, using gong sounds generated with a regular, possibly mechanical, excitation to
reduce transient qualities may improve the performance of the model. Some
experiments, not reported here, were conducted with a possibly chaotic, overblown
saxophone sound. This sound, however, was found to be very unstable and difficult to
maintain for long enough periods. Some work concentrating on mechanical blowing
techniques might allow better time series such as these to be generated for analysis.
To try the model with other naturally occurring sounds, more recordings need to be
made that, with judgement, may be suited to the modelling technique - e.g. steady
state, irregular sounds. This would involve, for example, more 'field work' with a
portable tape recorder or time spent examining sound libraries. The third idea is to set
up more experiments with the intention of producing chaotic sounds, as was the case
173
with the fan rumble sound. This could involve constructing physical systems with
known chaotic behaviour, for example more turbulent fluid systems, or forced
oscillations of nonlinear systems. A number of such systems are described in
[moon87].
7.10.2. Optimising the Synthetic Mapping
The next suggested area of further work concerns modifying the form of the model
with both the aims of improving performance, in terms of preservation of properties
from original to synthetic time series, and of reducing the complexity of the synthetic
system. The most immediate option is to experiment with other forms of the
embedded state space partition.
Recall that the partition is generated with respect to the embedded vectors such
that each partition contains at least M of them. Within each domain of the partition, a
linear function is then fitted to the data pairs associated with the vectors using a least
squares algorithm. This minimises
, ) , )
2
:
1
e
+
=
j i
D y i
i i j j
u y l e
(7.84)
for each domain (see Equation (7.52)). Minimising all of these effectively minimises
the difference between the embedded system function and the synthetic system
function,
, ) g g d
~
,
map (7.85)
for the particular partition used (see equation 7.67). Different forms of partition may
allow for (7.85) to be reduced below the limit imposed by the recursive partition used
in the model. As mentioned in the previous section, this particular partition was
chosen for computational ease, and not with the aim of minimising (7.85). A better
approach would be to achieve an overall minimisation of (7.85) with respect to both
the partition and the linear mappings simultaneously. This may possibly be achieved
with some nonlinear global optimisation routine, such as simulated annealing or a
genetic algorithm - see [gold88] and [laar87]. Also useful in this context is the one-
step prediction error which gives an estimate of (7.85). The results presented in this
chapter have shown that the value of prediction error is related to the performance of
the algorithm - the lower the error, the greater the similarity between original and
synthetic time series. It is therefore expected that any scheme that can reduce the
prediction error would improve the quality of the model's results. Hence it is
174
suggested that a good approach would be a global optimisation of the one-step
prediction error with respect to both the mappings and the partition parameters.
7.10.3. Stability Analysis
Another area of concern connected with the model's performance is that of
stability. Typically, when the analysis parameters are chosen to give very simple
synthetic systems, they tend to be unstable, the state space trajectories not being
attracted to a set, but tending to infinity. An understanding of this instability might
allow the simpler systems to be corrected to make them stable. Their attractors may
then be compared to the original/embedded ones. The main tool for examining
stability and determining the conditions for chaotic behaviour is the set of Lyapunov
exponents. Recall that these measure the rate of separation of neighbouring initial
conditions in state space. The polarity of the Lyapunov exponents relate to the type of
attractor that the system possesses. Ideally, then, an expression relating the synthetic
system mapping parameters to the Lyapunov exponents would be the most useful.
This would then allow, for example, the synthetic system mappings to be restricted
during analysis so as to maintain stable chaotic behaviour of the resulting system. It is
not known how to form this expression since the synthetic system mapping is
relatively complex. For example, a derivation exists for the chaotic Baker map, but
this consists of only two locally linear maps and has a symmetrical partition
[moon87]. Alternatively, the Lyapunov exponents may be found numerically by
iterating the synthetic system and measuring the separation of a set of initial
conditions.
7.10.4. Connections with IFS
A theoretical concern is the connection between the synthetic system, O, and
Iterated Function Systems (IFS). As was mentioned in the Inverse Problem section,
7.5, the locally linear form of the synthetic system mapping was chosen with the form
of IFS in mind. IFS have been referred to throughout this thesis as they are a well
understood and manageable framework for manipulating chaotic systems. If the
synthetic system could be made to be of the same form as an IFS then it would allow
full knowledge of the conditions under which it is stable and chaotic. It would also
allow the use of several other theorems relating to IFS such as the collage theorem.
What is currently known is that there are strong similarities between the synthetic
system and the Shift Dynamical System (SDS) form of an affine IFS. Both are locally
175
linear systems where the linear function is itself a function of the state of the system.
An SDS is the system
, )
n n
x S x =
+1 (7.86)
where
, ) , ) , ) A w x x w x S
j j
e =

when
1
(7.87)
for a set of non-overlapping and contractive maps, {w}. A is the attractor of the
system. In the affine case the maps are of the form
, ) N x x w + = M
(7.88)
where M is a square matrix and N a vector.
The synthetic system, O, is of the form,
, )
n n
z G z
~
1
=
+
(7.89)
where
, ) , ) , ) , ) , )
1 1 2 1
, , ,
~
, , ,
~ ~

= =
m
T
m
z z z g z z z G z G
(7.90)
and
, ) , )
j j
D z z l z g e = when
~
(7.91)
describes the choice of map according to the partition and
, ) b z a z l + =
(7.92)
describes the linear map.
The system mapping may therefore be rewritten as
, )
|
|
|
|
|
|
.
|

\
|
+
|
|
|
|
|
|
.
|

\
|
=
0
0
0
0 1 0 0 0
0 0 1 0
0 0 0 1
~
b
z
a
z G

(7.93)
which is a special case of (7.88) or its inverse which features in SDS. The difference
between the two systems, then, is the criteria with which the linear maps are chosen,
in other words, the form of the partition. In particular, the difference is in the
relationship between the linear mappings and the partition domains. For the SDS they
are directly related, but for the synthetic system they are independent, although in
practice there will be some relationship since both the domains and the linear maps
derive from the same set of embedded vectors.
176
These connections could be used to modify the form of the synthetic system to
make it like an IFS. It would also be worth exploring the connection with Recurrent
IFS, a more complex version of IFS, in which the partition and the linear maps are
related in different ways. If the synthetic system is made to be like some form of an
IFS, then the collage theorem may be employed as an error criteria for the inverse
problem. In fact something very similar is already being used for fitting the linear
function within each domain. Recall that the collage theorem relates to finding an IFS
whose attractor, A, best matches a given set, L.. The aim is therefore to minimise
, ) A L d
HA
,
(7.94)
According to the collage theorem this can be done by minimising
, ) , ) s

L w L d
j HA
,
(7.95)
which describes the closeness to L of a collage made of mapped versions of L, since
, )
s
A L d
HA

s
1
,

(7.96)
where s is the contractivity of the mappings. So, the better the collage, the closer the
IFS attractor, A, to the desired set L.
Now, the inverse problem for the sound model can be seen to be similar. Given an
embedded attractor, the set B, the goal is to find a system whose attractor matches it.
In other words, to minimise
, ) B B d
HA
~
,
(7.97)
After the partition has been computed, the linear maps in each domain are fitted so
as to minimise each of the errors
, ) , )

e
+
=
j i
D y i
i i j j
u y l e
:
2
1
(7.98)
which is equivalent to the squared difference between a point in B and a mapped point
in B. The sum of all errors

=
j
j tot
e e
(7.99)
is therefore equivalent to the difference between B and , ) B G
~
, which is a measure of
the closeness to B of a collage made of mapped versions of parts of B. So, the analysis
scheme used for the model is minimising some kind of collage error to give the
synthetic system mapping. Again, therefore, a strong similarity exists between the
synthetic system used for the sound model and an IFS which, if explored, it is
believed will provide greater theoretical understanding and practical help in
improving the model.
177
7.10.5. Time Varying Sounds
The final suggested area of further work concerns modelling time-varying sounds.
One of the main general conclusions drawn throughout this work is that the inherent
nature of naturally occurring sound and of interesting synthetic sound is not steady
state, but time-varying and transient. The model presented in this chapter works on the
assumption that the original sound comes from a system that has settled to ergodic,
stationary chaotic behaviour. A further goal, then, would be to develop a new model
that is itself inherently time-varying and therefore better suited to modelling natural
sounds as they are. Since the model as it is has shown to be successful at modelling
irregular dynamics, it would therefore form the basis of a time varying model. The
idea is then to model not only the dynamics at the level of the individual values of a
time series, but on a range of scales in a hierarchical, fractal manner. For example, one
system could be used to model the dynamics of short sequences of the time series and
then another used to model the dynamics of the systems themselves - i.e. the dynamics
of the parameters of the systems. Such a 'meta-dynamical' system could then
incorporate the techniques found to work so far while suiting time-varying sound. A
good example of such time varying sound and one worth considering for other
potential applications of the model is speech.
178
Chapter 8
The Poetry Generation Algorithm
This chapter is a sequel to the previous chapter in which the same framework is
used, but with a different model. The intention, as before, is to tackle the problem of
modelling a complex sound with an IFS. In the course of investigating this problem, a
solution to the roomtone problem is found.
8.1. Introduction
In the previous chapter, the central idea was to model a sound by modelling the
physical measure, induced by embedding its time series, with the strange attractor and
measure of a chaotic system. Using the notation of the previous chapter, let u
n
be the
original time series. It was assumed that this may be equivalently interpreted as either
the product of observing the chaotic system, , or the realisation of a stationary
stochastic process, { ;
n
U . The time series is then embedded to give the system, , with
embedded attractor, B, and measure . A synthetic system, , is then constructed in
embedded state space to have an attractor and measure, B
~
and
~
, that are similar to B
and . The synthetic system is constructed to be a deterministic chaotic system defined
by a single mapping, G
~
. This mapping is constructed with reference to a set of
embedded vectors derived from the original time series.
In this chapter, an alternative to this approach is considered where the aim is to
construct the synthetic system to be the RIA version of an IFS. This approach
therefore requires a solution to the RIA variant of the IFS inverse problem: given ,
find a set of IFS mappings and associated probabilities, { ;
i i
p w , that define an IFS
attractor with associated invariant measure, , that is similar to . This chapter
presents an approach this problem which, although not a complete solution, is
considered to be progress in the right direction.
Recall that the RIA of an IFS is a Markov process with an invariant measure. That
is, the set of IFS contraction mappings and associated probabilities define a Markov
operator that leaves a unique measure unchanged by its action. Since the inverse
problem requires finding a set of mappings that define an invariant measure which
approximates some given measure, a first step in the solution might be to identify a
179
Markov process with an invariant measure similar to the desired one. That is, if is
the given measure to be approximated, the first step of the solution is to find a Markov
process defined by an operator, M, that possesses an invariant measure, , that is
similar to . The second step of the solution would then be to find a set of mappings
and associated probabilities that define the Markov operator, M. This idea is
summarised in Figure 8.1
In the next section, an algorithm is introduced which suggests a solution to the
first part of the inverse problem and also a solution to the roomtone problem.
{ ;
i i
p w ,
IFS mappings and
associated probabilities
M
Markov operator
=M()
invariant measure
{ ;
i i
p w ,
solution to inverse
problem

M : =M()
find Markov operator
which leaves v invariant
v
given some measure
Figure 8.1 The top line shows the interdependence of the components of the RIA
version of an IFS. The bottom line shows a suggested path to obtain a solution to the
inverse problem.
8.2. Description of the Algorithm
The Poetry Generation Algorithm (PGA) is my own name for an algorithm that
has been found to exist in a number of forms, but which is usually presented as an
interesting diversion in the field of computer science [pi], [lanc]. It is a simple
algorithm that takes as input some passage of text, and then produces an output of
unlimited nonsense text that appears to have strong stylistic similarities to the original.
The example presented in Table 8.1 shows a short input passage and a number of
different output sequences obtained with different values of the algorithm's only
parameter, labelled 'seed length'.
180
Original Text (by Albert Einstein)
We are in the position of a little child entering a huge library whose walls are covered to the
ceiling with books in many different tongues. The child does not understand the languages in which
they are written. He notes a definite plan in the arrangement of the books, a mysterious order which
he does not comprehend, but only dimly suspects.
Seed length=2
We not undere covere whichichilin mystang witionly suspects. He posite ary whilin whichild the
libran to ton ton of the in ordered the plange lingues in they with books are a mystered then. The ch
to th hent only withe whose libran in orderstan the plangues. He lithe defin mystes not on ing
writerstang whichild the notent und to the derin they differionly sus ordefinitious note wittle
comprehe wrin thent und, arrandefin th tonges notent on manguage language linges. He a
linguagement und ton th huges ared ent they dimly dimly dimly dimly a many a does on of ary differe
are chichich but und the a does...
Seed length=3
We are ceiling a library which they are written. He not understand the position of the are walls
are walls are ceiling with books, a library which the litten. He not coverenterious order whose walls
are walls are ceiling a litten. He not understand the ceiling with books in whose written. He notes in
the child ent of the ceiling a litten. He not comprehend, but only suspects.We are child does in many
dimly suspects.We are comprehend, but only suspects.We arranguages notes notes notes not
comprehend, but only different of a library which he does in many dimly suspects.We arranguages
not understand to they are walls are in they...
Seed length=4
We are covered to they are written. He not understand the arrangement of the languages in the
arrangement of the are covered to the arrangement of a little child enterious order which he does not
comprehend, but only different tongues. The child enterious order which they are covered to the
ceiling a huge library whose walls are covered to the ceiling a huge library whose walls are written.
He not comprehend, but only different of the position of a little child enterious order which he does
notes a definite plan in the are written. He not understand they are written. He not comprehend, but
only dimly suspects.We are in the languages in the arrangement tongues. The ceiling a huge library
whose walls are written. He not understand the...
Seed length=15
We are in the position of a little child entering a huge library whose walls are covered to the
ceiling with books in many different tongues. The child does not understand the languages in which
they are written. He notes a definite plan in the arrangement of the books, a mysterious order which
he does not comprehend, but only dimly suspects. We are in the position of a little child entering a
huge library whose walls are covered to the ceiling with books in many different tongues. The child
does not understand the languages in which they are written. He notes a definite plan in the
arrangement of the books, a mysterious order which he does not comprehend, but only dimly
suspects. We are in the position of a little child entering...
Table 8.1 Example of the PGA acting on a short paragraph of text for a variety of
values of the seed length parameter.
As can be seen, the generated text is a chopped-up, regurgitated version of the
original text that maintains certain orderings of letters and words. The effect is that the
output, although often nonsense, retains aspects of the style of the original. The
algorithm strikes a balance between merely repeating the sequences of letters in the
same order as they originally appeared and reproducing them in an unstructured,
random way. The single 'seed length' parameter allows control over the tuning of this
balance. For a small value of seed length, the output becomes highly jumbled. For a
181
sufficiently high value, the output merely loops the original sequence over and over,
with no change to its structure. This can be seen in Table 8.1 when seedlength=15.
To understand how the algorithm works, consider a much shorter input sequence,
THE_CAT_SAT_ON_THE_MAT_
where a space is denoted with an underscore and is considered to be an addition to the
26 letter alphabet. The algorithm treats the input sequence as a circular one where the
first 'T' follows the end space. This is shown in Figure 8.2.
T
T
H
E
_
C
A
_
S
A
T
_
O
N
_
T
H
E
_
M
A
T
_
order of sequence
Figure 8.2. Input to the algorithm treated as a circular sequence.
Accompanying the circular register of letters is a smaller linear register known as
the 'seed'. For example, consider a seed whose length is two letters. This register is
initialised with any consecutive sequence of two letters from the original passage. For
example, let these be the first two, 'T H'. The initial contents of the seed may also be
considered to form the initial output of the algorithm. The algorithm then consists of
iterating of the following sequence of operations:
- search for and record the position of all occurrences of the seed sequence in
the original passage. For example, if the seed is 'T H', this occurs twice in the original
sequence (shown in bold):
THE_CAT_SAT_ON_THE_MAT__
Because the seed is initialised with part of the original sequence, it can always be
found to occur at least once;
- choose one of the occurrences of the seed sequence. If there is more than one,
choose any of these at random with equal probabilities. For example, choose the
second one:
182
THE_CAT_SAT_ON_THE_MAT__
- the next letter in the sequence after the chosen occurrence of the seed then
forms the output of the algorithm. Note that if the seed occurs at the end of the
original sequence, the first letter of the sequence is chosen. This is a consequence of
the circular treatment of the original. In this example, 'E' will be output;
- shift the output letter into the seed register from the right and discard the letter
at the far left of the seed. In this example, the seed will change from 'T H' 'H E',
with the 'T' being discarded.
These four operations are iterated as often as is required to form an output
sequence, one letter at a time. An example sequence of iterations for the algorithm is
shown in Table 8.2
iteration
number
state of
seed
number of
occurrences of
seed
occurrence
chosen
output
at initialisation TH - - TH
1 TH 2 2nd E
2 HE 2 1st _
3 E_ 2 1st C
4 _C 1 1st A
5 CA 1 1st T
6 AT 3 3rd _
7 T_ 2 2nd T
8 _T 2 1st H
9 TH 2 1st E
...etc ... ... ... ...
Table 8.2 Example sequence of iterations of the PGA.
So for the example shown, the generated output sequence is
THE_CAT_THE...
Note that at the 8th iteration, the seed chosen is at the very end of the original
sequence and so the letter taken for the output is from the beginning.
From this description it can be seen how this algorithm works. The seed is
initialised so that it contains part of the original sequence. When the seed is updated
with the shift operation, this situation is maintained and so there is always at least one
occurrence of the seed in the passage. Therefore, at each iteration there is always an
183
output and the seed can be updated. Consequently, any generated output sequence of
letters is guaranteed to have occurred in the same order as in the original passage. So,
for example, when the seed length is two, any consecutive two letter sequences in the
output can be found to occur somewhere in the original passage. It is this property of
the algorithm that maintains the similarity of the output with the original. Also,
therefore, the length of the seed will control how fragmented the output is. A higher
seed length ensures that the output always contains longer sequences that have
occurred in the original, such as whole words or even phrases.
When the seed length is one, there will be the highest number of occurrences of
the seed in the passage. As the seed length increases, it becomes more and more
difficult to find any particular sequence of letters. There is a limiting value for the
seed length at which there is only ever one occurrence of any seed sequence in the
original passage. If this occurs, the output is always the next letter in the original
sequence and no random jumps occur. The result is an output that consists of the
original sequence looped over and over again.
The function of this algorithm appears to be exactly that which is needed to solve
the roomtone problem. Recall from Chapter 2 that a solution to the roomtone problem
involves generating unlimited quantities of a sound given only a small fragment of
some original roomtone. The generated sound must have the same perceived
properties as the original and also not just be a periodically looped version of the
original. The PGA achieves this kind of function, but for passages of text. For a seed
length that is neither too high nor too low, for example when seedlength=4 in the
example shown in Table 8.1 , the output of the PGA is an unlimited passage of text
that has the same perceived qualities as the original passage and is not merely a
looped version of it. Further analysis of the PGA, which will be presented next, also
shows that it defines a Markov process on the embedded state space of the original
sequence that possesses an invariant measure. This is exactly what is required for the
first part of the solution to the RIA IFS inverse problem described in the introduction
to this chapter.
184
8.3. Analysis of the PGA
Consider the PGA to be a dynamical system where the seed represents the state of
the system at discrete instances of time. Let the seed length be L, and write the state as
a discrete vector
x eX (8.1)
where the state space, X, comprises the symbol alphabet set, A (26 letters plus the
space character), combined with itself L times. That is,
A A A A X = =
L
L times
(8.2)
Let the initial state of the seed be,
x
0
(8.3)
and let
x
n
(8.4)
denote the state of the seed after n iterations of the algorithm. Let
{ ; 1 0 : = e I i a
i
A (8.5)
represent the original passage of text. The state of the seed will then be restricted to
the subset of state space defined by L-length sequences of letters taken from the
original text passage. That is,
, ) , )
, ) { ; X Y c = = e
+ +
1 0 : , , ,
mod 1 mod 1
L i a a a x
I L i I i i
(8.6)
where (mod I) represents the nature of the circular register used to store the original
text sequence. Note that Y is the space of time-delay embedded vectors of the original
text sequence. Let Y be the number of distinct vectors in Y and let them be ordered
arbitrarily so that
{ ;
Y
y y y , , ,
2 1
= Y (8.7)
For each iteration of the PGA, the state of the system changes from one of the
states in the space Y to another. The sequence of state changes forms a trajectory in
state space. For example, Figure 8.3 shows part of the state space and some of the
possible states and trajectories for the case where the original text is the sequence
THE_CAT_SAT_ON_THE_MAT_. In this case the seed length, L, is two and
therefore the state space is two-dimensional.
The transition from one state to another is determined by the algorithm operation
rules given in the previous section. These declare that the state changes stochastically
such that the probability of the new state is entirely determined from the current state
of the system. Thus the dynamical system defined by the PGA is a first-order Markov
process, or a Markov chain, since it is discrete-time and discrete-state. A Markov
process is defined by the Markov operator, which in this case is a probability
transition matrix.
185
_ A C E H M N O S T
_
A
C
E
H
M
N
O
S
T
Alternative trajectories -
one chosen at random
TH
HE
_M
MA
E_
_C
CA
_T
First letter of seed
Second
letter
of
seed
Figure 8.3 Part of the state space, X, corresponding to an example PGA showing
some of the possible states and their associated transitions.
Let
, ) , ) , ) , )
Y n n n n
y y y , , ,
2 1
= (8.8)
be the state probability distribution at time n. This is a probability vector where , ) y
n

is the probability that the state of the PGA at time n is y and


, )

=
=
Y
i
i
y
1
1
(8.9)
The probability distribution at the next time step is then determined entirely by the
transition matrix, M,

n n
M
+
=
1
(8.10)
or,
, ) , ) , ) , ) , ) , ) , ) , )
|
|
|
|
|

\
| =
+ + +
YY Y Y
Y
Y Y n n n Y n n n
p p p
p p p
p p p y y y y y y


2 1
2 22 21
1 12 11 2 1 1 2 1 1 1

(8.11)
where p
ij
is the probability that the state will change from being y
i
to y
j
. M is composed
of probability (row) vectors so that

=
=
Y
j
ij
p
1
1 for each i
(8.12)
186
The values of the elements of the transition matrix are determined by the nature of
the original sequence. A transition probability will only be non-zero if the two states
appear consecutively in the original sequence. That is, if
, ) , )
, )
I L k I k k i
a a a y
mod 1 mod 1
, , ,
+ +
=
(8.13)
then
, ) , )
, ) b a a y
I k I k j
, , ,
mod 2 mod 1

+ +
=
(8.14)
where
b eA (8.15)
The probability of going from state y
i
to y
j
will then be equal to the number of
times the y
j
appears in the original sequence divided by the number of times y
i
appears. That is,
i
j
ij
y
y
p
#
#
=
(8.16)
If there is only one state which follows y
i
then the probability of going to it from
y
i
is obviously 1. If there are two distinct states, which follow y
i
then the probability
of moving to either is 0.5. For the example sequence,
THE_CAT_SAT_ON_THE_MAT_
and when L=2, the space of distinct subsequences will be
Y={_C,_S,_O,_T,_M,AT,CA,E_,HE,MA,N_,ON,SA,TH,T_} (8.17)
and so Y=15. So, for example, the probability of transition from , ) E_
8
= y to
, ) _C
1
= y is p
81
1
2
= since #y
8
=2 and #y
1
=1.
Every possible state of the Markov chain may be reached from any other state and
also every possible state may recur. To see this, consider that the states happen to be
chosen in the same order as in the original sequence until it eventually loops round the
circular register. Such a Markov chain is termed 'irreducible, positive recurrent'.
Consequently, the Markov chain has the property that it possesses a unique stationary
distribution [hoel72]. That is, there exists a unique distribution, t, such that
=
0
M n
n o
as (8.18)
and
= M
(8.19)
Or, in the language of IFS theory used in Chapter 3, the Markov operator defined
by the PGA possesses a unique invariant measure, t. The Markov chain is defined on
the space of embedded vectors, Y, which is a subset of the full state space X. The
above analysis has therefore shown that the PGA induces a Markov chain on the set of
embedded vectors of the original sequence and which possesses an invariant measure
187
in state space. Different original sequences will define different subsets of X with
different invariant distributions.
The PGA therefore presents a solution to the first half of the RIA IFS inverse
problem described in the introduction of this chapter. It provides a means of deriving a
Markov operator from an original sequence that defines a stochastic dynamical system
that can approximate the original sequence. The next section therefore considers using
the PGA with a digital audio time series as the original sequence instead of a passage
of text.
8.4. Implementation of the PGA for Sound
Keeping the notation of the previous chapter, let
{ ; 1 0 : = I i u
i
(8.20)
represent an I-length sequence of some sound time series. Assume that this is a
product of a chaotic system or, equivalently, a realisation of the stationary random
process
{ ;
n
U (8.21)
which is partly described by the mth order jpdf
, )
1 1
, ,
+ m i i i U
u u u P (8.22)
Alternatively, consider the set of m-length embedding vectors derived from the
original time series,
, ) 1 1 : , , ,
1 1
= =
+
I m i u u u y
T
m i i i i

(8.23)
which are distributed in embedded state space according to the measure . The
measure in m-dimensional embedded state space and the mth order jpdf are equivalent
descriptions of the probability distribution of the original time series. The stationarity
of the original sequence u
n
is equivalent to the invariance of the measure under the
action of the embedded dynamical system.
In the previous chapter, the aim was to find a deterministic chaotic system
possessing an invariant measure,
~
, that approximates . Now, the aim is to
implement a Markov dynamical system on the embedded state space that possesses a
stationary distribution, t, that approximates . Again, the information present in the
transitions between embedded vectors is used to derive the dynamical system.
How, though, can the PGA be implemented for digital audio. There are a number
of important differences between using the PGA with text and using it for sound.
Firstly, the number of possible discrete symbols in a digital audio sequence is far
higher than the number in a text sequence. For example, 16-bit digital audio has
65,536 possible symbols, compared to 27 for text. This suggests a much lower
188
likelihood of finding several occurrences of subsequences of the original sequence
that are at least two symbols long. The algorithm can only function if several
occurrences of a seed can be found so that a random choice can be made. Another
difference is that useful digital audio sequences are typically far longer than the
lengths of text typically processed by the PGA. This therefore implies considerably
greater processing power is needed to maintain the operational speed of the PGA
which, for the text implementation used to generate the examples in Table 8.1, is
satisfactory.
To investigate both these issues, the PGA algorithm used to generate the text
examples was slightly adapted to accept digital audio sequences as input. Trial runs
with the program have established the following. Firstly, the large increase in the
alphabet size is not a problem. Multiple occurrences of subsequences up to 5 in length
are typical in digital audio extracts of ~1/2 second taken from a steady-state source
such as a roomtone. This appears to be because of the correlations present in such
signals. Any value that recurs in a time series is likely to have the same neighbouring
values also recurring. The second point established is that, as expected, the processing
time for digital audio inputs of ~1/2 second and outputs of several seconds are
prohibitive, being of the order of tens of hours. It is therefore necessary to consider
alternative implementations of the PGA so as to increase the speed of the algorithm.
There are two possible alternatives implementations that represent extremes in
terms of processing speed and memory usage. One extreme is the algorithm
mentioned above and described in Section 8.2 in which the Markov transition
probabilities are calculated 'on the fly' by performing a comparison of the seed with all
subsequences of the original sequence. This algorithm has low memory usage as the
Markov transition matrix is not stored in its entirety, but a single transition probability
is calculated at a time. This algorithm, however, is slow because of the computational
cost of repeatedly searching the whole original sequence. An alternative to this
method is to calculate the transition Matrix in its entirety, once. This would involve a
significant preprocessing procedure, but the result would be a matrix that could be
referred to directly as a look-up table. This would then make iteration of the algorithm
much faster. This implementation, however, would require enormous amounts of
memory to store the transition matrix. Recall that this matrix is a 2-dimensional
square matrix of size Y, where Y is the number of distinct subsequences of the original
sequence. Consider the following estimate of Y: say an extract of original time series
uses 5000 distinct amplitude levels. Say that the neighbouring samples of any given
sample are restricted to having only 100 different values out of 5000 due to
correlation. Then for only a seed length of L=2 there will be Y=5 10
5
distinct
189
subsequences of length 2. Therefore the transition matrix will have the square of this,
which is of the order, 10
11
elements which is prohibitively large.
A novel implementation of the PGA has therefore been developed which is
somewhere between the two extremes described above. It does not calculate the entire
transition matrix, but uses additional memory to speed up the original search version
of the algorithm. The advanced algorithm is a variant of the algorithm described in
Section 8.2. It achieves a substantial increase in speed by preprocessing the original
sequence. This results in a reduction of the number of comparisons made with the
seed for each iteration. A greater amount of memory is needed, however, to store the
preprocessed data. The preprocessing operation sorts the original sequence into
ascending order while maintaining the whereabouts of the samples in the original
sequence. This is accomplished by forming am array of 2-dimensional vectors of the
data. One component of each vector stores an amplitude value, while the other stores
its position in the original sequence. The vectors are then sorted according to the value
of the first element, the amplitude value. This is done with a modified 'quicksort'
routine that uses the first element of the vector in the sort procedure, but always
maintains the pairing of the two elements when shuffling occurs. A simple example of
the action of the sort routine is shown in Table 8.3.
sample
value
position in
sequence
sample
value
position in
sequence
23 1 -90 9
35 2 -89 10
67 3 -36 8
126 4 12 7
98 5 23 1
45 6 35 2
12 7 45 6
-36 8 67 3
-90 9 98 5
-89 10 126 4
before preprocessing after
Table 8.3 Simple example showing how the preprocessing reorders the original
input sequence.
For the operation of the algorithm, a copy of both the above tables is kept. The
crucial time saving then occurs when the seed is being checked against the original
sequence. In the original version of the algorithm, the first element of the seed is
checked against every value in the original sequence. If a match occurs, then the
second element is checked and so on. Only when all elements match is a record kept
190
of the position in the sequence for the subsequent random choice part of the cycle.
With the above tables, however, any values of the original sequence that match the
first element of the seed can easily be looked-up in the sorted table using a binary
search. The associated position tag is then used in conjunction with the
unpreprocessed table to check the next values in the sequence against the seed. As a
result, only a few subsequences are tested for matching as opposed to a comparison
for every value in the original sequence.
There is one final problem that was encountered when running trials of the
original PGA with sound time series. This is the presence of an amplitude
discontinuity between the end of the original time series and its beginning when it is
stored in the circular register. The end value in the sequence is presumed to be
followed by the beginning value even though they may differ considerably in
amplitude. Consequently, the output of the algorithm includes large discontinuities,
heard as clicks, which are not present in the original time series. The solution to this
problem is to cross-fade the end of the time series with the beginning as is done when
splicing two recorded audio signals together. A small linear envelope is applied to the
amplitude values at the beginning and end of the original time series. The beginning
and end of the time series are then added together when stored in the circular register.
That is, the original sequence now becomes
{ ; 1 0 : + = E I i u
i
(8.24)
where E is the length of the two envelope functions. The envelope function is defined
as
E
i
e
i
=
B
(8.25)
for the beginning of the time series and
E
I i
e
i

=1
E
(8.26)
for the end. The original time series is then modified by the two envelopes to become
1 0 = + = '
+
E i u e u e u
i I i i i i

E B
1 = = ' I E i u u
i i

(8.27)
This process is depicted in Figure 8.4
191
0 E I I+E
I+E
I
beginning end
original time series
modified time series
envelope envelope
beginning and end
added together
Figure 8.4 Crossfade envelopes applied to beginning and end of original time
series which are then added together to form modified time series. This is then stored
in the circular register so that there is no amplitude discontinuity between its end and
its beginning.
8.5. Results
In this section, results are presented for three sets of experiments with the digital
audio version of the PGA. The first set of experiments use a single roomtone as the
original time series with a range of combinations of the two algorithm parameters;
length of original time series, I; and seed length, L. After gaining an understanding of
the effects of these parameters, the algorithm is tried with two other roomtones having
different characteristics to the first. Finally, the PGA is tried with a number of other
naturally occurring sounds.
In the first set of experiments, an industrial roomtone has been chosen for the
original time series. This comes from a sound-effects library and is described as
'roomtone, small [room size], ventilation noise' [ssl89]. This has been chosen because
it is a constant, steady state sound without any extra artefacts such as knocks or
clunks. It is therefore likely to satisfy the condition of being a stationary random
process. Three extracts of this sound having different lengths, I, are used as input to
the PGA. These extracts have the first and last 100 samples windowed and then added
together to remove any amplitude discontinuity. The window length of 100 was
chosen to be the smallest value such that the windowed result, when played in a
continuous loop, has no perceivable clicks or introduced artefacts. Each processed
extract is then tried with a range of seed lengths, L. The PGA is stopped after a 3
second synthetic time series has been generated. The resulting sounds for each
combination of I and L are compared with the original roomtone to evaluate how good
192
the PGA has been at creating a longer version of the original with the same perceived
properties. These results are summarised in Table 8.4. A selection of the resulting
sounds, and the original roomtone are presented as sound examples and indicated on
the table. The original roomtone can be heard as Sound 33.
Figure 8.5 shows waveforms plots of 300 and 3000 sample extracts of the original
roomtone. Figure 8.6(a) shows a plot of the resulting time series from the PGA when
I=300 and L=1. It can be seen that the output consists only of the sample values that
exist in the original which, in this case, are mostly negative. Figure 8.6(b) shows a
plot of the output when I=3000 and L=3. Here, the perceived 'buzzing' quality of the
result can be seen as the output consists of multiple, looped sections of the original.
Figure 8.6(c) shows the resulting time series when the PGA continuously loops the
original.
seed length, L
length of
original
time
series, I
1 2 3 4 5
300
(111)
not too bad, sounds
too much like white
noise and lacks the
low frequency depth
or original
Sound 34
(112)
buzzy, large
percentage of
result is looped
version of original
(113)
buzzy
(114)
buzz,
continuously
loops
original
(115)
buzz
3,000
(211)
too fluttery and hissy
(212)
good, but not
smooth enough
Sound 35
(213)
buzzy, large
percentage of
result is
looped
version of
original
Sound 36
(214)
buzz,
continuously
loops
original
Sound 37
(215)
buzz
30,000
(311)
sounds too much like
white noise and
lacks the low
frequency depth or
original
(312)
very good,
despite slight lack
of depth is almost
indistinguishable
from the original
Sound 38
(313)
also very
good
(314)
occasional,
slight buzz
(315)
occasional,
slight buzz
Sound 39
Table 8.4 Summary of results obtained with PGA and industrial roomtone as
original time series. (Numbers in brackets are experiment identification.)
193
The very good perceived similarity of the result for I=30,000 and L=2 is confirmed
by comparison of the time series plots, power spectral densities and amplitude
histograms shown in Figure 8.7 where there is also a high degree of similarity between
the original and synthetic versions. Note, however, that the spectral peaks present in
the power spectral density of the original at about 60 and 120 Hz are not present in the
spectrum of the synthetic version. This explains the slight difference between the
sound of the original and synthetic, the original containing a slight hum, which the
synthetic does not. As pointed out in both the previous section and previous chapter, it
is only to be expected that short-term correlations will be preserved as only the Lth-
order jpdf of the original is preserved by the PGA. Therefore it is not guaranteed that a
tonal element, which has long term correlations, will be preserved.
These results give some insight into the way in which the PGA operates. Firstly,
the longer the original time series, the better the results. There is however, no need to
increase the length beyond a limiting value, as the quality of the result for I=30,000,
for example, is good enough for the synthetic version to pass as the original.
Secondly, there is an optimal value, or small range of values, for the seed length
which produces the best results for any given length I. If the seed length is too low, the
result is too uncorrelated and the sound too 'white'. For too high a value of L, the
output contains sections in which portions of the original are looped, causing a
buzzing sound. Increasing L beyond a threshold produces a result that is merely the
original time series being continuously looped. The relationship between the
behaviour of the PGA and L is dependent on the length of the original time series, I.
The output loops the input for smaller values of L when I is smaller. This is to be
expected since the chance of finding more occurrences of the same subsequences in
the original decreases as its length decreases.
194
Figure 8.5 Time domain plots of the original roomtone showing 300 (left) and
3000 (right) samples.
(a)
(b) (c)
Figure 8.6 Time domain plots of output time series when (a) I=300, L=1, (b)
I=3000, L=3, and (c) I=300, L=4.
195
(a) (b)
(c) (d)
normalised amplitude, a
P(a)
0
0.01
0.02
0.03
0.04
0.05
0.06
-0.03 -0.02 -0.01 0 0.01 0.02 0.03
normalised amplitude, a
P(a)
0
0.01
0.02
0.03
0.04
0.05
0.06
-0.03 -0.02 -0.01 0 0.01 0.02 0.03
(e) (f)
Figure 8.7 Comparison between original (left) and synthetic time series (right)
showing: (a)&(b) time domain plots, (c)&(d) power spectral densities calculated by
averaging eleven 4096 point FFTs, and (e)&(f) amplitude histograms calculated from
30,000 samples.
Now that a range of values for I and L have been determined that give successful
operation of the PGA with a roomtone as input, the algorithm is tried with three other
roomtones with different qualities. In each case, the length of the original used is
I=30,000 and the seed length is varied to find the resulting synthetic time series that is
most like the original.
The first of the roomtones is described as 'laboratory roomtone' and is a recording
of a fairly large workspace where a number of pieces of electrical equipment are in
operation. There are also a number of people present and an open window at the time
of recording which both contribute certain discrete artefacts to the sound, such as
clunks, and passing traffic. This roomtone sound has been chosen because it is not so
196
pure and steady state as that used in the previous experiment. The best result was
found for the case where L=3. This sound, and the original roomtone, can be heard as
Sounds 40 and 41 respectively. As can be heard, the result is not very good. This is a
consequence of the discrete artefacts present in the original and it therefore not
satisfying the stationarity requirement. The action of the PGA is to repeat parts of
these artefacts with timing and spacing that is different from the way in which they
appear in the original. The result sounds like a 'minced' version of the original.
The other two roomtones used are, like that of the first experiment, steady state
sounds, but ones which have different perceived qualities - [ssl89]. One of these is a
deep, rumble-like sound, the other is higher in frequency and includes a pronounced
high frequency tonal drone. These two sounds and the best synthetic versions
produced with the PGA can be heard as Sounds 42 - 45. Table 8.5 shows a summary
of this set of experiments.
original sound sound
example
length of
original, I
seed
length, L
comments
laboratory
roomtone
40
and 41
30,000 3 poor result because of
discrete artefacts in original
deep rumble-
like roomtone
42
and 43
30,000 4 very good, result almost
indistinguishable from
original
roomtone with
drone
44
and 45
30,000 4 poor result because of tonal
component in original
Table 8.5 Summary of results for PGA used with other roomtones having different
qualities.
In the final set of experiments, four other naturally occurring, steady state sounds
which are not roomtones are used as input to the PGA. The four sounds are: the sound
of a river; wind noise; the sound of audience applause; and the sound of a rainforest.
These sounds have been chosen because they are fairly constant and steady-state, and
are sounds that are likely to occur as background sounds in film sound tracks. Again,
the length of the original time series in each case is set at I=30,000 and the seed length
is varied to find the best sounding result. The originals and the results can be heard as
Sounds 16 to 23 and are summarised in Table 8.6.
197
original sound sound
example
length of
original, I
seed
length, L
comments
river 46
and 47
30,000 3 good, but
some audible looping of
parts of the original
wind noise 48
and 49
30,000 3 good, retains some of the
flapping, fluttering sound of
the original
applause 50
and 51
30,000 2 very good, almost
indistinguishable from the
original
rainforest 52
and 53
30,000 4 o.k., but timing and spacing
of artefacts in the original
not retained
Table 8.6 Summary of results obtained with PGA and a variety of other
background sounds.
8.6. Conclusions
In this chapter, an algorithm, the PGA, has been presented that offers a solution to
the roomtone problem described in Chapter 2. The PGA is an implementation of a
Markov model that is based on the same framework as that of the previous chapter.
That is, given an original time series that is presumed to be a realisation of a stationary
random process, the model generates a synthetic version that preserves the Lth-order
jpdf of the original. This is achieved by constructing an irreducible, positive recurrent,
first-order Markov chain that acts on the space of embedded vectors of the original
time series. This type of Markov process is guaranteed to possess a stationary
distribution, i.e. an invariant measure. The transition matrix of the Markov chain is
constructed with reference to the transitions of the embedded vectors of the original
time series. The stationary distribution of the Markov chain then models the
embedded measure of the original and therefore preserves the Lth-order jpdf. The
PGA model is therefore like the model presented in the previous chapter where the
mapping of a chaotic system is constructed from the embedded vector transitions such
that its invariant measure models the invariant measure of the embedded system.
The PGA offers a solution to the roomtone problem because it allows unlimited
quantities of a synthetic roomtone to be generated from less than one second of the
original. The quality of the results, however, depends on the nature of the original
sound. When the PGA is used with original sounds that are perceived to be constant
and steady state, the resulting synthetic time series are nearly indistinguishable from
the originals. This has been shown to be the case for two roomtones, and the sound of
198
audience applause. Good results, in which there are slight differences between the
original and synthetic versions, were found for the sound of a river, wind noise and
the sound of a rainforest. It was found that the PGA performed badly when the
original sound is not constant, because of the presence of discrete artefacts, or when it
contains tonal components. This is to be expected for sounds containing discrete
artefacts because their time series do not adequately satisfy the condition of
stationarity required by the theory. It can therefore be presumed that this condition is
met when the sound is perceived as constant and steady state as is the case for those
sounds for which the PGA performed very well. Poor performance is also to be
expected for sounds with tonal components, as it is not guaranteed that the model will
preserve the long-term correlations present in their time series since only the Lth order
jpdf is preserved.
A novel implementation of the PGA has been presented that uses acceptable
amounts of computer memory and produces results in an acceptable time. Generally,
for an original time series of length I, the amount of memory used by the algorithm is
approximately 8I bytes. For good results, that is ones that sound like the original, it
was found that L should be of the order of tens of thousands. Consequently, the
memory use is up to one quarter of a megabyte. On average, the time taken to generate
1 second of a synthetic time series is approximately 40 seconds using a 486, 66Mhz
PC. This algorithm is therefore about 10 times slower than the synthetic system
described in the previous chapter.
The PGA model presented in this chapter is also a step towards the solution of the
inverse problem for the RIA version of an IFS. In the introduction it was proposed that
this inverse problem can be broken into two steps. The first step consists of
constructing a Markov process possessing an invariant distribution from an original
time series. The RIA of an IFS defines a Markov process possessing an invariant
distribution, or measure. Therefore, the second step of the inverse problem is to find a
set of IFS contraction mappings and associated probabilities that define a similar
Markov process to that constructed from the original time series. The PGA, therefore,
offers a solution to the first step of this process. It allows the construction of a Markov
chain that models certain original time series to a high degree of perceptual accuracy.
Further work is therefore required to address the second step of the problem and find a
way of extracting a set of IFS mappings and probabilities from the transition matrix
that defines the Markov chain.
Another suggested area of further work is into the possibilities of modifying
sounds once they have been successfully modelled with the PGA. The possibilities
include modifying the probability weightings involved during the random choice of
199
sub-string occurrences, or transforming the transition probability matrix M. It is
expected that a given sound may then be subtly altered since modifying the
probabilities allows control over the invariant statistics of the sound time series which
are relevant to its perceived qualities. It may also be possible to form hybrid sounds by
combining the transition Matrices obtained from several different sounds.
Finally, some preliminary experiments using non-steady-state sounds with the
PGA, for example speech, indicate that curious special effects can be produced that
may be useful for creative purposes. As an example, some original speech and a
processed version may be heard as Sounds 54 and 55. (For this example, I=30,000 and
L=4).
200
Chapter 9
Summary and Conclusions
This thesis has presented an exploration of the idea of applying chaos theory and
fractal geometry to the problem of modelling sound. It is believed to be the first
substantial investigation of this idea.
The thesis began with the suggestion that since chaos theory and fractal geometry
are significant new developments that are having a substantial impact on many fields
in both the arts and sciences, they may be applied to the problem of modelling sound.
This idea was emphasised with the example of computer images generated with
chaotic and fractal models. These are examples of simple systems that can both model
naturally occurring images and generate complex abstract forms.
Chapter 2 discussed what is meant by a sound model. For this work, it is taken to
be a computer-based model that represents sound for a practical purpose. The uses of
such models are considered to be creative ones, such as music composition and film
sound-track editing. A specific application was described called the 'roomtone
problem' in which the desire is to extrapolate sound; that is, to produce a greater
quantity of a short original sound such that it is perceptually the same as the original.
From a consideration of these applications, a functional definition of a sound model
has been developed. This is that the model should be able to represent the perceived
characteristics of a naturally occurring sound, and that the model operates with less
parameter data than sound data. The parameter data may then be stored instead of the
sound, thereby achieving data compression, and/or the parameters can be used as
'handles' with which the sound may be manipulated. Also, the model, which is
preferably simple, may be used to generate new, abstract sounds. Examples of
conventional models which satisfy these requirements were reviewed, and it was
found that the only ones existing in the literature are ones for musical sound or
speech.
Chapter 3 presented a review of chaos theory and fractal geometry with the aim of
providing a basis to the theory used in later chapters. The first issue considered was
that of the significance of chaos and fractals. Chaos describes a class of dynamical
behaviour exhibited by nonlinear systems. It is characterised by two main features: it
may be complex despite the system being simple; and it is unpredictable despite being
governed by deterministic rules. Fractals are geometric objects not encompassed by
traditional, Euclidean geometry. They exhibit self-similarity and have space-filling
201
properties unlike Euclidean objects which is reflected by them having non-integeric
dimensions. Of great significance is that chaos and fractals provide effective models
for many naturally occurring phenomena. These phenomena are typically complex and
irregular and have not, until now, been understood nor readily modelled in other ways.
A theoretical framework for chaos was developed by applying geometry to the
state space of a dynamical system. This introduces the central concept of the strange
attractor. The importance of the strange attractor is that it is a geometric embodiment
of chaotic behaviour and is itself a fractal object. The special dynamical properties of
chaotic systems are then understood to be related to the unusual geometric properties
of the fractal strange attractor. This treatment was extended by considering the
description of the statistical behaviour of chaotic dynamics with an invariant measure
whose support set is the strange attractor in state space.
Also introduced in Chapter 3 were Barnsley's Iterated Function Systems (IFS). It
was discussed how IFS provide a well understood framework for manipulating
complex fractal strange attractors with simple systems that have several advantages.
For example, they are robust for computer implementation and have already been
shown to effectively model natural images. It was also shown how the three
equivalent views of an IFS, those of contraction mappings, Shift Dynamical Systems
(SDS) and Random Iteration Algorithms (RIA) unite fractal geometry, chaotic
dynamics and Markov processes. Other features of chaos theory were then reviewed
including bifurcation, attractor visualisation, and fractal dimension.
Chapter 4 presented a discussion of the idea of applying chaos and fractals to the
problem of modelling sound. It was argued that chaos and fractals appear ideal for use
in sound models because their properties coincide with the main functional elements
required of a sound model. That is, they can represent naturally occurring phenomena,
and generate appealing abstract forms with simple systems requiring few data
parameters. This idea prompted two specific questions. Firstly, is there any evidence
for a connection between naturally occurring sound and chaos or fractals? Secondly,
how can sound be represented with chaos or fractals? In answer to the first question,
several pieces of positive evidence were presented which suggest that such a
connection does exist. The evidence includes the existence of bifurcation sequences in
musical instruments such as woodwinds and gongs; the determination of fractal
dimensions for speech sound waveforms; and the fact that wind noise and roomtones
are examples of 1/f noise and are therefore statistically self-affine, or fractal, signals.
In response to the second question, it was suggested that a sound may be
represented in either of two ways with a strange attractor: by representing the
dynamics of a sound and therefore assuming it is the product of a chaotic system; or
202
by representing the static waveform of the sound and therefore assuming that the
waveform is a fractal object. It is these suggestions that form the main concern of this
work and were the subject of investigation of the rest of the thesis. Chapters 5,6,7 and
8 presented four different experimental investigations and contain original
contributions towards the solution of the problem.
Chapter 5 presented a variety of experiments with a synthesis-only technique that
allows strange attractors to be turned into complex abstract sound waveforms. The
sounds are generated from simple systems that require small amounts of parameter
data. The model is based on Barnsley's Fractal Interpolation Functions (FIF) which are
a class of IFS attractors that are self-affine waveforms. The chapter began by
reviewing the theory of FIFs and developing an algorithm suitable for generating
digital audio. This algorithm was then used with a variety of input parameter sets to
gain an understanding of the nature of the model. The most significant result of this
chapter was that the FIF model can be used to generate a new class of sounds which
have both rhythm and timbre. The sounds are interesting, unusual and unlike those
produced with conventional synthesis techniques. They are considered to have
potential use for computer music composition. They are novel because the waveforms
are composed of patterns that are common to a range of time scales. The same
patterns are then perceived differently as both rhythm and timbre. Any modification to
the input parameters effects a change to both the nature of the rhythm and the quality
of the sound.
The FIF model was then incorporated into a more elaborate scheme where the
input parameters are controlled by a genetic model. The user then acts as 'artificial
selection' and can evolve sounds by accumulating small changes due to the
recombination and mutation of the parameters. This was found to be an effective and
compelling way of exploring the space of FIF sounds.
It was concluded that the products of the FIF model demonstrate that there is an
acoustic equivalent to the abstract fractal images such as the Julia set. That is,
appealing abstract fractal sounds can be readily generated from simple systems.
Chapter 6 continued with an investigation of FIFs, but as a means to represent
naturally occurring sound as part of an analysis/synthesis model. It was therefore an
investigation of the FIF inverse problem; given an original sound time series, find a
set of FIF parameters that specify an FIF that is similar to the original. Initial
investigations into this problem using interpolation points derived from the original
time series led to the conclusion that the problem is a difficult one and a systematic
approach is required.
203
The rest of the chapter presented an investigation into work by David Mazel found
in the literature. Mazel presents a number of FIF-based models for time series and
associated analysis algorithms which he claims offer solutions to the inverse problem.
An analysis of Mazel's results was conducted by comparing the
degradation/compression performance with that expected for simple amplitude
requantisation. It was found, however, that the performance of Mazel's models/inverse
algorithms are not significantly better than that of requantisation. To confirm this
finding, one of his inverse algorithms was reimplemented for use with sound time
series. The algorithm used is that of the self-affine model and was chosen because it is
directly applicable to the FIF model used in this thesis. Also, Mazel only presents one
result for this algorithm, which is the best of all his results. It was therefore decided
that more results are needed to reach a conclusion on the ability of the algorithm. The
reimplementation of the algorithm with a number of sounds as the original time series
yielded poor results. Inspection of the operation of the algorithm, however, led to a
proposition as to why the results were poor. The proposition was tested by modifying
the algorithm to counteract the problem. Some of the results from the modified
algorithm were then found to be significantly better than those of the unmodified
algorithm, Mazel's other models/algorithms, and amplitude requantisation. The best
results were found to occur for those sounds that have 1/f power spectral densities
such as wind noise and roomtones. It was concluded that this is a satisfying result as it
shows the model is exploiting the fractal redundancy of the statistically self-similar 1/f
noises.
Chapter 7 presented an account of the main approach taken in this thesis to the
problem of modelling sound with strange attractors. It is the other approach proposed
in Chapter 4 where the dynamics of a sound are represented by a strange attractor and
associated invariant measure. The chapter began by presenting the necessary
theoretical framework of how a time series obtained from observing a chaotic system
relates to that system. This involves the theory of embedding which shows how an
embedded system which shadows the original system may be constructed from a set of
time-delayed values of the observed time series. Using the theory of embedding, an
analysis/synthesis model was proposed where the dynamics of the sound are modelled
in embedded state space by a synthetic chaotic system whose attractor and associated
measure match those of the embedded system. It was argued that modelling the
embedded attractor and measure is equivalent to modelling the jpdf of the original
time series. The dynamics of a sound can therefore be modelled by finding a solution
to the inverse problem. This is: from the embedded sound time series, find a synthetic
system mapping that defines an attractor and measure that are similar to the embedded
ones.
204
A novel solution to this inverse problem was then proposed based on work found
in the literature on time series prediction. It involves partitioning embedded state
space and finding a locally linear function for each partition such that a nonlinear
synthetic system mapping is defined. This procedure forms the analysis half of the
model and iteration of the piece-wise linear synthetic system mapping then forms the
synthesis half. The linear functions are fitted using a least squares procedure that
minimises the function's prediction error with reference to an embedded extract of the
original time series.
The model was tested with a time series derived from the numerical integration of
the Lorenz chaotic system. It was found that, for a range of analysis parameters, the
synthetic system possesses a strange attractor that is very similar to that of the
embedded system derived from the original time series. Trials with ranges of analysis
parameters led to an understanding of the nature of the analysis scheme and showed
that the degree of similarity between the embedded attractor and the synthetic version
relates to the size of the prediction error. Having confirmed that the model can work
with simulated chaotic data, it was then tried with a range of naturally occurring sound
time series. These were chosen for their steady-state qualities and the belief that they
are generated by nonlinear dynamical systems. It was found that the scheme is capable
of modelling the sound of 'air noise' to a high degree of perceived similarity. This is
considered to be the first demonstration that a chaotic system is capable of modelling
sound by representing its dynamics with a strange attractor. A similarly good result
was also obtained for a tuba sound.
The results obtained using other sounds, although not so good at preserving the
perceived qualities of the original, were shown to preserve other features such as
embedded attractor shape, power spectral density and amplitude pdf. This was found
for the sounds of the wind, a gong, and a saxophone. It was found that the relative
performance for the different sounds was reflected by the relative values of the one-
step prediction errors. The computational complexity and size of the model were
investigated showing that the model is simple to implement, but requires large
amounts of parameter data. It was suggested that further work is required on the
optimisation of the analysis procedure with respect to the size of the model.
It was concluded that the results obtained with this model are good enough to
confirm the feasibility of the approach and to warrant further investigation. The strong
link between the model and Barnsley's IFS was also discussed. A number of further
experiments and immediate improvements were suggested as well as ideas for a
longer-term strategy.
205
Chapter 8 presented the final experimental investigation which is an approach to
the solution of the roomtone problem based on the theoretical framework of Chapter
7. Instead of modelling the embedded attractor and measure of an original time series
with a deterministic nonlinear dynamical system, a Markov model was proposed. This
idea came from a consideration of the equivalence between the deterministic SDS
version of an IFS and the RIA version, which defines a Markov process. The inverse
problem for the RIA version of an IFS was considered and a two stage solution
proposed. The first stage involves determining a Markov chain from a given
realisation of a stationary process, then the second stage involves obtaining the
mappings and associated probabilities for the RIA of an IFS from the Markov
transition matrix.
The Poetry Generation Algorithm, found in the literature, was then presented
which allows unlimited quantities of synthetic text to be generated in the style of some
given original passage. It was argued that this algorithm, if modified to work with
digital audio, presents a solution to the roomtone problem. It was then shown that the
PGA works by determining a Markov chain that operates on the set of embedded
sequences of the original text sequence and that possess an invariant measure. The
PGA therefore also presents a solution to the first half of the proposed RIA inverse
algorithm.
An implementation of the PGA was then presented that works with digital audio.
Some modifications to the original algorithm were made so as to produce results in an
acceptable time and using acceptable amounts of computer memory. Results were
then presented for the PGA used with a variety of roomtone time-series. It was found
that the PGA provides a solution to the roomtone problem for certain types of sound.
That is, unlimited quantities of a sound may be generated from less than a second of
an original such that the synthetic version sounds almost indistinguishable from the
original. It was concluded that the PGA works well when the original sound is steady-
state, and therefore presumed to be stationary. Good results were also presented for
other background sounds such as that of a river and that of audience applause.
My overall conclusion to this work is that chaos theory and fractal geometry
provide a rich source of ideas and techniques to be applied to the topic of modelling
sound with a computer for creative uses. Chaos and fractals are, in their own right,
compelling and inspiring subjects that have great intuitive appeal. It is also easy to see
how they might relate to a wide range of everyday complex phenomena. When
beginning this work, I felt intuitively that there must be a strong relationship between
chaos/fractals and naturally occurring sound. This intuition was a product of the
cogent nature of chaos theory and of observations of the complex and irregular sounds
206
that occur in nature that are not musical or speech sounds. The intuition was also that
the development of sound models has paralleled the conventional treatment of
dynamics in science and engineering. That is, complex, irregular and nonlinear
phenomena have received less attention than simple, regular, linear phenomena for the
reason that the theory and models of the latter are well developed, while for the former
they have been less well understood. Hence the emphasis on modelling regular
musical sound with linear systems and using Fourier theory. While this approach has
been very successful, its inadequacies are well described by the quote from Iannis
Xenakis given in the Introduction. It was therefore hoped that applying chaos and
fractals to sound might provide new models and techniques that would complement
existing ones, but that would be nonlinear and concern complex and irregular sound.
Given the results of this thesis, my conclusion is that the original intuition was
correct and that I have initiated a number of useful models and techniques. Some of
these are ready for use, for example the FIF synthesis technique and the roomtone
model, while others provide material for further research, such as the predictive
chaotic model. The results on which this conclusion is based may be further
summarised as follows. Firstly, the assembled evidence indicates that there are a
number of strong connections between naturally occurring sound and chaos and
fractals. Sound made from the complex strange attractors of very simple systems
(FIFs) can have interesting and musically useful properties. Sound waveforms with
statistically self-affine properties may be compressed by representing them with
fractal waveforms (which are again FIFs). The dynamics of naturally occurring chaotic
sound, such as 'air noise', may be modelled via a strange attractor with a synthetic
chaotic system such that the perceived qualities of the original are preserved.
Although the chaotic system requires many parameters, it is very simple to implement,
and there are good reasons to expect that the number of parameters can be reduced. In
the case where the sound is predominantly regular, but which includes irregularities
(the tuba sound), the chaotic model may be made very simple. Steady-state ambient
noises may be convincingly modelled with a stochastic system that has strong
similarities to an alternative view of a chaotic system (the RIA variant of an IFS). It is
expected that this model way be developed to exploit this connection.
While these results satisfy many of the questions posed at the outset, it is felt that
there are considerable limitations to the models developed. These limitations concern
the type of sound signals that the models are suited to. For example, the FIF model
represents signals that have exactly self-affine waveforms while the predictive chaotic
model and the PG algorithm assume stationary behaviour with invariant measures in
state space. Although these models have been shown to be capable of representing
sounds, these sounds form a limited subset of those occurring naturally. It is unlikely,
207
for example, that a sound will be exactly self-affine, and is more likely to be
statistically self-affine. There are, however, many other fractal models that may be
used. There are several partial self-affine variants of the FIF model and a range of
statistical models that have been used for image modelling.
Also, sound is typically not steady-state and stationary as required by the
predictive chaotic model. It is more likely to be time-varying or to be a discrete event
with transient behaviour. In fact it is these time-varying qualities of sound that are of
enormous perceptual significance, a fact widely acknowledged by computer
musicians. Again, however, there are many ways in which the chaotic model may be
developed to represent time-varying behaviour. One idea has already been mentioned
in the Conclusions of Chapter 7 of developing meta-dynamical systems in which
nonlinear dynamics are responsible for behaviour on several time scales. There also
many other nonlinear models which feature in what has become known collectively as
'complexity theory' that could be investigated. One example is that of cellular
automata (CA) which are nonlinear dynamical systems capable of many types of
behaviour. One type is known as 'class IV' behaviour which is poised between order
and chaos and has been described as behaviour having effectively very long transients.
Again, observing this type of behaviour, the intuition is that class IV CA may be
capable of capturing the dynamics of natural sound.
The conclusion drawn, then, is that this work has provided an initial investigation
into the idea of using nonlinear dynamics to model sound with enough positive results
to warrant further investigation. This work has concentrated on several types of
nonlinear model that are strictly fractal or strictly chaotic. What is then needed are
developments which modify the models to be better suited to the overall dynamics of
naturally occurring sound. For example, instead of choosing an unvarying part of a
gong sound and trying to fit a steady-state chaotic model to it, it would be better to
have a model that is based on transient nonlinear dynamics.
In any case, another general conclusion that can be made and which it is expected
would often apply, is that the problem in hand is a difficult one. The desire, in general,
is to have a simple, compact, manageable model for a complex sound. This therefore
involves some kind of solution to the 'inverse problem'. This always involves finding a
way of going from a complex data set to a simple algorithmic description of it.
Intuitively, this seems like a difficult thing to do. Experience has shown that there is
no problem in producing complex data sets from simple nonlinear systems, but that to
fit a simple system to any given, naturally occurring data set is difficult. It has been
suggested that the reason for this has something to do with algorithmic complexity
[mant92].
208
The algorithmic complexity of a set is a measure of its information content in the
context of its generation - see [chai88]. Specifically, it is the length of the shortest
program that, when executed on a Universal Turing Machine, generates the original
set. Typically, both the original set and the program code are represented as binary
strings. Their information content, in bits, is then simply their respective lengths. For a
truly random binary sequence, which has no redundancy, each of the digits has to be
stored explicitly in the program as there is no other means of generating them. The
algorithmic complexity is therefore approximately equal to the length of the original
sequence. At the other extreme, a sequence that is made up of all 1's, and therefore has
high redundancy, may be represented by simple program containing a loop. The size
of this program, will therefore be much smaller than the size of the original sequence.
Take, for example, the inverse problem for IFS which can be seen in relation to
the concept of algorithmic complexity. The solution of the inverse problem requires a
means of computing, or at least approximating, the algorithmic complexity of a given
set. That is, consider that the IFS generation algorithm and associated parameters form
the program code. Running this program then specifies a complex IFS attractor. The
inverse problem is to find the fewest IFS parameters to specify an attractor of a
desired form. The amount of data required to store the model and the parameters is
then an approximation to the algorithmic complexity of the set being modelled.
The difficulty, however, is that it is not possible, in general, to obtain the
algorithmic complexity of some given set [ford86 and mant89]. This is because the
problem is undecidable - it is equivalent to the Turing halting problem, a form of
Gdel's Incompleteness Theorem - see [hofs79]. That is, it is not possible to tell in
advance whether a program for computing the algorithmic complexity will eventually
stop or not. The only possible option is to try the program and see. This may explain
why it is difficult to find good solutions to inverse problems. A good solution will
always imply that the algorithmic complexity is being calculated which is, inherently,
problematical. This, however, is a problem for solutions to general inverse problems.
In practice, a solution is typically sought for a specific case where a certain type of
redundancy is being exploited by the model. There is no doubt that the redundancies
implied by chaos theory and fractal geometry, as well as occurring in a variety of
natural phenomena, are relevant to the problem of modelling sound.
On the basis of the results outlined in this thesis, I feel that applying chaos theory
and fractal geometry to sound modelling has considerable future potential. I imagine
that it may provide a basis for modelling irregular sound in much the same way as
linear theory has for the regular case. Further, powerful, computer-based tools and
techniques could be developed for anyone interested in the creative use of sound.

Anda mungkin juga menyukai