Dafx 2008

Proc. of the 11th Int.
Conference on Digital Audio Effects (DAFx-08), Espoo, Finland, September 1-4, 2008
A UNIFYING FRAMEWORK FOR ONSET DETECTION, TEMPO ESTIMATION AND

PULSE CLARITY PREDICTION
Olivier Lartillot, Tuomas Eerola, Petri Toiviainen, Jose Fornari∗

Finnish Center of Excellence in Interdisciplinary Music Research,
University of Jyväskylä
Finland
<first.last>@campus.jyu.fi
ABSTRACT 25 participants have rated the pulse clarity of one hundred excerpts
from movie soundtracks. The mapping between the model predic-
An overview of studies dealing with onset detection and tempo
tions and the ratings, discussed in section 6, was carried out via
extraction is reformulated under the new conceptual and computa-
regressions.
tional framework defined by MIRtoolbox [1]. Each approach can
be specified as a flowchart of general high-level signal-processing
operators that can be tuned along diverse options. This frame- 2. COMPUTING THE ONSET DETECTION FUNCTION
work encourages more advanced combinations of the different ap-
proaches and offers the possibility of comparing multiple approaches 2.1. Preprocessing
under a single optimized flowchart. Besides, a composite model
explaining pulse clarity judgments is decomposed into a set of in- First of all, the audio signal is loaded from a file:
dependent factors related to various musical dimensions. To eval-
uate the pulse clarity model, 25 participants have rated the pulse a = miraudio(’myfile.wav’) (1)
clarity of one hundred excerpts from movie soundtracks. The map-
ping between the model predictions and the ratings was carried out The audio signal can be segmented into characteristic and sim-
via regressions. ilar regions based on novelty [2] by calling the mirsegment op-
erator [1]:
1. INTRODUCTION a = mirsegment(a) (2)
MIRtoolbox is a Matlab toolbox offering an extensive set of signal When the tempo is supposed to remain stable within each seg-
processing operators and musical feature extractors [1]. The objec- ment [3], command (2) automatically ensures that the tempo will
tive is to design a tool capable of analyzing a large range of musical be computed for each segment separately.
dimensions from extensive set of audio files. The first public ver-
sion released last year contains the core of the framework enabling 2.2. Filterbank decomposition
a broad overview of musical dimensions investigated in compu-
tational music analysis. The aim of current research is mainly to The estimation of the onset positions generally requires a decom-
improve the set of tools by integrating a large range of approaches position of the audio waveform along particular frequency regions.
currently advertised in the research community. The simplest method consists in discarding the high-frequency com-
This paper focuses on the joint questions of onset extraction ponents by filtering the signal with a narrow bandpass filter [4]:
and tempo estimation. A synthetic overview of studies in this do-
main is reformulated using the operators defined in MIRtoolbox. a = mirfilter(a,’Scale’,50,20000) (3)
Section 2 shows various methods to compute the onset detection
curve and section 3 deals with the description of the curve, and in More subtle models require a multi-channel decomposition of
particular the detection of the onsets themselves. The estimation the signal mimicking auditory processes. This can be done through
of tempo from the onset curve is dealt in section 4. Throughout filterbank decomposition [5, 6]:
this review, each approach is modeled in terms of a flowchart of
general high-level signal-processing operators available in MIR- b = mirfilterbank(a,’CriticalBand’,
toolbox with multiple options and parameters to be tuned accord-
ingly. This framework encourages more advanced combinations ’Scale’,44,18000) (4)
of the different approaches and offers the possibility of comparing
multiple approaches under a single optimized flowchart. where more precised specification can be optionally indicated.
In section 5, a composite model explaining pulse clarity judg- Alternatively, the decomposition can be performed via a time-
ments is decomposed into a set of independent factors related to frequency representation computed through STFFT [7, 3]:
various musical dimensions. To evaluate the pulse clarity model,
s = mirspectrum(a,’Frame’,’FFT’,
∗ This work has been supported by the European Commission (NEST
project “Tuning the Brain for Music", code 028570) and by the Academy
’WinLength’,.023,’s’,
of Finland (project number 119959). ’Hop’,50,’%’) (5)
DAFX-1
Proc. of the 11th Int. Conference on Digital Audio Effects (DAFx-08), Espoo, Finland, September 1-4, 2008
further recombined for instance into critical bands1 : 2.3.2. Energy
b = mirspectrum(s,’CriticalBand’, Another strategy consists in computing the (root-)mean-square en-

ergy of each successive frame of the signal [9, 8, 10]:
’Scale’,50,20000) (6)
od = mirrms(x,’Frame’) (13)
The filterbank is sometimes split into subbands sent to distinct
analyses [8] , for instance: 2.3.3. Recombining banks
[l m h]= mirfilterbank(b,’Split’, The channels is sometimes recombined immediately after the en-
[1200 11000]) (7) velope extraction [11]:
od = sum(od,’Band’) (14)
The high frequency component h, related to the frequency range
over 11000 Hz, is discarded, the middle register m, for frequencies In order to allow the detection of periodicities scattered along
over 1200 Hz, will be applied to energy-based analyses (paragraph several bands, while still optimizing the analysis accuracy, the re-
2.3): combination can be partial, by summing adjacent bands in groups
om = mironsets(m,’Energy’) (8) [7] resulting for instance in four wider-frequency channels:
whereas the low component l will be applied to frequency-based od = sum(od,’Bands’,4) (15)
analyses (paragraph 2.4):
2.4. Frequency-based strategy
ol = mironsets(l,’Freq’) (9)
Frequency-based methods start from the time-frequency represen-
2.3. Energy-based strategy tation too, as in (5), but analyzing this time each successive frame
successively. High-frequency content [12] can be highlighted via
These strategies focus on the variation of amplitude or energy of linear weighting:
the signal.
s = mirspectrum(s,’HFC’) (16)
2.3.1. Envelope extraction The contrast between successive frames is observed through dif-
ferentiation, leading to a spectral flux:
The description of this temporal evolution of energy results from
an envelope extraction, basically through rectification (or squar- od = mirflux(s) (17)
ing) of the signal, low-pass filtering, and finally downsampling us- where diverse distance can be specified using the ’Dist’ param-
ing the following command: eter, such as L1 -norm [12] or L2 -norm [8]. Components contribut-
ing to a decrease of energy can be ignored [8] using the ’Inc’
od = mirenvelope(x) (10)
option. Instead of simple differentiation, FIR filter differentiator
[13] can be specified. Each distance between successive frames
where x can be either the undecomposed audio signal a, or the
can be normalized by the total energy of the first frame (’Norm’
filterbank decomposed b2 , or the middle-frequency band m.
option) in order to ensure a better adaptiveness to volume variabil-
Further refinement enables an improvement of the peak pick-
ity [8]. Besides, the computation can be performed in the complex
ing: first the logarithm of the signal is computed [5]3 :
domain (’Complex’ option) in order to include the phase infor-
od = mirenvelope(od,’Log’) (11) mation [14].
The novelty curve designed for musical segmentation, as men-
and the result is differentiated and half-wave rectified4 : tioned in section 2.1, can actually be considered as a more refined
way of evaluating distance between frames [15]. We notice in par-
od = mirenvelope(od,’Diff’,’HWR’) (12) ticular that the use of novelty on multi-pitch extraction results [16]
leads to particular good results when estimating onsets from violin
Some approaches advocate the use of a smoothed differentiator soli (see Figures 1-4).
FIR instead, based on exponential weightening5 [8] (available as a
f = mirpitch(a,’Frame’) (18)
parameter ’Smooth’ of the ’Diff’ option).
od = mirnovelty(f) (19)
1 This second call of the command mirspectrum does not mean that a
second FFT is computed. It just indicates a further operation on a mirspec- 2.5. Post-processing
trum object already computed.
2 It should be mentioned that if x is a mirspectrum object, the command
If necessary, the onset detection function can be smoothed through
(10) should include the ’Band’ keyword in order to specify that the enve- low-pass filtering [15]:
lope should be computed along bands and not the spectrum decomposition
in each frame. od = smooth(od) (20)
3 A µ-law compression [7] can be specified as well using the ’Mu’ op-
tion. In order to adapt further computation (such as peak picking or
4 A weighted average of the original envelope and its differentiated ver-
periodicity estimation) to local context, the onset detection curve
sion [7] can be obtained using the ’Weight’ option. can be detrended by removing the median [17, 13, 15]:
5 The logarithmic transformation might exempt from this loss of infor-
mation, though. od = detrend(od,’Median’) (21)
DAFX-2
Pitch
coefficient value (in Hz)

1000
500
0
0 5 10 15
Temporal location of events (in s.)
Figure 1: Multi-pitch extraction from a violin solo recording.
Figure 2: Frame-decomposed generalized and enhanced autocorrelation function [16] used for the multi-pitch extraction.
Similarity matrix
temporal location of frame centers (in s.)
14
12
10
2 4 6 8 10 12 14
temporal location of frame centers (in s.)
Figure 3: Similarity matrix computed from the frame-decomposed autocorrelation function.
Novelty
1
coefficient value
0.5
0
0 5 10 15
Figure 4: Novelty curve estimated along the diagonal of the similarity matrix [2], and onset detection (circles) featuring one false positive
(the second onset) and one false negative (around time t = 12.5 s.).
DAFX-3
and the onset detection curve can be half-wave rectified as well:

If on the contrary the note onset positions are found at local
od = hwr(od) (22) maxima of the temporal derivate of the energy curve [7, 8], then
the attack slope can be directly identified with the values of the
3. NON-PERIODIC CHARACTERIZATIONS OF THE local maxima.
ONSET DETECTION CURVE
3.1. Articulation 4. TEMPO ESTIMATION

The Low-Energy Rate – commonly defined as the percentage of 4.1. Pulsation estimation
frames within a file that have an RMS energy lower than the mean
RMS energy across that file [18] – can be generalized to any kind The periodicity of the onset curve can be assessed in various ways.
of onset detection curve:
art = mirlowenergy(od) (23) 4.1.1. Fourier transform

An estimation of the articulation can be assessed by computing FFT can be computed in separate bands, leading to a so-called
the Average Silence Ratio [19], which can be formalized as a low- fluctuation pattern [21]:7
energy rate where the threshold (here, silence threshold) is set to a
fraction of the mean RMS energy: p = mirspectrum(od,’FFT’,’Band’) (30)
asr = mirlowenergy(od,’Threshold’,.5) (24)
Similarly, the spectral product removes the harmonics of the
periodicities [13]:
3.2. Onset detection
Onsets can then be associated to local maxima of the onset detec- p = mirspectrum(od,’FFT’,’Prod’) (31)
tion curve:
o = peaks(od) (25)
4.1.2. Autocorrelation function
The onsets found on each different bands can be combined to-
gether: More often, periodicity is estimated via autocorrelation [22]:8
o = sum(o,’Band’) (26)
As one onset, when scattered along several bands, produces a se- p = mirspectrum(od,’Autocor’) (32)
ries of onsets that are not necessarily exactly synchronous, an align-
ment is performed by selecting the major peak within a 50-ms Alternatively, the autocorrelation phase matrix shows the dis-
neighborhood [5]6 . tribution of autocorrelation energy in phase space [10]:
When combining the hybrid subband scheme defined in equa-
tions (8-9): ph = mirautocorphase(od) (33)
o = om + ol (27)
onsets from the higher frequency band, offering better time reso- Metrically-salient lags can then be emphasized by computing
lution, are preferably chosen via a weighting sum [8]. the Shannon entropy of the phase distribution of each lag:
p = entropy(ph,’Lag’) (34)
3.3. Attack characterization
If the note onset temporal position is estimated using an energy- An emphasis towards best perceived periodicities can be ob-
based strategy (section 2.3), some characteristics related to the at- tained by multiplying the autocorrelation function (or the spec-
tack phase can be assessed as well. trum) with a resonance curve [23, 10] (’Resonance’ option).
If the note onset positions are found at local maxima of the
energy curve (amplitude envelope or RMS in particular), they can
be considered as ending positions of the related attack phases. A 4.1.3. Comb filters
complete determination of the attack requires therefore an estima-
tion of its starting position, through an extraction of the preceding Another strategy commonly used for periodicity estimation is based
local minima using an appropriate smoothed version of the energy on a bank of comb filters [6, 7]:
curve. Figure 5 shows the output of the command:
ph = mirspectrum(od,’Comb’) (35)
at = mironsets(’ragtime.wav’,’Attacks’) (28)
7 Following our discussion initiated at footnote 2, here the ’Band’ op-
Then the characteristics of the attack phase can be its duration or tion is explicitly mentioned as fluctuation pattern is usually computed from
its mean slope [20]. Figure 6 shows the output of the command: time-frequency representation. The ’Band’ keyword will not be mentioned
in the following commands for clarity sake.
8 In MIRtoolbox 1.0, mirspectrum was strictly related to FFT whereas
as = mirattackslope(at) (29)
mirautocor was related to autocorrelation function. In the new version,
6 More subtle combination processes have been proposed [5], based on mirspectrum should be understood as a general representation of energy
detailed auditory modeling, but are not integrated in the toolbox yet. distribution along frequencies, implemented by various methods.
DAFX-4
Onset curve (Envelope)

0.25
0.2
amplitude
0.15
0.1
0.05
0
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5
time (s)
Figure 5: Onset detection with determination of the attack phases.
Attack Slope
2
coefficient value
1.5
0.5
0
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5
Figure 6: Slope of the attack phases extracted in the previous example.
4.1.4. Event-wise The main pulse can be estimated by extracting the global max-
imum in the spectrum.
Periodicities can be estimated also from actual onset positions, ei-
ther detected from the onset detection curve (o, computed in sec- mp = peak(p) (40)
tion 3.2), or from onset dates read from MIDI files:
The summation of the peaks across bands can also be per-
o = mironsets(’myfile.mid’) (36) formed after the peak picking as well [25], with a clustering of
the close peaks and summation of the clusters:
The periodicities can be displayed with a histogram showing
mp = sum(mp,’Band’,’Tolerance’,.02,’s’) (41)
the distribution of all the possible inter-onset intervals [24]:
More refined tempo estimation are available as well. For in-
h = mirhisto(mirioi(o,’All’)) (37) stance, three peaks can be collected for each periodicity spectrum,
and if a multiplicity is found between their lags, the fundamental
which can be represented in the frequency domain: is selected [13]. Similarly, harmonics of a series of candidate lag
values can be searched in the autocorrelation function [10].
p = mirspectrum(h) (38) Finally the peaks in the autocorrelation can be converted into
BPM using the mirtempo operator:
Alternatively, the MIDI file can be transformed into an onset
detection curve by summing Gaussian kernels located at the on- t = mirtempo(mp) (42)
set points of each note [23]. The onset detection curve can then
be fed to the same analyses as for audio files, as presented at the
5. MODELING PULSE CLARITY
beginning of this section.
The computation developed in the previous sections help to offer
4.2. Peak picking a description of the metrical content of musical work in terms of
tempo. But some further analyses may enable to produce further
The previous paragraph gave an overview of diverse methods for important information related to rhythm. In particular, one im-
the estimation of rhythmic periodicity: FFT, autocorrelation func- portant way of description musical genre and particular works is
tion, comb filters output, histogram. Following the unifying view related to the amount of pulsation, more precisely to the clarity of
encouraged in the mirtoolbox framework, all these diverse repre- its expression. The understanding of pulse clarity may yields new
sentation can be considered as one single periodicity spectrum p, ways to improve automated genre classification in particular.
which can be further analyzed as follows.
The periodicity estimation on separate band can be summed 5.1. Previous work
before the peak picking:
At least one previous work has studied this dimension [26] – termed
p = sum(p,’Band’) (39) beat strength. The proposed solution is based on the computation
DAFX-5
of the autocorrelation function of the onset detection curve decom- 5.3. Harmonic relations between pulsations
posed into frames:
The clarity of a pulse seems to decrease if pulsations with no har-
p = mirspectrum(o,’Autocor’,’Frame’) (43) monic relations coexist. We propose to formalize this idea as fol-
lows. First a certain number of peaks are selected from the auto-
the three best periodicities are extracted [11]: correlation curve p:9
t = peaks(p,’Total’,3) (44) pp = peaks(p) (53)
These periodicities – or more precisely, their related autocorrela- Let the list of peak lags be P = {li }i∈[0,N ] , and let the first
tion coefficients – are collected into a histogram: peak l0 be the one considered as the main pulsation, as determined
in paragraph 4.2. The list of peak amplitudes is {p(li )}i∈[0,N ] .
h = mirhisto(t) (45) A peak will be inharmonic if the remainder of the euclidian
division of its lag with the lag of the main peak (and the inverted
From the histogram, two estimation of beat strength are pro- division as well) is significantly high. This defines the set of inhar-
posed: the SUM measure sums all the bins of the histogram: monic peaks H:
SUM = sum(h) (46)  ˛

˛l ∈ [αl0 , (1 − α)l0 ] (mod l0 )
ff
H = i ∈ [0, N ] ˛˛ i (54)
l0 ∈ [αli , (1 − α)li ] (mod li )
whereas the PEAK measure divides the maximum value to the
main amplitude:
where α is a constant tuned to .15 in our implementation.
PEAK = peak(h)/mean(h) (47) The degree of harmonicity is hence decreased by the cumula-
tion of the autocorrelation coefficients of the non-harmonic peaks:
This approach is therefore aimed at understanding the global „ P
p(li )
«
metrical aspect of an extensive musical piece. Our study, on the HARM = exp − i∈H (55)
contrary, is focused here on an understanding of the short-term βp(l0 )
characteristics of rhythmical pulse. Indeed, even musical excerpt
less than a few seconds long can easily convey to the listeners a where β is another constant set to 4.
strong sense of rhythmicity. The analysis of each successive local
context can then be extended to the global scope through usual
statistical techniques. 5.4. Non-periodic accounts of pulse clarity
Other descriptors have been added that do not relate directly to the
5.2. Statistical description of the autocorrelation curve periodicity of the pulses, but indicate factors of energy variability
that could contribute to the perception of clear pulsation. Some
For that purpose, the analysis is focused on the analysis of the factors defined in section 3 have been included:
autocorrelation function p itself, as defined in equation (43), and
tries to extract from it any information related to the dominance of • the articulation ARTI, based on Average Silence Ratio (24),
the pulsation. The most evident descriptor is the amplitude of the
• the attack slope ATAK (3.3).
main peak, hence the global maximum of the curve:
Finally, a variability factor VAR sums the amplitude difference be-
MAX = max(p) (48) tween successive local extrema of the onset detection curve.
The whole flowchart of operators required for the estimations
It seems that the global minimum is usually (inversely) related to of the pulse clarity factors is indicated in Figure 7.
the importance of the main pulsation:
MIN = min(p) (49) 6. MAPPING MODEL PREDICTIONS TO LISTENERS’

RATINGS
The kurtosis of the main peak describes its distinctness:
In order to assess the validity of the models predicting pulse clar-
KURT = kurtosis(p) (50) ity judgments presented in the previous section, an experimental
protocol has been designed. In a listening test experiment, 25
The entropy of the autocorrelation function indicates the quantity participants have rated the pulse clarity of one hundred excerpts
of pulsation information conveyed. from movie soundtracks. In parallel, the same musical database
has been fed to the diverse pulse clarity models presented in the
ENTR = entropy(p) (51) previous section. The mapping between the model predictions and
the listeners ratings was finally carried out via regressions.
Another hypothesis is that the faster a tempo is, the more clearly it
is perceived by the listeners (due to the increased density of event): 9 If no value is given for the ’Total’ parameter, by default all the lo-
cal maxima offering sufficient contrast with their related local minima are
TEMP = mirtempo(p) (52) selected.
DAFX-6
VAR = mirpulseclarity
(o,!Variability!)
AS = mirattackslope KURT =
(od) kurtosis(mp)
o = mironsets
(od,!Detect!,!Yes!)
od = mironsets mp = mirpeak(p) TEMP =
(a,!Detect!,!No!) mirtempo(mp)
p = mirspectrum
(o,!Autocor!)
ART = mirlowenergy pp = mirpeaks(p) HARM = mirpulseclarity
(od,!Threshold!,.5) (pp,!Harmony!)
MAX = MIN = ENTR =
max(p) min(p) entropy(p)
Figure 7: Flowchart of operators of the compound pulse clarity model.
6.1. Model optimizations

Table 1: Majors factors correlating with pulse clarity ratings.
One problem raised by the computational framework presented in
this paper is related to the high number of degrees of freedom rank factors correlation between maximal correlation
that have to be specified when choosing the proper onset detec- factor and ratings with previous factors
tion curve and periodicity evaluation method. In the public ver- 1 MAX .46
sion of the toolbox, default strategies and parameter values will be 2 KURT .46 .3
specified. The choice of these default settings will result from an 3 MIN .44 .5
evaluation of the performance offered by the various approaches. 4 HARM .43 .48
Due to the combinatorial complexity of possible configura- 5 VAR .41 .45
tions, we are designing optimization tool that systematically sam-
ple the set of possible solutions and produce a large number of
flowcharts that are progressively run with musical databases and
compared with ground truth data. The pulse clarity experiment 8. REFERENCES
described in this section is a first attempt towards this goal.
[1] O. Lartillot and P. Toiviainen, “A matlab toolbox for musical
feature extraction from audio,” in Proc. Digital Audio Effects
6.2. Pre-processing of the statistical variables (DAFx-07), Bordeaux, France, Sep. 10-15 2007, pp. 237–
244.
As a prerequisite to the statistical mapping, listeners ratings and
models predictions need to be normalized. The mapping routine [2] J. Foote and M. Cooper, “Media segmentation using self-
mirmap includes an optimization algorithm that automatically similarity decomposition,” in Proceedings of SPIE Stor-
finds optimal Box-Cox transformations [27] of the data ensuring age and Retrieval for Multimedia Databases, 2003, number
that their distributions becomes sufficiently gaussian. 5021, pp. 167–175.
[3] C. Uhle, “Tempo induction by investigating the metrical
structure of music using a periodicity signal that relates
6.3. Results to the tatum period,” Available at http://www.music-
ir.org/evaluation/mirex-results/articles/tempo/uhle.pdf,
The major factors correlating with the ratings are indicated in table accessed March 26, 2008.
1. The best predictor is the global maximum of the autocorrelation
function, with a correlation of .46 with the ratings, followed by the [4] M. Alghoniemy and A. H. Tewfik, “Rhythm and periodicity
kurtosis of the main peak, and by the global minimum. The pulse detection in polyphonic music,” in Proc. IEEE Third Work-
harmonicity factor shows a correlation of .48 with the ratings, but shop Multimedia Sig. Proc., Copenhagen, Denmark, Sep. 13-
is also correlated up to .48 with the other aforementioned factors. 15 1999, pp. 185–190.
The envelope variability factor shows a correlation of .41. Multiple [5] A. Klapuri, “Sound onset detection by applying psychoa-
regressions are being attempted in current works. coustic knowledge,” in Proc. Intl. Conf. on Acoust. Speech
Sig. Proc., Phoenix, Arizona, Mar. 15-19 1999, pp. 3089–
3092.
7. CONCLUSION
[6] E. D. Scheirer, “Tempo and beat analysis of acoustic musical
signals,” J. Acoust. Soc. Am., vol. 103, no. 1, pp. 588–601,
The new version of MIRtoolbox enabling the operations presented
1998.
in this paper will be released during the summer, at the following
address: http://www.jyu.fi/music/coe/materials/mirtoolbox [7] A. Klapuri, A. Eronen, and J. Astola, “Analysis of the me-
DAFX-7
ter of acoustic musical signals,” IEEE Trans. Audio Speech [23] P. Toiviainen and J. S. Snyder, “Tapping to bach: Resonance-
Langage Proc., vol. 14, no. 1, pp. 342–355, 2006. based modeling of pulse,” Music Perception, vol. 21, no. 1,
[8] C. Duxbury, M. Sandler, and M. Davies, “A hybrid approach pp. 43–80, 2003.
to musical note onset detection,” in Proc. Digital Audio Ef- [24] F. Gouyon, S. Dixon, E. Pampalk, and G. Widmer, “Evalu-
fects (DAFx-02), Hamburg, Germany, Sep. 26-28, 2002, pp. ating rhythmic descriptors for musical genre classiÞcation,”
33–38. in Proc. AES Intl. Conf., London, UK, June 17-19 2004, pp.
[9] A. Friberg, E. Schoonderwaldt, and P. N. Juslin, “Cuex: An 196–204.
algorithm for extracting expressive tone variables from audio [25] S. Dixon, E. Pampalk, and G. Widmer, “Classification of
recordings,” Acustica / Acta Acustica, , no. 93, pp. 411–420, dance music by periodicity pattern,” in Proc. Intl. Conf. on
2007. Music Information Retrieval, London, UK, Oct. 26-20 2003,
[10] D. Eck and N. Casagrande, “Finding meter in music using pp. 504–509.
an autocorrelation phase matrix and shannon entropy,” in [26] G. Tzanetakis, G. Essl, and P. Cook, “Human perception
Proc. Intl. Conf. on Music Information Retrieval, London, and computer extraction of musical beat strength,” in Proc.
UK, Sep. 11-15 2005, pp. 504–509. Digital Audio Effects (DAFx-02), Hamburg, Germany, Sep.
[11] G. Tzanetakis and P. Cook, “Musical genre classification of 26-28, 2002, pp. 257–61.
audio signals,” IEEE Trans. Speech Audio Proc., vol. 10, no. [27] G. E. P. Box and D. R. Cox, “An analysis of transformations,”
5, pp. 293–302, 2002. J. Roy. Stat. Soc., , no. 26, pp. 211–246, 1964.
[12] P. Masri, Computer modeling of Sound for Transformation
and Synthesis of Musical Signal, Ph.D. thesis, University of
Bristol, 1996.
[13] M. Alonso, B. David, and G. Richard, “Tempo and beat
estimation of musical signals,” in Proc. Intl. Conf. on Music
Information Retrieval, Barcelona, Spain, Oct. 10-14 2004,
pp. 158–163.
[14] J. P. Bello, C. Duxbury, M. Davies, and M. Sandler, “On
the use of phase and energy for musical onset detection in
complex domain,” IEEE Sig. Proc. Letters, vol. 11, no. 6,
pp. 553–556, 2004.
[15] J. P. Bello, S. Abdallah L. Daudet, C. Duxbury, M. Davies,
and M. Sandler, “A tutorial on onset detection in music sig-
nals,” Tr. Speech Audio Proc., vol. 13, no. 5, pp. 1035–1047,
2005.
[16] T. Tolonen and M. Karjalainen, “A computationally efficient
multipitch analysis model,” IEEE Trans. Speech Audio Proc.,
vol. 8, no. 6, pp. 708–716, 2000.
[17] M. Davies and M. Plumbley, “Comparing mid-level rep-
resentations for audio based beat tracking,” in Proc. Digi-
tal Music Res. Network Summer Conf., Glasgow, July 23-24
2005.
[18] J. J. Burred and A. Lerch, “A hierarchical approach to au-
tomatic musical genre classification,” in Proc. Digital Audio
Effects (DAFx-03), London, UK, Sep. 8-11 2003, pp. 344–
349.
[19] Y. Feng, Y. Zhuang, and Y. Pan, “Popular music retrieval by
detecting mood,” in Proc. Intl. ACM SIGIR Conf. on Res.
Dev. Information Retrieval, Toronto, Canada, Jul. 28-Aug. 1
2003, pp. 375–376.
[20] G. Peeters, “A large set of audio features for sound descrip-
tion (similarity and classification) in the cuidado project (ver-
sion 1.0),” Tech. Rep., Ircam, 2004.
[21] E. Pampalk, A. Rauber, and D. Merkl, “Content-based orga-
nization and visualization of music archives,” in Proc. Intl.
ACM Conf. on Multimedia, 2002, pp. 570–579.
[22] J. C. Brown, “Determination of the meter of musical scores
by autocorrelation,” J. Acoust. Soc. Am., vol. 94, no. 4, pp.
1953–1957, 1993.
DAFX-8

Dafx 2008

Diunggah oleh

Informasi Dokumen

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Dafx 2008

Diunggah oleh

Hak Cipta:

Format Tersedia

Proc. of the 11th Int.

A UNIFYING FRAMEWORK FOR ONSET DETECTION, TEMPO ESTIMATION AND

Olivier Lartillot, Tuomas Eerola, Petri Toiviainen, Jose Fornari∗

further recombined for instance into critical bands1 : 2.3.2. Energy

b = mirspectrum(s,’CriticalBand’, Another strategy consists in computing the (root-)mean-square en-

coefficient value (in Hz)

Figure 1: Multi-pitch extraction from a violin solo recording.

Figure 3: Similarity matrix computed from the frame-decomposed autocorrelation function.

and the onset detection curve can be half-wave rectified as well:

3.1. Articulation 4. TEMPO ESTIMATION

art = mirlowenergy(od) (23) 4.1.1. Fourier transform

Onset curve (Envelope)

Figure 5: Onset detection with determination of the attack phases.

Figure 6: Slope of the attack phases extracted in the previous example.

t = peaks(p,’Total’,3) (44) pp = peaks(p) (53)

SUM = sum(h) (46)  ˛

MIN = min(p) (49) 6. MAPPING MODEL PREDICTIONS TO LISTENERS’

Figure 7: Flowchart of operators of the compound pulse clarity model.

6.1. Model optimizations

Anda mungkin juga menyukai