343
ISMIR 2008 – Session 3a – Content-Based Retrieval, Categorization and Similarity 1
Lidy et al. [6] discuss a genre classification experiment Manaris et al. [8] report various classification
utilizing both timbre and melodic features. They utilize experiments with melodic features based on power laws.
the typical timbre measures obtained through spectral These studies include composer identification with 93.6%
analysis. They also employ a music transcription system to 95% accuracy. They also report an experiment using
they developed to generate MIDI representations of audio emotional responses from humans. Using a corpus of 210
music files. From these files, they extract 37 different music excerpts in a 12-fold cross-validation study,
features, involving attributes of note pitches, durations and artificial neural networks (ANNs) achieved an average
non-diatonic notes. The combined timbre and melodic success rate of 97.22% in predicting (within one standard
features are used to conduct genre classification deviation) human emotional responses to those pieces.
experiments. They report classification accuracies with
combined feature sets ranging from 76.8% to 90.4% using 3. MELODIC FEATURE EXTRACTION
standard benchmarks (e.g., ISMIR 2004 audio data).
As mentioned earlier, we employ hundreds of power-law
Cano et al. [4] report on a music recommendation
metrics that calculate statistical proportions of music-
system called MusicSurfer. The primary dimensions of
theoretic and other attributes of pieces.
similarity used are timbre, tempo and rhythm patterns.
Using a corpus of 273,751 songs from 11,257 artists their 3.1. Melodic Metrics
system achieves an artist identification rate of 24%. On We have defined 14 power-law metrics related to
the ISMIR 2004 author identification set they report a proportion of pitch, chromatic tone, duration, distance
success rate of 60%, twice as high as that of the next best between repeated notes, distance between repeated
systems. MusicSurfer has a comprehensive user interface durations, melodic and harmonic intervals, melodic and
that allows users to find songs based on artist, genre, and harmonic consonance, melodic and harmonic bigrams,
similarity among other characteristics, and allows chords, and rests [9]. Each metric calculates the rank-
selection of different types of similarity. However, it does frequency distribution of the attribute in question, and
not include melodic features. returns two values:
• the slope of the trendline, b (see equation 2), of the
2.1. Power Laws and Music Analysis
rank-frequency distribution; and
Our music recommender system prototype employs both
• the strength of the linear relation, r2.
melodic and timbre features based on power-laws.
We also calculate higher-order power law metrics. For
Although power laws, fractal dimension, and other self- each regular metric we construct an arbitrary number of
similarity features have been used extensively in higher-order metrics (e.g., the difference of two pitches,
information retrieval (e.g., [5]), to the best of our
the difference of two differences, and so on), an approach
knowledge, there are no studies of content-based music
similar to the notion of derivative in mathematics.
recommendation systems utilizing power law similarity
Finally, we also capture the difference of an attribute
metrics.
value (e.g., note duration) from the local average. Local
A power law denotes a relationship between two
variability, locVar[i], for the ith value is
variables where one is proportional to a power of the
other. One of the most well-known power laws is Zipf’s locVar[i] = abs(vals[i] – avg(vals, i)) / avg(vals, i) (3)
law:
where vals is the list of values, abs is the absolute value,
P(f) ~ 1 / f n (1) and avg(vals, i) is the local average of the last, say, 5
where P(f) denotes the probability of an event of rank f, values. We compute a local variability metric for each of
and n is close to 1. Zipf’s law is named after George the above metrics (i.e., 14 regular metrics x the number of
Kingsley Zipf, the Harvard linguist who documented and higher-order metrics we decide to include).
studied natural and social phenomena exhibiting such Collectively, these metrics measure hierarchical aspects
relationships [18]. The generalized form is: of music. Pieces without hierarchical structure (e.g.,
aleatory music) have significantly different measurements
P(f) ~ a / f b (2)
than pieces with hierarchical structure and long-term
where a and b are real constants. This generalized form is dependencies (e.g., fugues).
known as the Zipf-Mandelbrot law, after Benoit
Mandelbrot. 3.2. Evaluation
Evaluating content-based music features through
Numerous empirical studies report that music exhibits
classification tasks based on objective descriptors, such as
power laws across timbre and melodic attributes (e.g., [8,
artist or genre, is recognized as a simple alternative to
16, 18]). In particular, Voss and Clarke [16] demonstrate
listening tests for approximating the value of such features
that music audio properties (e.g. loudness and pitch
for similarity prediction [7, 11].
fluctuation) exhibit power law relationships. Using 12
We have conducted several genre classification
hours worth of radio recordings, they show that power
experiments using our set of power-law melodic metrics
fluctuations in music follow a 1/f distribution.
to extract features from music pieces. Our corpus
344
ISMIR 2008 – Session 3a – Content-Based Retrieval, Categorization and Similarity 1
345
ISMIR 2008 – Session 3a – Content-Based Retrieval, Categorization and Similarity 1
346
ISMIR 2008 – Session 3a – Content-Based Retrieval, Categorization and Similarity 1
347
ISMIR 2008 – Session 3a – Content-Based Retrieval, Categorization and Similarity 1
based on power laws. Experimental results suggest that Proceedings of the 2001 IEEE International
power-law metrics are very promising for content-based Conference on Multimedia and Expo (ICME ‘01),
music querying and retrieval, as they seem to correlate Tokyo (Japan ), pp. 745-748.
with aspects of human emotion and aesthetics. Power-law
[8] Manaris, B., Romero, J., Machado, P., Krehbiel, D.,
feature extraction and classification may lead to
Hirzel, T., Pharr, W., and Davis, R.B. (2005), “Zipf’s
innovative technological applications for information
Law, Music Classification and Aesthetics”,
retrieval, knowledge discovery, and navigation in digital
Computer Music Journal, 29(1), pp. 55-69.
libraries and the Internet.
[9] Manaris, B., Roos, P., Machado, P., Krehbiel, D.,
8. ACKNOWLEDGEMENTS Pellicoro, L., and Romero, J. (2007). “A Corpus-
based Hybrid Approach to Music Analysis and
This work is supported in part by a grant from the
Composition”, in Proceedings of the 22nd Conference
National Science Foundation (#IIS-0736480) and a
on Artificial Intelligence (AAAI-07), Vancouver, BC,
donation from the Classical Music Archives
pp. 839-845.
(http://www.classicalarchives.com/). The authors thank
Brittany Baker, Becky Buchta, Katie Robinson, Sarah [10] Pampalk, E. Flexer, A., and Widmer, G. (2005).
Buller, and Rondell Burge for assistance in conducting the “Improvements of Audio-Based Music Similarity and
evaluation experiment. Any opinions, findings and Genre Classification”, in Proceedings of the 6th
conclusions or recommendations expressed in this International Conference on Music Information
material are those of the author(s) and do not necessarily Retrieval (ISMIR 2005), London, UK.
reflect the views of the sponsors.
[11] Pampalk, E. (2006). “Audio-Based Music Similarity
and Retrieval: Combining a Spectral Similarity
9. REFERENCES
Model with Information Extracted from Fluctuation
[1] Aucouturier, J.-J. and Pachet, F. (2004). “Improving Patterns”, 3rd Annual Music Information Retrieval
Timbre Similarity: How High Is the Sky?”, Journal eXchange (MIREX'06), Victoria, Canada, 2006.
of Negative Results in Speech and Audio Sciences,
[12] Salingaros, N.A. and West, B.J. (1999). “A
1(1), 2004.
Universal Rule for the Distribution of Sizes”,
[2] Barrett, L. F. and Russell, J. A. (1999). “The Environment and Planning B: Planning and Design,
Structure of Current Affect: Controversies and 26, pp. 909-923.
Emerging Consensus”, Current Directions in
[13] Schroeder, M. (1991). Fractals, Chaos, Power Laws:
Psychological Science, 8, pp. 10-14.
Minutes from an Infinite Paradise. W. H. Freeman
[3] Bradley, M.M. and Lang, P.J. (1994). “Measuring and Company.
Emotion: The Self-Assessment Manikin and the
[14] Spehar, B., Clifford, C.W.G., Newell, B.R. and
Semantic Differential”, Journal of Behavioral
Taylor, R.P. (2003). “Universal Aesthetic of
Therapy and Experimental Psychiatry, 25(1), pp. 49-
Fractals.” Computers & Graphics, 27, pp. 813-820.
59.
[15] Tzanetakis, G., Essl, G., and Cook, P. (2001).
[4] Cano, P. Koppenberger, M. Wack, N. (2005). “An
“Automatic Musical Genre Classification of Audio
Industrial-Strength Content-based Music
Signals”, in Proceedings of the 2nd International
Recommendation System”, in Proceedings of 28th
Conference on Music Information Retrieval (ISMIR
Annual International ACM SIGIR Conference,
2001), Bloomington, IN, pp. 205-210.
Salvador, Brazil, p. 673.
[16] Voss, R.F. and Clarke, J. (1978). “1/f Noise in
[5] Faloutsos, C. (2003). “Next Generation Data Mining
Music: Music from 1/f Noise”, Journal of Acoustical
Tools: Power Laws and Self-Similarity for Graphs,
Society of America , 63(1), pp. 258–263.
Streams and Traditional Data”, Lecture Notes in
Computer Science, 2837, Springer-Verlag, pp. 10-15. [17] Walker, E. L. (1973). “Psychological complexity and
preference: A hedgehog theory of behavior”, in
[6] Lidy, T., Rauber, A., Pertusa, A. and Iñesta, J.M.
Berlyne, D, E., & Madsen, K. B. (eds.), Pleasure,
(2007). “Improving Genre Classification by
reward, preference: Their nature, determinants, and
Combination of Audio and Symbolic Descriptors
role in behavior, New York: Academic Press.
Using a Transcription System”, in Proceedings of the
8th International Conference on Music Information [18] Zipf, G.K. (1949), Human Behavior and the
Retrieval (ISMIR 2007), Vienna, Austria, pp. 61-66. Principle of Least Effort, Hafner Publishing
Company.
[7] Logan, B. and Salomon, A. (2001). “A Music
Similarity Function Based on Signal Analysis”, in
348