The Chroma Toolbox has been developed by
Meinard Müller
and Sebastian Ewert.
It contains MATLAB implementations for extracting various types of
novel pitch-based and chroma-based audio features. The MATLAB
implementations provided on this website are published under
the terms of the General Public License (GPL).
A general overview of the chroma toolbox is given in
[1].
If you publish results obtained using these implementations,
please cite [1].
For technical details on the features please cite
[2],
[3],
[4],
[5].
Chroma-based audio features have turned out to be a powerful tool for
various analysis tasks in Music Information Retrieval
including task such as
chord labeling, music summarization, structure analysis,
music synchronization and audio alignment.
A 12-dimensional chroma feature encodes the
short-time energy distribution of the underlying music signals over
the twelve chroma bands, which correspond to the twelve traditional pitch
classes of the equal-tempered scale encoded by the attributes
C,C#,D,D#,...,B. Such features strongly correlate to the harmonic
progression of the music signal, often prominent in Western music.
By identifying spectral components that differ by a musical octave,
chroma features possess a significant degree of robustness to changes in
timbre and instrumentation.
The Chroma Toolbox contains MATLAB implementations for the extraction
of various musically meaningful features from waveform based audio signals.
In particular, it contains feature extractors for pitch features as well as
parameterized families of variants of chroma-like features. We quickly
describe these features and refer to the literature for details. A general
overview of the features and the toolbox is given in
[1].
- Pitch features: In a first step, we decompose a given audio
signal into 88 frequency bands with center frequencies corresponding to the
pitches A0 to C8 (MIDI pitches p = 21 to p = 108).
In our implementation, the decomposition is realized by a suitable
multirate filter bank consisting of elliptic filters. Then, for each subband,
we compute the short-time mean-square power (STMSP). The features
measure the local energy content of each pitch subband and indicate the
presence of certain musical notes within the audio signal.
MATLAB Function: audio_to_pitch_via_FB.m
Literature:
[2]
- CP and CLP features: Chroma-based features represent the short-time
energy of the signal in each of the 12 pitch classes. Often these chroma
features are computed by suitably pooling spectral coefficients obtained
from a short-time Fourier transform. For our CP features, we use the
pitch decomposition to derive chroma features by suitably summing up
the pitch subbands that correspond to the same chroma. For example, to
compute the chroma A, we add up the STMSPs of the
pitches A0,A1,. . .,A7.
The resulting chroma vectors can then be normalized with respect to
different norms.
Furthermore, to account for the logarihmic sensation of sound by the human auditory system
one can apply a logarithmic compression of the pitch features before
the chroma computation. The resulting features are refered to as CLP features.
MATLAB Function: pitch_to_chroma.m
Literature:
[2]
- CENS features:
Adding a further degree of abstraction by considering short-time
statistics over energy distributions within the chroma bands, one obtains
CENS (Chroma Energy Normalized Statistics)
features, which constitute a family of scalable and robust audio features.
CENS features, which have first been introduced in
[3],
strongly correlate to the short-time harmonic content of the
underlying audio signal and absorb variations of properties such as dynamics,
timbre, articulation, execution of note groups, and temporal micro-deviations.
Furthermore, because of their low temporal resolution, CENS features can be
processed efficiently.
MATLAB Function: pitch_to_CENS.m
Literature:
[2]
[3]
- CRP features:
To boost the degree of timbre invariance, a novel family of chroma-based
CRP audio features has been introduced in
[4].
The general idea is to discard timbre-related information similar to that
expressed by certain mel-frequency cepstral coefficients (MFCCs).
More precisely, recall that the mel-frequency cepstrum is obtained by
taking a discrete cosine transform (DCT) of a log power spectrum on the
logarithmic mel scale. A generally accepted observation is that the
lower MFCCs are closely related to the aspect of timbre. Therefore,
intuitively spoken, one should achieve some degree of timbre-invariance
when discarding exactly this information.
Combining this idea with the concept of chroma features,
one first replaces the nonlinear mel scale with a nonlinear pitch scale
and then applies a DCT on the logarithmized pitch representation
to obtain pitch-frequency cepstral coefficients (PFCCs).
Then one only keeps the upper coefficients, applies an inverse DCT,
and finally projects the resulting pitch vectors onto
12-dimensional chroma vectors. These vectors are referred to as
CRP (Chroma DCT-Reduced log Pitch) features.
MATLAB Function: pitch_to_CRP.m
Literature:
[4]
[5]
The MATLAB implementations provided on this website are published under
the terms of the General Public License (GPL), version 2 or later. If you publish results obtained using these
implementations, please cite the references below.
Download Chroma Toolbox (Version 2.0. Last update:
2011-08-31): [zip]
The ZIP file contains the following MATLAB files and folders:
- wav_to_audio.m
Converts WAV (any sampling rate) to WAV (mono, 22050 Hz).
- estimateTuning.m
Estimates a deviation from the standard tuning for a given audio recording.
- audio_to_pitch_via_FB.m
Converts WAV (mono, 22050 Hz) to pitch representation and stores
the result (MAT file) into the folder data_feature.
- pitch_to_chroma.m, pitch_to_CENS.m, pitch_to_CRP.m
Converts pitch representation into chroma, CENS, and
CRP representations.
To this end, the pitch representations are loaded from the folder
data_feature.
- normalizeFeature.m, smoothDownsampleFeature.m
Post-processing of features: normalization, smoothing, and downsampling.
- visualizePitch.m, visualizeChroma.m, visualizeCRP.m
Visualizes pitch and chroma representations.
- demoChromaToolbox.m, test_convert_audio_to_pitch.m, test_convert_pitch_to_chroma.m,
test_convert_pitch_to_CENS.m, test_convert_pitch_to_CRP.m
Test and example functions.
- generateMultiratePitchFilterbank.m
Designs the six filterbanks used by audio_to_pitch_via_FB.m.
- MIDI_FB_ellip_pitch_60_96_22050_Q25, ...
Various pre-computed multirate pitch filter banks needed in audio_to_pitch_via_FB.m.
- data_WAV
Folder containing examples for WAV files.
- data_feature
Folder containing pitch representations.
Important Notes:
- For the Chroma Toolbox the MATLAB Signal Processing Toolbox is required.
- The implementations have been tested using MATLAB 2008b or newer.
-
To try out the code, one simply has to execute demoChromaToolbox.m.
As an alternative, we provide test files starting
with test_, that illustrate batch processing and highlight individual feature extractors.
Note that one first has to execute the file
test_convert_audio_to_pitch.m before
one can start the other test files,
since the pitch representations have to be
pre-computed and stored prior to any chroma computations.
- [1]
-
Meinard Müller and Sebastian Ewert
Chroma Toolbox: MATLAB Implementations for Extracting Variants of Chroma-Based Audio Features
Proceedings of the International Conference on Music Information Retrieval (ISMIR), 2011.
[bib] [pdf]
- [2]
-
Meinard Müller
Information Retrieval for Music and Motion
Monograph, Springer, 2007.
ISBN: 978-3-540-74047-6
[bib]
[link]
- [3]
-
Meinard Müller, Frank Kurth, and Michael Clausen
Audio matching via chroma-based statistical features.
Proceedings of the International Conference on Music Information Retrieval (ISMIR), pp. 288-295, 2005.
[bib] [pdf]
- [4]
-
Meinard Müller, Sebastian Ewert, and Sebastian Kreuzer
Making chroma features more robust to timbre changes.
Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Taipei, Taiwan, pp. 1869-1872, 2009.
[bib] [pdf]
- [5]
-
Meinard Müller and Sebastian Ewert
Towards timbre-invariant audio features for harmony-based music.
IEEE Transactions on Audio, Speech, and Language Processing, vol. 18, no. 3, pp. 649–662, 2010.
[bib] [link]