Chroma Toolbox: Pitch, Chroma, CENS, CRP

The Chroma Toolbox has been developed by Meinard Müller and Sebastian Ewert. It contains MATLAB implementations for extracting various types of novel pitch-based and chroma-based audio features. The MATLAB implementations provided on this website are published under the terms of the General Public License (GPL). A general overview of the chroma toolbox is given in [1].

If you publish results obtained using these implementations, please cite [1]. For technical details on the features please cite [2], [3], [4], [5].

Description of Pitch, Chroma, CENS, CRP features

Chroma-based audio features have turned out to be a powerful tool for various analysis tasks in Music Information Retrieval including task such as chord labeling, music summarization, structure analysis, music synchronization and audio alignment. A 12-dimensional chroma feature encodes the short-time energy distribution of the underlying music signals over the twelve chroma bands, which correspond to the twelve traditional pitch classes of the equal-tempered scale encoded by the attributes C,C#,D,D#,...,B. Such features strongly correlate to the harmonic progression of the music signal, often prominent in Western music. By identifying spectral components that differ by a musical octave, chroma features possess a significant degree of robustness to changes in timbre and instrumentation.

The Chroma Toolbox contains MATLAB implementations for the extraction of various musically meaningful features from waveform based audio signals. In particular, it contains feature extractors for pitch features as well as parameterized families of variants of chroma-like features. We quickly describe these features and refer to the literature for details. A general overview of the features and the toolbox is given in [1].

Pitch features: In a first step, we decompose a given audio signal into 88 frequency bands with center frequencies corresponding to the pitches A0 to C8 (MIDI pitches p = 21 to p = 108). In our implementation, the decomposition is realized by a suitable multirate filter bank consisting of elliptic filters. Then, for each subband, we compute the short-time mean-square power (STMSP). The features measure the local energy content of each pitch subband and indicate the presence of certain musical notes within the audio signal.
MATLAB Function: audio_to_pitch_via_FB.m
Literature: [2]
CP and CLP features: Chroma-based features represent the short-time energy of the signal in each of the 12 pitch classes. Often these chroma features are computed by suitably pooling spectral coefficients obtained from a short-time Fourier transform. For our CP features, we use the pitch decomposition to derive chroma features by suitably summing up the pitch subbands that correspond to the same chroma. For example, to compute the chroma A, we add up the STMSPs of the pitches A0,A1,. . .,A7. The resulting chroma vectors can then be normalized with respect to different norms. Furthermore, to account for the logarihmic sensation of sound by the human auditory system one can apply a logarithmic compression of the pitch features before the chroma computation. The resulting features are refered to as CLP features.
MATLAB Function: pitch_to_chroma.m
Literature: [2]
CENS features: Adding a further degree of abstraction by considering short-time statistics over energy distributions within the chroma bands, one obtains CENS (Chroma Energy Normalized Statistics) features, which constitute a family of scalable and robust audio features. CENS features, which have first been introduced in [3], strongly correlate to the short-time harmonic content of the underlying audio signal and absorb variations of properties such as dynamics, timbre, articulation, execution of note groups, and temporal micro-deviations. Furthermore, because of their low temporal resolution, CENS features can be processed efficiently.
MATLAB Function: pitch_to_CENS.m
Literature: [2] [3]
CRP features: To boost the degree of timbre invariance, a novel family of chroma-based CRP audio features has been introduced in [4]. The general idea is to discard timbre-related information similar to that expressed by certain mel-frequency cepstral coefficients (MFCCs). More precisely, recall that the mel-frequency cepstrum is obtained by taking a discrete cosine transform (DCT) of a log power spectrum on the logarithmic mel scale. A generally accepted observation is that the lower MFCCs are closely related to the aspect of timbre. Therefore, intuitively spoken, one should achieve some degree of timbre-invariance when discarding exactly this information. Combining this idea with the concept of chroma features, one first replaces the nonlinear mel scale with a nonlinear pitch scale and then applies a DCT on the logarithmized pitch representation to obtain pitch-frequency cepstral coefficients (PFCCs). Then one only keeps the upper coefficients, applies an inverse DCT, and finally projects the resulting pitch vectors onto 12-dimensional chroma vectors. These vectors are referred to as CRP (Chroma DCT-Reduced log Pitch) features.
MATLAB Function: pitch_to_CRP.m
Literature: [4] [5]

MATLAB Code

The MATLAB implementations provided on this website are published under the terms of the General Public License (GPL), version 2 or later. If you publish results obtained using these implementations, please cite the references below.

Download Chroma Toolbox (Version 2.0. Last update: 2011-08-31): [zip]

The ZIP file contains the following MATLAB files and folders:

wav_to_audio.m
Converts WAV (any sampling rate) to WAV (mono, 22050 Hz).
estimateTuning.m
Estimates a deviation from the standard tuning for a given audio recording.
audio_to_pitch_via_FB.m
Converts WAV (mono, 22050 Hz) to pitch representation and stores the result (MAT file) into the folder data_feature.
pitch_to_chroma.m, pitch_to_CENS.m, pitch_to_CRP.m
Converts pitch representation into chroma, CENS, and CRP representations. To this end, the pitch representations are loaded from the folder data_feature.
normalizeFeature.m, smoothDownsampleFeature.m
Post-processing of features: normalization, smoothing, and downsampling.
visualizePitch.m, visualizeChroma.m, visualizeCRP.m
Visualizes pitch and chroma representations.
demoChromaToolbox.m, test_convert_audio_to_pitch.m, test_convert_pitch_to_chroma.m, test_convert_pitch_to_CENS.m, test_convert_pitch_to_CRP.m
Test and example functions.
generateMultiratePitchFilterbank.m
Designs the six filterbanks used by audio_to_pitch_via_FB.m.
MIDI_FB_ellip_pitch_60_96_22050_Q25, ...
Various pre-computed multirate pitch filter banks needed in audio_to_pitch_via_FB.m.
data_WAV
Folder containing examples for WAV files.
data_feature
Folder containing pitch representations.

Important Notes:

For the Chroma Toolbox the MATLAB Signal Processing Toolbox is required.
The implementations have been tested using MATLAB 2008b or newer.
To try out the code, one simply has to execute demoChromaToolbox.m. As an alternative, we provide test files starting with test_, that illustrate batch processing and highlight individual feature extractors. Note that one first has to execute the file test_convert_audio_to_pitch.m before one can start the other test files, since the pitch representations have to be pre-computed and stored prior to any chroma computations.

References

[1]: Meinard Müller and Sebastian Ewert
Chroma Toolbox: MATLAB Implementations for Extracting Variants of Chroma-Based Audio Features
Proceedings of the International Conference on Music Information Retrieval (ISMIR), 2011.
[bib] [pdf]
[2]: Meinard Müller
Information Retrieval for Music and Motion
Monograph, Springer, 2007.
ISBN: 978-3-540-74047-6
[bib] [link]
[3]: Meinard Müller, Frank Kurth, and Michael Clausen
Audio matching via chroma-based statistical features.
Proceedings of the International Conference on Music Information Retrieval (ISMIR), pp. 288-295, 2005.
[bib] [pdf]
[4]: Meinard Müller, Sebastian Ewert, and Sebastian Kreuzer
Making chroma features more robust to timbre changes.
Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Taipei, Taiwan, pp. 1869-1872, 2009.
[bib] [pdf]
[5]: Meinard Müller and Sebastian Ewert
Towards timbre-invariant audio features for harmony-based music.
IEEE Transactions on Audio, Speech, and Language Processing, vol. 18, no. 3, pp. 649–662, 2010.
[bib] [link]