South Indian Music Tonic Recognition Dataset (Audio)

Published: 10 April 2024| Version 1 | DOI: 10.17632/nkdm57hvw3.1
Contributors:
Samhita Konduri, Kriti Pendyala,

Description

There are currently a limited number of Indian classical music datasets, especially those large enough and with useful annotations for training classification or prediction models. The tonic pitch, or base pitch, plays an important role in music, so much that it is sometimes called the keynote. The vocalists and the accompanying instrumental ensemble are fine-tuned to this keynote to render the composition. However, unlike in the western music where the tonic for a composition is predetermined, in Indian Classical Music (ICM), the lead artist or the vocalist chooses the tonic and other artists attune their rendition to the tonic. To create a large dataset with useful tonic annotations, we compiled these data. This dataset contains snippets covering four different tonics, ranging from F# to A, which are the most commonly used tonics. The dataset contains two main files: "Carnatic_Dataset_Snippets.zip", a ZIP file containing the 20 second audio snippets, and "Carnatic_Dataset.csv", a CSV file containing the metadata and tonic annotations. Unzipping the ZIP file extracts a folder containing 4 sub folders for each of the 4 tonics represented in the dataset, labeled with the tonic name, such as "F#". Each of the 4 tonic folders contains 20 second snippets of songs in South Indian classical, or Carnatic, music. They are named as follows: {songName}_{tonic}_chunk{number}.mp3. For example, the first snippet of the song Raravenu, recorded in tonic F#, is named "Raravenu_F#_chunk0.mp3". The CSV file contains each snippet's file path and tonic annotation. Continuing the above example, an entry in the CSV file has "Carnatic_Dataset_Snippets/F#/Raravenu_F#_chunk0.mp3" in the first cell, and "F# Scale (4.5 Kattai)" in the second. (Kattai is the South Indian classical music term for tonic, and we included kattai conversions in the annotations as well.) In total, there are 300 snippets in F#, 207 in G, 240 in G#, and 280 in A. This dataset can be reused by other researchers to train their classification/prediction models and other automated systems to predict and leverage tonic. These data can also be valuable in helping new or experienced music learners test their pitch and find which tonic works best for them.

Files

Steps to reproduce

To create a large dataset with useful tonic annotations, we compiled these data. The second and third authors of this paper, who are vocalists themselves, recorded songs in four different tonics: F#, G, G#, and A. Using the Python library pydub, we segmented each 3+ minute song into 20 second snippets, including the remainder as a separate snippet. The raw audio snippet data is available in folders separated by tonic, and a directory contains each snippet’s file path and tonic. The code can be found here: https://github.com/Sam-Kon/SouthICMAudioDataset_Code.

Categories

Music, Multimedia, Artificial Intelligence in Music, Deep Learning

Licence