KritiSamhita: South Indian Music Tonic Recognition Dataset (Audio)

Published: 11 April 2024| Version 2 | DOI: 10.17632/nkdm57hvw3.2
Samhita Konduri, Kriti Pendyala,


The term ”kriti” stands for ”creation” or ”work” in Sanskrit and designates the primary method of musical composition in Carnatic music. It is a song with a particular arrangement that is a main feature in Carnatic music performances. Kriti is the fundamental unit of a Carnatic music performance in terms of musical form. The Sanskrit word, Samhita means a collection and expresses the idea of things being compiled or assembled following a particular methodology. KritiSamhita is therefore a systematic collection of vocal compositions that the dataset represents, classified by the music tonics of the compositions. There are currently a limited number of Indian classical music datasets, especially those large enough and with useful annotations for training classification or prediction models. The tonic pitch, or base pitch, plays an important role in music, so much that it is sometimes called the keynote. The vocalists and the accompanying instrumental ensemble are fine-tuned to this keynote to render the composition. However, unlike in the western music where the tonic for a composition is predetermined, in Indian Classical Music (ICM), the lead artist or the vocalist chooses the tonic and other artists attune their rendition to the tonic. To create a large dataset with useful tonic annotations, we compiled these data. This dataset contains snippets covering four different tonics, ranging from F# to A, which are the most commonly used tonics. The dataset contains two main files: "", a ZIP file containing the 20 second audio snippets, and "Carnatic_Dataset.csv", a CSV file containing the metadata and tonic annotations. Unzipping the ZIP file extracts a folder containing 4 sub folders for each of the 4 tonics represented in the dataset, labeled with the tonic name, such as "F#". Each of the 4 tonic folders contains 20 second snippets of songs in South Indian classical, or Carnatic, music. They are named as follows: {songName}_{tonic}_chunk{number}.mp3. For example, the first snippet of the song Raravenu, recorded in tonic F#, is named "Raravenu_F#_chunk0.mp3". The CSV file contains each snippet's file path and tonic annotation. Continuing the above example, an entry in the CSV file has "Carnatic_Dataset_Snippets/F#/Raravenu_F#_chunk0.mp3" in the first cell, and "F# Scale (4.5 Kattai)" in the second. (Kattai is the South Indian classical music term for tonic, and we included kattai conversions in the annotations as well.) In total, there are 300 snippets in F#, 207 in G, 240 in G#, and 280 in A. This dataset can be reused by other researchers to train their classification/prediction models and other automated systems to predict and leverage tonic. These data can also be valuable in helping new or experienced music learners test their pitch and find which tonic works best for them.


Steps to reproduce

To create a large dataset with useful tonic annotations, we compiled these data. The second and third authors of this paper, who are vocalists themselves, recorded songs in four different tonics: F#, G, G#, and A. Using the Python library pydub, we segmented each 3+ minute song into 20 second snippets, including the remainder as a separate snippet. The raw audio snippet data is available in folders separated by tonic, and a directory contains each snippet’s file path and tonic. The code can be found here:


Music, Multimedia, Artificial Intelligence in Music, Deep Learning