Hindi Verse Dataset

Published: 27 April 2022| Version 1 | DOI: 10.17632/cp6htsbbpp.1
Dr Milind Audichya, Jatinderkumar Saini


This dataset was constructed by collecting numerous Hindi verses. Between December 2017 and January 2021, 3330 Unicode Transformation Format (UTF-8) based text data was collected and stored in Tab Separated Value (tsv) Files. It is divided into two sections. The raw data is in the first section. The second section contains the analyzed data, which is categorized using a specific automatic metadata generator based on Hindi verse writing norms. The raw data and the analyzed data are stored in separate folders. The readme.txt file contains additional information regarding file naming conventions.



Linguistics, Literature, Computer Science, Computational Linguistics, Natural Language Processing, Metadata, Hindi Language, Indian Literature