Assamese Authorship Attribution Language Corpora-AAALC
Published: 5 September 2024| Version 2 | DOI: 10.17632/m24w2n2jtb.2
Contributor:
Smriti Priya MedhiDescription
The dataset includes both modern and historical texts, offering a broad and diverse range of content. It features literary works such as novels, articles, and short stories by 16 prominent poets, writers, and novelists from the Assamese community.
Files
Steps to reproduce
The data was gathered from multiple online repositories in electronic format. Python libraries along with OCR technologies were implemented to convert them from electronic form to machine-readable form. Python programs were written to merge the collection in CSV format and also implement automated annotation of the data.
Institutions
Gauhati University, Assam Don Bosco University
Categories
Natural Language Processing, Statistical Natural Language Processing, Machine Learning, Deep Learning, Statistical Analysis, Applied Machine Learning