Assamese Authorship Attribution Language Corpora-AAALC

Published: 5 September 2024| Version 2 | DOI: 10.17632/m24w2n2jtb.2
Contributor:
Smriti Priya Medhi

Description

The dataset includes both modern and historical texts, offering a broad and diverse range of content. It features literary works such as novels, articles, and short stories by 16 prominent poets, writers, and novelists from the Assamese community.

Files

Steps to reproduce

The data was gathered from multiple online repositories in electronic format. Python libraries along with OCR technologies were implemented to convert them from electronic form to machine-readable form. Python programs were written to merge the collection in CSV format and also implement automated annotation of the data.

Institutions

Gauhati University, Assam Don Bosco University

Categories

Natural Language Processing, Statistical Natural Language Processing, Machine Learning, Deep Learning, Statistical Analysis, Applied Machine Learning

Licence