BV-Class
Description
This dataset contains 30001 Bangla sentences, each annotated with a grammatical verb class: Transitive (1634 sentences) or Intransitive (1,367 sentences). The sentences were manually collected and labeled by native Bangla speakers with linguistic expertise to ensure accuracy. Each entry consists of a raw Bangla sentence and its verb classification. -------Value of the Data------- -First publicly available dataset for Bangla verb transitivity classification. -Supports research in syntactic parsing, semantic role labeling, machine translation, and verb-based NLP tasks. -Balanced distribution ensures unbiased benchmarking of machine learning models.
Files
Steps to reproduce
The following pipeline was used to prepare the dataset from raw collected sentences: 1.Data Collection -Sentences were sourced from textbooks, storybooks, online Bangla articles, and conversational data. -Each sentence was reviewed to ensure it contained a clear main verb. 2. Annotation -Each sentence was manually labeled as Transitive or Intransitive: -Transitive verbs require an object to complete their meaning (e.g., “আমি বই পড়ি।” → পড়া = Transitive). Intransitive verbs do not require an object (e.g., “সে ঘুমায়।” → ঘুমানো = Intransitive). -At least two annotators cross-checked each label for consistency. 3. Cleaning & Standardization -Removed null values, duplicates, punctuation noise, numbers, and special characters. -Retained only grammatically valid Bangla sentences. -Verified class balance (dataset is ~50/50). 4. Final Dataset -Exported into .xlsx (Excel) and .csv (comma-separated) formats. -Final size: 2,616 sentences with balanced verb classes.