IndicLegalQA Dataset
The dataset comprises 10,000 question-answer pairs meticulously prepared from a total of 1,256 judgment documents, including 538 criminal and 718 civil cases. Each QA pair is derived from detailed legal judgments from Apex court (i.e. Supreme Court of India), with the questions framed to capture essential legal issues, principles, or facts, and answers extracted directly from the text. The dataset covers a balanced mix of legal topics in criminal and civil law, such as constitutional matters, property disputes, criminal offenses, and procedural matters. Additionally, it includes metadata such as case names, judgment dates.
Steps to reproduce
The dataset was created by collecting 1,256 Indian court judgments, extracting text using Large Language Models (LLMs), generating question-answer pairs with legal expert review with added metadata, and categorized by legal domain. NLP tools and annotation platforms facilitated preprocessing.