IndicLegalQA Dataset

Published: 16 October 2024| Version 1 | DOI: 10.17632/gf8n8cnmvc.1
Contributors:
,

Description

The dataset comprises 10,000 question-answer pairs meticulously prepared from a total of 1,256 judgment documents, including 538 criminal and 718 civil cases. Each QA pair is derived from detailed legal judgments from Apex court (i.e. Supreme Court of India), with the questions framed to capture essential legal issues, principles, or facts, and answers extracted directly from the text. The dataset covers a balanced mix of legal topics in criminal and civil law, such as constitutional matters, property disputes, criminal offenses, and procedural matters. Additionally, it includes metadata such as case names, judgment dates.

Files

Steps to reproduce

The dataset was created by collecting 1,256 Indian court judgments, extracting text using Large Language Models (LLMs), generating question-answer pairs with legal expert review with added metadata, and categorized by legal domain. NLP tools and annotation platforms facilitated preprocessing.

Institutions

National Institute of Technology Srinagar

Categories

Information Retrieval, Intelligent Information Retrieval, Answer Extraction, Legal Studies, Legal Interpretation

Licence