IDTheftCase-JudgmentCorpus: Indonesian Theft Case Judgment Corpus - Levels of Court
Description
IDTheftCase-JudgmentCorpus: Indonesian Theft Case Judgment Corpus – Levels of Court is a dataset containing the full-text documents of written judgments handed down by Indonesian courts in criminal theft cases at three levels: the court of first instance, the appellate court, and the cassation court. The dataset was created to support research and development activities in information extraction and natural language processing, specifically about the processing and understanding the legal texts and court documents. The dataset includes the full text of judgments with information about defendants, judges, types of punishment, hearing dates, and other relevant data for analysis. The dataset is organized into several files: 1. Manually Annotated Judgment Documents (annotated) • Json Files: o 1-pertama.json: Annotated judgments from the court of first instance. o 2-banding.json: Annotated judgments from the appellate court. o 3-kasasi.json: Annotated judgments from the cassation court. • CSV Files: o 1-pertama.csv: Tokenized version of the first-instance court judgments from the related JSON file, with corresponding BIO tagging for each token. o 2-banding.csv: Tokenized version of the appellate court judgments from the related JSON file, with corresponding BIO tagging for each token. o 3-kasasi.csv: Tokenized version of the cassation court judgments from the related JSON file, with corresponding BIO tagging for each token. 2. Non-Annotated Judgment Documents • 1-pertama-not-annotated.json: Non-annotated judgments from the court of first instance. 3. Metadata File metadata.csv : Contains contextual and hierarchical information about the judgment documents, structured into the following columns: • Id: Unique case identifier. • Id Putusan (Decision ID): Original ID of the document, distinguishing records from the Supreme Court’s website. • Tingkat Proses (Process Level): Indicates the court level (First Instance, Appellate, Cassation). • Id Pertama (First-Instance ID): Related first-instance document ID. • Id Banding (Appeal ID): Related appellate document ID. • Id Kasasi (Cassation ID): Related cassation document ID. All documents in this dataset were obtained from public records on the official website of the Supreme Court of the Republic of Indonesia (https://putusan3.mahkamahagung.go.id/). As such, the dataset represents real-world cases and reflects the legal form of Indonesian court documents. IDTheftCase-JudgmentCorpus is an essential dataset for research in named entity recognition and extraction, punishment imposition pattern analysis, and automatic document classification in the Indonesian legal context. Moreover, the dataset is useful for developers and researchers who aim to build and implement machine learning-based models to extract, group, and analyze judgment documents at different court levels.
Files
Categories
Funding
Direktorat Riset Dan Pengabdian Kepada Masyarakat
0459/E5/PG.02.00/2024