Data Mining Tokoh Politik Indonesia Berdasarkan Artikel Wikipedia Dalam Membangun Data Semantik

Published: 9 May 2026| Version 1 | DOI: 10.17632/pyhytzgrmm.1
Contributor:
Stefanie Laksmi Lakshita

Description

This dataset contains 470 entries of relation extraction results between Indonesian political entities, obtained from Indonesian-language Wikipedia articles using a Rule-Based Named Entity Recognition approach based on regular expressions (regex). Each entry represents a single structured relation triplet in the format ⟨source, relasi, target⟩, accompanied by a relation type category and a context sentence as linguistic evidence from the source text. The dataset consists of six columns: index as a sequential identifier, source containing the name of the political figure as the origin entity, target containing the destination entity in the form of a position, institution, party, or another figure, relasi containing the relationship label in natural language, tipe categorizing the relation into five groups namely JABATAN (position), ORGANISASI (organization), KEMITRAAN (partnership), KOALISI (coalition), and KELUARGA (family), and konteks storing the original sentence from which the relation was extracted. The JABATAN relation type dominates the entire dataset, reflecting the characteristic of Wikipedia texts that extensively discuss the career history and formal positions of political figures. This dataset can be utilized for knowledge graph construction, Indonesian relation extraction model training, and systematic political network analysis.

Files

Steps to reproduce

Apply relation extraction rules to identify semantic relationships between pairs of identified entities. Each valid relationship is recorded as a structured triplet consisting of source (origin entity), relasi (relationship label), and target (destination entity).

Institutions

Categories

Political Science, Computational Linguistics, Data Mining, Information Extraction, Social Network Analysis, Data Analysis, Text Mining

Licence