Dataset Question Answering for Admission of Higher Education Institution

Published: 26 September 2023| Version 2 | DOI: 10.17632/jc4df8srcb.2
Contributors:
Emny Yossy,
,
,

Description

The data collection process commenced with web scraping of a selected higher education institution's website, collecting any data that relates to the admission topic of higher education institutions, during the period from July to September 2023. This resulted in a raw dataset primarily cantered around admission-related content. Subsequently, meticulous data cleaning and organization procedures were implemented to refine the dataset. The primary data, in its raw form before annotation into a question-and-answer format, was predominantly in the Indonesian language. Following this, a comprehensive annotation process was conducted to enrich the dataset with specific admission-related information, transforming it into secondary data. Both primary and secondary data predominantly remained in the Indonesian language. To enhance data quality, we added filters to remove or exclude: 1) data not in the Indonesian language, 2) data unrelated to the admission topic, and 3) redundant entries. This meticulous curation has culminated in the creation of a finalized dataset, meticulously prepared and now readily available for research and analysis in the domain of higher education admission.

Files

Institutions

Bina Nusantara University

Categories

Computer Science, Education, Marketing, Natural Language Processing

Licence