Manually Curated Ethiopian Family Code QA Dataset

Published: 20 September 2024| Version 1 | DOI: 10.17632/c46kvgr9kw.1
Contributors:
Beimnet Bekele Guta,

Description

This dataset contains collection of question-and-answer pairs that have been manually generated from the revised family code of Ethiopia. The data generation process involves a review of each article of the family code of Ethiopia and generating questions and their answer for those question from the article they were extracted from. After the extraction each question-and-answer pair each of them was reviewed by people with domain knowledge to ensure the accuracy of them. Moreover, there was a second-round review to ensure the meaning accuracy of each pair. This dataset is created to fine tune llama-2 model to create a model that will be able to answer questions related with the revised family code of Ethiopia without utilization of any back-and-forth translation.

Files

Institutions

Addis Ababa Institute of Technology

Categories

Natural Language Processing, Machine Learning, Llama, Deep Learning, Large Language Model

Licence