Event Extraction Andersen's Fairy Tales

Published: 25 April 2024| Version 1 | DOI: 10.17632/22v3kcgks3.1
Erna Daniati


This dataset is the result of extraction from fairy tales by Hans Cristians Andersen. This fairy tale was taken from the official Gutenberg website and then carried out an extraction process. Fairy tales are extracted into several sentences and their entity domains. Apart from that, it is also extracted into the number of words and sentences. This dataset is of type json with attributes, title, number of sentences, number of words, and events.


Steps to reproduce

1. Retrieve data from the Gutenberg repository with the following link: https://www.gutenberg.org/cache/epub/1597/pg1597.txt 2. Calculate the number of words and sentences. 4. Identify the events in the fairy tale. 5. Identify the entity domain in the event. 6. Determine the entities involved in the fairy tale events.


Universitas Negeri Malang


Natural Language Processing, Text Extraction