Kuala Lumpur Travel blogs Dataset

Published: 26 February 2018| Version 1 | DOI: 10.17632/9wb5rv45j5.1
Contributors:
Erum Haris,

Description

This dataset contains three folders: 1) Training: The first sub-folder "raw training files" contains travel text extracted from 36 travel blog posts related to Kuala Lumpur. The second sub-folder "labeled files" consists of .xml version of raw text files containing 500 annotated spatial triplets as "trajector, spatial indicator, landmark" for spatial relation extraction. 2) Testing: The first sub-folder "raw testing files" contains travel text extracted from 10 travel blog posts related to Kuala Lumpur. The second sub-folder "labeled files" is the gold standard for evaluation consists of .xml version of raw text files containing 200 annotated spatial triplets as "trajector, spatial relation, landmark". 3) Related files: This folder contains annotation scheme definition (.xml) for training and testing files.

Files

Categories

Natural Language Processing, Blog, Information Extraction, Travel Behavior

Licence