Enriched Tourism Dataset Paris (POIs)

Published: 30 October 2024| Version 1 | DOI: 10.17632/vh4g4g2322.1
Contributors:
Ramon Hermoso,
,

Description

This dataset contains the Paris subset of the Tourpedia dataset, specifically focusing on points of interest (POIs) categorized as attractions (dataset available at http://tour-pedia.org/download/paris-attraction.csv). The original dataset comprises 4,351 entries that encompass a variety of attractions across Paris, providing details on several attributes for each POI. These attributes include a unique identifier, POI name, category, location information (address), latitude, longitude, specific details, and user-generated reviews. The review fields contain textual feedback from users, aggregated from platforms such as Google Places, Foursquare, and Facebook, offering a qualitative insight into each location. However, due to the initial dataset's high proportion of incomplete or inconsistently structured entries, a rigorous cleaning process was implemented. This process entailed the removal of erroneous and incomplete data points, ultimately refining the dataset to 477 entries that meet criteria for quality and structural coherence. These selected entries were subjected to further validation to ensure data integrity, enabling a more accurate representation of Paris' attractions. - Paris.csv It contains columns including a unique identifier, POI name, category, location information (address), latitude, longitude, specific details, and user-generated reviews. Those reviews have been previously retrieved and pre-processed from Google Places, Foursquare, and Facebook, and have different formats: all words, only nouns, nouns + verbs, noun + adjectives and nouns + verbs + adjectives. - Paris_annotated.csv It contains the ground truth relating to the previous dataset, with manual annotations made by humans on the categorisation of each of the POIs into 12 different pre-defined categories. It has the following columns: * POI name * POI's address * One column for each of the above categories. 1 means that the POI belongs to the category while blank indicates that it does not.

Files

Institutions

Universidad de Zaragoza

Categories

Tourism, Data Analysis, France, Recommendation System

Funding

Agencia Estatal de Investigación

PID2020-113037RB-I00

Gobierno de Aragón

T64_23R

Licence