User profiles based on user reviews extracted from OpinRank dataset

Published: 1 July 2020| Version 2 | DOI: 10.17632/pjmb4rbdms.2
João Paulo Dias de Almeida


The OpinRank dataset contains hotel reviews and aspect ratings. There are 5 aspects ratings related to hotels: cleanliness, value, service, location and room. The aspect ratings values are on a scale of 1-5. The authors manually created textual queries related to each aspect rating. These queries were based on real queries made by users in popular search engines, so they reflect a natural user query. For example, the query "great location" is related to the aspect rating "location". Given this query, the dataset lists the aspect rating value of each hotel. The rating values are given by users from TripAdvisor when evaluating the hotels they have visited. In essence, the OpinRank dataset contains five hotels aspects, each aspect is related to user queries, and one aspect rating value is related to each hotel. In addition, this dataset contains hotel profiles composed of unlabeled reviews made by users who have visited the respective hotel.


Steps to reproduce

User profiles are used to describe the user preference, using his/her past reviews to indicate the best item for him/her. Each hotel has five aspect ratings in the Opinrank dataset: cleanliness, room, service, location, and value. We build one user profile for each aspect rating to simulate a real user profile. Each user profile consists of twenty user reviews and their respective label: 0 for a bad review (negative class), and 1 for a good review (positive class). For instance, when building the "service" aspect rating related profile, a specialist selected ten reviews about the hotels with the highest "service" aspect rating value and another ten about the hotels with the lowest. This way, the "service" aspect rating related profile represents a user who has visited hotels with good and bad services and commented about them on TripAdvisor.


Universidade Federal da Bahia


Information Retrieval, Classification (Machine Learning), Design for Personalization