Amazon movie reviews

Published: 09-08-2020| Version 2 | DOI: 10.17632/kb5nv7dbtm.2
Atiquer Rahman Sarkar


Secondary Data. Primary Data could be found here: File allReviews.csv consists of 7911684 movie reviews from amazon. It has removed the profile name of the reviewer, the review-summary, and the review-text from the primary data. The data span a period of more than 10 years, including all up to October 2012. Each row contains 6 fields: 1. Product ID (e.g. B003AI2VGA) 2. User ID (e.g. A141HP4LYPWMSR) 3. Count of thumb-ups received by this review (e.g. 7) 4. Total thumb count of this review (sum of thumb-ups and thumb-downs, e.g. 7) 5. Given rating in a discrete likert scale of 1 to 5(e.g. 3) 6. Time of the review (unix time: e.g. 1182729600) For example, a sample row from this file is: B003AI2VGA,A1I7QGUDP043DG,8,10,5,1164844800 File Reviewers50plus.csv, contains the user ID of all (16341) the reviewers with more than 50 reviews each. File MovieID177k.csv, contains the product ID of all (177111) the movies reviewed by the reviewers with more than 50 reviews. File Set2userid2000.csv, contains the user ID of 2000 reviewers who have the largest thumb-up to the thumb-down difference from Reviewers50plus.csv. The four files in "Product Ratings" folder contains “product ratings” of all the movies from MovieID177k.csv derived using 4 different techniques. Each file consists of 2 columns: product ID and product rating. The 9 files in "Recommended Experts" folder contains 37 different sets of “recommended expert reviewers”. Each file contains 200 rows of user IDs. Primary data citation: "McAuley, J. J., & Leskovec, J. (2013, May). From amateurs to connoisseurs: modeling the evolution of user expertise through online reviews. In Proceedings of the 22nd international conference on World Wide Web (pp. 897-908)."