Dataset of User Reviews from Low-Rated Applications on the Amazon Appstore

Published: 30 June 2025| Version 1 | DOI: 10.17632/zztwrn36n8.1
Contributor:
nek dil khan

Description

This dataset provides a comprehensive collection of 79,821 end-user reviews from 64 low-rated software applications on the Amazon Appstore. The dataset is specifically curated to focus on applications that exhibit significant user dissatisfaction, making it a valuable resource for studying the root causes of software failure and negative user reception. The applications span 14 distinct categories, offering a broad view of user feedback across different software domains. The data was systematically collected using an automated web scraping tool. The selection criteria targeted applications with a user rating of 3 stars (out of 5) or lower and a minimum of 400 reviews to ensure the data reflects a broad consensus of user opinion. The dataset is provided in CSV format and contains three primary columns for each review: the user's rating ('Stars'), the 'Title_of_Review', and the full text of the 'Base_Review'. This resource is intended for researchers, software developers, requirements engineers, and educators. It is particularly useful for studies in software quality assessment, user experience (UX) analysis, sentiment analysis, issue detection, and for training and validating machine learning and natural language processing (NLP) models. Its unique focus on negative feedback provides an unfiltered and concentrated source of data for understanding why software products fail to meet user expectations.

Files

Steps to reproduce

The dataset was created by following a systematic process of application selection, data extraction, and data cleaning. The workflow can be reproduced using the steps outlined below: 1. Application Identification and Selection Identify a pool of applications on the Amazon Appstore. Apply specific filtering criteria to select the final applications for inclusion. The criteria used for this dataset were: A user rating of 3.0 stars (out of 5) or lower. A minimum of 400 individual user reviews. Compile a final list of applications that meet these requirements. For this dataset, 64 applications were selected. 2. Data Extraction Utilize an automated web scraping tool. This dataset was created using the "Instant Data Scraper" Google Chrome extension. For each selected application, navigate to its user reviews pages on the Amazon Appstore. Configure the scraping tool to extract the following data fields for each individual review: the user's star rating ( Stars), the title of the review (Title_of_Review), and the full review text (Base_Review). Export the scraped data into a spreadsheet format, such as CSV or XLSX. 3. Data Verification and Cleaning Manually cross-check a sample of the extracted data against the live Amazon Appstore listings to verify accuracy. Consolidate the data scraped from all applications into a single master dataset. Process the dataset to remove any duplicate entries, formatting inconsistencies, or irrelevant artifacts from the scraping process. 4. Manual Categorization Manually review the primary function of each of the 64 selected software applications. Assign each application to one of the 14 predefined functional categories outlined in the associated data paper. The final output of this reproducible process is the single, cleaned CSV file contained in this repository.

Institutions

Beijing University of Technology

Categories

Computer Science, Software Engineering

Licence