A Dataset of End-User Reviews from the Apple App Store for Software Evolution Research

Published: 27 October 2025| Version 4 | DOI: 10.17632/75g52mjmpf.4
Contributor:
Mumrez Khan

Description

This dataset provides a comprehensive collection of 158,910 end-user reviews from 84 applications across 17 distinct categories on the Apple App Store (AAS). The dataset was created to address the need for rich, contextual user feedback for research in software engineering, particularly in the areas of requirements engineering, user experience analysis, and software evolution. The data was systematically collected using a custom Python script that leverages the `app_store_scraper` library. Each record in the dataset includes the review title, the full review text, and the user's star rating (1-5), offering a multi-faceted view of user needs, reported issues, and overall sentiment. This resource is intended for researchers, software developers, and educators. It is particularly useful for training and validating machine learning models for natural language processing tasks, performing sentiment analysis, identifying software issues, and extracting functional and non-functional requirements. The dataset's focus on detailed, structured reviews from the AAS makes it a valuable asset for data-driven software improvement and innovation.

Files

Steps to reproduce

The dataset was generated by programmatically extracting user reviews from the Apple App Store. The process is reproducible by following the steps outlined below. **1. Prerequisites** * Ensure you have Python installed on your system. * Install the necessary Python libraries using pip: ```bash pip install app_store_scraper pandas ``` **2. Data Extraction Script** * The following Python script was used to fetch reviews for each application. The script requires the unique App ID, the country code for the App Store, and the desired number of reviews. ```python import pandas as pd from app_store_scraper import AppStore import csv def fetch_reviews(app_id, app_country, num_reviews, csv_filename): """ Fetches reviews for a given app from the Apple App Store and saves them to a CSV file. """ # Initialize the AppStore object app = AppStore(country=app_country, app_name=app_id) # Fetch reviews print(f"Fetching {num_reviews} reviews for app ID: {app_id}...") app.review(how_many=num_reviews) # Prepare data for CSV reviews_data = [] for review in app.reviews: reviews_data.append([ review.get('id', 'N/A'), review.get('userName', 'N/A'), review.get('date', 'N/A'), review.get('title', 'N/A'), review.get('review', 'N/A'), review.get('rating', 'N/A') ]) # Define CSV column headers csv_headers = ['Review ID', 'User Name', 'Date', 'Title', 'Review', 'Rating'] # Write reviews to CSV with open(csv_filename, mode='w', newline='', encoding='utf-8') as file: writer = csv.writer(file) writer.writerow(csv_headers) writer.writerows(reviews_data) print(f"Successfully saved reviews to {csv_filename}") # --- Example Configuration --- # To run this for an app, replace the following values APP_ID = 'id363590051' # Example: Netflix's App ID APP_COUNTRY = 'us' # Example: United States NUM_REVIEWS = 5000 # Number of reviews to fetch CSV_FILENAME = 'App_Store_Reviews.csv' # Fetch and save reviews fetch_reviews(APP_ID, APP_COUNTRY, NUM_REVIEWS, CSV_FILENAME) ``` **3. Execution and Data Consolidation** * To reproduce the full dataset, execute the Python script for each of the 84 applications listed in the associated data paper, replacing the `APP_ID` and `NUM_REVIEWS` variables accordingly. * After running the script for all applications, consolidate the individual CSV files into a single master dataset. * Perform a final data cleaning step to remove any potential duplicates or inconsistencies that may have arisen during the collection process. The final dataset should contain 158,910 unique reviews.

Institutions

  • Xi'an University of Technology

Categories

Computer Science, Software Engineering, Requirement Engineering, Natural Language Processing, Sentiment Analysis

Licence