Indonesian Customer Reviews for Food SMEs: Raw Dataset for Text Summarization

Published: 17 April 2026| Version 1 | DOI: 10.17632/j8tzn2cxvm.1
Contributors:
Moch Darul Gusti Alief, Supriyono Supriyono

Description

This dataset contains raw customer reviews collected from various digital platforms, including Google Maps, online marketplaces, and other review portals, specifically targeting Indonesian small and medium-sized food enterprises (SMEs). The dataset is intended for research in Natural Language Processing (NLP), particularly for text summarization tasks using models such as IndoBERT. It includes reviews in Bahasa Indonesia that reflect customer opinions on product quality, service experience, and overall satisfaction. The data has been anonymized to protect personal information and is provided in a clean CSV format suitable for preprocessing and model training. Researchers and practitioners can use this dataset to: Develop extractive or abstractive text summarization models. Conduct sentiment analysis or other NLP tasks. Evaluate the performance of language models on Indonesian customer-generated text. Dataset Details: Language: Indonesian Format: CSV Number of Reviews: [insert number of rows in your CSV] Fields Included: review_text, rating (optional), source_platform (optional) Usage License: [e.g., CC BY 4.0 or Open Data Commons] Keywords: Indonesian, Customer Reviews, Food SMEs, Text Summarization, NLP, IndoBERT

Files

Steps to reproduce

Download the dataset: Obtain the CSV file from the Mendeley Data repository. Load the dataset: Use Python, R, or any preferred data analysis tool to load the CSV file. For example, in Python: import pandas as pd data = pd.read_csv("indonesian_customer_reviews_food_smes.csv") Explore the dataset: Examine the structure, fields, and sample reviews: print(data.head()) print(data.info()) Preprocess the text (optional): Remove punctuation, emojis, or special characters. Normalize text (lowercasing, stemming, or lemmatization if needed). Tokenize sentences or words using a tokenizer compatible with IndoBERT or other NLP models. Use the dataset for NLP tasks: For text summarization: extractive or abstractive summarization using IndoBERT or transformer-based models. For sentiment analysis: map ratings (if included) or analyze review text. Evaluate models: Use metrics such as ROUGE, BLEU, or accuracy to assess performance of summarization or classification models.

Categories

Artificial Intelligence, Natural Language Processing, Text Extraction

Licence