Tokopedia E Commerce Review In Indonesia
Description
This dataset was obtained through web scraping using Python from customer reviews of a motorcycle oil product on Tokopedia in Indonesia, preserved in their original form, including typos, abbreviations, and informal language, to maintain linguistic authenticity. The research tests the hypothesis that reviews with overly polished language, generic praise lacking product-specific details, and perfect punctuation are more likely to be fake, whereas reviews containing detailed product references, informal expressions, spelling mistakes, and emotional tone are more likely to be genuine. The analysis shows that a small number of reviews appear potentially fake, typically consisting of short, generic praise unrelated to the product, while most reviews seem authentic and contain specific complaints such as unusual smell, abnormal oil thickness, leaking packaging, missing barcodes, and suspicions of counterfeit goods. These findings suggest that the majority of negative feedback likely reflects actual quality issues or the circulation of fake products, and can be used by e-commerce platforms to prioritize relevant reviews and develop linguistic-based fake review detection systems.
Files
Steps to reproduce
1. Set Up Python Environment 2. Use Python (version 3.8 or later) for running the data collection. 3. Develop the Web Scraping Script 4. Implement web scraping using the Selenium method to automate browser actions and extract review data from the target e-commerce store or product page. 5. Select Target Store or Product 6. Identify the specific store or product page from which review data will be collected. 7. Ensure the page contains sufficient reviews for analysis. 8. Run the Scraper and Save Data 9. Execute the script to collect the reviews (including review text, rating, and other relevant metadata). 10. Store the extracted data in a structured format such as CSV for further processing and classification.
Institutions
- Bina Nusantara University