Dataset: The Impact of e-WOM Sentiment on Sales: An Attribute Level Analysis of Search and Experience Attributes
Description
Data, Codes, and Process for the Paper "The Impact of e-WOM Sentiment on Sales: An Attribute Level Analysis of Search and Experience Attributes" This document outlines the data sources, coding steps, and processing workflow required to replicate the analysis presented in the paper. Data Sources Twitter (X) - for e-WOM data Amazon - for e-WOM data VGchartz - for sales data
Files
Steps to reproduce
Data for this analysis was obtained from X (formerly known as Twitter), Amazon for e-WOM, and VGchartz for sales metrics. The accompanying code and data enable the reproduction of topic and sentiment analyses. However, to create the final dataset (provided in the Dataset folder), some calculations will need to be performed in Excel or any other program of your choice. The complete process from raw data to the final dataset is outlined below. For any step where code or a dataset is included in the uploaded folder, it will be indicated: 1. **Obtain raw data** from X, Amazon, and VGchartz (data provided in the Raw Data folder). 2. **Process Twitter Data:** - 2.1. Merge all CSV files into one file (code and data available at `twitter/0. Raw Data`). - 2.2. Clean tweets as shown in the code (`twitter/1. Cleaned Data`). The output data is in a .rar file in that folder. - 2.3. Embed the text using the sentence embedding code, then apply topic modeling with two Python files (code and data in `twitter/2. Topiced Data`). - 2.4. Create a separate dataset for sentiment analysis (`twitter/3. For Sentiment`). - 2.5. Now, you have two datasets for tweets: one with topics and one with sentiment scores. Use the sentiment analysis model detailed in the paper to analyze sentiment for each tweet, storing results for the next step. 3. **Process Amazon Data** (code and datasets provided in the relevant folder). 4. **Merge Tweet and Amazon Scores** based on weekly dates. 5. **Aggregate Data by Topic Category:** Using a weighted average, group the data by week, then organize it into topic categories as outlined in the paper. 6. **Add First and Second Data Lags:** Calculate averages for topic categories to create variables `avg_SF`, `avg_SA`, `avg_EP`, and `avg_EO`. 7. **Obtain ABSACS_Final_Dataset** (provided in the folder). 8. **Repeat the Process for the Benchmark Model:** Calculate averages as described above to produce `avg_SF`, `avg_SA`, `avg_EP`, and `avg_EO`. Use the simpler sentiment model mentioned in the paper to create the **Benchmark Final Dataset**. 9. **Generate the Correlation Matrix** using `correlation file.ipynb`. 10. **Run the Regression Analysis**: Use the variables listed in the regression results table and conduct a robust regression in STATA.