YouTube Conflict Sentiment Dataset — Bilingual English-Turkish Labelled Comments (2024-2025)
Description
This dataset contains 4,672 manually labelled YouTube comments in English (n=3,000) and Turkish (n=1,672) related to two major global conflicts — the Israel-Hamas-Palestine war and the Ukraine-Russia war. Each comment is labelled with one of three sentiment classes: Positive, Negative, or Neutral. English comments were labelled directly using standard sentiment guidelines. Turkish comments were translated to English to assist the manual labelling process, then the original Turkish text was retained for model training. This labelled dataset was used to train and evaluate nine sentiment classification models across three paradigms — classical Machine Learning (Logistic Regression, Random Forest, XGBoost, SVC), Deep Learning (Bidirectional LSTM), and transformer-based Large Language Models (BERT, Turkish BERT, mBERT, RoBERTa, XLM-RoBERTa) — as part of a Master's thesis at FMV Işık University. The best performing model was Multilingual BERT (mBERT) achieving Macro F1 of 0.75. Author usernames have been removed to comply with privacy regulations (KVKK/GDPR).