YouTube Conflict Sentiment Dataset — Bilingual English-Turkish Labelled Comments (2024-2025)

Name: YouTube Conflict Sentiment Dataset — Bilingual English-Turkish Labelled Comments (2024-2025)
Creator: Umair Ali Khan
Published: 2026-06-01T15:08:55.548Z
Keywords: Social Media, Machine Learning, War, Deep Learning, Public Sentiment, Sentiment Analysis, Large Language Model

Ali Khan, Umair

doi:10.17632/g3tv6kfrbf.1

YouTube Conflict Sentiment Dataset — Bilingual English-Turkish Labelled Comments (2024-2025)

Published: 1 June 2026| Version 1 | DOI: 10.17632/g3tv6kfrbf.1

Contributor:

Umair Ali Khan

Description

This dataset contains 4,672 manually labelled YouTube comments in English (n=3,000) and Turkish (n=1,672) related to two major global conflicts — the Israel-Hamas-Palestine war and the Ukraine-Russia war. Each comment is labelled with one of three sentiment classes: Positive, Negative, or Neutral. English comments were labelled directly using standard sentiment guidelines. Turkish comments were translated to English to assist the manual labelling process, then the original Turkish text was retained for model training. This labelled dataset was used to train and evaluate nine sentiment classification models across three paradigms — classical Machine Learning (Logistic Regression, Random Forest, XGBoost, SVC), Deep Learning (Bidirectional LSTM), and transformer-based Large Language Models (BERT, Turkish BERT, mBERT, RoBERTa, XLM-RoBERTa) — as part of a Master's thesis at FMV Işık University. The best performing model was Multilingual BERT (mBERT) achieving Macro F1 of 0.75. Author usernames have been removed to comply with privacy regulations (KVKK/GDPR).

YouTube Conflict Sentiment Dataset — Bilingual English-Turkish Labelled Comments (2024-2025)

Description

Files

Categories

Licence