Arabic news and public opinion dataset from YouTube

Published: 6 January 2025| Version 1 | DOI: 10.17632/3mnjw5hjkh.1
Contributor:
Hezam Gawbah

Description

• The dataset containing 2,047,433 public comments and replies from 70,000 video news published from 20 renowned Arabic news YouTube channels. Each channel contributes 3,500 video news segments, providing a comprehensive corpus for analysis. • The data withholds 16 properties of news that include video URL, ID, title, likes, views, date of publishing, hashtags, description, comment author, comment time, comment, likes in the comment, reply author, reply time, reply, and likes in the responses. • Data curation involves refining and organizing the collected data to ensure quality and usability. This step includes: - This involves removing duplicate rows, fixing mistakes, and making sure the dataset is consistent. To ensure privacy, the commentator’s name is removed in the dataset. - Data structuring: The data is organized in a standardized CSV format, making it easy to access and analyze. The data is organized in a standardized CSV format, making it easy to access and analyze. It includes 16 columns and 3500 records. - Data save: The final curated datasets are saved in 20 primary files; each news channel has a separate file. This file contains the raw data, which includes include the video URL, ID, title, likes, views, date of publishing, hashtags, description, comment author, comment time, comment, likes in the comment, reply author, reply time, reply, and likes in the responses.

Files

Institutions

Ibb University

Categories

Public Opinion, Natural Language Processing, Arabic Language, Temporal Analysis, Sentiment Analysis

Licence