"Bangla News Article Sentiment Analysis Dataset"

Published: 21 August 2024| Version 1 | DOI: 10.17632/93m3f8h6dz.1
Contributor:
Ehsanul Haque

Description

This dataset is collected from Bangla news articles, scraping news articles from various online newspapers. It has an article base of more than 1000 articles where the title of the news piece as well as the full content of the same is available. The content of the dataset consists of over 35,000 individual sentences taken from such articles. All the sentence or article can be used for sentiment analysis tasks to find out sentiment of people, categorizing the newscontents or to find out the emotion of tone in Bangla language that has been used. The variation of the source newspapers means that the topics covered, the opinion and the writing style are varied strongly.

Files

Steps to reproduce

In order to generate my dataset, I made it a procedural point to check that the correct Python libraries were installed, namely newspaper3k, requests, and BeautifulSoup4. I then searched for several Bangla newspapers from Bangladesh and collected them; With this, I implemented in the newspaper3k library to automate the extraction of the titles and content of the articles from these newspapers. For interaction with the web and to search for the specific links to the articles I used requests for HTTP and BeautifulSoup for the parsing of the HTML. The articles’ titles as well as the full text of the articles obtained in the process were saved in a file format of Excel. More importantly, I did not manipulate the raw data in any way; it was featured as is, acquired from the sources. This approach makes sure that the dataset provides unprocessed information of the news documents.

Institutions

East West University

Categories

Data Science, Natural Language Processing, Machine Learning

Licence