MapIntel Case Study Dataset
Description
Daily news articles from multiple international sources collected using NewsAPI (https://newsapi.org/) during the period between October 2020 and June 2021. The total number of records is 334,925 documents. The format of the dataset is in JSON. Cleaning is applied to the direct results from the API. We ensure that each document is unique, is written in English, and doesn’t have any HTML tags or any strange pattern. Each record is a dictionary with the following keys and their descriptions: - "text": Cleaned content of the news article (concatenation of "title", "description", and "content" received from the API request. "content" is truncated to 200 characters). - "title": The headline or title of the article. - "url": The direct URL to the article. - "timestamp": The date and time that the article was published, in UTC (+000). Formatted as "%Y-%m-%dT%H:%M:%SZ". - "snippet": Excerpt of the document displayed in the user interface of MapIntel. - "image_url": The URL to a relevant image for the article.
Files
Institutions
Categories
Funding
Fundação para a Ciência e a Tecnologia
DSAIPA/DS/0116/2019