Soybean Market News Dataset

Published: 29 December 2023| Version 1 | DOI: 10.17632/f8fdmpp6yh.1


This dataset encompasses a comprehensive collection of soybean market news articles, meticulously curated and labeled for relevance. Collected from the prominent Brazilian website "" from January 2015 to June 2023, the dataset features a diverse range of content, including international, national, and regional perspectives. Features: Data Attributes: Date, headline, content, label, and embeddings from three pre-trained BERT models (Paraphrase Multilingual, Distilbert Multilingual, and BERTimbau). Labeling: News articles are labeled as either relevant or irrelevant, providing a binary classification for ease of analysis. Coverage: The dataset spans various aspects of the soybean market, offering insights into climate conditions, research findings, consulting information, technological advancements, diseases and pests, and logistics. Language: Portuguese Potential Applications: Natural Language Processing (NLP) tasks; Machine Learning Task(ML); Multimodal Predictions. Benefits: Diverse sources (544 international, national, and regional providers); Enriched with embeddings for advanced NLP applications Covers a wide range of soybean market aspects Usage: Researchers and practitioners in the fields of agriculture, economics, and data science can leverage this dataset for in-depth analyses, model development, and trend exploration within the soybean market.



Universidade de Sao Paulo Campus de Sao Carlos, Universidade do Estado de Minas Gerais


Natural Language Processing, Labeling Technique, Text Processing, Text Mining


Fundação de Amparo à Pesquisa do Estado de Minas Gerais

PCRH BPG-00054-210