Dataset on Sentiment Analysis of Global Climate Change Education Abstracts (2015–2024)

Published: 31 July 2025| Version 1 | DOI: 10.17632/6gxknsrk27.1
Contributor:
Enrique de Paz

Description

This dataset contains supporting data for a manuscript currently in preparation, exploring rhetorical tone, sentiment, and vulnerability framing in climate change education literature published between 2015 and 2024. It includes metadata and sentiment analysis results for 777 peer-reviewed abstracts, covering sentiment polarity, subjectivity, hedging/certainty metrics, rhetorical style classification, vulnerability scores, and author-region metadata. This dataset is openly available to support transparency, reuse, and further research in the field of climate change education and science communication.

Files

Steps to reproduce

Systematic Literature Search Conducted in Scopus, Google Scholar, and EBSCO using Boolean keywords related to climate change, education, and higher education. The search covered January 2015 to December 2024. A total of 1,931 articles were retrieved and screened in Covidence, resulting in 777 peer-reviewed abstracts included in the final dataset. Vulnerability Indexing Composite Vulnerability Scores (CSV) were computed for each country by combining normalised INFORM Risk Index (60% weight) and WorldRiskIndex (40%). Scores were binned into five categories (Very Low to Very High) for comparison. Disciplinary Classification (ASJC Codes) Each article was assigned one or more disciplines based on its ASJC journal classification codes retrieved from SciVal. Matching was performed using a keyword-based filter to flag inclusion across nine major fields. Sentiment Analysis (Polarity) The sentiment polarity of each abstract was computed using the Hugging Face model distilbert-base-uncased-finetuned-sst-2-english. Polarity scores were calculated as the difference between the model’s positive and negative probabilities (S = p – n), then discretised into tonal categories. Subjectivity Analysis TextBlob was used to generate a subjectivity score (U ∈ [0,1]) for each abstract, capturing the level of personal interpretation or evaluation. Scores were binned into five subjectivity categories. Communication Style Classification Combined polarity (S) and subjectivity (U) were used to classify each abstract into one of seven rhetorical styles (e.g. Optimistic, Factual, Sceptical, Mixed-Positive). Hedging and Certainty Metrics A curated list of hedging and certainty keywords was used to count occurrences in each abstract. Normalised densities (per 1,000 words) and net certainty values were calculated and binned into rhetorical categories (e.g. Mostly Certain, Balanced, Mostly Hedged).

Institutions

  • University of the Sunshine Coast

Categories

Social Sciences, Geography, Information Science, Education, Environmental Science

Licence