Queries to identify climate change research that takes an engineering approach

Published: 31 May 2022| Version 1 | DOI: 10.17632/gxgj4kmsvz.1


This dataset was created as part of work conducted for the NSF Engineering Research Visioning Alliance's inaugural Visioning Event, entitled "The Role of Engineering in Addressing Climate Change". The dataset is composed of <ol><li>four text (.txt) files, each containing a single query used to identify research relevant to climate change; and</li> <li>three comma-separated values (.csv) files including information about how engineering research was defined.</li></ol>The goal of this work was to conduct bibliometric analyses aimed at better understanding the role engineering plays in climate change research. To do so, the set of documents returned from the climate change queries was crossed with the set of documents returned from the engineering query. Files 01 through 04 each contain a single Scopus query that aims to capture either general climate change research (01) or specific topics of interest within climate change research (02-04). These are queries that could be run using the scopus.com advanced document search feature. Given engineering is a broad field, keyword-based queries were not an efficient way to capture all relevant articles. Instead, three complementary classification schemes were used to capture this research. The following files summarize which parts of the classification schemes were mostly engineering-focused. An effort was made to keep this publication set maximally inclusive, such that it also includes some wider applied research subfields. File 05a contains the All Science Journal Classification (ASJC) classes that were identified as being engineering for the purpose of this project. The ASJC is a journal-level classification, which means that all articles from a given journal are classified in the same subject area(s). A complete list of ASJC codes can be found <a href = "https://service.elsevier.com/app/answers/detail/a_id/15181/supporthub/scopus/" target="_blank">here</a>. File 05b refers to Science-Metrix (SM) subfields identified as being engineering. The SM classification is also made at the journal-level, but articles from multidisciplinary journals such as Science or Nature are reclassified at the article level by a machine learning model (<a href = "https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0251493" target="_blank">Read the article here</a>), allowing us to capture more articles than using ASJC alone. File 05c refers to the SciVal Topics that were included in the engineering publication set. Topics are collections of documents that share a common intellectual interest, as identified through their direct citation patterns. For a high-level description of SciVal Topics, see <a href = "https://www.elsevier.com/solutions/scival/features/topic-prominence-in-science" target="_blank">this page</a>. A detailed description of the methodology used to create Topics is also available in <a href = "https://onlinelibrary.wiley.com/doi/abs/10.1002/asi.22748" target="_blank">this article</a>.


Steps to reproduce

The keyword-based climate change queries used here were based on past work conducted by Elsevier to identify climate change research. In particular, the publication sets developed to identify research relevant to the Sustainable Development Goals (queries available <a href = "https://elsevier.digitalcommonsdata.com/datasets/9sxdykm8s4" target="_blank">here</a>) proved very useful as a starting point from which to build. This led to the creation of the query presented in file 01. Queries described in files 01 to 04 were otherwise built using the following dataset construction approach: <ol><li>A first query composed only of key phrases evidently-pertinent to the topic at hand was developed.</li> <li>Publications returned by the query were assessed to ensure almost no off-topic articles were returned.</li> <li>A text-mining approach based on TF-IDF weighting was used to identify additional pertinent key phrases used in the set of returned articles.</li> <li>The content returned by these additional key phrases was assessed to ensure it was still pertinent to the topic at hand. If it was, the key phrase was added to the query.</li> <li>Using the expanded query, steps 2 to 4 were repeated until almost no more content was added to the publication set between iterations.</li> <li>Precision of the publication set was manually assessed through a sampling approach to ensure at least 95% of the content was pertinent. If too many off-topic publications were found, terms were modified or exclusions were added to the query to ensure the off-topic content was not included in the final publication set.</li></ol>The engineering publication set was built in two main steps. First, all ASJC fields (file 05a) or SM subfields (file 05b) that had the words <i>engineering</i>, <i>applied</i>, <i>application</i> or <i>technology</i> in their title, or that were part of a higher-level class that included one of these words and were not part of the social sciences & humanities was included. Also included were all articles classified under the ASJC's <i>Energy</i> and <i>Material Science</i> subject areas, as these are also applied science. The SM subfields <i>Energy</i>, <i>Materials</i>, <i>Optoelectronics & Photonics</i>, <i>Mining & Metallurgy</i>, <i>Operations Research</i> and <i>Aerospace & Aeronautics</i> were also included for this same reason. The second step was to use SciVal Topics (file 05c) to expand on the use of journal-based classifications. This is a valuable addition because articles from the same journal are not necessarily found in the same Topic, so engineering articles published in non-engineering journals can also be captured in this way. To identify Topics pertinent to include in such a way, the share of every Topic that was already captured by the two other classification schemes was computed, and all publications within each Topic that already had at least 80% of its content included were included. This rate was chosen to maximize precision.


Elsevier BV


Data & Analytics


Engineering, Climate Change, Bibliometrics