Dataset | Sustainable and Circular Construction Terms

Published: 10 October 2024| Version 3 | DOI: 10.17632/6w74d7x8s4.3
Contributors:
,
,
,
,
,
,

Description

The study uses a dataset of 480 academic papers to analyze the relationship between "sustainability" and "circular economy" in the construction industry through Natural Language Processing (NLP) techniques, including TextRank, TF-IDF, and Concept Matrix. The dataset was gathered from two focused searches in the Scopus database: "circular economy and construction" and "sustainability and construction." Each paper was processed to extract essential information, such as author details, abstracts, full text, and quantitative metrics (word counts, sentence structures, etc.). The data was then cleaned, with unnecessary elements like URLs, abstracts, and references removed to ensure accurate analysis. Preprocessing involved removing numbers and stop-words to highlight meaningful terms. Analysis showed that circular construction focuses heavily on operational aspects such as resource recovery and waste management, while sustainable construction adopts a broader, holistic scope, addressing urban planning, community development, and long-term environmental impact. Both fields overlap in areas like environmental assessments but differ in how they approach resource use and sustainability goals. The analysis was performed using Jupyter notebooks, where clustering and evaluation techniques were applied to assess the results. CSV files store the extracted terms for each method, separating common and unique terms for circular economy and sustainable construction and clearly comparing the concepts.

Files

Steps to reproduce

The data for this research was collected through a systematic review of academic literature using the Scopus database. Two search queries were used: one for "circular economy and construction" and another for "sustainability and construction," focusing on papers published between 2021 and 2024. After filtering based on relevance from abstracts, 480 papers were selected. To extract key terms, Natural Language Processing (NLP) techniques such as TextRank, TF-IDF, and Concept Matrix were employed. The text was extracted from PDFs using tools like PdfReader, Apache Tika, and PDFMiner, and for HTML documents, Beautiful Soup was utilized. The data underwent cleaning to remove irrelevant content like URLs, abstracts, and references, and preprocessing steps included stop-word removal and lemmatization. Clustering and evaluation methods were then applied to interpret the results and compare the key concepts in circular and sustainable construction.

Institutions

Kazakh British Technical University, Institute of Information and Computer Technology, Nazarbayev University

Categories

Natural Language Processing, Machine Learning, Sustainable Construction, Circular Economy, Cluster Analysis

Licence