ComPara: A Corpus Linguistics Dataset of Computation in Architecture

Published: 20 October 2022| Version 6 | DOI: 10.17632/7ktscvmxvg.6
Anca-Simona Horvath


A corpus linguistics built to study the language of computational architecture, or architecture which focuses on technology developments. The corpus includes (1) the volume titles, titles of articles, and keywords associated with the Introduction article of the journal Architectural Design (AD) to retrieve the language in the theoretical discourse around computation in architecture, and (2) titles and abstracts of winning and honorable mentions of the eVolo Skyscraper competition to retrieve words in conceptual project titles and their descriptions. This dataset has around 100.000 words and can serve as a basis for quantitative, qualitative, or mixed-method analysis of the language used in AD and the eVolo skyscraper competition between 2005 and 2019. As AD is recognized as one of the journals focusing on the 'digital turns' in architecture, and eVolo is arguably the most prestigious architectural competition which focuses on technological advances in architecture, ComPara can be considered representative of the language of computational architecture between 2005 and 2019. It includes .txt and .csv files as well as .svg wordclouds.


Steps to reproduce

This corpus was retrieved using the web scraping tool Octoparse ( The word clouds were created using Voyant Tool - Cirrus (


Aalborg Universitet


Architecture, Corpus Linguistics, Architectural Design, Architectural History, Architectural Theory, Architectural Technology