ComPara: A Corpus Linguistics Dataset of Computation in Architecture

Published: 21 October 2020| Version 5 | DOI: 10.17632/7ktscvmxvg.5
Anca-Simona Horvath


A corpus linguistics built to study the language of computational architecture, or architecture which focuses on technology developments. The corpus includes: (1) the volume titles, titles of articles and Introduction keywords for the journal Architectural Design (AD) to retrieve keynotes in theoretical discourse, and (2) titles and abstracts of winning and honourable mentions of the eVolo Skyscraper competition to retrieve words in conceptual project titles and their descriptions. This dataset has around 100.000 words and can serve as a basis for quantitative, qualitative or mixed method analysis of the language used in AD and the eVolo skyscraper competition between 2005 and 2019. As AD is recognized as one of the journals focusing on the 'digital turns' in architecture, and eVolo is arguably the most prestigious architectural competitions which focus on technological advances in architecture, ComPara can be considered respresentative for the language of computational architecture over the last 15 years. It includes .txt and .csv files as well as .svg wordclouds.


Steps to reproduce

This corpus was retrieved using the web scraping tool Octoparse ( The word clouds were created using Voyant Tool - Cirrus (


Aalborg Universitet


Architecture, Corpus Linguistics, Architectural Design, Architectural History, Architectural Theory, Architectural Technology