Complexity of parliamentary speeches, Czech Republic 2017 - 2021

Published: 28 March 2023| Version 1 | DOI: 10.17632/f47tb35gcz.1
Petr Voda


The Stenographic Protocols of the Czech Chamber of Deputies in the 2017–2021 parliamentary period served as the source of data. The core of the data is a corpus of parliamentary speeches presented during the electoral term starting in October 2017 and ending in October 2021. However, as the corpus provided for download is not preprocessed, it was necessary to collapse speeches done by individual MPs dispersed into individual paragraphs (n = 223,643) into a form where the whole speech is handled as a single case (n = 82,550). Before the analysis, some speeches were dropped, and several pieces of information were merged into the speeches. Firstly, speeches done by a moderator were deleted (n = 42,577). Because some speeches were interrupted by a moderator (e.g., asking for silence in the chamber), the subsequent remarks of the same speaker were merged into one speech. As a next step, we deleted speeches not related to any parliamentary document (i.e., parliamentary questions, voting sessions etc.) and discussions where more than one parliamentary document is discussed simultaneously, as these are key to recognizing which committees were involved in discussions. This left us with a final set containing 17,875 speeches. In its initial form, our dataset contained only the name of the speaker, his or her function (MP, minister, Prime Minister), the speech, a title of the parliamentary document to which the speech is related, parliamentary document number, and the number and type of the session. The dataset of speeches was accompanied with additional information, mostly from the parliamentary database. Firstly, we added the identification codes to the names of MPs, allowing us to connect the data automatically. This includes the membership of MPs in committees. Because both the speech and membership are accompanied with dates, we were able to classify whether an MP was a member of a committee at the time of her speech and the discussion. Furthermore, we calculated in how many electoral terms the MP had sat in the Chamber. From the database we were also able to distinguish the party membership, age, gender and education of an MP. Finally, we added additional information about party membership from the party websites and wiki pages of politicians. The dependent variable of our study is the linguistic complexity of parliamentary speech measured by the Flesch Reading Ease score (FRE) adapted for Czech: FRE (CZ)= 206,935-1,672 (total words)/(total sentences)-62,18 (total syllables)/(total sentences)



Masarykova univerzita


Parliamental Politics