Quote Following

Published: 6 December 2022| Version 1 | DOI: 10.17632/g8c5wgmfgr.1
Joseph Schlessinger


This data serves as the companion dataset for the forthcoming paper "Exposing the Obscured Influence of State-Controlled Media: A Causal Estimation of Influence Between Media Outlets Via Quotation Propagation." This partially-sanitized dataset is a compromise between preserving valuable intellectual property and allowing replication. The original dataset has 618,328 quotes from 123,396 articles published between May 2018 and October 2019. It includes articles from 454 outlets labeled with one of 167 topics and 418 sentiments. Each quote has additional information about the date of the article, the country of media origin, any geographic references in the quotation, the quote speaker and the outlet where the quote and article appeared. Each quote concerns geopolitical news. The quotes are drawn from articles published by the most prominent outlets in 24 European countries. The published dataset contains the necessary data to replicate the analysis. We present the data after having conducted quote matching on the original, ~600k quote dataset. Each row represents an instance of "quote following," where one outlet, the "Following Outlet," used a quote in an article after the "Source Outlet" used the quote in an article. The source article using the quote was published on "Leading Article Date" and the following outlet article using the quote was published on "Leading Article Date". The article was hand-labeled with a given "Topic" and "Quote Sentiment". "N Sources Using Quote" gives the number of outlets that published an article using the quote, while "N Followings" gives the number of times the following outlet used the quote. Finally, "Leading English Quote" gives the quote used by the source outlet, while "Following English Quote" gives the quote used by the following outlet. In the case where multiple variations of a quote were used by either the source or following outlet, the quote given is the first version of the quote they used. Topics and sentiments have been sanitized to preserve intellectual property. Topics are labeled as either "Nuclear Cooperation" or "All Topics". This allows for replication on either all topics or nuclear cooperation, as done in the companion paper. Similarly, all sentiments except those taking a stance either for or against Russia or the United States are labeled "All Sentiments". English quotes for Russian-language media have been translated using Google translate.



Communication, Geopolitics, International Media