Data for: Mapping the Dutch Vaccination Debate on Twitter: Identifying Communities, Narratives, and Interactions

Published: 25-04-2019| Version 1 | DOI: 10.17632/fjvk93bc5m.1
Roel Lutkenhaus


*Tweets* We retrieved all Dutch Twitter messages (statuses or tweets) written between 07-28-17 and 12-02-17 that included the words: ‘vaccinatie’, ‘vaccineer’, ‘vaccineert’, ‘vaccineren’, ‘vaccineerde’, ‘vaccineerden’, ‘gevaccineerd’, ‘gevaccineerden’, ‘vaccin’, ‘vaccins’, ‘inenting’, or ‘inenten’ . This produced a collection of 2,869 tweets by 1,684 unique users. Many of these tweets resulted from (multiple) interactions between users. For example, 823 of our 2,869 original tweets (28.7%) were replies, 414 (14.4%) were retweets, and 249 (8.7%) were quotes. Many of these statuses would not have been written without an original tweet to retweet, quote, or reply to. As we wanted our data to reflect this context, we retrieved the (chains of) tweets that triggered the retweets, quotes, and replies in our initial set, resulting in 2,437 extra tweets by 1,197 unique users, of whom 324 unique users were present in our initial data set. This led to a sample set of 5,306 unique messages written by 2,557 unique users. *Nodelist and edgelist* Just a small section of all registered Twitter users actively tweet; many users merely lurk or are inactive [21,22]. However, connections between non-tweeting and tweeting users make up a large part of the digital infrastructure that facilitates the circulation of vaccine-related content and can be used to reveal the underlying social context. Therefore, for each of the unique Twitter accounts in our earlier-retrieved set of tweets (the authors), we retrieved all their followers (accounts following the authors: 34,135,154) and followees (accounts followed by the authors: 1,288,618). We were interested in identifying online communities based on shared interests (who the authors are following) and shared audiences (who the authors are followed by). We therefore excluded followers and followees who were not connected to at least 15 authors. We determined this cut-off point by examining the distribution of the number of connections with authors and arrived at our ultimate network size to stay within the limits of what our hardware and software were capable of handling in terms of visualization. Ultimately, our network included 121,623 Twitter accounts and 3,706,124 connections. We used the Louvain algorithm to detect communities in our network. This is known as a fast, but relatively accurate, method to detect communities in large-scale networks.