Topological and Conversational Features in a Large Mastodon Network
Description
This dataset was created as part of a research paper titled 'Leading the Mastodon Herd: Analysing the Traits of Influential Leaders on a Decentralised Social Media Platform' by Luke Gassmann, Dr Matthew Edwards and Dr Ryan McConville. The datasets original purpose was to analyse how influence and conversational metrics interact within a decentralised social network environment. This dataset contains a network of Mastodon users linked by conversational traits, mentions and reposts. Between the year 2001 and 2023 this dataset has 150,000 users, 70,000,000 total collected conversational and user features, 1,000,000 repost connections, 1,000,000 mention connections, and 1,000,000 post and reply connections. This dataset was collected using a three hop method from seed users referring to BBC News in the Mastodon Network. Personal information has been redacted from the dataset.
Files
Steps to reproduce
The dataset is split over 12 CSV files with many files having connections to others via their associated IDs. Each file is described below: tbl_community_to_user: Used as a connection to show which communities users belong to. tbl_community: Showing the seed user's communities (for this dataset only BBC News is used). tbl_hashtag: Shows a list of hashtags found in posts or replies. tbl_link: Shows a list of links found in posts or replies. tbl_post_mention_to_user: A connection table used to associate a user who has been mentioned in a piece of text. tbl_post_to_hashtag: A connection table used to associate a hashtag that has been used inside a piece of text. tbl_post_to_link: A connection table used to associate a link that has been used inside a piece of text. tbl_post_to_reply: A connection table used to connect a post to a reply. This table refers both columns to the tbl_post file. tbl_post: A table containing all posts and replies in the network alongside features extracted from the content. This content has been removed but can be accessed via the Mastodon IDs. tbl_user_to_repost: A connection table used to connect a user to a post that they have reposted. tbl_user_topology-post_reply_only: A table containing user information and their corresponding communities and topological influence metrics. This table only contains users associated with posts and replies. tbl_user: Contains all users alongside some minor platform features. User's identities have been removed from the dataset.