Religious Beliefs on Social Media: Large Dataset of Tumblr Posts and Bloggers Consisting of Religion Based Tags

Published: 13 August 2016| Version 1 | DOI: 10.17632/8hp39rknns.1
Swati Agarwal,


The dataset contains 8 different types of Tumblr posts consisting of tags commonly used in religion based posts. The shared dataset contains linguistic and contextual metadata of Tumblr posts, bloggers and Notes available since 2007. If you need the raw data collected for the tags, please contact: The dataset contains the following information: 1. Tumblr Posts Collected for a Tag post_id unique id of a post blogger username of the blogger timestamp timestamp of the post date date and time of the post GMT state status of the post: published, drafted, queued.... slug url of the post consisting of title of the post format html or markdown short_url compressed and direct url of the post note_count number of notes on a post (including likes and reblogs) type text, audio, video, photo, url/link, chat, answer, quote reblogged_key unique key of a post if it gets reblogged- linking source post reblogged_from if a post is already reblogged then the id of parent blogger 2. Merged_Data post_id unique id of the post Content (non-English content translated in English) 3. Tag post_id unique id of the post tag tag associated with the post 4. Notes post_id unique id of the post blogger author who made the post note_type like or reblog note_by who liked or reblogged the post note_timestamp timestamp of hit 5. Blogger blogger id of the blogger blogger_name name of the blogger last_updated timestamp of last update from blogger- last activity blogger_ask whether blogger allows questions/messages number_posts total number of posts made by the blogger blogger_title title of blogger blogger_description description of blogger


Steps to reproduce

mysql -u root -p; enter your password create database Feature_Space; use Feature_Space; source Tumblr_Religious_Conflicts.sql;


Religion, Information Retrieval, Applied Linguistics, Data Mining, Social Media, Patient Social Context, Natural Language Processing, Social Behavior, Text Extraction, Government Affair