Enhanced Dataset of Citizen Centric Complaints and Grievances on Twitter

Published: 23 July 2016| Version 1 | DOI: 10.17632/w2cp7h53s5.1
Swati Agarwal,


The dataset "Complaints_Reports_Data.sql" contains the public complaint tweets posted on 4 public service accounts of Indian Government (@RailMinIndia, @IncomeTaxIndia, @DelhiPolice and @dtpTraffic). Complaints_Reports_Data.sql file contains the records of raw tweets, users, hashtags, user mentions and other contextual metadata of tweets and bloggers. In this dataset, we also share a sample of tweets pre-processed in 3 steps ("pre1", "pre3" and "pre4")- hashtag expansion, spell error correction and internet & slang expansion. Metadata of each table is given below: Table 1: Annotated: tweet_ID, text, class (complaint or unknown) Table 2: Hashtags: tweet_ID, hashtag Table 3: Posts: tweet_ID, text, url_count, image_count, video_count, user_id, timestamp, organization (Indian Govt account), language, latitude, longitude, replied_to_tweet_id, replied_to_user_id, retweet Table 4, 5, 6: Pre1, Pre3, Pre4: tweet_ID, text, organization Table 7: User_Mentions: tweet_ID, user_ID Table 8: Users: user_ID, screen_name, name, verified?, location, created_at


Steps to reproduce

mysql -u root -p; enter your password create database citizen_complaints_sampled; use citizen_complaints_sampled; source Complaints_Reports_Data.sql;


Indraprastha Institute of Information Technology Delhi


Information Retrieval, Data Mining, Social Media, Patient Social Context, Government Computing, Textual Databases, Public Record