E-GOVERNMENT TEXT MINING DATA IN WUHAN

Published: 13 December 2021| Version 1 | DOI: 10.17632/5rzx2gj766.1
Contributor:
xiaolin lao

Description

• The dataset is sourced from the local e-government panel in China: Wuhan City Message Board. • This dataset provides messages of the most pressing public issues about COVID-19 that Wuhan people want the government to address. • This data is listed in chronological order, which can reflect the change of public opinion. • The data can be used for text mining. The scores of sentiment analysis can be used to reflect the change and high and low distribution of the mood of the Wuhan people. • Semantic analysis and Word2vec analysis of this data can get the topic distribution of Wuhan people by the stage of time. • The public and policymakers can use these changes in sentiment and themes to know what situation they are in, in the postnormal triggered by COVID-19.

Files

Steps to reproduce

1) The data is crawled using Python. The collection URL is http://liuyan.cjn.cn/. The collection date is October 27th, 2021. 2) Firstly, the search was performed using the Chinese character "疫情" as a keyword. Secondly, "View Details" was clicked on each message to go to the secondary page. Thirdly, strings of "title," "inquiry code," "user ID," "message time," "civic message content," "government response time," and "government response content" were crawled under the secondary pages. Besides, each message matches a unique query code, which can filter for duplicate values after crawling the messages. Finally, a total of 13598 no-repeat messages (3490054 Chinese characters) have been collected.

Categories

Textual Database

Licence