300 topics in "Exploring the Subject Heterogeneity of Scientific Research Project Funding"

Published: 18 January 2022| Version 1 | DOI: 10.17632/dyy2wxf3kk.1
Contributor:
Xinyue Yi

Description

We analyze topic distribution is extracted using the LDA topic model. We first build a text corpus based on the titles, abstracts, and keywords of the 115,813 papers through normalization techniques of natural language processing, including word segmentation, stop word removal, and standardized stemming. Then, we use the LDA topic model to extract topics and correspond each of them to several high-frequency words. The perplexity index is used to determine the number of topics--the smaller the perplexity is, the stronger the overall effect of the model. It can be seen from the graph that the perplexity reaches the lowest value when the number of topics is 300. So we choose 300 topics as the reference for the following analysis on topic attributes. The table contains all the theme-words contained in 300 topics

Files

Categories

Dirichlet Series

Licence