Nuclear Engineering and Technology Journal titles dataset and LSA outputs

Published: 25 August 2023| Version 1 | DOI: 10.17632/9j6std925r.1
Contributor:
Vincent Kuo

Description

The dataset consists of data on articles from the Nuclear Engineering and Technology journal and the different outputs of each step of Latent Semantic Analysis (LSA) applied to the data. The following are the contents of the Excel file: Tab 1: All articles and related bibliographic and other data until 2023 Tab 2: The titles of 2643 articles between 2013-2023 used for LSA. Tab 3: Term-document-matrix (TDM) consisting of the occurence count of term i in document j. Tab 4: The same TDM weighted with TF-iDF. Tab 5: The Singular values of the Singular Value Decomposition (SVD) output S matrix, the squares of the singular values, cumulative percentages, and a scatter plot of the squared singular values against the dimensionality. Tab 6: The output semantic U matrix with dimensionality of 100 describing terms vectors. Tab 7-8: The 2D coordinates and t-SNE plots of the top 20, 100 and 500 vectors ranked by TF-iDF. Tab 9: Common Ngrams parsed from the dataset for interest.

Files

Categories

Semantics, Knowledge Management, Nuclear Engineering, Natural Language Processing, Data Visualization, Text Processing, Competency Modeling

Licence