Context Scientific papers, as well as other types of documents, can be identified by a set of keywords. Typically, authors are free to choose their Keywords. When authors decide keywords, we called them Authors' Keywords (AK) Sometimes, keywords are imposed, limited or infered by using algorithms. KeyWordsPlus © (KP) try to use information from the bibliographic references of an article to infere keywords. Content This dataset contains information about AK and KP of 69.000 articles. All the articles have been retrieved from Web of Science (WOS): https://www.webofknowledge.com The data is splitted into three different collections: Raw: The raw data, as comes from WoS. Document are distributed over CSV documents. Columns "DE" and "ID" referers to AK and KP, respectively. filtered: We've removed all the articles which don't contain information about AK and KP at the same time. pre_processed: We have cleaned keywords to remove special character, and we have lowercased and stemmed all the keywords. In filtered and pre_processed, you will find two text documents: "ak.txt" and "kp.txt", every line of these documents referers to the same article. So for example, the article number 8 have the following keywords: AK: Automated knowledge assessment; concept map; linking phrase; semantic analysis KP: SCIENCE After pre-processing, the article number 8 have the following keywords: - AK: automknowledgassess;conceptmap;linkphrase;semant_analysi - KP: scienc Acknowledgements We want to thank Web of Science for giving access to it's database.
Steps to reproduce
This data can be retrieved from Web Of Science, by filtering by entering the term "Computer science" and witouth applying any filter. Results should be exported as CSV format.