arXiv Scientific Research Paper Dataset
Description
Description This dataset comprises structured metadata from the arXiv repository, a widely used preprint server for scientific research. It includes paper titles, abstracts, categories (subject areas), and submission dates, making it a valuable resource for research in natural language processing (NLP), bibliometrics, machine learning, and scientific trend analysis. Content The dataset contains the following columns 1. id: Unique arXiv identifier for each paper. 2. title: The title of the research paper. 3. summary: Summary of the paper’s content, extracted from arXiv. 4. summary_word_count: Word count of the summary. 5. category: Subject categories assigned by arXiv. 6. category code: Category code for the research paper. 7. published_date: Publication date of the research paper. 8. updated_date: The last updated date is when the paper is updated. 9. authors: Authors of the research paper. 10. first_author: First Author mentioned in the paper.