arXiv Scientific Research Paper Dataset

Name: arXiv Scientific Research Paper Dataset
Creator: Sumit Mishra
Published: 2025-02-19T17:52:53.482Z
Keywords: Data Science, Natural Language Processing, Machine Learning, Bidirectional Encoder Representations From Transformers

Mishra, Sumit

doi:10.17632/mm6kst3krj.1

arXiv Scientific Research Paper Dataset

Published: 19 February 2025| Version 1 | DOI: 10.17632/mm6kst3krj.1

Contributor:

Sumit Mishra

Description

Description This dataset comprises structured metadata from the arXiv repository, a widely used preprint server for scientific research. It includes paper titles, abstracts, categories (subject areas), and submission dates, making it a valuable resource for research in natural language processing (NLP), bibliometrics, machine learning, and scientific trend analysis. Content The dataset contains the following columns 1. id: Unique arXiv identifier for each paper. 2. title: The title of the research paper. 3. summary: Summary of the paper’s content, extracted from arXiv. 4. summary_word_count: Word count of the summary. 5. category: Subject categories assigned by arXiv. 6. category code: Category code for the research paper. 7. published_date: Publication date of the research paper. 8. updated_date: The last updated date is when the paper is updated. 9. authors: Authors of the research paper. 10. first_author: First Author mentioned in the paper.

arXiv Scientific Research Paper Dataset

Description

Files

Categories

Licence