AI4RSE
Description
This dataset supports the study titled “Advancing Research Software Engineering with AI: A Research Framework”, which explores the emerging domain of Artificial Intelligence for Research Software Engineering (AI4RSE). It contains all materials used in the large-scale empirical analysis of over 1,500 open-source research software repositories. The repository includes: 1) Repository Metadata and Scores: A curated list of 1,512 research software repositories hosted on Zenodo, along with associated metadata extracted from GitHub. Each entry includes annotations for AI usage, engineering maturity, and compliance with the FAIR Principles for Research Software (FAIR4RS). 2) IEEE Taxonomy Mapping: A filtered subset of IEEE Taxonomy 2025 terms used to categorize the repositories and structure the domain-specific analysis. 3) Related Literature: A collection of key papers and survey articles that informed the theoretical framing of AI4RSE, quadrant modeling, and reproducibility metrics. 4) Analysis Scripts: Python and Jupyter-based scripts used to collect, process, score, and classify the repository data. These include tools for static and semantic code inspection, GenAI usage detection, FAIR4RS checks, and quadrant classification. All materials are made available to promote transparency, reproducibility, and further research at the intersection of AI, software engineering, and open science.