Open Data in Social Sciences
Description
This dataset accompanies the study titled “Open Data in Social Sciences: Growth, Impact, and Equity in Data Paper Publishing,” which presents the first large-scale empirical assessment of peer-reviewed data papers in the domain of social sciences. With the increasing emphasis on data transparency, FAIR (Findable, Accessible, Interoperable, and Reusable) principles, and equitable research practices, this dataset provides a robust and replicable foundation for exploring how data papers are transforming scholarly communication, research impact, and global collaboration in the age of scientific crisis. Drawing on a curated corpus of 3,957 peer-reviewed data papers indexed in Scopus, the dataset includes detailed bibliographic metadata, citation counts from major citation indices (Scopus, Web of Science, Google Scholar, Dimensions), and Altmetric attention scores reflecting societal engagement across platforms such as news media, policy documents, patents, and social media. The dataset further captures collaboration networks, institutional affiliations, funding acknowledgments, and journal-level compliance with FAIR and CARE (Collective Benefit, Authority to Control, Responsibility, and Ethics) principles. The accompanying analytical materials include: • Preprocessed datasets in CSV and Excel formats for reproducibility. • Altmetric scores mapped to individual publications. • Research collaboration network, co-citation analysis and bibliographic coupling files generated using VOSviewer (Map and Network files). • R Workspace file used for bibliometric analysis via the Bibliometrix R package. • High-resolution PNG images of all figures and tables used in the publication. Plotted data and analytical visualizations presented in the manuscript are also made available in this repository to enable full reproducibility of the findings and facilitate secondary analyses. This dataset enables replication, extension, and methodological innovation in bibliometric, altmetric, and open science research. It is intended to support scholars, data librarians, research policymakers, and open science advocates in examining patterns of data publication, exploring citation and attention dynamics, and evaluating the role of funding and policy in shaping open data ecosystems. All materials are openly available for reuse under open access principles, promoting transparency, reproducibility, and inclusive participation in the data-driven transformation of the social sciences. File Types Included: • CSV (.csv): Scopus data • Excel (.xlsx): Citation and Altmetric metrics • Text (.txt): Network and co-citation files • RData (.RData): Bibliometrix workspace • Image (.png): Figures and tables
Files
Steps to reproduce
To reproduce the findings presented in the study, users should begin by downloading the complete dataset package, which includes bibliographic metadata (Data Papers (n=3957).csv and .xlsx), cleaned analytical data (Analysis.xlsx), and Altmetric information (Altmetrics.xlsx). These files contain detailed publication, citation, and engagement metrics necessary for quantitative and visual analysis. For bibliometric analysis, users can open the provided R workspace file (Bibliometrix.RData) in RStudio. This workspace includes structured data prepared using the Bibliometrix package and can be used to replicate publication trends, thematic clustering, collaboration mapping, and citation impact assessments. Required R packages such as bibliometrix, dplyr, and ggplot2 should be installed beforehand to ensure full functionality. Network visualizations of co-cited journals, source clustering, and collaboration network can be reproduced using VOSviewer. The files Sources_Network.txt and Sources_Map.txt should be loaded into the software, where users can generate visual representations of the intellectual and disciplinary structure underlying the dataset. Statistical analyses—such as Pearson correlation between citation counts and Altmetric scores—can be reproduced using standard statistical tools including Excel, R, or Python. The cleaned .xlsx files contain all variables used for these calculations, including open-access status, citation metrics from multiple databases, and Altmetric platform-level engagement scores. Finally, all figures and tables used in the manuscript are included as .png images to support visual validation. Users can compare regenerated plots against these originals to ensure analytical consistency. Collectively, these files and instructions ensure full reproducibility of the study and promote secondary analysis, meta-research, and extension of findings in the field of open science and research communication.