Perspectives on venture capital: A practitioner-oriented bibliometric dataset of abstracts
Description
This dataset contains the underlying search results, screening outputs, classification data, and derived analytical files used in the article Perspectives on Venture Capital: A practitioner-oriented bibliometric and systematic review. The dataset was generated with the survey-results / article-analysis-public NLP toolkit, which supports a PRISMA-based literature review workflow by querying multiple digital libraries, removing duplicate records, screening articles against inclusion and exclusion criteria, and automatically tagging records based on predefined thematic properties. In this study, the search covers publications from 2010 to 2022 and focuses on venture capital related literature identified through keyword-based queries such as venture capital, private equity, angel investor, and business angel. The dataset documents the full article selection pipeline, from the initial retrieval of records to the final analytical corpus used in the manuscript. It includes raw or intermediate search outputs, deduplicated records, filtered article tables, search configuration files, bibliometric summaries, and derived visualizations and tabulations used to support the review. Records are classified using title and abstract text into the main thematic areas examined in the article, namely ESG and sustainability, innovation, and exit strategy. Depending on the file, variables may include article title, authors, year, source database, URL or DOI, abstract, matched search keyword, assigned topic tags, relevance indicators, duplicate status, and inclusion or exclusion decisions. This dataset is intended to support transparency and reproducibility of the review process, allowing readers to trace how the final corpus was constructed and how the descriptive and bibliometric outputs were produced. In addition to the data files, the associated code repositories provide the software used to generate the screening, tagging, aggregation, and visualization outputs reported in the article.
Files
Steps to reproduce
The public repo of the scraping and NLP tool we used is available here: https://gitlab.com/magix.ai/article-analysis-public