Skip to main content

Share your research data

Mendeley Data is a free and secure cloud-based communal repository where you can store your data, ensuring it is easy to share, access and cite, wherever you are.

Create a Dataset

Find out more about our institutional offering, Digital Commons Data

Search the repository

Recently published

140744 results
  • HausaSRS: A Hausa-English Parallel Corpus of Software Requirements Specifications
    This study hypothesizes that software requirements written in English can be systematically transformed into a high-quality Hausa Requirements Engineering dataset through a controlled pipeline that combines document harvesting, domain filtering, glossary-guided translation, weak annotation, and expert validation. A related hypothesis is that, in a low-resource setting, combining automated corpus construction with Human-in-the-Loop review can produce data of sufficient quality for downstream NLP tasks such as translation, FR/NFR classification, and token-level entity extraction. The data consists of a parallel and annotated corpus derived from ~350 Software Requirements Specification (SRS) documents collected across the health, education, and finance domains. These source documents were processed from PDF & DOCX formats. Text was extracted using pdfplumber and python-docx, cleaned with rule-based preprocessing to remove non-semantic artifacts such as page numbers, repeated spacing, URLs, and boilerplate labels, and filtered to retain requirement-like content using requirement-engineering keywords and modal patterns. English segments were retained, technical terms were anchored through a custom SRS glossary and named-entity recognition, and the retained text was translated into Hausa. Hausa outputs were normalized and weakly annotated with BIO tags and FR/NFR labels. Synthetic IEEE-style Hausa requirement templates were also introduced to strengthen corpus structure. After cleaning, deduplication, and removal of malformed rows, the dataset was stored as a cleaned silver corpus and partitioned into train, validation, and test subsets. Results show it is feasible to construct a domain-specific Hausa RE resource from heterogeneous SRS documents using a semi-automated workflow. It also shows that a glossary-aware translation and annotation strategy can preserve important software engineering concepts such as actors, system entities, constraints, and quality attributes in Hausa. Notably, automated annotation alone is not sufficient for reliable low-resource RE data; expert correction by a Hausa-speaking NLP specialist was necessary to refine mistranslations, resolve ambiguous labels, and correct token boundaries. This confirms the importance of expert validation in producing a gold-standard corpus from an initially silver dataset. The dataset is a structured representation of software requirements knowledge in Hausa, aligned with common RE tasks. The parallel English–Hausa component supports machine translation and cross-lingual modeling. The FR/NFR labels support requirement classification, while the BIO tags support sequence labeling and information extraction. Researchers can use the data for translation benchmarking, low-resource RE classification, domain-adaptive pretraining, or Hausa-specific entity extraction. More broadly, the data demonstrates a reproducible pathway for creating Requirements Engineering datasets in under-resourced languages.
  • Planned Congestion Dataset: PCU-Based Infrastructure Capacity Analysis for Phnom Penh, Cambodia
    This dataset accompanies the manuscript “Planned Congestion: How Development Approvals Embed Latent Infrastructure Failure in Rapidly Urbanising Cities — Evidence from Phnom Penh, Cambodia.” It provides all parameters, calculations, and scenario outputs required to reproduce the Passenger Car Unit (PCU)-based infrastructure capacity analysis presented in the paper. The dataset operationalises the concept of planned congestion, defined as infrastructure failure embedded in the development process through the decoupling of development approval from capacity-based planning. The case study focuses on Phnom Penh, Cambodia, where rapid speculative urban development has outpaced infrastructure provision. The Excel workbook is structured to ensure full transparency and reproducibility. The Parameters sheet contains all baseline inputs, including total floor area, worker density, modal split assumptions, PCU conversion factors, peak-hour factors, and network capacity. The Base_Calculation sheet reconstructs the core V/C calculation under current and full occupancy conditions. The Occupancy_Scenarios sheet presents V/C ratios across occupancy levels from 20% to 100%, corresponding to Figure 2 in the manuscript. The Modal_Split_Sensitivity sheet provides sensitivity analysis across different motorcycle–car modal shares, corresponding to Appendix A4.2. Additional sheets include extended sensitivity analyses and figure-ready datasets. All calculations are formula-based and visible within the workbook to facilitate verification. No external or proprietary data are used; the dataset is fully synthetic and derived from transparent assumptions documented in the manuscript. This dataset enables replication of all reported results, including the finding that the system operates at Level of Service F (V/C ≈ 1.17) at current occupancy (~60%) and reaches systemic overload (V/C ≈ 1.95) under full occupancy.
  • Coded analytical matrix of Environmental Impact Assessment reports for managed free-roaming cat colonies in protected natural areas of Gran Canaria, Spain (2025)
    This dataset contains the complete analytical coding matrix used in the study: Structural bias outweighs ecological evidence in assessing the environmental impacts of managed free-roaming cat colonies in protected natural areas (Manzanares-Fernández et al., 2026, Environmental Science & Policy). The dataset covers the full corpus of 72 Environmental Impact Assessment (EIA) reports produced between June and October 2025 by the consultancy ECOS Estudios Ambientales y Oceanografía SL on behalf of 11 municipalities in Gran Canaria (Canary Islands, Spain), under the mandate of the 2024 Canary Islands Regional Resolution on the management of community cat colonies in protected natural spaces. The reports evaluated registered managed free-roaming cat colonies located within the Canarian Network of Protected Natural Spaces and Natura 2000 sites (Rural Parks, Protected Landscapes, Natural Reserves, and Natural Monuments). Each of the 72 reports was coded independently across 19 variables organised in five analytical dimensions: (1) impact construction (definition mode, evidence type, empirical grounding); (2) risk framing and burden of proof; (3) animal representation and welfare incorporation; (4) procedural standardisation (structural, textual, and measures-level); and (5) assessment quality indicators (internal coherence and normative silences). Coding was applied holistically at the document level following a critical discourse analysis framework. Inter-coder reliability was assessed on a 14% sub-sample; Cohen's kappa values ranged from 0.71 to 0.82. The workbook comprises four sheets. The README sheet provides dataset metadata, variable definitions, and citation information. The Dataset sheet contains the full 72 × 19 coded matrix, with one row per report and anonymised report identifiers (RPT_001–RPT_072). The Codebook sheet documents each variable's data type, description, category labels, and coding notes. The Summary_statistics sheet provides cross-tabulations of all categorical variables by conclusion type (Incompatible, n = 34; Compatible, n = 21; Deferred, n = 17).
  • DIGITAL MATURITY AND TECHNOLOGY ADOPTION IN THE CONSTRUCTION INDUSTRY: A GLOBAL BIBLIOMETRIC ANALYSIS OF TRENDS, THEMATIC CLUSTERS, AND EMERGING FRONTIERS (2015–2025)
    Purpose: This study provides a detailed global bibliometric analysis of 446 articles found on the Scopus database, published in 2015-2025, to describe the dynamics of publications, the most impactful contributors, visualise thematic cluster patterns, and find novel areas of research by identifying bursts of keywords Design/Methodology/Approach: The PRISMA 2020 protocol guided systematic screening of the Scopus database. The Boolean search query of the form of “digital maturity” OR “digital technology” AND “construction industry” provided 778 initial records, narrowed down to 446 with subject area, document type, and language sequential filters. VOSviewer was used to map co-occurrence networks of keywords through bibliometric analysis, and the automaton-based burst detection algorithm with silhouette scoring of temporal keywords was adopted Findings: The dataset accumulates a total of 7,462 citations in 446 documents, a field h-index of 44 and a mean of 16.73 citations per document. The number of publications has increased fivefold; in 2015-2018, the access to 28 documents, and in 2024-2025, the access to 195 documents, respectively. China has the largest number of documents (93 documents), whereas the United Kingdom has the highest citation impact (2,096 citations). The most prolific African contributor (49 documents) is South Africa. The most common contributors are Australia with the high density of citations (47 documents, 1,806 citations; 38.4 per document). The most focused active frontiers of the field, according to keyword burst detection, are productivity (burst strength = 5.817, 2019-2023 silhouette = 1.000) and Nigeria as a research geography (4.056, 2023-2024 silhouette = 0.896). Sustainable construction (3.564, 2019–2024, silhouette = 0.763) and infrastructure (3.060, 2019–2024, silhouette = 1.000). Also, four thematic clusters were defined. Research Limitations: The analysis is limited to Scopus-indexed, English-language publications. Disposal of papers during manual content screening introduces a degree of subjectivity. Practical Implications: Policymakers in developing economies should prioritise digital infrastructure investment. Research funders should direct resources toward the identified active frontiers. Originality/Value: This is a bibliometric analysis that comprehensively map the intellectual space of digital maturity and technology uptake in construction through burst detection and silhouette scoring alongside the traditional bibliometric techniques over the entire 2015-2025 horizon, providing a data-based prioritisation of future research directions. Keywords: Digital maturity; digital technology adoption; construction industry; bibliometric analysis; VOSviewer; PRISMA; Construction 4.0; sustainable construction
  • Temperature and Humidity Dependent progression and maturation rate of mosquito
    The study involved rearing mosquitoes in simulated temperatures and relative humidity. A total of four hundred (400) eggs of An. arabiensis were obtained from JU TIDRC insectary and reared to adults. Mosquitoes were reared across temperatures ranging from the minimum 14.55°C to maximum 34.35°C and relative humidity spanning from 64% to 86%. Each temperature and humidity combination were replicated four times, with one hundred mosquito eggs placed in petri dishes containing wet filter paper. After a 24-hour period, the eggs were carefully washed and transferred into hatching trays. Throughout the larval stage, adequate nourishment was provided, and observations were recorded until pupation occurred. Subsequently, pupae were relocated into cages, and the emergence of adult mosquitoes was closely monitored until the end of their lifespan. Detailed records were maintained, capturing key parameters such as the number of days required for egg hatching, as well as the survival or mortality rates of eggs hatching, larvae, pupae and adult mosquitoes under each temperature and humidity condition.
  • Irradiation-induced void swelling dataset for austenitic alloy under EBR-II and HFIR reactor conditions
    Supplementary material for the study "Machine learning-based prediction and mechanistic interpretation of irradiation-induced swelling in austenitic steels: The roles of irradiation conditions and alloying elements"
  • Notes on replicating the results of “Employment trade-offs in climate policy: Evidence from China carbon market”
    Notes on replicating the results of “Employment trade-offs in climate policy: Evidence from China carbon market”
  • Data for: Gender bias in bird ringing: uncovering the massive female dropout in volunteer ecological monitoring
    This dataset contains the anonymized quantitative responses (N=561) from a nationwide survey of the French bird ringing community. It includes demographic variables (gender, age class, license status, parenthood) and categorical responses regarding self-perceived ornithological knowledge, feelings of illegitimacy, training support, integration, and safety during fieldwork. These data were collected to investigate gender disparities and experiences within the bird ringing community, and they support the findings presented in the article "Gender bias in bird ringing: uncovering the massive female dropout in volunteer ecological monitoring" submitted to Biological Conservation. Ethical note: In compliance with the European General Data Protection Regulation (GDPR) and to ensure participant confidentiality, all personally identifiable information and free-text comments (including qualitative interview transcripts) have been completely removed from this dataset.
  • Infrastructure development within the Belt and Road Initiative as a factor in expanding interaction between Chinese and Russian enterprises
    thesis
  • patient data
    This study investigates the additional impact of Leap Motion-based VR therapy, when used alongside conventional treatment, on the upper extremity function, physical activity and participation of stroke patients. Forty post-stroke patients were randomly assigned to either a control (CG) or an experimental group (EG). Both groups received 60 minutes of CT, while the EG received additional 40 minutes of VR therapy, five days a week for four weeks. Outcome measures included the Fugl-Meyer Assessment for Upper Extremity (FMA-UE), Selective Control of the Upper Extremity Scale (SCUES), Nine-Hole Peg Test (NHPT), Disabilities of the Arm, Shoulder and Hand questionnaire (DASH-T), and Stroke-Specific Quality of Life Scale (SS-QOL). Outcome assessments were performed by an independent assessor at baseline (T0), post-intervention (4 weeks,T1), and at 1 (T2)- and 3 (T3)-month follow-ups. Patient demographic data and assessment results for each time period are presented individually in Excel.
View more
GREI

The Generalist Repository Ecosystem Initiative

Elsevier's Mendeley Data repository is a participating member of the National Institutes of Health (NIH) Office of Data Science Strategy (ODSS) GREI project. The GREI includes seven established generalist repositories funded by the NIH to work together to establish consistent metadata, develop use cases for data sharing, train and educate researchers on FAIR data and the importance of data sharing, and more.

Find out more

Why use Mendeley Data?

Make your research data citable
Unique DOIs and easy-to-use citation tools make it easy to refer to your research data.
Share data privately or publicly
Securely share your data with colleagues and co-authors before publication.
Ensure long-term data storage
Your data is archived for as long as you need it by Data Archiving & Networked Services.
Keep access to all versions
Mendeley Data supports versioning, making longitudinal studies easier.

The Mendeley Data communal data repository is powered by Digital Commons Data.

Digital Commons Data provides everything that your institution will need to launch and maintain a successful Research Data Management program at scale.

Find out more