Search the repository
Recently published
146600 results
- OMANISHA: A Benchmark Dataset for Identifying and Categorizing Bengali Misogynistic TextOMANISHA (Online Misogynistic Annotated Natural-language Instances for Sentiment and Hate Analysis) is a curated Bengali dataset developed to support the automatic detection of misogynistic discourse in online spaces. Misogynistic content on social media has serious psychological, social, and institutional consequences for women, as it contributes to gender inequality, normalizes gender-based violence, and discourages women from participating freely in digital communities. Despite the global significance of Bengali, computational resources for detecting gender-based online abuse in Bengali remain limited. OMANISHA addresses this gap by providing a reliable, publicly accessible dataset for online misogyny detection. The dataset consists of 7,017 annotated Bengali text samples collected from diverse online platforms, including Facebook, YouTube, TikTok, Instagram, Twitter and online news portals. Each instance is assigned to one of four predefined categories: • Non-misogynistic: 2,420 samples • Stereotype: 1,744 samples • Derogation: 1,527 samples • Sexual harassment: 1,326 samples The dataset includes both formal and informal Bengali texts, reflecting real-world online communication patterns. English translations are also provided to enhance cross-lingual accessibility and support comparative NLP research. To ensure annotation reliability, each sample was independently annotated by two native Bengali annotators selected from a pool of four annotators with diverse gender, religious, ethnic and geographical backgrounds. Annotation disagreements were resolved through structured consultation with a third annotator. Annotation quality was evaluated using Cohen’s Kappa (κ = 0.76) and Krippendorff’s Alpha (α = 0.75), indicating substantial inter-annotator agreement. Additionally, pairwise Jaccard Similarity scores among the classes range from 0.12 to 0.21, suggesting clear taxonomic distinction across the defined categories. The preprocessing pipeline includes duplicate removal, text normalization and coherence checking to ensure data quality and integrity. Unlike binary misogyny detection datasets that simply classify content as misogynistic or non-misogynistic, OMANISHA offers fine-grained category-level annotations, enabling more precise analysis and content moderation. By making this dataset publicly available for research purposes, OMANISHA aims to advance low-resource Bengali NLP, support explainable AI-driven content moderation and encourage further innovation and collaboration within the Bengali NLP community.
- Supplemental Data - Effect of Dietary Choline and Diet Fermentability on Performance and Feeding Behavior of Postpartum Dairy CowsSupplemental Figures and Tables for: Pasch, K.R., F. Viganti, and W.E. Brown. 2026. Effect of Dietary Choline and Diet Fermentability on Performance and Feeding Behavior of Postpartum Dairy Cows. Dairy. 7:XXX. Accepted June 26, 2026.
- Hacimusalar Geophysical Survey DatasetThis dataset provides raw and processed ERT (electric resistivity tomography), GPR (ground penetrating radar) and magnetometry data acquired along a buried ancient wall at Hacımusalar Höyük, Elmalı, Antalya, Türkiye. The dataset shows anomalies caught in all three methods, allowing integrated interpretation of a shallowly buried, rectangular body with conductivity, magnetic susceptibility and dielectric contrast. If used in any scientific research, the researchers should cite this dataset and the following article after its publication: (Aydın et al., 202X, TBA).
- Anonymized_IDA_T2DM_DataRaw data for Prevalence of Iron Deficiency Anemia in Type 2 Diabetes: A Descriptive Cross-Sectional Study
- Dataset of bibliometric records and topic modelling outputs for AI-driven boiler optimisation in thermal power plants (2014–2025)This dataset supports the study of artificial intelligence (AI)-driven boiler optimisation in thermal power plants through bibliometric analysis and topic modelling. It comprises bibliographic records retrieved from the Scopus database for English-language journal articles and review papers published between 2014 and 2025. The dataset was developed to examine the evolution of research on AI applications for boiler performance optimisation, combustion control, emissions reduction, predictive maintenance, fault diagnosis, and intelligent monitoring in thermal power generation. It includes raw bibliometric records, processed text data, document–term matrices, Latent Dirichlet Allocation (LDA) outputs, temporal topic trends, and supporting bibliometric statistics. The LDA model identifies seven latent research topics from document titles, abstracts, and author keywords. The dataset contains document–topic probability distributions (theta.csv), topic–term probability distributions (beta.csv), dominant topic assignments, top-ranked topic terms, topic labels, and annual topic prevalence. These files enable users to investigate thematic structures, analyse the evolution of research topics over time, and compare topic distributions across publications. The dataset also includes processed text files (cleaned_corpus.csv and dtm.csv) to facilitate replication of the topic modelling workflow. Supporting files such as document_topic_full.csv and country_stats.csv provide integrated bibliometric metadata and publication statistics for further analysis. The data can be interpreted at multiple levels. Bibliometric records support analyses of publication trends, research productivity, collaboration patterns, and institutional or country contributions. The LDA outputs provide probabilistic representations of document topics, where higher topic probabilities indicate stronger thematic relevance. Topic–term probabilities identify the most representative terms within each topic, while annual topic prevalence enables assessment of changes in research emphasis over time. Researchers may use this dataset to reproduce the published analysis, evaluate alternative topic modelling approaches, benchmark text mining methods, conduct scientometric studies, or investigate emerging trends in AI applications for thermal power plants and energy systems. The dataset is compatible with R, Python, MATLAB, and other software environments that support CSV-formatted data.
- Supplementary Material for "Alopecia Areata: Advances in Clinical Evaluation and Pathogenesis"This supplement includes Supplementary Table 1 for the Journal of the American Academy of Dermatology Continuing Medical Education article titled "Alopecia Areata: Advances in Clinical Evaluation and Pathogenesis."
- Time, Edits, and Effort: Quantifying Cognitive Load in Post-Editing with EduApp LogsThis study investigates post-editing behaviour in translator training through automatically logged process metrics, exploring how temporal and technical indicators may approximate relative cognitive effort. Using 20 valid EduApp task-level records drawn from a classroom cohort of 32 undergraduate translation students, the study analyses edits per minute, character throughput, and edit density to identify provisional behavioural profiles. Correlational analysis shows a strong inverse relationship between time and throughput, suggesting that longer task duration may reflect greater processing demands, task difficulty, or extended evaluation, although the logged metrics remain indirect proxies rather than direct measures of cognitive load. Exploratory profiling distinguishes three task-level patterns - fast minimalists, deliberate revisers, and rewriters - each reflecting a different balance between fluency, evaluation, and revision intensity. The findings suggest that transparently and ethically captured process indicators can support classroom discussion of post-editing effort, strengthen learner reflection on machine-translation literacy, and help connect experimental cognitive translation studies with pedagogical practice.
- The fitting coefficients of ngWSGG-PS and the codes for calculating the parameters of ngWSGG-PSThis data provides the MATLAB code to calculate the absorption coefficients, weight factors, wall emissivities and wall absorptivities of ngWSGG-PS at given thermodynamic state. The data is applicable to conditions with nongray wall, pressures of 1-50 atm, temperatures of 300-2800 K, H₂O/CO₂ molar ratios of 0.05-4, and soot volume fractions of 0-20 ppm.
- Carbon Border Adjustment, Green Compliance Capability, and Export Readiness in the Global South: A Global CBAM-GCCI Panel StudyThis dataset provides the replication package for the study titled “Carbon Border Adjustment, Green Compliance Capability, and Export Readiness in the Global South: A Global CBAM-GCCI Panel Study.” The study examines how EU-facing Carbon Border Adjustment Mechanism (CBAM) exposure and national green-compliance readiness jointly shape export readiness and development vulnerability across a global country-year panel covering 2007-2024. The dataset includes processed HS07-based EU-facing export exposure measures, direct CBAM, near-CBAM, and broader green-compliance product classifications, the weighted CBAM exposure index, alternative exposure-weight specifications, normalized exposure intensity, Green Composite Capability Index (GCCI) variables, macroeconomic and governance controls, diagnostic outputs, robustness tables, and R scripts used for empirical estimation. The empirical framework combines three layers: product-level CBAM exposure construction, GCCI readiness construction, and panel econometric testing. The main econometric approach uses two-way fixed effects with Driscoll-Kraay robust standard errors, supported by robustness checks using alternative exposure weights, normalized exposure intensity, exchange-rate exclusion, and group-wise estimations. The original source data were obtained from CEPII-BACI HS07 trade data, World Development Indicators, and Worldwide Governance Indicators. This repository contains the cleaned and processed analytical files, documentation, variable dictionary, source register, and replication scripts required to reproduce the tables, diagnostics, and empirical results reported in the manuscript.
- XJTU-SESXJTU-SES dataset, which simulates various operating conditions of a photovoltaic IES, encompassing multiple sub-datasets. Our code are also available at Github. Our open-source dataset is named XJTU-SES, which contains three sub-datasets with sampling frequencies of 1Hz, 0.5Hz, and 0.1Hz, respectively. And the data length of each sub-dataset is greater than 20000. The details of the sensors are shown in Table V, which includes 33 sensors. This dataset is a forecasting dataset similar to ETTh1, with a specific focus on individual photovoltaic energy systems. It can be utilized for researches related to predictive maintenance and energy control and allocation. Currently, there are relatively abundant grid-level IES dataset resources, especially in smart grids field. However, there are very few public dataset resources for single photovoltaic devices, especially for spacecraft, mainly due to the difficulty of data acquisition and the principle of confidentiality. Nevertheless, given the significant economic value and potential losses incurred by these devices upon failure, predictive maintenance are essential. Consequently, we have released this dataset for contributing to the research about unit-level photovoltaic energy systems. Our experiment platform is shown in Fig. 4, including SA (Solar Array Simulator) for simulating the solar array to provide power, BCR (Battery Charge Regulator) for regulating the battery charge, BAT (Battery Set) for storing the power, PDM (Power Distribute Module) for distributing the power, PCU (Power Control Unit) for controlling the power, and LOAD (Load Simulator) for simulating the load to consume the power. Its detailed experimental settings and sensor information can be found in Appendix.

The Generalist Repository Ecosystem Initiative
Elsevier's Mendeley Data repository is a participating member of the National Institutes of Health (NIH) Office of Data Science Strategy (ODSS) GREI project. The GREI includes seven established generalist repositories funded by the NIH to work together to establish consistent metadata, develop use cases for data sharing, train and educate researchers on FAIR data and the importance of data sharing, and more.
Find out moreWhy use Mendeley Data?
Make your research data citable
Unique DOIs and easy-to-use citation tools make it easy to refer to your research data.
Share data privately or publicly
Securely share your data with colleagues and co-authors before publication.
Ensure long-term data storage
Your data is archived for as long as you need it by Data Archiving & Networked Services.
Keep access to all versions
Mendeley Data supports versioning, making longitudinal studies easier.
The Mendeley Data communal data repository is powered by Digital Commons Data.
Digital Commons Data provides everything that your institution will need to launch and maintain a successful Research Data Management program at scale.
Find out more