Mendeley Data Showcase
Filter Results
135348 results
- ClassTrimThe experiment dataset is organized into two main folders: ClassTrim and baseline. The ClassTrim folder contains all raw experimental data produced by our proposed model. It includes detailed execution records, refactoring recommendations, and intermediate results generated during the experiments. The baseline folder contains the complete records of how the baseline tools were executed, along with the refactoring results produced by each baseline approach. Each folder is accompanied by a dedicated README.md file that precisely explains the directory structure, file formats, and the semantics of each data file. At the top level of the dataset, experiment-result.xlsx and baseline.xlsx provide summarized results derived from the raw data. These two files aggregate all experimental outcomes and serve as the direct data source for the tables reported in the paper.
- UzNER-5Style CorpusAs part of the study, an annotated corpus of the Uzbek language was created for training and evaluating named entity recognition (NER) models. The corpus consists of 5,000 sentences (65,608 words) collected from various sources. The data were compiled from the following sources: • Legislative and official documents: Part of the data was extracted from the publicly available lex.uz database, which contains official and normative legal texts. • Mass media sources: To enrich the corpus, materials were collected from the online news platform kun.uz and transcripts of videos from the youtube.com platform covering various topics. • Literary sources: Excerpts from the novel “Kecha va kunduz” were selected to represent the literary style. • Scientific sources: Texts in the scientific style were obtained from conference proceedings and academic collections. • Synthetic data: In addition, synthetic sentences in the scientific style were generated to further diversify the corpus. The collected data were organized according to five major functional styles: colloquial, official, scientific, literary, and publicistic. This approach ensured thematic and stylistic diversity of the corpus and enhanced the effectiveness of model training. Data annotation was performed manually using the BIOES tagging scheme, which enables precise identification of the boundaries and types of named entities. All annotated data were reviewed and validated by Uzbek language experts to ensure accuracy and consistency.
- Atypical Borrowing by Resource-based Firms: Implications for Resource CurseReplication package for the paper
- Raw data for ManuscriptRaw testing data
- UzThemeLex Dataset: An Uzbek Thematic Lexicon for Domain Terminology and Weakly Supervised NERUzThemeLex is a curated Uzbek-language thematic lexicon dataset designed for domain terminology mining and weakly supervised named entity recognition (NER). The release contains 4,945 unique terminological entries organized into 3 top-level domains (Agronomy, Economics and Business, Law and Governance) and 30 subcategories. Each entry provides the Uzbek term in Latin script, a normalized form for matching, a paraphrased Uzbek definition, domain and subcategory labels, provenance pointers to authoritative sources, and lightweight quality-control signals (heuristic confidence score, review flag, ambiguity flag). Optional fields include aliases and example sentences. The dataset is distributed in multiple formats to support both manual inspection and machine processing. It includes a flat CSV file and a multi-sheet Excel workbook, together with a data dictionary that documents all columns and label sets. For training and pipeline integration, the release also provides JSON/JSONL exports, taxonomy metadata, and ready-to-use pattern files for dictionary-based tagging and weak supervision (e.g., spaCy EntityRuler patterns). A validation script is included to help users verify schema consistency and detect formatting issues (e.g., residual Cyrillic characters and apostrophe normalization). UzThemeLex can be used as (i) a domain dictionary for keyword-based classification and information extraction in Uzbek texts and (ii) a gazetteer for generating weak labels to train or fine-tune NER models. The resource is intended to support Uzbek NLP research and applied text analytics in agriculture, economics, and legal/governance domains.
- Long-term population dynamics of a human-associated colonial raptor: case of Lesser Kestrel Falco naumanni Breeding biology data used in article
- Research on the lag effect of rainfall erosivity in southern China: relevant DataThis dataset constitutes critical data for research into the lag effect of rainfall erosion potential in southern China, primarily comprising rainfall erosion potential data calculated from daily rainfall amounts, alongside various climatic index data.
- Large Language Models and the Labour Market: Spatial Evidence from Job AdsThis is the database for the following article: Baranyai, E., Granát, M., Szepesi, M. (2026) Large Language Models and the Labour Market: Spatial Evidence from Job Ads. Despite the rapid rise of large language models (LLMs) and their implications for productivity and employment, little is known about how exposure to LLMs varies within and across countries. Understanding these patterns matters because technology spillovers are often geographically localised, and regional disparities can affect domestic and international labour market flows, long-term growth, social cohesion, and political stability. Using geolocated, task-level data from all job advertisements on Hungarys largest online job portal, we apply an LLM-based mapping approach to estimate job exposure. Extrapolating to the national labour market, we find Hungarys average LLM exposure to be 8%, substantially lower than estimates for the United States. This gap is partly explained by Hungarys higher share of physically intensive occupations and lower prevalence of office-based roles. Job-level exposure rarely exceeds 30%, suggesting that LLMs primarily complement rather than replace tasks. Spatial variation in exposure is driven mainly by industry composition, with higher exposure in urban areas and associations with current demographic characteristics and historical economic development. These findings highlight regions and sectors where productivity gains from LLM adoption may arise and where targeted education and employment policies could support workforce adjustment.
- Lichen diversity, indicator species, and community turnover along a fog oasis–Andean gradient in Peru: a baseline for climate change biomonitoringLichens are sensitive bioindicators of environmental change, yet their diversity patterns remain poorly documented in the hyperarid ecosystems of southern Peru. This study characterizes, for the first time, lichen diversity, community composition, and indicator species along a coastal-Andean altitudinal gradient (683–3,756 m a.s.l.) encompassing fog oases (lomas) and high-Andean shrublands in Moquegua, a region facing increasing pressure from large-scale copper mining and projected climate change impacts. We established 175 sampling units across 35 stations in seven sectors, identifying 53 lichen species belonging to 33 genera, 13 families, 10 orders, and 3 classes (Lecanoromycetes, Candelariomycetes, and Arthoniomycetes). Alpha diversity indices (Shannon, Simpson, Pielou) were compared between ecosystems using Wilcoxon and t-tests, community composition was analyzed through NMDS and PERMANOVA, environmental predictors were evaluated using GLM and LM, and indicator species were identified with the IndVal.g index. Simpson diversity was significantly higher in Andean shrublands than in coastal fog oases (0.83 ± 0.06 vs. 0.75 ± 0.09; p = 0.029), while Pielou's evenness showed highly significant differences (0.90 ± 0.06 vs. 0.77 ± 0.08; p < 0.001). Remarkably, community composition showed complete species turnover between ecosystems (β-diversity = 1), with 28 species exclusive to coastal fog oases and 25 species exclusive to Andean shrublands—a pattern rarely documented in lichenological studies. PERMANOVA confirmed highly significant compositional differences at both ecosystem (R² = 0.280, p = 0.001) and sector (R² = 0.765, p = 0.001) levels. Altitude and climatic zone emerged as the primary environmental predictors, explaining 60.9% of Shannon diversity variance. Thirty-four indicator species were identified with significant ecological fidelity (p ≤ 0.05), including Umbilicaria polyphylla (IndVal = 0.980) for Andean shrublands and Flavopunctelia flaventior (IndVal = 0.707) for coastal fog oases.
- Intercomparison of MgO, Au, Pt pressure standards assessed by NaCl B2 primary pressure scaleAccurate pressure determination is essential to investigate the behaviors of materials under extreme conditions. Despite decades of efforts, it remains a fundamental challenge to establish reliable and interconsistent absolute pressure scales at multi-Mbar conditions as separate calibration studies on same pressure standards as well as cross-calibrations of multiple pressure standards show substantial discrepancies often reaching ~5% at 1 Mbar and ~10% at 3 Mbar. In this study, the elasticity-based primary NaCl B2 pressure scale originally established by Murakami and Takata (2019) is further refined using a modified version of 3rd order Birch-Murnaghan equation of state and fully self-consistent thermodynamic parameters. Thermal pressure effect is calculated for NaCl B2 phase using experimentally-measured thermodynamic parameters at 300 K and Mie‒Grüneisen‒Debye model (MGD) under quasi-harmonic assumption, allowing the thermal EOS of NaCl B2 to be constructed as a reliable primary pressure scale applicable under simultaneous high pressure and temperature conditions. Then, independent high pressure molar volume data (V_MgO,V_Au,V_Pt,V_(NaCl B2)) from published co-compression experiments are reanalyzed to establish secondary MgO, Au, and Pt EOSs using our primary NaCl B2 scale as reference to facilitate intercomparison among these pressure standards/scales as well as to reconcile potential causes of their discrepancies. Finally, our results show that experimental pressures can be determined with a precision better than 2% up to 2 Mbar using a series of independently established and yet mutually consistent primary NaCl B2, MgO, Au and Pt pressure scales.
1