Skip to main content

Journal of Biomedical Informatics

ISSN: 1532-0464

Visit Journal website

Datasets associated with articles published in Journal of Biomedical Informatics

Filter Results
1970
2025
1970 2025
8 results
  • Data for: Identifying the essential nodes in network pharmacology based on multilayer network combined with random walk algorithm
    Data related to herb XiaoChaiHu Decoction, including chemical compounds, proteins, metabolic pathways, diseases.
    • Dataset
  • Data for: Research on Chinese Medical Named Entity Recognition Based on Collaborative Coopera-tion of Multiple Neural Network Models
    codes and data for this research paper.
    • Dataset
  • Data for: Concept Embedding to Measure Semantic Relatedness for Biomedical Information Ontologies
    we extended the definition information of the CUI terms using the Wikipedia database to improve the coverage of the similarity model. Second, we adopted document embedding for vector representations of the CUI terms. We used UMLS2015AB for the data.
    • Dataset
  • DrugSemantics Gold Standard
    DrugSemantics gold standard consists of 5 Summaries of Product Characteristics (SPC) written in Spanish. SPCs were retrieved from Medicines Online Information Center - CIMA - that belongs to the Spanish Agency for Medicines and Health Products - AEMPS. This corpus is annotated with 10 Named Entities (NE) related to pharmacotherapeutic care, namely: Chemical Composition, Disease, Drug, Excipient, Food, Medicament, Pharmaceutical Form, Route, Therapeutic Action and Unit of Measurement. It contains 2241 ENs, 780 sentences and 226,729 tokens. The zip file is organized as follows: Each SPC is in a separte folder containing one xml file that contains the annotated documents in Gate Standoff format. DrugSemantics was designed to be used for developing and testesting of Spanish NE recogniton tools in the pharmacotherapeutic domain.
    • Dataset
  • NLP4RARE-NER
    El dataset contiene el código de la implementación y experimentación de los enfoques propuestos en el artículo: Isabel Segura Bedmar, David Camino Perdones, Sara Guerrero Aspizua. (2022). Exploring deep learning methods for recognizing rare diseases and their clinical manifestations from texts. BMC Bioinformatics 23, 263 . ISSN: 1471-2105. DOI: https://doi.org/10.1186/s12859-022-04810-y
    Estos enfoques basados en técnicas de PLN y aprendizaje profundo están dirigido al reconocimiento automático de enfermedades raras y sus manifestaciones clínicas en textos biomédicos.
    Gracias a este código (también disponible en el repositorio público) https://github.com/isegura/NLP4RARE-NER
    • Dataset
  • Utility and Privacy in Generating Synthetic Social Science Data
    Modern computational social science aims to gain insights into the human experience by utilizing various data sources and methods. However, the sensitivity of both qualitative and quantitative data concerning individuals due to privacy concerns and data protection issues hinders the greatest use of social science data. One promising approach to tackle this challenge is to generate synthetic social science data that is structurally and statistically similar to the real data. The synthetic data will be useful in the exploratory research phase to determine the usability of the real data for answering specific research questions. In this work, we collaborated with NRO, CBS, and SURF (OSSC) to generate realistic and privacy-preserving synthetic data using cognitive student data from CBS. We advanced the basic Generaive Adversarial Network model by consisting four components including transformation, sampling, conditioning, and networking training with differential privacy. The generator was designed to capture the relations between variables in real data and simulate the same relations in the synthetic data. To motivate the generator to create diverse and representative synthetic data, we apply Wasserstein distances with gradient penalty and then group the training samples in the discriminator. Finally, we provide a privacy guarantee through a differential privacy approach that injects Gaussian noise into the penalty gradients in the training process. Under a certain differential privacy threshold, the synthetic data will not leak sensitive information originating in the source data. We evaluated the quality of the synthetic data by comparing the analyses results on real and synthetic data and assessed the privacy risk using information disclosure meatures and attacker models. We found that stronger protection of privacy reduces quality of the synthetic data in terms of similarity to the original data, and consequently becomes less "useful" as a direct proxy to those data. Therefore, this work also discussed the trade-off between data utility and privacy in generating and using synthetic data in practical application in social science.
    • Slides
  • HemOnc CC BY subset
    This dataset is a subset of the full HemOnc ontology centered on the Component concept and its relationships and hierarchies. It also includes all Conditions from the HemOnc ontology. Components are mapped to RxNorm codes and Conditions are mapped to NCIT, ICD-O-3, and SEER Site recodes. HemOnc follows the OMOP Common Data Model format and specifications.
    • Dataset
  • HemOnc knowledgebase
    HemOnc is derived from content present on the HemOnc.org website, as well as additional content accessed through the RxNorm API, the PubMed API, and locally maintained curation. Older versions are available through the HemOnc Dataverse; this is the most updated version of the OMOP tables, ancillary tables with "production" status, the data dictionary and overview, and the "what's new" file.
    • Dataset