Skip to main content

Journal of Biomedical Informatics

ISSN: 1532-0464

Visit Journal website

Datasets associated with articles published in Journal of Biomedical Informatics

Filter Results
1970
2024
1970 2024
5 results
  • Data for: Identifying the essential nodes in network pharmacology based on multilayer network combined with random walk algorithm
    Data related to herb XiaoChaiHu Decoction, including chemical compounds, proteins, metabolic pathways, diseases.
    • Dataset
  • Data for: Research on Chinese Medical Named Entity Recognition Based on Collaborative Coopera-tion of Multiple Neural Network Models
    codes and data for this research paper.
    • Dataset
  • Data for: Concept Embedding to Measure Semantic Relatedness for Biomedical Information Ontologies
    we extended the definition information of the CUI terms using the Wikipedia database to improve the coverage of the similarity model. Second, we adopted document embedding for vector representations of the CUI terms. We used UMLS2015AB for the data.
    • Dataset
  • DrugSemantics Gold Standard
    DrugSemantics gold standard consists of 5 Summaries of Product Characteristics (SPC) written in Spanish. SPCs were retrieved from Medicines Online Information Center - CIMA - that belongs to the Spanish Agency for Medicines and Health Products - AEMPS. This corpus is annotated with 10 Named Entities (NE) related to pharmacotherapeutic care, namely: Chemical Composition, Disease, Drug, Excipient, Food, Medicament, Pharmaceutical Form, Route, Therapeutic Action and Unit of Measurement. It contains 2241 ENs, 780 sentences and 226,729 tokens. The zip file is organized as follows: Each SPC is in a separte folder containing one xml file that contains the annotated documents in Gate Standoff format. DrugSemantics was designed to be used for developing and testesting of Spanish NE recogniton tools in the pharmacotherapeutic domain.
    • Dataset
  • Utility and Privacy in Generating Synthetic Social Science Data
    Modern computational social science aims to gain insights into the human experience by utilizing various data sources and methods. However, the sensitivity of both qualitative and quantitative data concerning individuals due to privacy concerns and data protection issues hinders the greatest use of social science data. One promising approach to tackle this challenge is to generate synthetic social science data that is structurally and statistically similar to the real data. The synthetic data will be useful in the exploratory research phase to determine the usability of the real data for answering specific research questions. In this work, we collaborated with NRO, CBS, and SURF (OSSC) to generate realistic and privacy-preserving synthetic data using cognitive student data from CBS. We advanced the basic Generaive Adversarial Network model by consisting four components including transformation, sampling, conditioning, and networking training with differential privacy. The generator was designed to capture the relations between variables in real data and simulate the same relations in the synthetic data. To motivate the generator to create diverse and representative synthetic data, we apply Wasserstein distances with gradient penalty and then group the training samples in the discriminator. Finally, we provide a privacy guarantee through a differential privacy approach that injects Gaussian noise into the penalty gradients in the training process. Under a certain differential privacy threshold, the synthetic data will not leak sensitive information originating in the source data. We evaluated the quality of the synthetic data by comparing the analyses results on real and synthetic data and assessed the privacy risk using information disclosure meatures and attacker models. We found that stronger protection of privacy reduces quality of the synthetic data in terms of similarity to the original data, and consequently becomes less "useful" as a direct proxy to those data. Therefore, this work also discussed the trade-off between data utility and privacy in generating and using synthetic data in practical application in social science.
    • Slides