Synthetic dataset of PLGA and liposome nanocarrier formulations for brain-cancer-relevant drug delivery, release, blood-brain-barrier transport, and paired cell-viability proxies

Published: 20 April 2026| Version 1 | DOI: 10.17632/jvfb9mjzws.1
Contributors:
Syauqi Abrori,

Description

This is a fully synthetic/simulated dataset package designed to support materials-informatics and comparative formulation analysis of PLGA nanoparticles and liposomes for brain-cancer-relevant drug delivery. The package contains 6,000 unique virtual formulations in a master table and three linked long-format tables describing time-resolved release profiles (360,000 rows), blood-brain-barrier transport proxies (54,000 rows), and paired tumor/non-tumor cell-assay proxies (432,000 rows), totaling approximately 846,000 assay-like rows. Variables include composition descriptors, preparation routes, physicochemical properties, targeting features, encapsulation efficiency, drug loading, stability, biodegradation proxy, serum stability proxy, integrated BBB transport score, cellular uptake score, biocompatibility score, tumor-directed cytotoxicity proxy, off-target toxicity proxy, and derived multi-criteria performance scores. The dataset is fully synthetic/simulated — it does not contain patient data, animal data, clinical records, or published experimental rows. It was generated through a transparent 15-step workflow combining domain-informed hierarchical priors, latent heterogeneity, batch effects, replicate variation, bounded noise, and post-generation validation. The package is distributed with its full Python generator, JSON configuration, codebook, validation report, and three reproducible figures. It is intended for methodological reuse, surrogate modeling, benchmarking, multi-criteria optimization, machine-learning workflow development, comparative formulation analytics, and teaching of reproducible nanomedicine data workflows. It is NOT intended to support clinical, regulatory, or efficacy claims. This dataset accompanies a manuscript submitted to Data in Brief (Elsevier) and extends the scientific themes synthesized in the related review article by Makalew & Abrori, OpenNano 21 (2025) 100225.

Files

Steps to reproduce

1. Download all files in this dataset. 2. Install Python 3.9+ with numpy and pandas. 3. Run python generate_synthetic_data.py to regenerate the four main CSV tables from the fixed seed (20260419) specified in generation_config.json. 4. Run python generate_manuscript_figures.py to regenerate the three figures from the CSV tables. 5. Consult data_generation_protocol.md for the 15-step generation workflow and validation_report.md for quality-control checks. 6. Consult metadata_codebook.csv for the variable dictionary.

Institutions

Categories

Drug Delivery, Nanomedicine

Licence