Datasets Comparison
Version 1
60,000 SCAPS-1D samples spanning 15 electron transport layers (ETLs) and hole transport layers (HTLs) combinations (TiO₂, SnO₂, ZnO × CuI, Cu₂O, Spiro-OMeTAD, P3HT, PTAA) under 900 lux cool-white and warm-white LEDs
Description
This dataset supports the study "Machine-learning-driven Pareto screening of transport layers for indoor CsPbI₂Br
perovskite solar cells with zero-shot cross-light-source calibration." All device simulations were performed with
SCAPS-1D v3.3.11 under two indoor LED spectra at 900 lux: cool-white (0.309 mW/cm²) and warm-white (0.278 mW/cm²),
covering 15 electron-transport-layer/hole-transport-layer (ETL/HTL) combinations formed by three ETLs (TiO₂, SnO₂,
ZnO) and five HTLs (CuI, Cu₂O, PTAA, P3HT, Spiro-OMeTAD). The four photovoltaic targets predicted by the machine
learning models are power conversion efficiency (PCE), open-circuit voltage (Voc), short-circuit current density
(Jsc), and fill factor (FF).
The repository is organized into four folders. The data folder contains two CSV files (train_warm_0278.csv and
train_cold_0309.csv, approximately 30,000 samples each) holding the simulated device parameters and corresponding
photovoltaic outputs for each light source. The def folder provides the 15 SCAPS-1D device definition files (.def)
used to generate the simulations, one per ETL/HTL combination. The code folder contains four Python scripts that
reproduce all results in the paper: train_xgboost.py (Step 1) trains single-task XGBoost regressors with SHAP
analysis; train_fttransformer.py (Step 2) trains the multi-task Feature Tokenizer + Transformer (FT-Transformer) with
attention-weight analysis; ch4_2_spectral_analysis.py (Step 3) evaluates cross-light-source generalization across
three training tiers (domain-aware, domain-blind, and source-only) with six calibration strategies; and
ch4_3_material_analysis.py (Step 4) performs Monte Carlo robustness analysis, defect-tolerance ranking, and
Pareto-frontier optimization over all 15 material combinations. The scripts must be run in order; Steps 3 and 4 can
alternatively load the pre-trained weights provided in the models folder to skip retraining. The results folder
contains all figures and CSV tables generated by the four scripts, organized by chapter section.
Categories
Solar Cell, Machine Learning
Licence
Creative Commons Attribution 4.0 International
Version 2
60,000 SCAPS-1D samples spanning 15 electron transport layers (ETLs) and hole transport layers (HTLs) combinations (TiO₂, SnO₂, ZnO × CuI, Cu₂O, Spiro-OMeTAD, P3HT, PTAA) under 900 lux cool-white and warm-white LEDs
Description
This dataset supports the study "Machine-learning-driven Pareto screening of transport layers for indoor CsPbI₂Br
perovskite solar cells with zero-shot cross-light-source calibration." All device simulations were performed with
SCAPS-1D v3.3.11 under two indoor LED spectra at 900 lux: cool-white (0.309 mW/cm²) and warm-white (0.278 mW/cm²),
covering 15 electron-transport-layer/hole-transport-layer (ETL/HTL) combinations formed by three ETLs (TiO₂, SnO₂,
ZnO) and five HTLs (CuI, Cu₂O, PTAA, P3HT, Spiro-OMeTAD). The four photovoltaic targets predicted by the machine
learning models are power conversion efficiency (PCE), open-circuit voltage (Voc), short-circuit current density
(Jsc), and fill factor (FF).
The repository is organized into four folders. The data folder contains two CSV files (train_warm_0278.csv and
train_cold_0309.csv, approximately 30,000 samples each) holding the simulated device parameters and corresponding
photovoltaic outputs for each light source. The def folder provides the 15 SCAPS-1D device definition files (.def)
used to generate the simulations, one per ETL/HTL combination. The code folder contains four Python scripts that
reproduce all results in the paper: train_xgboost.py (Step 1) trains single-task XGBoost regressors with SHAP
analysis; train_fttransformer.py (Step 2) trains the multi-task Feature Tokenizer + Transformer (FT-Transformer) with
attention-weight analysis; ch4_2_spectral_analysis.py (Step 3) evaluates cross-light-source generalization across
three training tiers (domain-aware, domain-blind, and source-only) with six calibration strategies; and
ch4_3_material_analysis.py (Step 4) performs Monte Carlo robustness analysis, defect-tolerance ranking, and
Pareto-frontier optimization over all 15 material combinations. The scripts must be run in order; Steps 3 and 4 can
alternatively load the pre-trained weights provided in the models folder to skip retraining. The results folder
contains all figures and CSV tables generated by the four scripts, organized by chapter section.
Categories
Solar Cell, Machine Learning
Licence
Creative Commons Attribution 4.0 International