60,000 SCAPS-1D samples spanning 15 electron transport layers (ETLs) and hole transport layers (HTLs) combinations (TiO₂, SnO₂, ZnO × CuI, Cu₂O, Spiro-OMeTAD, P3HT, PTAA) under 900 lux cool-white and warm-white LEDs
Description
This dataset supports the study "Machine-learning-driven Pareto screening of transport layers for indoor CsPbI₂Br perovskite solar cells with zero-shot cross-light-source calibration." All device simulations were performed with SCAPS-1D v3.3.11 under two indoor LED spectra at 900 lux: cool-white (0.309 mW/cm²) and warm-white (0.278 mW/cm²), covering 15 electron-transport-layer/hole-transport-layer (ETL/HTL) combinations formed by three ETLs (TiO₂, SnO₂, ZnO) and five HTLs (CuI, Cu₂O, PTAA, P3HT, Spiro-OMeTAD). The four photovoltaic targets predicted by the machine learning models are power conversion efficiency (PCE), open-circuit voltage (Voc), short-circuit current density (Jsc), and fill factor (FF). The repository is organized into four folders. The data folder contains two CSV files (train_warm_0278.csv and train_cold_0309.csv, approximately 30,000 samples each) holding the simulated device parameters and corresponding photovoltaic outputs for each light source. The def folder provides the 15 SCAPS-1D device definition files (.def) used to generate the simulations, one per ETL/HTL combination. The code folder contains four Python scripts that reproduce all results in the paper: train_xgboost.py (Step 1) trains single-task XGBoost regressors with SHAP analysis; train_fttransformer.py (Step 2) trains the multi-task Feature Tokenizer + Transformer (FT-Transformer) with attention-weight analysis; ch4_2_spectral_analysis.py (Step 3) evaluates cross-light-source generalization across three training tiers (domain-aware, domain-blind, and source-only) with six calibration strategies; and ch4_3_material_analysis.py (Step 4) performs Monte Carlo robustness analysis, defect-tolerance ranking, and Pareto-frontier optimization over all 15 material combinations. The scripts must be run in order; Steps 3 and 4 can alternatively load the pre-trained weights provided in the models folder to skip retraining. The results folder contains all figures and CSV tables generated by the four scripts, organized by chapter section.