Synthetic and Realistic Laser–Tissue Interaction Data for Machine-Learning Applications

Published: 10 June 2025| Version 1 | DOI: 10.17632/ys4t55m57x.1
Contributor:
Ahmed Al-Dulaimi

Description

This dataset provides two complementary CSV files containing simulated laser–tissue interaction measurements, designed to support machine-learning research in biomedical optics and computational modeling: 1- **laser_tissue_data.csv** - Samples: 1,000,000 Features: - wavelength (nm) - pulse_duration (ns) - energy_density (J/cm²) - absorption_coeff (cm⁻¹) - thermal_conductivity (W/(m·K)) - fluence (J/cm²/ns) – derived - beam_profile – simulated optical absorption -- success (0/1) – binary target indicating whether the laser pulse meets thermal diffusion and energy thresholds These data were generated via uniform, log-normal, and normal distributions to represent idealized measurements of laser–tissue interactions under controlled conditions. 2- **realistic_laser_tissue_data.csv** - Samples: 1,000,000 Same feature set as above, with two added realism factors: - Measurement noise (configurable noise_level = 0.1) applied to each physical variable - Label noise (flip_prob = 0.05) randomly inverting 5% of the success outcomes This version simulates sensor variance and occasional mislabeling, making it suitable for testing robustness of classification and regression models. ------------------------------------------------- Use Cases & Applications - Train and benchmark classification models (e.g., logistic regression, random forests, neural networks) on binary outcome prediction. - Explore feature-engineering strategies (e.g., interaction terms, normalization) for optical parameters. - Evaluate the impact of measurement and label noise on model performance and generalizability. - Develop synthetic-data augmentation pipelines for scarce or ethically constrained biomedical datasets.

Files

Steps to reproduce

1. **Set Up a Python Virtual Environment** 2. **Install Dependencies** ```bash pip install numpy pandas ``` 3. **Create the Data-Generation Script** Save the following as `generate_laser_data.py` in your project folder: ```python import numpy as np import pandas as pd #––– Set a fixed seed for reproducibility ––– np.random.seed(42) def generate_laser_data(n_samples=10000): wavelength = np.random.uniform(500, 1100, n_samples) pulse_duration = np.random.lognormal(1, 0.5, n_samples) energy_density = np.random.normal(10, 3, n_samples) absorption_coeff = np.random.lognormal(2, 0.2, n_samples) thermal_conductivity = np.random.uniform(0.1, 5, n_samples) fluence = energy_density / pulse_duration beam_profile = np.exp(-absorption_coeff * wavelength / 1000) thermal_diffusion_time = (absorption_coeff ** 2) / (4 * thermal_conductivity) success = ((pulse_duration < thermal_diffusion_time) & (energy_density > 5)).astype(int) return pd.DataFrame({ 'wavelength': wavelength, 'pulse_duration': pulse_duration, 'energy_density': energy_density, 'absorption_coeff': absorption_coeff, 'thermal_conductivity': thermal_conductivity, 'fluence': fluence, 'beam_profile': beam_profile, 'success': success }) def generate_laser_data_realistic(n_samples=10000, noise_level=0.1, flip_prob=0.05): # Base data df = generate_laser_data(n_samples) # Introduce label flips flip_indices = np.random.choice(n_samples, int(flip_prob*n_samples), replace=False) df.loc[flip_indices, 'success'] = 1 - df.loc[flip_indices, 'success'] # Add measurement noise df['wavelength'] += np.random.normal(0, noise_level*100, n_samples) df['pulse_duration'] += np.random.normal(0, noise_level*1, n_samples) df['energy_density'] += np.random.normal(0, noise_level*10, n_samples) df['absorption_coeff'] += np.random.normal(0, noise_level, n_samples) df['thermal_conductivity'] += np.random.normal(0, noise_level*2, n_samples) # Recompute derived features df['fluence'] = df['energy_density'] / df['pulse_duration'] df['beam_profile'] = np.exp(-df['absorption_coeff'] * df['wavelength'] / 1000) return df if __name__ == "__main__": # Generate & save ideal dataset df1 = generate_laser_data() df1.to_csv('laser_tissue_data.csv', index=False) # Generate & save noisy dataset df2 = generate_laser_data_realistic() df2.to_csv('realistic_laser_tissue_data.csv', index=False) ``` 4. **Run the Script** ```bash python generate_laser_data.py ``` After running, you’ll have: * `laser_tissue_data.csv` * `realistic_laser_tissue_data.csv`

Categories

Optics, Machine Learning, Laser, Classification System, Regression Model

Licence