mmWave Skin Cancer Imaging Dataset
Description
This dataset comprises synthetic millimeter-wave (MMW) electromagnetic simulations generated using the Meep finite-difference time-domain (FDTD) framework for skin cancer imaging. A multi-input multi-output (MIMO) antenna configuration is employed to capture diverse spatial scattering responses, enhancing sensitivity to tumor presence and depth. The dataset models wave–tissue interactions across heterogeneous skin layers, incorporating variations in dielectric properties, tumor sizes, depths, and spatial distributions. Each sample includes multi-channel time-domain electromagnetic field responses corresponding to different transmit–receive antenna pairs, along with ground-truth tumor annotations. This dataset is designed to facilitate the development of advanced signal processing and deep learning methods for accurate tumor detection and localization, particularly in scenarios involving deep or weakly scattering lesions where conventional approaches face limitations.
Files
Steps to reproduce
The dataset was generated using the Meep finite-difference time-domain (FDTD) solver. A multi-layer skin model was constructed to represent epidermis, dermis, and subcutaneous tissue, each assigned frequency-dependent dielectric properties based on established literature. Tumor inclusions were modeled as localized dielectric perturbations with randomized sizes, depths, and lateral positions. A multi-input multi-output (MIMO) antenna configuration was implemented, consisting of multiple transmit–receive elements arranged around the region of interest. Each transmitter was sequentially excited using a broadband Gaussian pulse in the millimeter-wave frequency range, while all receivers recorded the resulting time-domain electric field responses. Perfectly matched layer (PML) boundary conditions were applied to eliminate artificial reflections. For each simulation instance, parameters including tumor geometry, position, dielectric contrast, and antenna pair combinations were randomized to ensure dataset diversity. The recorded signals were stored as multi-channel time-domain data corresponding to each transmit–receive pair, along with ground-truth labels specifying tumor location and size. All simulations were executed with fixed spatial and temporal discretization to ensure numerical stability and reproducibility. The complete simulation scripts, parameter configurations, and random seed initialization are provided to enable exact replication of the dataset.