MVS-wDF: A Diffusion Model-Based Synthetically Augmented High-Resolution Satellite Imagery Dataset for Maritime Vessel Detection (PART 2)

Published: 22 July 2025| Version 2 | DOI: 10.17632/f9dz2syjd6.2
Contributors:
Alper SANLI,
,
,

Description

realistic_single_vessel part_2

Files

Steps to reproduce

To reproduce the MVS-wDF dataset, we followed a controlled, repeatable pipeline designed for synthetic maritime vessel image generation using diffusion models. The process began with the collection of 1,000 base satellite images from Google Earth Pro between 2020 and 2025, covering key maritime regions including the Suez Canal, Bosphorus Strait, Panama Canal, South China Sea ports, and the Mediterranean Sea. Images were manually selected to ensure geographic diversity, seasonal variations, and different vessel types, capturing data at altitudes between 250 and 1000 meters. All base images were exported in PNG format to preserve quality and serve as conditioning references for synthetic generation. The synthetic augmentation process employed the Stable Diffusion 3.5 Large model, implemented using PyTorch 2.x and the diffusers library on an NVIDIA RTX 3090 GPU with CUDA acceleration. A VAE (Variational Autoencoder) was used to encode base images into a high-dimensional latent space, enabling semantic manipulation during diffusion. Conditioning was performed using TripletCLIP, combining positive prompts such as "high-resolution satellite image of a ship, realistic, detailed" and negative prompts like "cartoon, drawing, sketch" to guide the generative process toward photorealistic maritime vessel imagery. The model leveraged a triple-encoder structure with global CLIP, local CLIP, and T5-XXL components for enhanced semantic control. For image generation, the DPM++ sampler was applied over 35 denoising iterations, with a guidance scale of 5.5 and a critical shift parameter set at 20.5 to balance image quality and semantic alignment. The final denoised latent representations were decoded back into pixel space using the VAE decoder, resulting in high-resolution synthetic images at 1920×1080 resolution for single-vessel scenes and variable sizes for multi-vessel scenes while preserving aspect ratios. Each output was stored in PNG format, following a systematic folder and naming convention to facilitate reproducibility. The entire workflow, from data collection to synthetic generation, was orchestrated through a computational graph architecture allowing for parameterized control of each step, ensuring both reproducibility and controlled stochastic variation. Configuration files and seed management protocols were used to guarantee consistent results across experimental runs.

Institutions

Milli Savunma Universitesi

Categories

Remote Sensing, Synthetic Image, Diffusion, Artificial Vessel

Licence