Monocular Depth Estimation Dataset
Description
This dataset comprises synthesized RGB images paired with precise ground-truths in three distinct and realistic environments. The first environment (Urban or Env-1) replicates a city with buildings and cars. In the second environment (Downtown or Env-2), a diverse array of elements such as vegetation, benches, wood, metal, trash bins, flower pots, trees, statues, and light posts populate the scene. The third environment (Pillar World or Env-3) features an expansive grid of equidistantly spaced pillars. There are 10,161 pairs of RGB images with corresponding ground-truth depths for Env-1, 7,473 for Env-2, and 4,167 for Env-3. The dataset encompasses various weather conditions, including snow, foggy, snow-foggy for Env-1, maple leaf fall for Env-2, and no weather conditions for Env-3. The dataset captures variations in illumination, lighting, colors, textures, shapes, object sizes, and view perspectives. The RGB images are saved in .png format with dimensions of 256×144×3, while the depth maps are stored in .npy format with a shape of 256×144×1. The .npy format is a standard binary file format in NumPy, containing float values that represent the pixel intensity for each pixel in the depth maps. The dataset is gathered for monocular depth estimation and revolves around the autonomous navigation of a quadrotor equipped with only a monocular camera.
Files
Steps to reproduce
The images and their corresponding depth maps are manually obtained through the computer vision API of the AirSim simulator, which uses Unreal Engine 4.27 for graphics rendering.