RailEnV-PASMVS: a dataset for multi-view stereopsis training and reconstruction applications

Published: 16-11-2020| Version 1 | DOI: 10.17632/xrwb9m37gd.1
Contributors:
André Broekman,
Petrus Johannes Gräbe

Description

A Perfectly Accurate, Synthetic dataset featuring a virtual railway EnVironment for Multi-View Stereopsis (RailEnV-PASMVS). The dataset consists of 40 scenes and 79,800 renderings together with ground truth depth maps, camera extrinsic and intrinsic parameters and binary segmentation masks of all the track components and surrounding environment. Every scene is rendered from a set of 3 cameras, each positioned relative to the track for optimal 3D reconstruction of the rail profile. The set of cameras is translated across the 100-meter length of tangent track to yield a total of 1,995 cameras per scene. Photorealistic lighting of each of the 40 scenes is achieved with the implementation of different high-definition, high dynamic range (HDR) environmental textures. Additional variation is introduced in the form of varying camera focal lengths, the addition of random noise for the camera location and rotation parameters and material variation of the rail profile. The power spectral density (PSD) sourced from representative track geometry data is used to generate random and unique vertical alignment data for the rail profile for every scene. File descriptions: + RailEnV-PASMVS.blend (227 Mb) - Blender file (developed using Blender version 2.8.1) used to generate the dataset. The Blender file packs only one of the HDR environmental textures to use as an example, along with all the other asset textures. + RailEnV-PASMVS_sample.png (28 Mb) - A visual collage of 30 scenes, illustrating the variability introduced by using different models, illumination, material properties and camera focal lengths. + geometry.zip (2 Mb) - Geometry CSV files used for all the scenes. The Bezier curve defines the geometry of the rail profile (10 mm interval). + sampleset_19.zip (2.5 Gb) - A single RailEnV-PASMVSD scene sample for easy viewing and evaluations. For the full dataset please use the link provided below. GitHub page to access the full RailEnV-PASMVS dataset (hosted on Google Drive): https://github.com/andrebroekman/RailEnV-PASMVS

Files

Steps to reproduce

The open source Blender software suite (https://www.blender.org/) was used to generate the dataset, with the entire pipeline developed using the exposed Python API interface. The camera trajectory is kept fixed for all 40 scenes, except for small perturbations introduced in the form of random noise to increase the camera variation. The camera intrinsic information was initially exported as a single CSV file (scene.csv) for every scene, from which the camera information files were generated; this includes the focal length (focalLengthmm), image sensor dimensions (pixelDimensionX, pixelDimensionY), position, coordinate vector (vectC) and rotation vector (vectR). The STL model files, as provided in this data repository, were exported directly from Blender, such that the geometry/scenes can be reproduced. The data processing below is written for a Python implementation, transforming the information from Blender's coordinate system into universal rotation (R_world2cv) and translation (T_world2cv) matrices. The following packages are required: import numpy as np from scipy.spatial.transform import Rotation as R The intrinsic matrix K is constructed using the following formulation: focalLengthPixel = focalLengthmm x pixelDimensionX / sensorWidthmm K = [[focalLengthPixel, 0, dimX/2], [0, focalPixel, dimY/2], [0, 0, 1]] The rotation vector as provided by Blender was first transformed to a rotation matrix: r = R.from_euler('xyz', vectR, degrees=True) matR = r.as_matrix() Transpose the rotation matrix, to find matrix from the WORLD to BLENDER coordinate system: R_world2bcam = np.transpose(matR) The matrix describing the transformation from BLENDER to CV/STANDARD coordinates is: R_bcam2cv = np.array([[1, 0, 0], [0, -1, 0], [0, 0, -1]]) Thus the representation from WORLD to CV/STANDARD coordinates is: R_world2cv = R_bcam2cv.dot(R_world2bcam) The camera coordinate vector requires a similar transformation moving from BLENDER to WORLD coordinates: T_world2bcam = -1 * R_world2bcam.dot(vectC) T_world2cv = R_bcam2cv.dot(T_world2bcam) The resulting R_world2cv and T_world2cv matrices are written to the camera information file using exactly the same format as that of BlendedMVS developed by Dr. Yao (https://github.com/YoYo000/BlendedMVS). The original rotation and translation information can be found by following the process in reverse. Note that additional steps were required to convert from Blender's unique coordinate system to that of OpenCV; this ensures universal compatibility in the way that the camera intrinsic and extrinsic information is provided. Equivalent GPS information is provided (gps.csv), whereby the local coordinate frame is transformed into equivalent GPS information, centered around the Engineering 4.0 campus, University of Pretoria, South Africa. This information is embedded within the JPG files as EXIF data.