RailEnV-PASMVS: a dataset for multi-view stereopsis training and reconstruction applications

Published: 04-05-2021| Version 3 | DOI: 10.17632/xrwb9m37gd.3
André Broekman,
Petrus Johannes Gräbe


A Perfectly Accurate, Synthetic dataset featuring a virtual railway EnVironment for Multi-View Stereopsis (RailEnV-PASMVS) is presented, consisting of 40 scenes and 79,800 renderings together with ground truth depth maps, extrinsic and intrinsic camera parameters and binary segmentation masks of all the track components and surrounding environment. Every scene is rendered from a set of 3 cameras, each positioned relative to the track for optimal 3D reconstruction of the rail profile. The set of cameras is translated across the 100-meter length of tangent (straight) track to yield a total of 1,995 camera views. Photorealistic lighting of each of the 40 scenes is achieved with the implementation of high-definition, high dynamic range (HDR) environmental textures. Additional variation is introduced in the form of camera focal lengths, random noise for the camera location and rotation parameters and shader modifications of the rail profile. Representative track geometry data is used to generate random and unique vertical alignment data for the rail profile for every scene. This primary, synthetic dataset is augmented by a smaller image collection consisting of 124 manually annotated photographs for improved segmentation performance. The specular rail profile represents the most challenging component for MVS reconstruction algorithms, pipelines and neural network architectures, increasing the ambiguity and complexity of the data distribution. RailEnV-PASMVS represents an application specific dataset for railway engineering, against the backdrop of existing datasets available in the field of computer vision, providing the precision required for novel research applications in the field of transportation engineering. File descriptions: + RailEnV-PASMVS.blend (227 Mb) - Blender file (developed using Blender version 2.8.1) used to generate the dataset. The Blender file packs only one of the HDR environmental textures to use as an example, along with all the other asset textures. + RailEnV-PASMVS_sample.png (28 Mb) - A visual collage of 30 scenes, illustrating the variability introduced by using different models, illumination, material properties and camera focal lengths. + geometry.zip (2 Mb) - Geometry CSV files used for all the scenes. The Bezier curve defines the geometry of the rail profile (10 mm interval). + sampleset_19.zip (2.5 Gb) - A single RailEnV-PASMVSD scene sample for easy viewing and evaluations. For the full dataset please use the link provided below. + PhysicalDataset.zip (1.8 Gb) - A smaller, secondary dataset of 124 manullay annotated photographs of railway environments; only the railway profiles are annotated. GitHub page to access the full RailEnV-PASMVS dataset (hosted on Google Drive): https://github.com/andrebroekman/RailEnV-PASMVS


Steps to reproduce

The open source Blender software suite (https://www.blender.org/) was used to generate the dataset, with the entire pipeline developed using the exposed Python API interface. The camera trajectory is kept fixed for all 40 scenes, except for small perturbations introduced in the form of random noise to increase the camera variation. The camera intrinsic information was initially exported as a single CSV file (scene.csv) for every scene, from which the camera information files were generated; this includes the focal length (focalLengthmm), image sensor dimensions (pixelDimensionX, pixelDimensionY), position, coordinate vector (vectC) and rotation vector (vectR). The STL model files, as provided in this data repository, were exported directly from Blender, such that the geometry/scenes can be reproduced. The data processing below is written for a Python implementation, transforming the information from Blender's coordinate system into universal rotation (R_world2cv) and translation (T_world2cv) matrices. The following packages are required: import numpy as np from scipy.spatial.transform import Rotation as R The intrinsic matrix K is constructed using the following formulation: focalLengthPixel = focalLengthmm x pixelDimensionX / sensorWidthmm K = [[focalLengthPixel, 0, dimX/2], [0, focalPixel, dimY/2], [0, 0, 1]] The rotation vector as provided by Blender was first transformed to a rotation matrix: r = R.from_euler('xyz', vectR, degrees=True) matR = r.as_matrix() Transpose the rotation matrix, to find matrix from the WORLD to BLENDER coordinate system: R_world2bcam = np.transpose(matR) The matrix describing the transformation from BLENDER to CV/STANDARD coordinates is: R_bcam2cv = np.array([[1, 0, 0], [0, -1, 0], [0, 0, -1]]) Thus the representation from WORLD to CV/STANDARD coordinates is: R_world2cv = R_bcam2cv.dot(R_world2bcam) The camera coordinate vector requires a similar transformation moving from BLENDER to WORLD coordinates: T_world2bcam = -1 * R_world2bcam.dot(vectC) T_world2cv = R_bcam2cv.dot(T_world2bcam) The resulting R_world2cv and T_world2cv matrices are written to the camera information file using exactly the same format as that of BlendedMVS developed by Dr. Yao (https://github.com/YoYo000/BlendedMVS). The original rotation and translation information can be found by following the process in reverse. Note that additional steps were required to convert from Blender's unique coordinate system to that of OpenCV; this ensures universal compatibility in the way that the camera intrinsic and extrinsic information is provided. Equivalent GPS information is provided (gps.csv), whereby the local coordinate frame is transformed into equivalent GPS information, centered around the Engineering 4.0 campus, University of Pretoria, South Africa. This information is embedded within the JPG files as EXIF data.