Dataset for: Learning Direct Optimization for Scene Understanding
Description: The dataset consists of of a large number of realistic synthetic images that feature a number of objects on a table-top, of three classes: staplers, mugs and bananas. These are taken at a variety of lighting, viewpoint and object configuration conditions. In addition, the dataset includes a set of annotated real images that were manually taken to feature a number of objects of the considered classes. The dataset includes over 22000 realistic synthetic images that can be used for training and testing, and 135 annotated real images for testing. All datasets include object annotations and their masks. Image resolution is 256 x 256. Synthetic datasets include all the latent variables of the 3D scene (scene graph). The synthetic scenes were rendered using the Blender software: www.blender.org. For each object its associated latent variables are its position, scaling factor, azimuthal rotation, shape (1-of-K encoding) and colour (RGB). The ground plane has a random RGB colour. The camera is taken to be at a random height above the origin and to be looking down with a random angle of elevation. The illumination model is uniform lighting plus a directional source (specified by the strength, azimuth and elevation of the source). Real dataset: for each object we annotated its class, instance mask, and the contact point using the LabelMe software.