Annotated Objects for Visual Reasoning Dataset

Published: 24 February 2025| Version 1 | DOI: 10.17632/bn5cbjts6j.1
Contributors:
,
,
,

Description

The AOVR-Dataset is a synthetic 3D dataset designed to facilitate research in visual reasoning and object detection. The dataset includes various 3D objects placed in different containers, each annotated with bounding boxes and natural language descriptions. The 3D models were created using Blender, and the captions were generated with a Large Language Model (LLM).

Files

Steps to reproduce

The AOVR-Dataset was created using Blender for 3D scene generation and a Large Language Model (LLM) for natural language descriptions. Objects—cylinders, cubes, toruses, cones, and spheres—were assigned randomized attributes such as colors (e.g., blue, red, yellow), materials (metal, rubber), and sizes (small, big) and placed in various containers (e.g., shelf, table, crate, box). Bounding boxes were extracted using Blender’s Python API, converting 3D coordinates into 2D annotations. Captions were generated by an LLM, which received metadata about each scene and produced structured descriptions. The dataset is reproducible using Blender for rendering, Python for automation, and an LLM for text generation.

Institutions

Universidade Federal do Rio Grande do Norte

Categories

Object Detection, Multimodality

Licence