2D geometric shapes dataset

Published: 13 April 2020| Version 1 | DOI: 10.17632/wzr2yv7r53.1


This dataset is composed of 2D 9 geometric shapes, each shape is drawn randomly on a 200x200 RGB image. During the generation of this dataset, the perimeter and the position of each shape are selected randomly and independently for each image, the rotation angle of each shape is selected randomly for each image within an interval between -180° and 180°, as well the background colour of each image and the filling colour of each shape is selected randomly and independently. The published dataset is composed of 9 data classes, each class represent a type of geometric shape (Triangle, Square, Pentagon, Hexagon, Heptagon, Octagon, Nonagon, Circle and Star). Each class is composed of 10k generated image. This paper includes also a GitHub URL to the generator source code used for the generation which can be reused to generate any desired size of data. The proposed dataset aims to provide a perfectly clean dataset, for classification as well clustering purposes, the fact that this dataset is generated synthetically provides the ability to use it to study the behaviour of machine learning models independently of the nature of the dataset or the possible noise or data leak that can be found in any other datasets. Moreover, the choice of a 2D geometrical shape dataset provides the ability to understand as well to have good knowledge of the number of patterns stored inside each data class.



Geometry, Machine Learning, Image Classification