Published: 31 October 2023| Version 2 | DOI: 10.17632/2khchjbgzr.2
Juliana Marian Arrais,
, Sylvio Mantelli,


The Clouds-1500 dataset is an extension of original clouds-1000 dataset, which can be accessed through this link: It comprises a collection of sky images taken between March 2021 and Janary 2023. The images were captured by ground-based cameras pointed towards the horizon, in the north and south directions, at the facilities of the Federal University of Santa Catarina and in the solarimetric station of the Photovoltaic Energy Laboratory, located in the Sapiens Technological Park. This dataset is part of the Machine Learning Methods Project for Nowcasting for Solar Energy, conducted by the Laboratory of Image Processing and Computer Graphics - LAPIX of the National Institute of Digital Convergence - INCoD. The image annotation process was carried out manually by a team comprising computer scientists, meteorologists, and an experienced sky observer from the Brazilian Air Force Base on Santa Catarina Island. Annotations were made using the polygon tool on the Supervisely platform to mark the clouds visible in each image. This dataset is created to help machine learning algorithms identify clouds in images taken from ground-level locations using ordinary cameras. It employs a practical cloud height-based classification system that categorizes clouds into four groups: Cirriforms, Cumuliforms, Stratiforms, and Stratocumuliforms. Additionally, a category representing background objects like trees and buildings is included in the annotations. This classification system is aimed at enhancing nowcasting in the solar energy sector by predicting the potential solar radiation absorption by clouds covering solar energy facilities. The decision to categorize clouds into these four groups stems from the need to efficiently forecast solar radiation interference and because earlier attempts at finer classification led to inadequate learning outcomes with neural networks. This subpar performance is likely due to the inherent similarities among actual cloud classes within each superclass and the vague and transparent characteristics of clouds, making precise classification challenging and yielding poor results. The dataset underwent several validation steps to ensure its quality and reliability. Initially, three team members inspected and checked a randomly chosen subset of images for annotation consistency, assessing the quality of manual annotations. In a second semi-automated step, the dataset was split into training and validation sets. A semantic segmentation convolutional neural network (PPLite B2) was trained on it, and used to pinpoint the 100 images with the lowest classification scores. A meteorologist from the team then manually reviewed and corrected these images, as they were the most likely to contain annotation errors. This procedure helped to detect and amend any remaining errors or issues, enhancing the overall quality and reliability of the dataset.


Steps to reproduce

Nimbus Gazer uses motionEye version 0.41 and Motion version 4.2.2. The system is set to GMT location zone and prevents any camera LEDs from blinking, by disabling all LEDs in the boot configuration file. The system is set to start capturing images at 08:00 GMT and stop at 22:00 GMT. The chosen time interval was defined to capture only images with at least some level of sunlight. The location zone of our research lab is at GMT-3. To install the OS, it is necessary to have at least 32GB of free memory. Our configuration of the motion system is set at the lowest available frame rate of 1 frame per minute to match the time resolution of sensory data from our lab. That means that every minute, an image is captured. Captured images are configured at 2592 x 1944 resolution and are stored in a local directory before being uploaded to the cloud (the default directory is /Nuvens/camtest/). We use the built-in option to upload to a Google Drive directory to upload the images. For more details see the related link below.


Universidade Federal de Santa Catarina


Computer Vision, Image Processing, Photovoltaics, Machine Learning, Computer Imaging, Renewable Energy, Digital Imaging