Published: 10 November 2023| Version 2 | DOI: 10.17632/bvts42fpfb.2


Advances in Artificial Intelligence and Computer Vision have made automation systems popular in a variety of fields. One of these is Human Activity Recognition, which has a number of novel, practical and innovative applications. Because the HAR is a complex system, many issues and challenges need to be addressed. One of the most challenging research areas under HAR is identifying toddler activities from various vision input devices. In order to enhance the safety and wellness of toddlers, their behavior has to be studied separately. For this, a comprehensive understanding of existing toddler activity systems, available datasets to conduct experiments, challenges involved, and potential societal applications must be explored. Toddler activity recognition is a specialized subfield of computer vision and machine learning that focuses on automatically identifying and classifying activities performed by toddlers or young children. Vision-based toddler activity recognition plays a crucial role in monitoring developmental milestones, detecting anomalies, and supporting research in child development studies. Current human activity recognition primarily emphasizes adult activities. However, toddler activities, while sharing similarities with those of adults, exhibit significant variations in execution. Consequently, there exists ample opportunity to explore toddler activities as a distinct and separate research domain. This paper presents an image dataset designed for the purpose of recognizing toddler activities. The dataset consists of 2,389 raw images distributed across 11 distinct activity classes. Each activity class contains over 195 images, ensuring a well-balanced distribution within the classes. The considered toddler activity categories for experimentation encompass crying, eating, drinking, smiling, sleeping, yawning, W-sitting, playing on a swing, playing with stacking rings, toddlers on a staircase, and toddlers with a knife. The dataset was meticulously compiled, merging images obtained from Google with manually collected photographs. The dataset undergoes image preprocessing on the raw images as part of the model building process. This preprocessing involves auto orientation, contrast stretching, and resizing the images to 320x320 pixels while accommodating white edges. To enhance the diversity of the original images, image augmentation techniques were employed. These techniques included horizontal flipping, the application of grayscale to 10% of the images, brightness adjustments with variations ranging from -25% to +25%, up to 1-pixel blurring, and the addition of noise to up to 3% of the pixels. As a result, a comprehensive training dataset was created, consisting of 12,331 images. Subsequently, the resulting dataset was utilized for model construction. The authors leveraged the RoboFlow AutoML model, achieving a validation accuracy of 93.2%.


Steps to reproduce

Once you've downloaded the ZIP file, unzip it to extract the dataset and proceed to utilize it. You can check the Rboflow Auto ML Model's output by visiting the following URL: https://universe.roboflow.com/kuvempu/vtar11/dataset/7.


Maharaja Institute of Technology, University of Mysore


Computer Vision, Image Processing, Pattern Recognition