CAUCAFall is a database created with the aim of recognizing human falls, where its differential feature lies in the fact that it is created in conditions of an uncontrolled home environment, with occlusions, changes in lighting (natural, artificial, and night), variety in the clothing of the participants, movement in the background, different textures on the floor and room, variety in the angles of fall, different distances from the camera to the fall, with participants of different age, weight, height, and even different dominant leg, which contributes to the real progress of research in the recognition of falls. In addition, the proposed database is the only one that contains segmentation labels in each of its images, which serve to be able to implement human fall recognition methods by means of YOLO detectors. Subjects simulated 5 types of falls and 5 types of activities of daily living (ADLs). The data included forward falls, backward falls, falls to the left, falls to the right, and falls arising from sitting. The ADLs performed by the participants are: walking, jumping, picking up an object, sitting, and finally kneeling. Frames that recorded human falls were labeled as "fall" and ADLs were labeled as "no-fall". The data are organized into 10 main directories corresponding to the subjects, each of which contains 10 folders with the different activities performed by the participants, and in each folder there is a video of the action in .avi format, the images of the actions in .png format, and each of the frame segmentation labels in .txt format.
Steps to reproduce
The data was obtained with a HIKVISION IR camera, which was fixed on the upper corner of the stage wall, covering a larger field of view to monitor the user's activity. The camera was connected to a HIKVISION DVR with a built-in 1 TB hard disk for video storage and processing. The DVR is programmed to detect motion, so the recording starts at the exact moment the individual enters the scene. The camera used can capture video at a speed of 23 fps at a resolution of 1080 × 960 pixels with changing illumination, i.e., natural, low, or even no light. On the other hand, the labels to segment each of the images into "fall" and "non-fall" were performed manually with a text editor. It should be noted that the database was performed in an uncontrolled environment, with occlusions, changes in lighting (natural, artificial, and night), variety in the clothing of the participants, movement in the background, different textures on the floor and room, variety in the fall angles, different distances from the camera to the fall, with participants of different age, weight, height, and even different dominant leg.