MoCA: Multi-view Cooking Actions Dataset

Published: 11 October 2022| Version 1 | DOI: 10.17632/95n9scp9zc.1
Elena Nicora,


MoCA is a bi-modal dataset with Motion Capture data and video sequences acquired from multiple views. The focus is on upper body actions in a cooking scenario. A specific goals is investigating view-invariant action properties in both biological and artificial systems and in this sense it may be of interest for multiple research communities in the cognitive and computational domains. The dataset includes 20 cooking actions, involving either a single or both arms of the volunteer, some of them including tools which may require different forces. Three different view-points have been considered for the acquisitions, i.e. lateral, egocentric, and frontal. For each action a training and a test sequence is available, each containing, on average, 25 repetitions of the action. Furthermore, acquisitions of more structured activities are included, in which the actions are performed in sequence for a final, more complex goal. An annotation is available, which includes the segmentation of single action instances in terms of time instants in the MoCap reference frame. A function then allows to map the time instants on the corresponding frame in the video sequences. In addition, functionalities to load, segment, and visualize the data are also provided in Python and Matlab.



Universita degli Studi di Genova, Istituto Italiano di Tecnologia


Computer Vision, Motor Control, Motion Capture, Movement Analysis, Action Recognition