University of Washington Indoor Object Manipulation (UW IOM) Dataset
The University of Washington Indoor Object Manipulation (UW IOM) dataset comprises videos (and corresponding skeletal tracking information) of twenty participants within the age group of 18-25 years, of which fifteen are males and the remaining five are females. The videos are recorded using a Kinect Sensor for Xbox One at an average rate of twelve frames per second. Each participant carries out the same set of tasks in terms of picking up six objects (three identical empty boxes and three identical rods) from three different vertical racks, placing them on a table, putting them back on the racks from where they are picked up, and then walking out of the scene carrying the box from the middle rack. The boxes are manipulated with both the hands while the rods are manipulated using only one hand. The above tasks are repeated in the same sequence three times such that the duration of every video is approximately three minutes. We categorize the actions into seventeen labels, where each label follows a four-tier hierarchy. The first tier indicates whether the box or the rod is manipulated, the second tier denotes human motion (walk, stand, and bend), the third tier captures the type of object manipulation if applicable (reach, pick-up, place, and hold), and the fourth tier represents the relative height of the surface where manipulation is taking place (low, medium, and high).