A Multi-Domain Video Moment Retrieval Dataset

Published: 5 January 2026| Version 1 | DOI: 10.17632/sn4g76n5xx.1
Contributors:
,
,
,

Description

#Description: This dataset contains 200 annotated video clips collected for Video Moment Retrieval (VMR) tasks. The videos are divided into four categories—Surveillance, Road Accidents, Football, and Cooking, with 50 videos in each category. It is designed to support the retrieval of precise video moments based on natural language queries. #Dataset Content: Videos are collected from online sources and manually trimmed to retain relevant content. Each video is annotated with temporal boundaries and textual descriptions. The dataset includes: Surveillance (50 videos) Road Accidents (50 videos) Football (50 videos) Cooking (50 videos) #Purpose: This dataset is designed to facilitate research in Video Moment Retrieval by enabling accurate localization of relevant temporal segments in videos using natural language queries. It aims to support the development of robust cross-modal models that effectively align visual events with textual descriptions, contributing to applications such as surveillance analysis, sports event detection, accident understanding, and instructional video retrieval.

Files

Steps to reproduce

Videos were collected from online streaming platforms and YouTube based on the selected categories: Surveillance, Road Accidents, Football, and Cooking. Irrelevant segments were removed using video editing software to retain only meaningful content. A subtitle annotation tool was used to mark temporal boundaries and provide descriptive captions for each significant moment. The annotated data were then structured into CSV files containing video IDs, start–end times, and textual descriptions. Finally, a Python script was used to merge all information into a single unified dataset, ensuring consistency and readiness for Video Moment Retrieval tasks.

Institutions

  • Daffodil International University
  • United International University

Categories

Accident, Theft, Closed-Circuit Television Camera, Cooking, Soccer

Licence