Multi-language Video Subtitle Dataset

Published: 29 November 2021| Version 2 | DOI: 10.17632/gj8d88h2g3.2
Contributors:
,

Description

The video subtitle images were collected from 24 videos shared on Facebook and Youtube. The subtitle text included Thai and English languages, including Thai characters, Roman characters, Thai numerals, Arabic numerals, and special characters with 157 characters in total. In the data-preprocessing step, we converted all 24 videos to images and obtained 2,700 images with subtitle text. The size of the subtitle text image was 1280x720 pixels and it was stored in JPG format. Further, we generated the ground truth from 4,224 subtitle images using the labelImg program. Also, the labels were then assigned to each subtitle image. Note that the number before the label is the order of the subtitle text image.

Files

Institutions

Mahasarakham University

Categories

Word Recognition, Convolutional Neural Network, Long Short-Term Memory Network

Licence