MODI-HHDoc: Historical MODI Script Handwritten Document Dataset

Published: 3 April 2023| Version 1 | DOI: 10.17632/sg337vf6wn.1
Contributors:
Manisha Deshmukh,

Description

Around from 12th century MODI script was used to write Indian languages as Marathi, Hindi, and Gujarati etc. It was used as administrative script from 17th century to mid of 19th century in Maharashtra state (India). At present, MODI script users are diminishing away, and countable persons can understand the MODI script. The preserved archaic historical MODI handwritten documents contained important and rare cultural, historic, and administrative kind of information which is usable in present-days. The significant information related to current era is preserved in the thousands of the archaic handwritten MODI documents at official and public sectors. MODI-HHDoc Dataset is a collection of three thousand three hundred and fifty handwritten ancient MODI document images. This dataset can be used to develop the handwritten ancient MODI document digitization, recognition, transcription, and transliteration system to gain the information written in MODI script. This dataset is collected in such way that the system should be robust enough to adapt all the variations in approach. • In any resultant publications of research that uses the dataset, due credits will be provided to one or more following publications:  Deshmukh M. S., Patil, M. P., & Kolhe S. R. (2018). A hybrid text line segmentation approach for the ancient handwritten unconstrained freestyle Modi script documents. The Imaging Science Journal, 66(7), 433-442.  Deshmukh M. S., Patil M. P., & Kolhe S. R. (2017). The divide-and-conquer based algorithm to detect and correct the skew angle in the old age historical handwritten Modi Lipi documents. Int J Comput Sci Appl, 14(2), 47-63.  Deshmukh M. S., & Kolhe S. R. (2021). A modified approach for the segmentation of unconstrained cursive Modi touching characters cluster. In Recent Trends in Image Processing and Pattern Recognition: Third International Conference, RTIP2R 2020, Aurangabad, India, January 3–4, 2020, Revised Selected Papers, Part I 3 (pp. 431-444). Springer Singapore.  Deshmukh M. S., Patil M. P., & Kolhe S. R. (2017, September). A dynamic statistical nonparametric cleaning and enhancement system for highly degraded ancient handwritten Modi Lipi documents. In 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI) (pp. 1545-1551). IEEE.  Deshmukh M. S., & Kolhe S. R. (2022). A New Approach for Unified Characters Cluster Segmentation of Ancient Handwritten Modi Documents. In Computer Vision and Robotics: Proceedings of CVR 2021 (pp. 511-526). Singapore: Springer Singapore.  Deshmukh M. S., & Kolhe S. R. (2021, March). Unsupervised Page Area Detection Approach for the Unconstrained Chronic Handwritten Modi Document Images. In 2021 International Conference on Emerging Smart Computing and Informatics (ESCI) (pp. 130-135). IEEE. 

Files

Steps to reproduce

MODI-HHDoc dataset is archived in a zip file. MODI document images are stored in the twenty four folders named as 1, 2, 3….and the document image files are stored as page1.jpg, page2.jpg …. User can change the folder and image file name as per his/her convenience. Particulars of these folders are described in the description section of the dataset. Researcher can use this dataset for different purposes like to create MODI character annotation, document image preprocessing, segmentation, classification etc. The users of the MODI-HHDoc Dataset must agree that: • Use of the data set is restricted to research purpose only. • No redistribution of the dataset is allowed. MODI-HHDoc dataset contains three thousand and ten original MODI documents which are collected from different libraries or archives as given below. • Rajwade Sanshodhan Mandir, Dhule • Shri Samartha Vagdevta Mandir, Dhule • History Department, KBCNMU, Jalgaon • Dr. Pushkar Shasri (History scholar), Shahada, • History scholar (Kolhapur university, Kolhapur) etc. All MODI document images are of different sizes and of JPG format. These document images are captured through scanner or camera having different resolution.

Categories

Artificial Intelligence, Computer Vision, Image Processing, Machine Learning

Licence