BengaliPrintDB: A Repository of Machine-Printed Bengali Documents

Published: 21 May 2024| Version 1 | DOI: 10.17632/5d9dtxkpmw.1
Contributors:
, Sriparna Banerjee,

Description

Distinguishing between handwritten and machine-printed documents is vital in OCR applications due to varying processing methods. Handwritten text demands specialized recognition algorithms, such as neural networks with LSTM layers, addressing complex writing styles. In contrast, machine-printed text benefits from simpler algorithms like template matching. Adaptive pre-processing techniques involve normalizing styles and handling cursive writing for handwritten documents, while machine-printed documents focus on tasks like binarization and noise reduction. Feature extraction for handwritten text captures loops and slant, whereas machine-printed text emphasizes geometric properties. Training data selection differs, with diverse datasets for handwritten OCR models and uniform fonts for machine-printed OCR models. Efficiency is enhanced through selective processing based on document type, and adaptive learning strategies improve overall OCR performance. In essence, tailoring techniques to differentiate between these document types optimizes OCR accuracy and efficiency. As selective processing based on document type improves efficiency, while adaptive learning strategies enhance overall OCR performance, so differentiation of handwritten and machine printed documents is essential.

Files

Institutions

Jadavpur University Faculty of Engineering and Technology

Categories

Document Analysis, Optical Character Recognition, Bengali Language

Licence