BengaliPrintDB: A Repository of Machine-Printed Bengali Documents

Name: BengaliPrintDB: A Repository of Machine-Printed Bengali Documents
Creator: Bidisha Samanta
Published: 2024-05-21T15:10:59.958Z
Keywords: Document Analysis, Optical Character Recognition, Bengali Language

Samanta, Bidisha; Banerjee, Sriparna; Sinha Chaudhuri, Sheli

doi:10.17632/5d9dtxkpmw.1

BengaliPrintDB: A Repository of Machine-Printed Bengali Documents

Published: 21 May 2024| Version 1 | DOI: 10.17632/5d9dtxkpmw.1

Contributors:

Bidisha Samanta, Sriparna Banerjee, Sheli Sinha Chaudhuri

Description

Distinguishing between handwritten and machine-printed documents is vital in OCR applications due to varying processing methods. Handwritten text demands specialized recognition algorithms, such as neural networks with LSTM layers, addressing complex writing styles. In contrast, machine-printed text benefits from simpler algorithms like template matching. Adaptive pre-processing techniques involve normalizing styles and handling cursive writing for handwritten documents, while machine-printed documents focus on tasks like binarization and noise reduction. Feature extraction for handwritten text captures loops and slant, whereas machine-printed text emphasizes geometric properties. Training data selection differs, with diverse datasets for handwritten OCR models and uniform fonts for machine-printed OCR models. Efficiency is enhanced through selective processing based on document type, and adaptive learning strategies improve overall OCR performance. In essence, tailoring techniques to differentiate between these document types optimizes OCR accuracy and efficiency. As selective processing based on document type improves efficiency, while adaptive learning strategies enhance overall OCR performance, so differentiation of handwritten and machine printed documents is essential.

Files

Institutions

Jadavpur University Faculty of Engineering and Technology

BengaliPrintDB: A Repository of Machine-Printed Bengali Documents

Description

Files

Institutions

Categories

Licence