Forged Handwritten Document Database

Published: 8 February 2023| Version 1 | DOI: 10.17632/5bmyz97y7f.1


Dataset Description: This is a dataset regarding the creation of forged handwritten document images. In this dataset, ten classes of different forgery operations are performed on handwritten document images, namely: (i) Normal, (ii) Copy-Paste, (iii) Insertion, (iv) Copy-Paste + Insertion (v) Copy-Paste + Noise, (vi) Copy Paste + Blur, (vii) Noise, (viii) Blur, (ix) Insertion + Noise and (x) Insertion + Blur. Each class contains 50 images and hence the total number of images considered for experimentation from 10 classes is 500 (50 original and 450 forged). The original handwritten document images have been collected from students of Rani Channamma University, Belagavi Karnataka, India, from their class assignment notebooks. In this dataset, the original handwritten text is manipulated by multiple forgery operations, including adding noise and blur levels to the original and forged images. The handwritten documents are scanned using a LaserJet M1136 MFP scanner with 200 DPI, and then multiple forgery operations are performed. It is a combination of copy-paste, insertion, noise, and blur. For altering text, authors have used the Adobe Photoshop tool which is easily available for creating forgeries at the word level. Similarly, for adding noise and blur to the original and forged images, Gaussian Noise and Gaussian Blur operations were used. In the case of copy-paste operations, part of an image has been copied from the same document, or a different document has had target words pasted, while in the case of insertion, the eraser is used from the menu to delete a portion of a word from the document and different characters are pasted using the insert option, resulting in a forged document. The primary difference between insertion operations and copy-paste is that the insertion operation changes a portion of an image, whereas a copy-paste operation changes the entire image. While performing multiple operations on a word, one operation uses a small portion of the word and another option uses other portions of the word, such as Copy-Paste + Noise and Insertion + Noise, copy-paste + Blur, Insertion + Blur and copy paste + Insertion. The dataset can be used for research in the area of fake image identification. Instructions: 1. There are ten different sub-folders corresponding to the ten different forgery operations. 2. In each sub-folder, there are forged handwritten document images corresponding to ten different classes of forgery. 3. All original scanned and forged handwritten document images are in .jpg format.



Rani Channamma University


Computer Science, Computer Forensics, Forensic Analysis, Forgery of Document