ITC-MNP: A diverse dataset for image file fragment classification
The dataset includes image file fragments with length of 4096 bytes from five formats (JPG, BMP, GIF, PNG, and TIFF), each with different conversion settings. The source images fall into three content categories: Nature, People, and Medical. Overall, the dataset contains 501,000 fragments. These fragments consist of file headers and incomplete end-of-file fragments, completed with random bytes, mimicking the way operating systems handle data when the size is not a multiple of the sector size. This method simulates real-world scenarios where fragments are recovered from a hard drive. Results section in supporting document provided, proves the effectiveness of our approach on collection of this dataset. The fragments stored inside (*.dat) files can be accessed using Frag_extractor matlab script.