Samples of electronic invoices

Published: 1 June 2021| Version 2 | DOI: 10.17632/tnj49gpmtz.2
Marek Kozłowski,
Paweł Weichbroth


Electronic invoices have become the product of the information age, increasing their utility on the nowadays market. Looking at real electronic invoices across the globe, we have come up with sufficient placement of the information. Each detail has been generated in a programmable way using Python programs. Billing information is minimalistic to omit or lower the chance of fraud detection. The process of collecting each product has been achieved by scrapping popular online marketplaces. As a result, categorized groups have been created to imitate a manner of the persona. The direction of the potential reusability is heading towards becoming an input of the machine learning fraud detection algorithms or data extraction mechanisms. Datasets presents 1000 samples each of auto-generated invoices containing: - valid information. - valid information with colored iban background. RGB color of a background varies between (255,255,240) to (255,255,254). - valid information with modified space between iban characters. Charspace coefficient varies between 0.001 to 1. Both ends of a special invoice modifier represents a domain from detectable to non-detectable factor by a human eye. Nomenclature: invoice_<invoice_id>(_charspace_<coefficent_numerator>)(_color_B_<blue_color_value>).pdf



Politechnika Gdanska


Machine Learning, Fraud