Public procurement cartels: A large-sample testing of screens using machine learning – Dataset

Published: 12 August 2025| Version 2 | DOI: 10.17632/f3y4nrn3s6.2
Contributors:
,
,
,

Description

This release provides both the combined public procurement contract-level dataset used for data processing and analysis. The dataset combines public procurement contract-level data with verified cartel cases from seven European countries—Bulgaria, France, Hungary, Latvia, Portugal, Spain, and Sweden—covering the period from 2004 to 2021. The data were collected from official government publication portals and open data repositories, primarily opentender.eu, and harmonized into a consistent format to enable cross-country comparisons despite differing original data structures. Each record contains the cartel detection indicators used in model training. Key variables include tender and contract identifiers (persistent_id, tender_id, lot_id), anonymized buyer and supplier IDs (buyer_id, bidder_id), anonymized product classifications based on 2-digit CPV codes, and a range of cartel risk screens tested in the analysis. A critical feature of the dataset is the integration of confirmed cartel case information, sourced from competition authorities’ court rulings and official reports. Cartel cases are linked to procurement contracts through company names and cartel activity periods. While exact identification of rigged contracts remains challenging, contracts awarded to cartel-involved firms during their documented collusion periods are labeled accordingly to facilitate analysis of cartel behavior. The final dataset comprises 73 confirmed cartel cases and over 15,000 contracts awarded to cartel members. It includes multiple risk indicators capturing pricing irregularities and bidding patterns consistent with collusion, aggregated at both contract and company-year levels. These indicators support machine learning models that distinguish between collusive and competitive procurement activity. This dataset is a valuable resource for researchers, policymakers, and competition authorities focused on detecting anti-competitive practices in public procurement. Its standardized structure and continuous data coverage allow for ongoing application in cartel screening, market monitoring, and supporting competition authorities. For a comprehensive data description, please see: https://www.govtransparency.eu/wp-content/uploads/2023/04/Fazekas-et-al_PP-cartel-detection_GTI-WP_2023.pdf

Files

Institutions

  • Central European University

Categories

Machine Learning, Competition, Public Procurement

Licence