International Collegiate Programming Contest data 2012-2018

Published: 24-07-2019| Version 1 | DOI: 10.17632/5k7xtf582g.1
Rick de Boer,
Cassio de Campos


The data consists of a collection of ICPC programming competition results. It contains information about teams and their scores for the problems that were posed to them and which they (tried to) solve. This dataset covers the ICPC competitions of Europe, Latin-America, South Pacific, World Finals, and some of the North American competitions from 2012 up till 2018. The data is divided by topic for easy extraction and/or combination of desired information. In this context, an Entry represents a team and all its information, which has entered in a (single) competition. This also includes the team’s final rank for that competition, their final score, consisting of the number of problems solved, and total time taken. This is the time elapsed from the beginning of a contest till the first accepted submission of a problem, accumulated for each problem, including a penalty for every additional attempt. Note that, in contrast to the team information normally present in the public ICPC data, team names are included in this dataset but cannot be assumed to be completely identical, as the data of the original sources were often not the same as the official ICPC data. A competition stores some meta-information, such as its region, the years it was held (i.e. the years that are present in the data) and the size of the problem set for each year. Finally, a solution represents all input from a single team for a single problem, where the ‘attempts’ are the number of times a team tried to solve a problem and the time is the total time it took to solve. This means that, if no time and only a number of attempts is present, a team did not solve that problem, and if no entry exists for a combination of a team and problem, that particular team has made no attempts on that problem. All five entities described are separate files in the dataset, which can be combined using the corresponding identifiers.


Steps to reproduce

Scoreboards were collected from several sources and processed into raw csv files. These files were semi-automatically structured into the same format, uniformly writing university and country abbriviations. Missing information was then manually filled in.