Data extracted from GitHub repositories (training and test data-sets)

Published: 1 August 2019| Version 3 | DOI: 10.17632/gt3f4jnbvn.3
Contributor:
Youcef Bouziane

Description

This dataset contains the SQL tables of the training and test datasets used in our experimentation. These tables contain the preprocessed textual data (in a form of tokens) extracted from each training and test project. Besides the preprocessed textual data, this dataset also contains meta-data about the projects, GitHub topics, and GitHub collections. The GitHub projects are identified by the tuple “Owner” and “Name”. The descriptions of the table fields are attached to their respective data descriptions.

Files

Steps to reproduce

The tables are in SQL, just import these files using an SQL server.

Categories

Data Mining

Licence