Software code quality and source code metrics dataset

Name: Software code quality and source code metrics dataset
Creator: Sayed Mohsin Reza
Published: 2021-02-17T11:02:32.891Z
Keywords: Machine Learning, Software Design, Software Quality Assurance

Reza, Sayed Mohsin; Badreddin, Omar; Rahad, Khandoker; Mahmud, Saif Uddin

doi:10.17632/77p6rzb73n.2

Software code quality and source code metrics dataset

Published: 17 February 2021| Version 2 | DOI: 10.17632/77p6rzb73n.2

Contributors:

Sayed Mohsin Reza, Omar Badreddin, Khandoker Rahad, Saif Uddin Mahmud

Description

The dataset contains quality, source code metrics information of 60 versions under 10 different repositories. The dataset is extracted into 3 levels: (1) Class (2) Method (3) Package. The dataset is created upon analyzing 9,420,246 lines of code and 173,237 classes. The provided dataset contains one quality_attributes folder and three associated files: repositories.csv, versions.csv, and attribute-details.csv. The first file (repositories.csv) contains general information(repository name, repository URL, number of commits, stars, forks, etc) in order to understand the size, popularity, and maintainability. File versions.csv contains general information (version unique ID, number of classes, packages, external classes, external packages, version repository link) to provide an overview of versions and how overtime the repository continues to grow. File attribute-details.csv contains detailed information (attribute name, attribute short form, category, and description) about extracted static analysis metrics and code quality attributes. The short form is used in the real dataset as a unique identifier to show value for packages, classes, and methods.

Files

Steps to reproduce

The following step to reproduce this dataset: (1) Visit the version link mentioned in versions.csv (2) Download the version from the link (3) Use CODEMR tool to analyze each version (4) The analyzed result then export as data

Institutions

University of Texas at El Paso
University of Texas at El Paso College of Engineering

Software code quality and source code metrics dataset

Description

Files

Steps to reproduce

Institutions

Categories

Related Links

Licence