Datasets for Paper "Towards Understanding Bugs in An Open Source Cloud Management Stack: An Empirical Study of OpenStack Software Bugs"

Published: 15 February 2019| Version 1 | DOI: 10.17632/tmg8pnjmdj.1
Chen Feng, Wei Zheng, Tingting Yu, Xibing Yang, Xiaoxue Wu


1.Folder 1-48255 original data contains the original XML files obtained from OpenStack bug stack, including 48255 bug reports which have Critical, High and Medium severity level and Complete and Fixed status. We uploaded bug607068 as a sample and uploaded other remaining data as two zip files for it has more than 40000 files, while GitHub only allows uploading up to 100 files at a time. All these data were obtained using a data crawl tool. Specific steps are as follows: 1)Create a new task. 2)Add an opened web page and enter the web address. 3)Set the page of the OpenStack bug page to loop so it could flip automatically after each page is obtained. 4)Create a group of elements so that the data could be crawled one by one. 5)Open each element in the group and crawl the required data. 6) Loop Step 3-4-5 until the task has been completed. 2.Folder 2-32547 bugs is XML files that have been preprogressed using basic text parsing and discourse matching method. This file includes 32547 satisfactory bug reports. We also uploaded them as two zip files and a sample file. 3.Folder 3-Preprogress contains the preprogressing code. 4.Folder 4-800 bugs for analysis contains 800 bug reports which have randomly selected for analysis. The folder has a XML file as a sample and a zip file including other needed data. 5.Folder 5-Benchmarks and data analysis results contains 3 Excel files. Benchmarks including the analyzed bugs in different dimension and each sheet is correspond to a statistical approach, 5 sheets in total. Deployment of components contains the information of deployments of components obtained from the OpenStack bug repository. Fixing information of duration and comments includes the information of each bug, each sheet is corresponding to a severity, all captured from the original data using a data crawling tool.



Empirical Study of Software Engineering