Data files for Mahtab: Phase-wise Acceleration of Regression Testing for C

Published: 28-08-2019| Version 2 | DOI: 10.17632/7fvwj88jvm.2
Shouvick Mondal


Software regression testing consists of offline, online, and execution phases which are executed sequentially. The offline phase involves code instrumentation and test-coverage collection. Subsequently, the online phase performs program differencing, test-suite selection and prioritization. Finally, the selected test-cases are executed against the new version of software for its re-validation. Regression testing is a time-consuming process and is often on the critical path of the project. To improve the turn-around time of software development cycle, our goal is to reduce regression testing time across all phases using multi-core parallelization. This poses several challenges that stem from I/O, dependence on third-party libraries, and inherently sequential components in the overall testing process. We propose parallelization test-windows to effectively partition test-cases across threads. To measure the benefit of prioritization coupled with multi-threaded execution, we propose a new metric, EPSilon, for rewarding failure observation frequency in the timeline of test-execution. To measure the rate of code-change coverage due to regression test prioritization, we introduce ECC, a variant of the widely used APFD metric. We illustrate the effectiveness of our approach using the popular Software-artifact Infrastructure Repository (SIR) and five real-world projects from GitHub. We show that for SIR programs, parallel regression testing achieves an end-to-end geometric mean speedup of 4.72× compared to sequential RTS (and 2.44× against RetestAll). We achieve a geometric mean boost (EBF) of 1.6× in effectiveness (EPSilon) of test prioritization, using up to 16 threads. For GitHub projects used in our study, we observed end-to-end speedup of 3.90× compared to sequential RTS, and EBF of 1.43×, using up to 32 threads. All the experiments were performed on a system with a 20-core (40 threads with hyper-threading) Intel Xeon CPU E5-2640 v4 clocked at 2.40GHz having 64GB RAM running CentOS Linux release 7.5.1804 (Core) operating system. While Mahtab framework was compiled using g++ 5.3.1, the benchmark programs were compiled using clang frontend of LLVM. In this dataset, we have included raw log files (in plain-text) and processed spreadsheets (.ods) corresponding to experimental data in Mahtab: Phase-wise Acceleration of Regression Testing for C. Plain-text files contain raw data from program's execution. Spreadsheets contain summarized data from text-files, corresponding to plots and tabulations presented throughout the paper. All plots have also been included in associated .ods files. The root directory contains the following tarballs: -- mahtab_data.tar.gz (data used in experiments). -- mahtab_tool.tar.gz (source codes for our software tool).