Binary-Classification Performance Evaluation Reporting Survey Data with the Findings
This data prepared for or manuscript provides comprehensive findings related to binary-classification performance evaluation reporting issues of 78 academic studies within the last 7 years (2012–2018) that model some machine learning based Android malware detection classifiers and report their performance evaluation in terms of some metrics such as accuracy or F1. The data shows that the performance evaluation reporting in the literature is not common and well-defined. The performance metrics chosen, the number of metrics reported, and the combination of the reported metrics are highly diverse. The data also shows that the studies use Accuracy metric in a misleading manner via analyzing the performances by the proposed performance indicator. The survey selection methodology is also described in the data.