Predicting the Type of Auditor Opinion: Statistics, Machine Learning, or a Combination of the Two?

Published: 5 November 2019| Version 3 | DOI: 10.17632/mmcczp3g3y.3
Contributor:
Nemanja Stanisic

Description

The data from 13,561 complete sets of annual financial statements for 4,701 companies are combined with the data from the corresponding audit reports, forming an unbalanced panel data set. The client companies included in the sample represent a supermajority of medium- and large-sized companies registered in the Republic of Serbia. The information on the auditor firm name and the type of audit opinion is hand-collected from the audit reports issued by 64 audit firms (Big 4 plus 60 other audit firms), which, again, represents a supermajority of all the auditor firms registered in this country. To the best of our knowledge, this is the largest data set used in the literature devoted to predicting the type of audit opinion. In the total sample of audit opinions (13,561), the following frequencies of the four main types of audit opinions are observed: adverse opinion (71), disclaimer of opinion (644), qualified opinion (3,706), and unqualified opinion (9,140). Feel free to use it for research purposes or to reproduce the results presented in the article. For a detailed description of the variables and their descriptive statistics, please read the article: Stanišić, N., Radojević, T., Stanić, N. (2019). Predicting the Type of Auditor Opinion: Statistics, Machine Learning, or a Combination of the Two?. The European Journal of Applied Economics, 16(2), 1-58. doi:10.5937/EJAE16-21832 that is available at: http://journal.singidunum.ac.rs/paper/predicting-the-type-of-auditor-opinion-statistics-machine-learning-or-a-combination-of-the-two.html When referring to the data set in publications please cite the data as follows: Stanisic, Nemanja (2019), “Predicting the Type of Auditor Opinion: Statistics, Machine Learning, or a Combination of the Two?”, Mendeley Data, V1, doi: 10.17632/mmcczp3g3y.1 Also, consider citing the related research paper. These data are used in a research study and may not be redistributed or used for commercial purposes. If you have any questions please feel free to contact me at nstanisic@singidunum.ac.rs

Files

Steps to reproduce

Open the R code file and run the code. Bear in mind that some chunks of code take a long time to execute.