Data and code of: Value proposition of predictive discarding in semiconductor manufacturing
Data and code related to the work van Kollenburg, G.H., Holenderski, M. and Meratnia, N. 2022. Value proposition of predictive discarding in semiconductor manufacturing. Production Planning & Control. https://doi.org/10.1080/09537287.2022.2103471 The Secom dataset was originally published at https://archive.ics.uci.edu/ml/datasets/SECOM by authors: Michael McCann, Adrian Johnston. Those authors were not involved in the current study. The file attached here, uci-secom.csv has the outcome (pass/fail) appended to the process variables.
Steps to reproduce
For Tables 1 and 2: Open SensitivityAnalysis.ipynb and follow the instuctions. For Tables 3 and 4: Make three sub folders: data, src, and results. Put uci-secom.csv in the folder 'data' put all jupyter notebooks in folder 'src' In folder 'results' make three sub-folders named 'knn', 'gb', and 'blocks'. Running SecomPD.ipynb, calls SecomPreProc to preprocess the data and run all analyses required to reproduce Table 3 in the paper (the analysis will cross-validate all models for all 19 artificial process stages, unless the endpoint of the for loop is changed from 19 to, for example, 5) To calculate the benefits on each test set, run 'SecomResultsHandling.ipynb' this will provide the values found in Table 3. Table 4 is reproduced by running ReanalysisKNN.ipynb to do the Monte Carlo study and then KNNResultsHandling.ipynb to obtain the values reported in Table 4. The Monte Carlo validation for GB is done by running ReanalysisGB.ipynb, and then GBResultsHandling.ipynb. This study indicated that the first results were mere chance findings.