Data and code of: Value proposition of predictive discarding in semiconductor manufacturing

Published: 17 August 2022| Version 2 | DOI: 10.17632/bnthfntfrm.2


Data and code related to the work van Kollenburg, G.H., Holenderski, M. and Meratnia, N. 2022. Value proposition of predictive discarding in semiconductor manufacturing. Production Planning & Control. The Secom dataset was originally published at by authors: Michael McCann, Adrian Johnston. Those authors were not involved in the current study. The file attached here, uci-secom.csv has the outcome (pass/fail) appended to the process variables.


Steps to reproduce

For Tables 1 and 2: Open SensitivityAnalysis.ipynb and follow the instuctions. For Tables 3 and 4: Make three sub folders: data, src, and results. Put uci-secom.csv in the folder 'data' put all jupyter notebooks in folder 'src' In folder 'results' make three sub-folders named 'knn', 'gb', and 'blocks'. Running SecomPD.ipynb, calls SecomPreProc to preprocess the data and run all analyses required to reproduce Table 3 in the paper (the analysis will cross-validate all models for all 19 artificial process stages, unless the endpoint of the for loop is changed from 19 to, for example, 5) To calculate the benefits on each test set, run 'SecomResultsHandling.ipynb' this will provide the values found in Table 3. Table 4 is reproduced by running ReanalysisKNN.ipynb to do the Monte Carlo study and then KNNResultsHandling.ipynb to obtain the values reported in Table 4. The Monte Carlo validation for GB is done by running ReanalysisGB.ipynb, and then GBResultsHandling.ipynb. This study indicated that the first results were mere chance findings.


Technische Universiteit Eindhoven


Machine Learning, Quality Control, Manufacturing Process Control, Semiconductor Industry, Model Validation