Data for: Conditional probability estimation based classification with class label missing at random

Published: 30-11-2019| Version 1 | DOI: 10.17632/xbjccc3jc3.1
Qihua Wang,
ying sheng


he breast cancer dataset studied by \citet{Cummings1986Tamoxifen}, which is conducted to evaluate tamoxifen as a treatment for stage \uppercase\expandafter{\romannumeral2} breast cancer among elder women. In this dataset, 78 patients died during the clinical trial. Specifically, 43 of them died from breast cancer, 17 of them died from other known reasons and the remaining 18 patients died from unknown reasons. Therefore, we have two types of cause of death: breast cancer (class 1) and other known reasons (class 0). Moreover, 43 samples come from class 1, 17 samples come from class 0 and class labels of 18 samples are missing. According to \citet{chen2018reweighted}, we choose the observed survival time of a patient as the predictor and assume that the class label is missing at random (MAR).