Dataset_tyhroid
Description
This dataset contains an anonymized retrospective clinicopathological dataset of thyroid disease patients collected from a tertiary referral hospital in Indonesia. The main data file includes 360 patient records and 12 variables: Age, Gender, Smoking, Hx Smoking, Hx Radiotherapy, Thyroid Function, Adenopathy, Pathology, Risk, Stage, Response, and Recurred. The dataset was curated from hospital-based patient records and organized into a structured CSV file for reuse in clinical research, epidemiological studies, and machine learning applications related to thyroid disease recurrence and clinicopathological profiling. All directly identifying patient information was removed prior to data sharing. Categorical variables were harmonized into consistent labels to improve interpretability and reuse. This dataset may be useful for endocrinology, oncology, pathology, and biomedical informatics researchers, particularly those interested in recurrence-related analyses using structured clinical data from underrepresented populations.
Files
Steps to reproduce
Identify eligible thyroid disease patient records from the hospital-based retrospective data source within the study period. Exclude duplicate records and records that could not be safely anonymized or lacked the minimum required clinicopathological information. Extract the following structured variables from each eligible record: Age, Gender, Smoking, Hx Smoking, Hx Radiotherapy, Thyroid Function, Adenopathy, Pathology, Risk, Stage, Response, and Recurred. Remove all direct identifiers, including patient names, medical record numbers, pathology report numbers, physician identifiers, and other traceable administrative information. Harmonize categorical values into consistent labels (for example: Yes/No, Male/Female, and standardized pathology and stage categories). Compile the cleaned variables into a single CSV file. Perform a final quality-control review to identify invalid values, inconsistent categories, and duplicate rows before repository deposition. The shared file can be reused directly for descriptive analyses, recurrence-related studies, or as input for downstream statistical and machine learning workflows.