Simulated data from a Cox's model.

Published: 12 May 2021| Version 1 | DOI: 10.17632/657hs9v8yf.1
Contributor:
vitara pungpapong

Description

This dataset contains the simulated data from a Cox's proportional hazards model with sample size n=250 and number of predictors p=1,000. Survival times were simulated from a Cox's model with the baseline hazard function drawn from a Weibull distribution with a shape parameter 10 and a scale parameter 1. The censoring times were generated randomly to achieve censoring rate of 50%. Case 1: Markov Chain The location of non-zero coefficients were generated from a Markov chain with the following probabilities: P(beta_1 = 0) = 0.50, P(beta_{j+1} = 0 | \beta_{j} = 0) = 0.99, P(beta_{j+1} = 0 | \beta_{j} ≠ 0) = 0.50. The location of non-zero coefficients were assumed to be the same across all 100 datasets but the effect sizes of those non-zero coefficients were randomly drawn from Uniform(0.5,5). The covariates were generated from AR(1) with different value of rho=0,0.5, and 0.9. Case 2: Network simulation Gene expression data within an assumed network were simulated. The network consisted of ten disjoint pathways. Each of which contained 100 genes resulting in 1,000 genes in total. Ten regulated genes were assumed in each pathway. The gene expression values were generated from a standard normal distribution. For those regulated genes in the same pathway, the expression values were generated from normal distribution with a correlation rho = 0.7 among those ten regulated pathways. The non-zero coefficients that were drawn from Uniform(0.5, 5).

Files

Categories

Statistics, Bioinformatics

Licence