A demo of power estimation by simulation for a cluster randomized trial with a time-to-event outcome -

A colleague reached out for help designing a cluster randomized trial to evaluate a clinical decision support tool for primary care physicians (PCPs), which aims to improve care for high-risk patients. The outcome will be a time-to-event measure, collected at the patient level. The unit of randomization will be the PCP, and one of the key design issues is settling on the number to randomize. Surprisingly, I’ve never been involved with a study that required a clustered survival analysis. So, this particular sample size calculation is new for me, which led to the development of simulations that I can share with you. (There are some analytic solutions to this problem, but there doesn’t seem to a consensus about the best approach to use.)

Overview

In tackling this problem, there were four key elements that I needed to work out before actually conducting the power simulations. First, I needed to determine the hypothetical survival curve in the context of a single (control) arm and simulate data to confirm that I could replicate the desired curve. Second, I wanted to generate cluster-level variation so that I could assess the implications of the variance assumptions (still in a single-arm context). Third, I generated two intervention arms (without any clustering) to assess effect size assumptions. And lastly, I generated a full data set that included clustering, randomization, and censoring, and then fit a model that would be the basis for the power analysis to ensure that everything was working as expected. Once this was all completed, I was confident that I could move on to generating estimates of power under a range of sample size and variability assumptions. I apologize in advance for a post that is a bit long, but the agenda is quite packed and includes a lot of code.

Defining shape of survival curve

Defining the shape of the survival curve is made relatively easy using the function survGetParams in the simstudy package. All we need to do is specify some (at least two) coordinates along the curve and the function will return the parameters for the mean and shape of a Weibull function that best fit the points. These parameters are used in the data generation process. In this case, the study’s investigators provided me with a couple of points, indicating that approximately 10% of the sample would have an event by day 30, and half would have an event at day 365. Since the study is following patients at most for 365 days, we will consider anything beyond that to be censored (more on censoring later).

To get things started, here are the libraries needed for all the code that follows:

library(simstudy)
library(data.table)
library(survival)
library(survminer)
library(GGally)
library(coxme)
library(parallel)

Now, we get the parameters that define the survival curve:

points <- list(c(30, 0.90), c(365, .50))
r <- survGetParams(points)
r

## [1] -4.8  1.3

To simulate data from this curve, the time-to-event variable tte is defined using these parameters generating by survGetParams. The observed time is the minimum of one year and the time-to-event, and we create an event indicator if the time-to-even is less than one year.

defs <- defSurv(varname = "tte", formula = r[1], shape = r[2])

defa <- defDataAdd(varname = "time", formula = "min(365, tte)", dist = "nonrandom")
defa <- defDataAdd(defa, "event", "1*(tte <= 365)", dist = "nonrandom")

Generating the data is quite simple in this case:

set.seed(589823)

dd <- genData(1000)
dd <- genSurv(dd, defs, digits = 0)
dd <- addColumns(defa, dd)

dd

##         id  tte time event
##    1:    1  234  234     1
##    2:    2 1190  365     0
##    3:    3  395  365     0
##    4:    4  178  178     1
##    5:    5   72   72     1
##   ---                     
##  996:  996  818  365     0
##  997:  997  446  365     0
##  998:  998  118  118     1
##  999:  999  232  232     1
## 1000: 1000 5308  365     0

The plots below show the source function determined by the parameters on the left and the actual data generated on the right. It appears that the generated data matches the data generation process:

splot <- survParamPlot(r[1], r[2], points = points, n = 1000, limits = c(0, 365) )

fit <- survfit(Surv(time, event) ~ 1, data = dd)

j <- ggsurv(fit, CI = FALSE, surv.col = "#ed7c67", size.est = 0.8) + 
  theme(panel.grid = element_blank(),
        axis.text = element_text(size = 7.5),
        axis.title = element_text(size = 8, face = "bold"),
        plot.title = element_blank()) +
  ylim(0, 1) +
  xlab("time") + ylab("probability of survival")

ggarrange(splot, j, ncol = 2, nrow = 1)

Evaluating cluster variation

Cluster variation in the context of survival curves implies that there is a cluster-specific survival curve. This variation is induced with a random effect in in the data generation process. In this case, I am assuming a normally distributed random effect with mean 0 and some variance (distributions other than a normal distribution can be used). The variance assumption is a key one (which will ultimately impact the estimates of power), and I explore that a bit more in the second part of this section.

Visualizing cluster variation

The data generation process is a tad more involved than above, though not much more. We need to generate clusters and their random effect first, before adding the individuals. tte is now a function of the distribution parameters as well as the cluster random effect b. We are still using a single arm and assuming that everyone is followed for one year. In the first simulation, we set the random effect variance \(b = 0.1\).

defc <- defData(varname = "b", formula = 0, variance = 0.1)

defs <- defSurv(varname = "tte", formula = "r[1] + b", shape = r[2])

defa <- defDataAdd(varname = "time", formula = "min(365, tte)", dist = "nonrandom")
defa <- defDataAdd(defa, "event", "1*(tte <= 365)", dist = "nonrandom")

dc <- genData(20, defc, id = "pcp")
dd <- genCluster(dc, "pcp", numIndsVar = 1000, "id")
dd <- genSurv(dd, defs, digits = 0)
dd.10 <- addColumns(defa, dd)
dd.10

##        pcp    b    id  tte time event
##     1:   1 0.25     1  788  365     0
##     2:   1 0.25     2   93   93     1
##     3:   1 0.25     3  723  365     0
##     4:   1 0.25     4  151  151     1
##     5:   1 0.25     5 1367  365     0
##    ---                               
## 19996:  20 0.22 19996  424  365     0
## 19997:  20 0.22 19997  207  207     1
## 19998:  20 0.22 19998   70   70     1
## 19999:  20 0.22 19999  669  365     0
## 20000:  20 0.22 20000  994  365     0

The following plot shows two sets of survival curves, each based on different levels of variation, 0.10 on the left, and 0.01 on the right. Each curve represents a different cluster. With this plot, we get a clear visualization of how variance assumption of the random effect impacts the variation of the survival curves:

Variation of the probability of an event across clusters

The plot of the survival curves is only one way to consider the impact of cluster variation. Another option is to look at the binary event outcome under the assumption of no censoring. I like to evaluate the variation in the probability of an event across the clusters, particularly by looking at the range of probabilities, or considering the coefficient of variation, which is \(\sigma / \mu\).

To show how this is done, I am generating a data set with a very large number of clusters (2000) and a large cluster size (500), and then calculating the probability of an event for each cluster:

defc <- defData(varname = "b", formula = 0, variance = 0.100)

dc <- genData(2000, defc, id = "pcp")
dd <- genCluster(dc, "pcp", numIndsVar = 500, "id")
dd <- genSurv(dd, defs, digits = 0)
dd <- addColumns(defa, dd)

ds <- dd[, .(p = mean(event)), keyby = pcp]
ds

##        pcp    p
##    1:    1 0.41
##    2:    2 0.43
##    3:    3 0.36
##    4:    4 0.51
##    5:    5 0.50
##   ---          
## 1996: 1996 0.82
## 1997: 1997 0.53
## 1998: 1998 0.88
## 1999: 1999 0.42
## 2000: 2000 0.63

Here is the distribution of observed cluster-level proportions:

Here are the mean probability, the standard deviation of probabilities, the coefficient of variation for the probabilities, and the 95% interval of the probabilities when the random effect variance in the survival generation process is 0.10:

ds[, .(mu = mean(p), s = sd(p), cv = sd(p)/mean(p))]

##     mu    s   cv
## 1: 0.5 0.11 0.22

ds[, .(quantile(p, probs = c(0.025, .975)))]

##      V1
## 1: 0.31
## 2: 0.72

To compare across a range variance assumptions, I’ve generated ten data sets and plotted the results below. If you hover over the points, you will get the CV estimate. This could be helpful in helping collaborators decide what levels of variance is appropriate to focus on in the final power estimation and sample size determination.

In this case, we’ve decided that a coefficient of variation is not likely to exceed 0.17 (with a corresponding 95% interval of proportions ranging from 35% to 66%), so we’ll consider that when evaluating power.

Evaluating the effect size

Next, we generate data that includes treatment assignment (but excludes cluster variability and censoring before one year). The treatment effect is expressed as a log hazard ratio, which in this case 0.4 (equal to a hazard ratio of just about 1.5).

The data generation starts with treatment assignment, adds the time-to-event survival data, and then adds the one-year censoring data, as before:

defa <- defData(varname = "rx", formula = "1;1", dist = "trtAssign")

defs <- defSurv(varname = "tte", formula = "r[1] + 0.4 * rx", shape = r[2])

defe <- defDataAdd(varname = "time", formula = "min(365, tte)", dist = "nonrandom")
defe <- defDataAdd(defe, "event", "1*(tte <= 365)", dist = "nonrandom")

dd <- genData(1000, defa)
dd <- genSurv(dd, defs, digits = 0)
dd <- addColumns(defe, dd)

dd

##         id rx  tte time event
##    1:    1  0 1902  365     0
##    2:    2  0  286  286     1
##    3:    3  0  480  365     0
##    4:    4  0   32   32     1
##    5:    5  1   12   12     1
##   ---                        
##  996:  996  0  663  365     0
##  997:  997  1  962  365     0
##  998:  998  1   19   19     1
##  999:  999  1  502  365     0
## 1000: 1000  0   85   85     1

The plot of the survival curves by treatment arms provides a visualization of the treatment effect:

Again, since we have no censoring, we can estimate the probability of an event within 365 days for each arm:

dd[, .(p = mean(event)), keyby = rx]

##         rx    p
## 1: control 0.51
## 2:      rx 0.63

An effect size of 0.4 on the log hazard scale translates to an odds ratio of about 1.64, a risk ratio of 1.24, and risk difference of 12 percentage points. In the absence of any external data on the potential effect size, we can use an effect size that is minimally clinically meaningful based on any or all these effect size measurements.

Complete data generation and model estimation

With the pieces in place, we are ready to put it all together and add censoring to the mix to finalize the full data generating process. We will fit a mixed-effects Cox proportional hazards model to see if we can recover the parameters that we have used to generate the data, and, if that goes well, we will be ready to estimate power.

We start in defc by defining the cluster-level random effect variation and treatment assignment design (in this case 1 to 1, treatment to control). We add a censoring process in defa. This assumes that we will be enrolling patients for six months, spread out across this time period. The study will last exactly one year, so every patient will be followed for at least six months, and only some will be followed for a full year (i.e., those who join the study on the first day).

Finally, defs defines the data generation process for the survival outcome, which we’ve seen above, though now we have both a treatment effect and a random effect, in addition to the baseline parameters in the vector r.

defc <- defData(varname = "b", formula = 0, variance = 0.05)
defc <- defData(defc, varname = "rx", formula = "1;1", dist = "trtAssign")

defa <- defDataAdd(varname = "start_day", formula = "1;182", dist = "uniformInt")
defa <- defDataAdd(defa, varname = "censor", 
  formula = "365 - start_day ", dist = "nonrandom")

defs <- defSurv(varname = "tte", formula = "r[1] + 0.4 * rx + b", shape = r[2])

The data generation is the same as before, though there is an additional censoring process, which is done with the function addCompRisk:

dc <- genData(500, defc, id = "pcp")
dd <- genCluster(dc, "pcp", numIndsVar = 200, "id")
dd <- addColumns(defa, dd)
dd <- genSurv(dd, defs, digits = 0)
dd <- addCompRisk(dd, events = c("tte", "censor"), 
  timeName = "time", censorName = "censor", keepEvents = TRUE)

dd

##         pcp       b rx     id start_day censor  tte time event   type
##      1:   1 -0.0158  1      1        65    300  307  300     0 censor
##      2:   1 -0.0158  1      2       113    252  907  252     0 censor
##      3:   1 -0.0158  1      3       121    244  110  110     1    tte
##      4:   1 -0.0158  1      4        60    305  170  170     1    tte
##      5:   1 -0.0158  1      5        95    270 2673  270     0 censor
##     ---                                                              
##  99996: 500 -0.0015  0  99996       160    205  291  205     0 censor
##  99997: 500 -0.0015  0  99997        79    286  193  193     1    tte
##  99998: 500 -0.0015  0  99998        64    301   28   28     1    tte
##  99999: 500 -0.0015  0  99999       153    212  459  212     0 censor
## 100000: 500 -0.0015  0 100000        19    346  298  298     1    tte

Since we have generated a rather large data set, we should be able to recover the parameters pretty closely if we are using the correct model. We are going to fit a mixed effects survival model (also known as a frailty model) to see how well we did.

fit_coxme <- coxme(Surv(time, event) ~ rx + (1 | pcp), data = dd)
summary(fit_coxme)

## Cox mixed-effects model fit by maximum likelihood
##   Data: dd
##   events, n = 49955, 1e+05
##   Iterations= 11 48 
##                   NULL Integrated  Fitted
## Log-likelihood -555429    -553708 -553058
## 
##                   Chisq  df p  AIC  BIC
## Integrated loglik  3442   2 0 3438 3421
##  Penalized loglik  4743 414 0 3914  259
## 
## Model:  Surv(time, event) ~ rx + (1 | pcp) 
## Fixed coefficients
##    coef exp(coef) se(coef)  z p
## rx 0.39       1.5    0.022 18 0
## 
## Random effects
##  Group Variable  Std Dev Variance
##  pcp   Intercept 0.22    0.05

Pretty good! The estimated HR of 0.39 (95% CI: 0.35 - 0.43) is on target (we used 0.40 in the data generation process), and the estimated variance for the PCP random effect was 0.05, also on the mark. I’d say we are ready to proceed to the final step.

Power estimation

To conduct the power estimation, I’ve essentially wrapped the data generation and model estimation code in a collection of functions that can be called repeatedly to generate multiple data sets and model estimates The goal is to calculate the proportion of data sets with a statistically significant result for a particular set of assumptions (i.e., the estimate of power for the assumed effect size, variation, and sample sizes). I’ve provided the code below in the addendum in case you haven’t grown weary of all this detail. I described a general framework for using simulation to estimate sample size/power, and I’m largely following that process here.

I’ve written a little function scenario_list (which I’m now thinking I should add to simstudy) to create different parameter combinations that will determine the power estimation. In this case, the parameters I am interested in are the number of PCPs that should be randomized and the variance assumption. The number of patients per PCP (cluster size) is also important to vary, but for illustration purposes here I am keeping it constant.

Here is the simplified scenario list with four possible combinations:

scenario_list <- function(...) {
  argmat <- expand.grid(...)
  return(asplit(argmat, MARGIN = 1))
}

npcps <- c(20, 30)
npats <- c(15)
s2 <- c(0.03, 0.04)

scenarios <- scenario_list(npcps = npcps, npats = npats, s2 = s2)

scenarios

## [[1]]
## npcps npats    s2 
## 20.00 15.00  0.03 
## 
## [[2]]
## npcps npats    s2 
## 30.00 15.00  0.03 
## 
## [[3]]
## npcps npats    s2 
## 20.00 15.00  0.04 
## 
## [[4]]
## npcps npats    s2 
## 30.00 15.00  0.04

I use the mclapply function in the parallel package to generate three iterations for each scenario:

model.ests <- mclapply(scenarios, function(a) s_scenarios(a, nrep = 3))

model.ests

## [[1]]
##    npcps npats   s2 est_s re.var_s   p_s
## 1:    20    15 0.03  0.41   0.1596 0.100
## 2:    20    15 0.03  0.32   0.0306 0.065
## 3:    20    15 0.03  0.32   0.0004 0.041
## 
## [[2]]
##    npcps npats   s2 est_s re.var_s     p_s
## 1:    30    15 0.03  0.13    0.017 3.5e-01
## 2:    30    15 0.03  0.62    0.029 7.3e-05
## 3:    30    15 0.03  0.45    0.040 3.8e-03
## 
## [[3]]
##    npcps npats   s2 est_s re.var_s    p_s
## 1:    20    15 0.04  0.55  7.9e-02 0.0084
## 2:    20    15 0.04  0.07  4.0e-04 0.6700
## 3:    20    15 0.04  0.24  8.2e-05 0.1500
## 
## [[4]]
##    npcps npats   s2 est_s re.var_s     p_s
## 1:    30    15 0.04  0.42  8.3e-05 2.7e-03
## 2:    30    15 0.04  0.57  4.0e-04 2.6e-05
## 3:    30    15 0.04  0.36  1.0e-01 4.2e-02

In the actual power calculations, which are reported below, I used 60 scenarios defined by these data generation parameters:

npcps <- c(18, 24, 30, 36)
npats <- c(15, 18, 21)
s2 <- c(0.01, 0.02, 0.03, 0.04, 0.05)

For each of these scenarios, I generated 5000 data sets and estimated models for each (i.e., a total of 300,000 data sets and model fits). For each of the 60 scenarios, I estimated the proportion of the 5000 model fits that yielded a p-value < 0.05 for the estimated log hazard ratio. I had the benefit of using a high performance computer, because running this on my laptop would have taken well over 10 hours (only about 10 minutes on the HPC).

At the end, we have a plot of “power curves” that shows estimated power for each of the scenarios. If we assume that we can expect at least 18 patients per PCP and that the between-PCP variance will be around 0,03 or 0.04, we should be OK randomizing 30 PCPs (15 in each arm), though it might more prudent to go with 36, just to be safe:

Addendum

Here is the code I used to generate the data for the power curve plot. It is based on the framework I mentioned earlier. There is one extra function here, extract_coxme_table, which I pulled from stackoverflow, because there is currently no obvious way to extract data from the coxme model fit.

extract_coxme_table <- function (mod) {
  beta <- mod$coefficients 
  nvar <- length(beta)
  nfrail <- nrow(mod$var) - nvar
  se <- sqrt(diag(mod$var)[nfrail + 1:nvar])
  z <- round(beta/se, 2)
  p <- signif(1 - pchisq((beta/se)^2, 1), 2)
  table=data.table(beta = beta, se = se, z = z, p = p)
  return(table)
}

s_def <- function() {
  
  defc <- defData(varname = "b", formula = 0, variance = "..s2")
  defc <- defData(defc, varname = "rx", formula = "1;1", dist = "trtAssign")
  
  defa <- defDataAdd(varname = "start_day", formula = "1;182", dist = "uniformInt")
  defa <- defDataAdd(defa, varname = "censor", 
                     formula = "365 - start_day ", dist = "nonrandom")
  
  defs <- defSurv(varname = "tte", formula = "-4.815 + 0.4 * rx + b", shape = 1.326)
  
  defa2 <- defDataAdd(varname = "event6", 
                      formula = "1*(tte <= 182)", dist = "nonrandom")
  
  return(list(defc = defc, defa = defa, defs = defs, defa2 = defa2))
  
}

s_generate <- function(argsvec, list_of_defs) {
  
  list2env(list_of_defs, envir = environment())
  list2env(as.list(argsvec), envir = environment())
  
  dc <- genData(npcps, defc, id = "pcp")
  dd <- genCluster(dc, "pcp", npats, "id")
  dd <- addColumns(defa, dd)
  dd <- genSurv(dd, defs, digits = 0)
  dx <- addCompRisk(dd, events = c("tte", "censor"), 
                    timeName = "time", censorName = "censor", keepEvents = TRUE)
  dx <- addColumns(defa2, dx)
  
  dx[]
  
}

s_replicate <- function(argsvec, list_of_defs) {
  
  dx <- s_generate(argsvec, list_of_defs)
  
  coxfitm <-coxme(Surv(time, event) ~ rx + (1 | pcp), data = dx)
  
  list2env(as.list(argsvec), envir = environment())
  
  return(data.table(
    npcps = npcps,
    npats = npats,
    s2 = s2,
    est_s = fixef(coxfitm), 
    re.var_s = VarCorr(coxfitm)$pcp,
    p_s = extract_coxme_table(coxfitm)$p
  ))
  
}

s_scenarios <- function(argsvec, nreps) {
  
  list_of_defs <- s_def()
  
  rbindlist(
    parallel::mclapply(
      X = 1 : nreps, 
      FUN = function(x) s_replicate(argsvec, list_of_defs), 
      mc.cores = 4)
  )
  
}

R Cluster randomized trials survival analysis