About two years ago, someone inquired whether simstudy had the functionality to generate data from a logistic model with a specific AUC. It did not, but now it does, thanks to a paper by Peter Austin that describes a nice algorithm to accomplish this. The paper actually describes a series of related algorithms for generating coefficients that target specific prevalence rates, risk ratios, and risk differences, in addition to the AUC. simstudy has a new function logisticCoefs that implements all of these. (The Austin paper also describes an additional algorithm focused on survival outcome data and hazard ratios, but that has not been implemented in simstudy). This post describes the the new function and provides some simple examples.
A demo of power estimation by simulation for a cluster randomized trial with a time-to-event outcome
A colleague reached out for help designing a cluster randomized trial to evaluate a clinical decision support tool for primary care physicians (PCPs), which aims to improve care for high-risk patients. The outcome will be a time-to-event measure, collected at the patient level. The unit of randomization will be the PCP, and one of the key design issues is settling on the number to randomize. Surprisingly, I’ve never been involved with a study that required a clustered survival analysis. So, this particular sample size calculation is new for me, which led to the development of simulations that I can share with you. (There are some analytic solutions to this problem, but there doesn’t seem to a consensus about the best approach to use.)
[Read More]Generating variable cluster sizes to assess power in cluster randomized trials
In recent discussions with a number of collaborators at the NIA IMPACT Collaboratory about setting the sample size for a proposed cluster randomized trial, the question of variable cluster sizes has come up a number of times. Given a fixed overall sample size, it is generally better (in terms of statistical power) if the sample is equally distributed across the different clusters; highly variable cluster sizes increase the standard errors of effect size estimates and reduce the ability to determine if an intervention or treatment is effective.
[Read More]Implementing a one-step GEE algorithm for very large cluster sizes in R
Very large data sets can present estimation problems for some statistical models, particularly ones that cannot avoid matrix inversion. For example, generalized estimating equations (GEE) models that are used when individual observations are correlated within groups can have severe computation challenges when the cluster sizes get too large. GEE are often used when repeated measures for an individual are collected over time; the individual is considered the cluster in this analysis. Estimation in this case is not really an issue because the cluster sizes are typically relatively small. However, if there are groups of individuals, we also need to account for correlation. Unfortunately, if these group/cluster sizes are too large - perhaps bigger than 1000 - traditional GEE estimation techniques just may not be feasible.
[Read More]simstudy 0.6.0 released: more flexible correlation patterns
The new version (0.6.0) of simstudy is available for download from CRAN. In addition to some important bug fixes, I’ve added new functionality that should make data generation with correlated data a little more flexible. In the previous post, I described enhancements to the function genCorMat. As part of this release announcement, I’m describing blockExchangeMat and blockDecayMat, two new functions that can be used to generate correlation matrices when there is a temporal element to the data generation.
Flexible correlation generation: an update to genCorMat in simstudy
I’ve been slowly working on some updates to simstudy, focusing mostly on the functionality to generate correlation matrices (which can be used to simulate correlated data). Here, I’m briefly describing the function genCorMat, which has been updated to facilitate the generation of correlation matrices for clusters of different sizes and with potentially different correlation coefficients.
I’ll briefly describe what the existing function can currently do, and then give an idea about what the enhancements will provide.
[Read More]A GAM for time trends in a stepped-wedge trial with a binary outcome
In a previous post, I described some ways one might go about analyzing data from a stepped-wedge, cluster-randomized trial using a generalized additive model (a GAM), focusing on continuous outcomes. I have spent the past few weeks developing a similar model for a binary outcome, and have started to explore model comparison and methods to evaluate goodness-of-fit. The following describes some of my thought process.
Data generation
The data generation process I am using here follows along pretty closely with the earlier post, except, of course, the outcome has changed from continuous to binary. In this example, I’ve increased the correlation for between-period effects because it doesn’t seem like outcomes would change substantially from period to period, particularly if the time periods themselves are relatively short. The correlation still decays over time.
[Read More]Modeling the secular trend in a stepped-wedge design
Recently I started a discussion about modeling secular trends using flexible models in the context of cluster randomized trials. I’ve been motivated by a trial I am involved with that is using a stepped-wedge study design. The initial post focused on more standard parallel designs; here, I want to extend the discussion explicitly to the stepped-wedge design.
The stepped-wedge design
Stepped-wedge designs are a special class of cluster randomized trial where each cluster is observed in both treatment arms (as opposed to the classic parallel design where only some of the clusters receive the treatment). In what is essentially a cross-over design, each cluster transitions in a single direction from control (or pre-intervention) to intervention. I’ve written about this in a number of different contexts (for example, with respect to power analysis, complicated ICC patterns, using Bayesian models for estimation, open cohorts, and baseline measurements to improve efficiency).
[Read More]Generating clustered data with marginal correlations
A student is working on a project to derive an analytic solution to the problem of sample size determination in the context of cluster randomized trials and repeated individual-level measurement (something I’ve thought a little bit about before). Though the goal is an analytic solution, we do want confirmation with simulation. So, I was a little disheartened to discover that the routines I’d developed in simstudy for this were not quite up to the task. I’ve had to quickly fix that, and the updates are available in the development version of simstudy, which can be downloaded using devtools::install_github(“kgoldfeld/simstudy”). While some of the changes are under the hood, I have added a new function, genBlockMat, which I’ll describe here.
Modeling the secular trend in a cluster randomized trial using very flexible models
A key challenge - maybe the key challenge - of a stepped wedge clinical trial design is the threat of confounding by time. This is a cross-over design where the unit of randomization is a group or cluster, where each cluster begins in the control state and transitions to the intervention. It is the transition point that is randomized. Since outcomes could be changing over time regardless of the intervention, it is important to model the time trends when conducting the efficacy analysis. The question is how we choose to model time, and I am going to suggest that we might want to use a very flexible model, such as a cubic spline or a generalized additive model (GAM).
[Read More]