A three-arm trial using two-step randomization

Clinical Decision Support (CDS) tools are systems created to support clinical decision-making. Health care professionals using these tools can get guidance about diagnostic and treatment options when providing care to a patient. I’m currently involved with designing a trial focused on comparing a standard CDS tool with an enhanced version (CDS+). The main goal is to directly compare patient-level outcomes for those who have been exposed to the different versions of the CDS. However, we might also be interested in comparing the basic CDS with a control arm, which would suggest some type of three-arm trial.

[Read More]

Creating a nice looking Table 1 with standardized mean differences

I’m in the middle of a perfect storm, winding down three randomized clinical trials (RCTs), with patient recruitment long finished and data collection all wrapped up. This means a lot of data analysis, presentation prep, and paper writing (and not so much blogging). One common (and not so glamorous) thread cutting across all of these RCTs is the need to generate a Table 1, the comparison of baseline characteristics that convinces readers that randomization worked its magic (i.e., that study groups are indeed “comparable”). My primary goal here is to provide some R code to automate the generation of this table, but not before highlighting some issues related to checking for balance and pointing you to a couple of really interesting papers.

[Read More]
R 

Finding logistic models to generate data with desired risk ratio, risk difference and AUC profiles

About two years ago, someone inquired whether simstudy had the functionality to generate data from a logistic model with a specific AUC. It did not, but now it does, thanks to a paper by Peter Austin that describes a nice algorithm to accomplish this. The paper actually describes a series of related algorithms for generating coefficients that target specific prevalence rates, risk ratios, and risk differences, in addition to the AUC. simstudy has a new function logisticCoefs that implements all of these. (The Austin paper also describes an additional algorithm focused on survival outcome data and hazard ratios, but that has not been implemented in simstudy). This post describes the the new function and provides some simple examples.

[Read More]

A demo of power estimation by simulation for a cluster randomized trial with a time-to-event outcome

A colleague reached out for help designing a cluster randomized trial to evaluate a clinical decision support tool for primary care physicians (PCPs), which aims to improve care for high-risk patients. The outcome will be a time-to-event measure, collected at the patient level. The unit of randomization will be the PCP, and one of the key design issues is settling on the number to randomize. Surprisingly, I’ve never been involved with a study that required a clustered survival analysis. So, this particular sample size calculation is new for me, which led to the development of simulations that I can share with you. (There are some analytic solutions to this problem, but there doesn’t seem to a consensus about the best approach to use.)

[Read More]

Generating variable cluster sizes to assess power in cluster randomized trials

In recent discussions with a number of collaborators at the NIA IMPACT Collaboratory about setting the sample size for a proposed cluster randomized trial, the question of variable cluster sizes has come up a number of times. Given a fixed overall sample size, it is generally better (in terms of statistical power) if the sample is equally distributed across the different clusters; highly variable cluster sizes increase the standard errors of effect size estimates and reduce the ability to determine if an intervention or treatment is effective.

[Read More]

Implementing a one-step GEE algorithm for very large cluster sizes in R

Very large data sets can present estimation problems for some statistical models, particularly ones that cannot avoid matrix inversion. For example, generalized estimating equations (GEE) models that are used when individual observations are correlated within groups can have severe computation challenges when the cluster sizes get too large. GEE are often used when repeated measures for an individual are collected over time; the individual is considered the cluster in this analysis. Estimation in this case is not really an issue because the cluster sizes are typically relatively small. However, if there are groups of individuals, we also need to account for correlation. Unfortunately, if these group/cluster sizes are too large - perhaps bigger than 1000 - traditional GEE estimation techniques just may not be feasible.

[Read More]
R 

simstudy 0.6.0 released: more flexible correlation patterns

The new version (0.6.0) of simstudy is available for download from CRAN. In addition to some important bug fixes, I’ve added new functionality that should make data generation with correlated data a little more flexible. In the previous post, I described enhancements to the function genCorMat. As part of this release announcement, I’m describing blockExchangeMat and blockDecayMat, two new functions that can be used to generate correlation matrices when there is a temporal element to the data generation.

[Read More]

Flexible correlation generation: an update to genCorMat in simstudy

I’ve been slowly working on some updates to simstudy, focusing mostly on the functionality to generate correlation matrices (which can be used to simulate correlated data). Here, I’m briefly describing the function genCorMat, which has been updated to facilitate the generation of correlation matrices for clusters of different sizes and with potentially different correlation coefficients.

I’ll briefly describe what the existing function can currently do, and then give an idea about what the enhancements will provide.

[Read More]

A GAM for time trends in a stepped-wedge trial with a binary outcome

In a previous post, I described some ways one might go about analyzing data from a stepped-wedge, cluster-randomized trial using a generalized additive model (a GAM), focusing on continuous outcomes. I have spent the past few weeks developing a similar model for a binary outcome, and have started to explore model comparison and methods to evaluate goodness-of-fit. The following describes some of my thought process.

Data generation

The data generation process I am using here follows along pretty closely with the earlier post, except, of course, the outcome has changed from continuous to binary. In this example, I’ve increased the correlation for between-period effects because it doesn’t seem like outcomes would change substantially from period to period, particularly if the time periods themselves are relatively short. The correlation still decays over time.

[Read More]

Modeling the secular trend in a stepped-wedge design

Recently I started a discussion about modeling secular trends using flexible models in the context of cluster randomized trials. I’ve been motivated by a trial I am involved with that is using a stepped-wedge study design. The initial post focused on more standard parallel designs; here, I want to extend the discussion explicitly to the stepped-wedge design.

The stepped-wedge design

Stepped-wedge designs are a special class of cluster randomized trial where each cluster is observed in both treatment arms (as opposed to the classic parallel design where only some of the clusters receive the treatment). In what is essentially a cross-over design, each cluster transitions in a single direction from control (or pre-intervention) to intervention. I’ve written about this in a number of different contexts (for example, with respect to power analysis, complicated ICC patterns, using Bayesian models for estimation, open cohorts, and baseline measurements to improve efficiency).

[Read More]