This site is a compendium of R code meant to highlight the various uses of simulation to aid in the understanding of probability, statistics, and study design. I frequently draw on examples using my R package simstudy. Occasionally, I opine on other topics related to causal inference, evidence, and research more generally.

To impute or not: the case of an RCT with baseline and follow-up measurements

Under normal conditions, conducting a randomized clinical trial is challenging. Throw in a pandemic and things like site selection, patient recruitment and patient follow-up can be particularly vexing. In any study, subjects need to be retained long enough so that outcomes can be measured; during a period when there are so many potential disruptions, this can become quite difficult. This issue of loss to follow-up recently came up during a conversation among a group of researchers who were troubleshooting challenges they are all experiencing in their ongoing trials. [Read More]

Simulating time-to-event outcomes with non-proportional hazards

As I mentioned last time, I am working on an update of simstudy that will make generating survival/time-to-event data a bit more flexible. I previously presented the functionality related to competing risks, and this time I’ll describe generating survival data that has time-dependent hazard ratios. (As I mentioned last time, if you want to try this at home, you will need the development version of simstudy that you can install using devtools::install_github(“kgoldfeld/simstudy”). [Read More]

Adding competing risks in survival data generation

I am working on an update of simstudy that will make generating survival/time-to-event data a bit more flexible. There are two biggish enhancements. The first facilitates generation of competing events, and the second allows for the possibility of generating survival data that has time-dependent hazard ratios. This post focuses on the first enhancement, and a follow up will provide examples of the second. (If you want to try this at home, you will need the development version of simstudy, which you can install using devtools::install_github(“kgoldfeld/simstudy”). [Read More]

Follow-up: simstudy function for generating parameters for survival distribution

In the previous post I described how to determine the parameter values for generating a Weibull survival curve that reflects a desired distribution defined by two points along the curve. I went ahead and implemented these ideas in the development version of simstudy 0.4.0.9000, expanding the idea to allow for any number of points rather than just two. This post provides a brief overview of the approach, the code, and a simple example using the parameters to generate simulated data. [Read More]

Simulating survival outcomes: setting the parameters for the desired distribution

The package simstudy has some functions that facilitate generating survival data using an underlying Weibull distribution. Originally, I added this to the package because I thought it would be interesting to try to do, and I figured it would be useful for me someday (and hopefully some others, as well). Well, now I am working on a project that involves evaluating at least two survival-type processes that are occurring simultaneously. [Read More]

simstudy update: ordinal data generation that violates proportionality

Version 0.4.0 of simstudy is now available on CRAN and GitHub. This update includes two enhancements (and at least one major bug fix). genOrdCat now includes an argument to generate ordinal data without an assumption of cumulative proportional odds. And two new functions defRepeat and defRepeatAdd make it a bit easier to define multiple variables that share the same distribution assumptions. Ordinal data In simstudy, it is relatively easy to specify multinomial distributions that characterize categorical data. [Read More]
R  simstudy 

Including uncertainty when comparing response rates across clusters

Since this is a holiday weekend here in the US, I thought I would write up something relatively short and simple since I am supposed to be relaxing. A few weeks ago, someone presented me with some data that showed response rates to a survey that was conducted at about 30 different locations. The team that collected the data was interested in understanding if there were some sites that had response rates that might have been too low. [Read More]

Skeptical Bayesian priors might help minimize skepticism about subgroup analyses

Over the past couple of years, I have been working with an amazing group of investigators as part of the CONTAIN trial to study whether COVID-19 convalescent plasma (CCP) can improve the clinical status of patients hospitalized with COVID-19 and requiring noninvasive supplemental oxygen. This was a multi-site study in the US that randomized 941 patients to either CCP or a saline solution placebo. The overall findings suggest that CCP did not benefit the patients who received it, but if you drill down a little deeper, the story may be more complicated than that. [Read More]

Controlling Type I error in RCTs with interim looks: a Bayesian perspective

Recently, a colleague submitted a paper describing the results of a Bayesian adaptive trial where the research team estimated the probability of effectiveness at various points during the trial. This trial was designed to stop as soon as the probability of effectiveness exceeded a pre-specified threshold. The journal rejected the paper on the grounds that these repeated interim looks inflated the Type I error rate, and increased the chances that any conclusions drawn from the study could have been misleading. [Read More]

Exploring design effects of stepped wedge designs with baseline measurements

In the previous post, I described an incipient effort that I am undertaking with two colleagues, Monica Taljaard and Fan Li, to better understand the implications for collecting baseline measurements on sample size requirements for stepped wedge cluster randomized trials. (The three of us are on the Design and Statistics Core of the NIA IMPACT Collaboratory.) In that post, I conducted a series of simulations that illustrated the design effects in parallel cluster randomized trials derived analytically in a paper by Teerenstra et al. [Read More]