ouR data generation

Simulating survival outcomes: setting the parameters for the desired distribution

Posted on February 8, 2022

The package simstudy has some functions that facilitate generating survival data using an underlying Weibull distribution. Originally, I added this to the package because I thought it would be interesting to try to do, and I figured it would be useful for me someday (and hopefully some others, as well). Well, now I am working on a project that involves evaluating at least two survival-type processes that are occurring simultaneously. To get a handle on the analytic models we might use, I’ve started to try to simulate a simplified version of the data that we have.

[Read More]

R simulation survival analysis

simstudy update: ordinal data generation that violates proportionality

Posted on January 25, 2022

Version 0.4.0 of simstudy is now available on CRAN and GitHub. This update includes two enhancements (and at least one major bug fix). genOrdCat now includes an argument to generate ordinal data without an assumption of cumulative proportional odds. And two new functions defRepeat and defRepeatAdd make it a bit easier to define multiple variables that share the same distribution assumptions.

Ordinal data

In simstudy, it is relatively easy to specify multinomial distributions that characterize categorical data. Order becomes relevant when the categories take on meanings related to strength of opinion or agreement (as in a Likert-type response) or frequency. A motivating example could be when a response variable takes on four possible values: (1) strongly disagree, (2) disagree, (4) agree, (5) strongly agree. There is a natural order to the response possibilities.

[Read More]

R simstudy

Including uncertainty when comparing response rates across clusters

Posted on January 18, 2022

Since this is a holiday weekend here in the US, I thought I would write up something relatively short and simple since I am supposed to be relaxing. A few weeks ago, someone presented me with some data that showed response rates to a survey that was conducted at about 30 different locations. The team that collected the data was interested in understanding if there were some sites that had response rates that might have been too low. To determine this, they generated a plot that looked something like this:

[Read More]

R Bayesian model

Skeptical Bayesian priors might help minimize skepticism about subgroup analyses

Posted on January 4, 2022

Over the past couple of years, I have been working with an amazing group of investigators as part of the CONTAIN trial to study whether COVID-19 convalescent plasma (CCP) can improve the clinical status of patients hospitalized with COVID-19 and requiring noninvasive supplemental oxygen. This was a multi-site study in the US that randomized 941 patients to either CCP or a saline solution placebo. The overall findings suggest that CCP did not benefit the patients who received it, but if you drill down a little deeper, the story may be more complicated than that.

[Read More]

R Bayesian model

Controlling Type I error in RCTs with interim looks: a Bayesian perspective

Posted on December 21, 2021

Recently, a colleague submitted a paper describing the results of a Bayesian adaptive trial where the research team estimated the probability of effectiveness at various points during the trial. This trial was designed to stop as soon as the probability of effectiveness exceeded a pre-specified threshold. The journal rejected the paper on the grounds that these repeated interim looks inflated the Type I error rate, and increased the chances that any conclusions drawn from the study could have been misleading. Was this a reasonable position for the journal editors to take?

[Read More]

R Bayesian model

Exploring design effects of stepped wedge designs with baseline measurements

Posted on December 7, 2021

In the previous post, I described an incipient effort that I am undertaking with two colleagues, Monica Taljaard and Fan Li, to better understand the implications for collecting baseline measurements on sample size requirements for stepped wedge cluster randomized trials. (The three of us are on the Design and Statistics Core of the NIA IMPACT Collaboratory.) In that post, I conducted a series of simulations that illustrated the design effects in parallel cluster randomized trials derived analytically in a paper by Teerenstra et al. In this post, I am extending those simulations to stepped wedge trials; the hope is that the design effects can be formally derived some point soon.

[Read More]

R Cluster randomized trials

The design effect of a cluster randomized trial with baseline measurements

Posted on November 23, 2021

Is it possible to reduce the sample size requirements of a stepped wedge cluster randomized trial simply by collecting baseline information? In a trial with randomization at the individual level, it is generally the case that if we are able to measure an outcome for subjects at two time periods, first at baseline and then at follow-up, we can reduce the overall sample size. But does this extend to (a) cluster randomized trials generally, and to (b) stepped wedge designs more specifically?

[Read More]

R Cluster randomized trials

simstudy update: adding flexibility to data generation

Posted on November 9, 2021

A new version of simstudy (0.3.0) is now available on CRAN and on the package website. Along with some less exciting bug fixes, we have added capabilities to a few existing features: double-dot variable reference, treatment assignment, and categorical data definition. These simple additions should make the data generation process a little smoother and more flexible.

Using non-scalar double-dot variable reference

Double-dot notation was introduced in the last version of simstudy to allow data definitions to be more dynamic. Previously, the double-dot variable could only be a scalar value, and with the current version, double-dot notation is now also array-friendly.

[Read More]

R simstudy

Sample size requirements for a Bayesian factorial study design

Posted on October 26, 2021

How do you determine sample size when the goal of a study is not to conduct a null hypothesis test but to provide an estimate of multiple effect sizes? I needed to get a handle on this for a recent grant submission, which I’ve been writing about over the past month, here and here. (I provide a little more context for all of this in those earlier posts.) The statistical inference in the study will be based on the estimated posterior distributions from a Bayesian model, so it seems like we’d like those distributions to be as informative as possible. We need to set the sample size large enough to reduce the dispersion of those distributions to a helpful level.

[Read More]

R Bayesian model Stan

A Bayesian analysis of a factorial design focusing on effect size estimates

Posted on October 12, 2021

Factorial study designs present a number of analytic challenges, not least of which is how to best understand whether simultaneously applying multiple interventions is beneficial. Last time I presented a possible approach that focuses on estimating the variance of effect size estimates using a Bayesian model. The scenario I used there focused on a hypothetical study evaluating two interventions with four different levels each. This time around, I am considering a proposed study to reduce emergency department (ED) use for patients living with dementia that I am actually involved with. This study would have three different interventions, but only two levels for each (i.e., yes or no), for a total of 8 arms. In this case - the model I proposed previously does not seem like it would work well; the posterior distributions based on the variance-based model turn out to be bi-modal in shape, making it quite difficult to interpret the findings. So, I decided to turn the focus away from variance and emphasize the effect size estimates for each arm compared to control.

[Read More]

R Bayesian model