This site is a compendium of R code meant to highlight the various uses of simulation to aid in the understanding of probability, statistics, and study design. I frequently draw on examples using my R package simstudy. Occasionally, I opine on other topics related to causal inference, evidence, and research more generally.

simstudy just got a little more dynamic: version 0.2.1

simstudy version 0.2.1 has just been submitted to CRAN. Along with this release, the big news is that I’ve been joined by Jacob Wujciak-Jens as a co-author of the package. He initially reached out to me from Germany with some suggestions for improvements, we had a little back and forth, and now here we are. He has substantially reworked the underbelly of simstudy, making the package much easier to maintain, and positioning it for much easier extension. And he implemented an entire system of formalized tests using testthat and hedgehog; that was always my intention, but I never had the wherewithal to pull it off, and Jacob has done that. But, most importantly, it is much more fun to collaborate on this project than to toil away on my own.

[Read More]
R 

Permuted block randomization using simstudy

Along with preparing power analyses and statistical analysis plans (SAPs), generating study randomization lists is something a practicing biostatistician is occasionally asked to do. While not a particularly interesting activity, it offers the opportunity to tackle a small programming challenge. The title is a little misleading because you should probably skip all this and just use the blockrand package if you want to generate randomization schemes; don’t try to reinvent the wheel. But, I can’t resist. Since I was recently asked to generate such a list, I’ve been wondering how hard it would be to accomplish this using simstudy. There are already built-in functions for simulating stratified randomization schemes, so maybe it could be a good solution. The key element that is missing from simstudy, of course, is the permuted block setup.

[Read More]
R 

Generating probabilities for ordinal categorical data

Over the past couple of months, I’ve been describing various aspects of the simulations that we’ve been doing to get ready for a meta-analysis of convalescent plasma treatment for hospitalized patients with COVID-19, most recently here. As I continue to do that, I want to provide motivation and code for a small but important part of the data generating process, which involves creating probabilities for ordinal categorical outcomes using a Dirichlet distribution.

[Read More]
R 

Diagnosing and dealing with degenerate estimation in a Bayesian meta-analysis

The federal government recently granted emergency approval for the use of antibody rich blood plasma when treating hospitalized COVID-19 patients. This announcement is unfortunate, because we really don’t know if this promising treatment works. The best way to determine this, of course, is to conduct an experiment, though this approval makes this more challenging to do; with the general availability of convalescent plasma (CP), there may be resistance from patients and providers against participating in a randomized trial. The emergency approval sends the incorrect message that the treatment is definitively effective. Why would a patient take the risk of receiving a placebo when they have almost guaranteed access to the therapy?

[Read More]

Generating data from a truncated distribution

A researcher reached out to me the other day to see if the simstudy package provides a quick and easy way to generate data from a truncated distribution. Other than the noZeroPoisson distribution option (which is a very specific truncated distribution), there is no way to do this directly. You can always generate data from the full distribution and toss out the observations that fall outside of the truncation range, but this is not exactly efficient, and in practice can get a little messy. I’ve actually had it in the back of my mind to add something like this to simstudy, but have hesitated because it might mean changing (or at least adding to) the defData table structure.

[Read More]
R 

A hurdle model for COVID-19 infections in nursing homes

Late last year, I added a mixture distribution to the simstudy package, largely motivated to accommodate zero-inflated Poisson or negative binomial distributions. (I really thought I had added this two years ago - but time is moving so slowly these days.) These distributions are useful when modeling count data, but we anticipate observing more than the expected frequency of zeros that would arise from a non-inflated (i.e. “regular”) Poisson or negative binomial distribution.

[Read More]
R 

A Bayesian model for a simulated meta-analysis

This is essentially an addendum to the previous post where I simulated data from multiple RCTs to explore an analytic method to pool data across different studies. In that post, I used the nlme package to conduct a meta-analysis based on individual level data of 12 studies. Here, I am presenting an alternative hierarchical modeling approach that uses the Bayesian package rstan.

Create the data set

We’ll use the exact same data generating process as described in some detail in the previous post.

[Read More]

Simulating multiple RCTs to simulate a meta-analysis

I am currently involved with an RCT that is struggling to recruit eligible patients (by no means an unusual problem), increasing the risk that findings might be inconclusive. A possible solution to this conundrum is to find similar, ongoing trials with the aim of pooling data in a single analysis, to conduct a meta-analysis of sorts.

In an ideal world, this theoretical collection of sites would have joined forces to develop a single study protocol, but often there is no structure or funding mechanism to make that happen. However, this group of studies may be similar enough - based on the target patient population, study inclusion and exclusion criteria, therapy protocols, comparison or control condition, randomization scheme, and outcome measurement - that it might be reasonable to estimate a single treatment effect and some measure of uncertainty.

[Read More]
R 

Consider a permutation test for a small pilot study

Recently I wrote about the challenges of trying to learn too much from a small pilot study, even if it is a randomized controlled trial. There are limitations on how much you can learn about a treatment effect given the small sample size and relatively high variability of the estimate. However, the temptation for researchers is usually just too great; it is only natural to want to see if there is any kind of signal of an intervention effect, even though the pilot study is focused on questions of feasibility and acceptability.

[Read More]
R 

When proportional odds is a poor assumption, collapsing categories is probably not going to save you

Continuing the discussion on cumulative odds models I started last time, I want to investigate a solution I always assumed would help mitigate a failure to meet the proportional odds assumption. I’ve believed if there is a large number of categories and the relative cumulative odds between two groups don’t appear proportional across all categorical levels, then a reasonable approach is to reduce the number of categories. In other words, fewer categories translates to proportional odds. I’m not sure what led me to this conclusion, but in this post I’ve created some simulations that seem to throw cold water on that idea.

[Read More]
R