This site is a compendium of R code meant to highlight the various uses of simulation to aid in the understanding of probability, statistics, and study design. I will frequently draw on examples using my R package simstudy. Occasionally, I will opine on other topics related to causal inference, evidence, and research more generally.

Is non-inferiority on par with superiority?

It is grant season around here (actually, it is pretty much always grant season), which means another series of problems to tackle. Even with the most straightforward study designs, there is almost always some interesting twist, or an approach that presents a subtle issue or two. In this case, the investigator wants compare two interventions, but doesn’t feel the need to show that one is better than the other. He just wants to see if the newer intervention is not inferior to the more established intervention. [Read More]

How efficient are multifactorial experiments?

I recently described why we might want to conduct a multi-factorial experiment, and I alluded to the fact that this approach can be quite efficient. It is efficient in the sense that it is possible to test simultaneously the impact of multiple interventions using an overall sample size that would be required to test a single intervention in a more traditional RCT. I demonstrate that here, first with a continuous outcome and then with a binary outcome. [Read More]

Testing multiple interventions in a single experiment

A reader recently inquired about functions in simstudy that could generate data for a balanced multi-factorial design. I had to report that nothing really exists. A few weeks later, a colleague of mine asked if I could help estimate the appropriate sample size for a study that plans to use a multi-factorial design to choose among a set of interventions to improve rates of smoking cessation. In the course of exploring this, I realized it would be super helpful if the function suggested by the reader actually existed. [Read More]

Exploring the underlying theory of the chi-square test through simulation - part 2

In the last post, I tried to provide a little insight into the chi-square test. In particular, I used simulation to demonstrate the relationship between the Poisson distribution of counts and the chi-squared distribution. The key point in that post was the role conditioning plays in that relationship by reducing variance. To motivate some of the key issues, I talked a bit about recycling. I asked you to imagine a set of bins placed in different locations to collect glass bottles. [Read More]

Exploring the underlying theory of the chi-square test through simulation - part 1

Kids today are so sophisticated (at least they are in New York City, where I live). While I didn’t hear about the chi-square test of independence until my first stint in graduate school, they’re already talking about it in high school. When my kids came home and started talking about it, I did what I usually do when they come home asking about a new statistical concept. I opened up R and started generating some data. [Read More]

Another reason to be careful about what you control for

Modeling data without any underlying causal theory can sometimes lead you down the wrong path, particularly if you are interested in understanding the way things work rather than making predictions. A while back, I described what can go wrong when you control for a mediator when you are interested in an exposure and an outcome. Here, I describe the potential biases that are introduced when you inadvertently control for a variable that turns out to be a collider. [Read More]

“I have to randomize by cluster. Is it OK if I only have 6 sites?"

The answer is probably no, because there is a not-so-low chance (perhaps considerably higher than 5%) you will draw the wrong conclusions from the study. I have heard variations on this question not so infrequently, so I thought it would be useful (of course) to do a few quick simulations to see what happens when we try to conduct a study under these conditions. (Another question I get every so often, after a study has failed to find an effect: “can we get a post-hoc estimate of the power? [Read More]

Have you ever asked yourself, "how should I approach the classic pre-post analysis?"

Well, maybe you haven’t, but this seems to come up all the time. An investigator wants to assess the effect of an intervention on a outcome. Study participants are randomized either to receive the intervention (could be a new drug, new protocol, behavioral intervention, whatever) or treatment as usual. For each participant, the outcome measure is recorded at baseline - this is the pre in pre/post analysis. The intervention is delivered (or not, in the case of the control group), some time passes, and the outcome is measured a second time. [Read More]

Importance sampling adds an interesting twist to Monte Carlo simulation

I’m contemplating the idea of teaching a course on simulation next fall, so I have been exploring various topics that I might include. (If anyone has great ideas either because you have taught such a course or taken one, definitely drop me a note.) Monte Carlo (MC) simulation is an obvious one. I like the idea of talking about importance sampling, because it sheds light on the idea that not all MC simulations are created equally. [Read More]

Simulating a cost-effectiveness analysis to highlight new functions for generating correlated data

My dissertation work (which I only recently completed - in 2012 - even though I am not exactly young, a whole story on its own) focused on inverse probability weighting methods to estimate a causal cost-effectiveness model. I don’t really do any cost-effectiveness analysis (CEA) anymore, but it came up very recently when some folks in the Netherlands contacted me about using simstudy to generate correlated (and clustered) data to compare different approaches to estimating cost-effectiveness. [Read More]