“I have to randomize by cluster. Is it OK if I only have 6 sites?"

The answer is probably no, because there is a not-so-low chance (perhaps considerably higher than 5%) you will draw the wrong conclusions from the study. I have heard variations on this question not so infrequently, so I thought it would be useful (of course) to do a few quick simulations to see what happens when we try to conduct a study under these conditions. (Another question I get every so often, after a study has failed to find an effect: “can we get a post-hoc estimate of the power? [Read More]

Have you ever asked yourself, "how should I approach the classic pre-post analysis?"

Well, maybe you haven’t, but this seems to come up all the time. An investigator wants to assess the effect of an intervention on a outcome. Study participants are randomized either to receive the intervention (could be a new drug, new protocol, behavioral intervention, whatever) or treatment as usual. For each participant, the outcome measure is recorded at baseline - this is the pre in pre/post analysis. The intervention is delivered (or not, in the case of the control group), some time passes, and the outcome is measured a second time. [Read More]

Importance sampling adds an interesting twist to Monte Carlo simulation

I’m contemplating the idea of teaching a course on simulation next fall, so I have been exploring various topics that I might include. (If anyone has great ideas either because you have taught such a course or taken one, definitely drop me a note.) Monte Carlo (MC) simulation is an obvious one. I like the idea of talking about importance sampling, because it sheds light on the idea that not all MC simulations are created equally. [Read More]

Simulating a cost-effectiveness analysis to highlight new functions for generating correlated data

My dissertation work (which I only recently completed - in 2012 - even though I am not exactly young, a whole story on its own) focused on inverse probability weighting methods to estimate a causal cost-effectiveness model. I don’t really do any cost-effectiveness analysis (CEA) anymore, but it came up very recently when some folks in the Netherlands contacted me about using simstudy to generate correlated (and clustered) data to compare different approaches to estimating cost-effectiveness. [Read More]

When there's a fork in the road, take it. Or, taking a look at marginal structural models.

I am going to cut right to the chase, since this is the third of three posts related to confounding and weighting, and it’s kind of a long one. (If you want to catch up, the first two are here and here.) My aim with these three posts is to provide a basic explanation of the marginal structural model (MSM) and how we should interpret the estimates. This is obviously a very rich topic with a vast literature, so if you remain interested in the topic, I recommend checking out this (as of yet unpublished) text book by Hernán & Robins for starters. [Read More]

When you use inverse probability weighting for estimation, what are the weights actually doing?

Towards the end of Part 1 of this short series on confounding, IPW, and (hopefully) marginal structural models, I talked a little bit about the fact that inverse probability weighting (IPW) can provide unbiased estimates of marginal causal effects in the context of confounding just as more traditional regression models like OLS can. I used an example based on a normally distributed outcome. Now, that example wasn’t super interesting, because in the case of a linear model with homogeneous treatment effects (i. [Read More]

Characterizing the variance for clustered data that are Gamma distributed

Way back when I was studying algebra and wrestling with one word problem after another (I think now they call them story problems), I complained to my father. He laughed and told me to get used to it. “Life is one big word problem,” is how he put it. Well, maybe one could say any statistical analysis is really just some form of multilevel data analysis, whether we treat it that way or not. [Read More]

Visualizing how confounding biases estimates of population-wide (or marginal) average causal effects

When we are trying to assess the effect of an exposure or intervention on an outcome, confounding is an ever-present threat to our ability to draw the proper conclusions. My goal (starting here and continuing in upcoming posts) is to think a bit about how to characterize confounding in a way that makes it possible to literally see why improperly estimating intervention effects might lead to bias. Confounding, potential outcomes, and causal effects Typically, we think of a confounder as a factor that influences both exposure and outcome. [Read More]

A simstudy update provides an excuse to generate and display Likert-type data

I just updated simstudy to version 0.1.7. It is available on CRAN. To mark the occasion, I wanted to highlight a new function, genOrdCat, which puts into practice some code that I presented a little while back as part of a discussion of ordinal logistic regression. The new function was motivated by a reader/researcher who came across my blog while wrestling with a simulation study. After a little back and forth about how to generate ordinal categorical data, I ended up with a function that might be useful. [Read More]

Thinking about different ways to analyze sub-groups in an RCT

Here’s the scenario: we have an intervention that we think will improve outcomes for a particular population. Furthermore, there are two sub-groups (let’s say defined by which of two medical conditions each person in the population has) and we are interested in knowing if the intervention effect is different for each sub-group. And here’s the question: what is the ideal way to set up a study so that we can assess (1) the intervention effects on the group as a whole, but also (2) the sub-group specific intervention effects? [Read More]