Well, maybe you haven’t, but this seems to come up all the time. An investigator wants to assess the effect of an intervention on a outcome. Study participants are randomized either to receive the intervention (could be a new drug, new protocol, behavioral intervention, whatever) or treatment as usual. For each participant, the outcome measure is recorded at baseline - this is the *pre* in pre/post analysis. The intervention is delivered (or not, in the case of the control group), some time passes, and the outcome is measured a second time. This is our *post*. The question is, how should we analyze this study to draw conclusions about the intervention’s effect on the outcome?

There are at least three possible ways to approach this. (1) Ignore the *pre* outcome measure and just compare the average *post* scores of the two groups. (2) Calculate a *change* score for each individual (\(\Delta_i = post_i - pre_i\)), and compare the average \(\Delta\)’s for each group. Or (3), use a more sophisticated regression model to estimate the intervention effect while *controlling* for the *pre* or baseline measure of the outcome. Here are three models associated with each approach (\(T_i\) is 1 if the individual \(i\) received the treatment, 0 if not, and \(\epsilon_i\) is an error term):

I’ve explored various scenarios (i.e. different data generating assumptions) to see if it matters which approach we use. (Of course it does.)

### When the effect differs by baseline measurement

In a slight variation of the previous scenario, the *effect* of the intervention itself is a now function of the baseline score. Those who score higher will benefit less from the intervention - they simply have less room to improve. In this case, the adjusted model appears slightly inferior to the change model, while the unadjusted *post-only* model is still relatively low powered.

```
defPO <- updateDef(defPO, changevar = "eff",
newformula = "1.9 - 1.9 * pre0/15")
```

```
presults[, .(postonly = mean(p1 <= 0.05),
change = mean(p2 <= 0.05),
adjusted = mean(p3 <= 0.025 | p3x <= 0.025))]
```

```
## postonly change adjusted
## 1: 0.425 0.878 0.863
```

The *adjusted* model has less power than the *change* model, because I used a reduced \(\alpha\)-level for the hypothesis test of the *adjusted* models. I am testing for interaction first, then if that fails, for main effects, so I need to adjust for multiple comparisons. (I have another post that shows why this might be a good thing to do.) I have used a Bonferroni adjustment, which can be a more conservative test. I still prefer the *adjusted* model, because it provides more insight into the underlying process than the *change* model.

### Treatment assignment depends on baseline measurement

Now, slightly off-topic. So far, we’ve been talking about situations where treatment assignment is randomized. What happens in a scenario where those with higher baseline scores are more likely to receive the intervention? Well, if we don’t adjust for the baseline score, we will have unmeasured confounding. A comparison of follow-up scores in the two groups will be biased towards the intervention group if the baseline scores are correlated with follow-up scores - as we see visually with a scenario in which the effect size is set to 0. Also notice that the p-values for the unadjusted model are consistently below 0.05 - we are almost always drawing the wrong conclusion if we use this model. On the other hand, the error rate for the adjusted model is close to 0.05, what we would expect.

```
defPO <- updateDef(defPO, changevar = "eff",
newformula = 0)
dt <- genData(1000, defPO)
dt <- trtObserve(dt, "-4.5 + 0.5 * pre0", logit.link = TRUE)
dt <- addColumns(defObs, dt)
```

```
## postonly change adjusted
## 1: 0.872 0.095 0.046
```

I haven’t proved anything here, but these simulations suggest that we should certainly think twice about using an unadjusted model if we happen to have baseline measurements. And it seems like you are likely to maximize power (and maybe minimize bias) if you compare follow-up scores while adjusting for baseline scores rather than analyzing change in scores by group.