This site is a compendium of R code meant to highlight the various uses of simulation to aid in the understanding of probability, statistics, and study design. I will frequently draw on examples using my R package simstudy. Occasionally, I will opine on other topics related to causal inference, evidence, and research more generally.

## A simstudy update provides an excuse to generate and display Likert-type data

I just updated simstudy to version 0.1.7. It is available on CRAN. To mark the occasion, I wanted to highlight a new function, genOrdCat, which puts into practice some code that I presented a little while back as part of a discussion of ordinal logistic regression. The new function was motivated by a reader/researcher who came across my blog while wrestling with a simulation study. After a little back and forth about how to generate ordinal categorical data, I ended up with a function that might be useful. [Read More]

## Thinking about different ways to analyze sub-groups in an RCT

Here’s the scenario: we have an intervention that we think will improve outcomes for a particular population. Furthermore, there are two sub-groups (let’s say defined by which of two medical conditions each person in the population has) and we are interested in knowing if the intervention effect is different for each sub-group. And here’s the question: what is the ideal way to set up a study so that we can assess (1) the intervention effects on the group as a whole, but also (2) the sub-group specific intervention effects? [Read More]

## Who knew likelihood functions could be so pretty?

I just released a new iteration of simstudy (version 0.1.6), which fixes a bug or two and adds several spline related routines (available on CRAN). The previous post focused on using spline curves to generate data, so I won’t repeat myself here. And, apropos of nothing really - I thought I’d take the opportunity to do a simple simulation to briefly explore the likelihood function. It turns out if we generate lots of them, it can be pretty, and maybe provide a little insight. [Read More]

## Can we use B-splines to generate non-linear data?

I’m exploring the idea of adding a function or set of functions to the simstudy package that would make it possible to easily generate non-linear data. One way to do this would be using B-splines. Typically, one uses splines to fit a curve to data, but I thought it might be useful to switch things around a bit to use the underlying splines to generate data. This would facilitate exploring models where we know the assumption of linearity is violated. [Read More]

## A minor update to simstudy provides an excuse to talk a bit about the negative binomial and Poisson distributions

I just updated simstudy to version 0.1.5 (available on CRAN) so that it now includes several new distributions - exponential, discrete uniform, and negative binomial. As part of the release, I thought I’d explore the negative binomial just a bit, particularly as it relates to the Poisson distribution. The Poisson distribution is a discrete (integer) distribution of outcomes of non-negative values that is often used to describe count outcomes. It is characterized by a mean (or rate) and its variance equals its mean. [Read More]

## CACE closed: EM opens up exclusion restriction (among other things)

This is the third, and probably last, of a series of posts touching on the estimation of complier average causal effects (CACE) and latent variable modeling techniques using an expectation-maximization (EM) algorithm . What follows is a simplistic way to implement an EM algorithm in R to do principal strata estimation of CACE. The EM algorithm In this approach, we assume that individuals fall into one of three possible groups - never-takers, always-takers, and compliers - but we cannot see who is who (except in a couple of cases). [Read More]

## A simstudy update provides an excuse to talk a little bit about latent class regression and the EM algorithm

I was just going to make a quick announcement to let folks know that I’ve updated the simstudy package to version 0.1.4 (now available on CRAN) to include functions that allow conversion of columns to factors, creation of dummy variables, and most importantly, specification of outcomes that are more flexibly conditional on previously defined variables. But, as I was coming up with an example that might illustrate the added conditional functionality, I found myself playing with package flexmix, which uses an Expectation-Maximization (EM) algorithm to estimate latent classes and fit regression models. [Read More]

## Complier average causal effect? Exploring what we learn from an RCT with participants who don't do what they are told

Inspired by a free online course titled Complier Average Causal Effects (CACE) Analysis and taught by Booil Jo and Elizabeth Stuart (through Johns Hopkins University), I’ve decided to explore the topic a little bit. My goal here isn’t to explain CACE analysis in extensive detail (you should definitely go take the course for that), but to describe the problem generally and then (of course) simulate some data. A plot of the simulated data gives a sense of what we are estimating and assuming. [Read More]

## Further considerations of a hidden process underlying categorical responses

In my previous post, I described a continuous data generating process that can be used to generate discrete, categorical outcomes. In that post, I focused largely on binary outcomes and simple logistic regression just because things are always easier to follow when there are fewer moving parts. Here, I am going to focus on a situation where we have multiple outcomes, but with a slight twist - these groups of interest can be interpreted in an ordered way. [Read More]

## A hidden process behind binary or other categorical outcomes?

I was thinking a lot about proportional-odds cumulative logit models last fall while designing a study to evaluate an intervention’s effect on meat consumption. After a fairly extensive pilot study, we had determined that participants can have quite a difficult time recalling precise quantities of meat consumption, so we were forced to move to a categorical response. (This was somewhat unfortunate, because we would not have continuous or even count outcomes, and as a result, might not be able to pick up small changes in behavior. [Read More]