This site is a compendium of R code meant to highlight the various uses of simulation to aid in the understanding of probability, statistics, and study design. I frequently draw on examples using my R package simstudy. Occasionally, I opine on other topics related to causal inference, evidence, and research more generally.

Randomization tests make fewer assumptions and seem pretty intuitive

I’m preparing a lecture on simulation for a statistical modeling class, and I plan on describing a couple of cases where simulation is intrinsic to the analytic method rather than as a tool for exploration and planning. MCMC methods used for Bayesian estimation, bootstrapping, and randomization tests all come to mind.

Randomization tests are particularly interesting as an approach to conducting hypothesis tests, because they allow us to avoid making unrealistic assumptions. I’ve written about this before under the rubric of a permutation test. The example I use here is a little a different; truth be told, the real reason I’m sharing is that I came up with a nice little animation to illustrate a simple randomization process. So, even if I decide not to include it in the lecture, at least you’ve seen it.

[Read More]
R 

Visualizing the treatment effect with an ordinal outcome

If it’s true that many readers of a journal article focus on the abstract, figures and tables while skimming the rest, it is particularly important tell your story with a well conceived graphic or two. Along with a group of collaborators, I am trying to figure out the best way to represent an ordered categorical outcome from an RCT. In this case, there are a lot of categories, so the images can get confusing. I’m sharing a few of the possibilities that I’ve tried so far, including the code.

[Read More]
R 

How useful is it to show uncertainty in a plot comparing proportions?

I recently created a simple plot for a paper describing a pilot study of an intervention targeting depression. This small study was largely conducted to assess the feasibility and acceptability of implementing an existing intervention in a new population. The primary outcome measure that was collected was the proportion of patients in each study arm who remained depressed following the intervention. The plot of the study results that we included in the paper looked something like this:

[Read More]
R 

Finding answers faster for COVID-19: an application of Bayesian predictive probabilities

As we evaluate therapies for COVID-19 to help improve outcomes during the pandemic, researchers need to be able to make recommendations as quickly as possible. There really is no time to lose. The Data & Safety Monitoring Board (DSMB) of COMPILE, a prospective individual patient data meta-analysis, recognizes this. They are regularly monitoring the data to determine if there is a sufficiently strong signal to indicate effectiveness of convalescent plasma (CP) for hospitalized patients not on ventilation.

[Read More]

Coming soon: effortlessly generate ordinal data without assuming proportional odds

I’m starting off 2021 with my 99th post ever to introduce a new feature that will be incorporated into simstudy soon to make it a bit easier to generate ordinal data without requiring an assumption of proportional odds. I should wait until this feature has been incorporated into the development version, but I want to put it out there in case any one has any further suggestions. In any case, having this out in plain view will motivate me to get back to work on the package.

[Read More]
R 

Constrained randomization to evaulate the vaccine rollout in nursing homes

On an incredibly heartening note, two COVID-19 vaccines have been approved for use in the US and other countries around the world. More are possibly on the way. The big challenge, at least here in the United States, is to convince people that these vaccines are safe and effective; we need people to get vaccinated as soon as they are able to slow the spread of this disease. I for one will not hesitate for a moment to get a shot when I have the opportunity, though I don’t think biostatisticians are too high on the priority list.

[Read More]
R 

A Bayesian implementation of a latent threshold model

In the previous post, I described a latent threshold model that might be helpful if we want to dichotomize a continuous predictor but we don’t know the appropriate cut-off point. This was motivated by a need to identify a threshold of antibody levels present in convalescent plasma that is currently being tested as a therapy for hospitalized patients with COVID in a number of RCTs, including those that are particpating in the ongoing COMPILE meta-analysis.

[Read More]

A latent threshold model to dichotomize a continuous predictor

This is the context. In the convalescent plasma pooled individual patient level meta-analysis we are conducting as part of the COMPILE study, there is great interest in understanding the impact of antibody levels on outcomes. (I’ve described various aspects of the analysis in previous posts, most recently here). In other words, not all convalescent plasma is equal.

If we had a clear measure of antibodies, we could model the relationship of these levels with the outcome of interest, such as health status as captured by the WHO 11-point scale or mortality, and call it a day. Unfortunately, at the moment, there is no single measure across the RCTs included in the meta-analysis (though that may change). Until now, the RCTs have used a range of measurement “platforms” (or technologies), which may measure different components of the convalescent plasma using different scales. Given these inconsistencies, it is challenging to build a straightforward model that simply estimates the relationship between antibody levels and clinical outcomes.

[Read More]
R 

Exploring the properties of a Bayesian model using high performance computing

An obvious downside to estimating Bayesian models is that it can take a considerable amount of time merely to fit a model. And if you need to estimate the same model repeatedly, that considerable amount becomes a prohibitive amount. In this post, which is part of a series (last one here) where I’ve been describing various aspects of the Bayesian analyses we plan to conduct for the COMPILE meta-analysis of convalescent plasma RCTs, I’ll present a somewhat elaborate model to illustrate how we have addressed these computing challenges to explore the properties of these models.

[Read More]

A refined brute force method to inform simulation of ordinal response data

Francisco, a researcher from Spain, reached out to me with a challenge. He is interested in exploring various models that estimate correlation across multiple responses to survey questions. This is the context:

  • He doesn’t have access to actual data, so to explore analytic methods he needs to simulate responses.
  • It would be ideal if the simulated data reflect the properties of real-world responses, some of which can be gleaned from the literature.
  • The studies he’s found report only means and standard deviations of the ordinal data, along with the correlation matrices, but not probability distributions of the responses.
  • He’s considering simstudy for his simulations, but the function genOrdCat requires a set of probabilities for each response measure; it doesn’t seem like simstudy will be helpful here.

Ultimately, we needed to figure out if we can we use the empirical means and standard deviations to derive probabilities that will yield those same means and standard deviations when the data are simulated. I thought about this for a bit, and came up with a bit of a work-around; the approach seems to work decently and doesn’t require any outrageous assumptions.

[Read More]
R