R -

Estimating treatment effects (and ICCs) for stepped-wedge designs

Posted on July 16, 2019

In the last two posts, I introduced the notion of time-varying intra-cluster correlations in the context of stepped-wedge study designs. (See here and here). Though I generated lots of data for those posts, I didn’t fit any models to see if I could recover the estimates and any underlying assumptions. That’s what I am doing now.

My focus here is on the simplest case, where the ICC’s are constant over time and between time. Typically, I would just use a mixed-effects model to estimate the treatment effect and account for variability across clusters, which is easily done in R using the lme4 package; if the outcome is continuous the function lmer is appropriate. I thought, however, it would also be interesting to use the rstan package to fit a Bayesian hierarchical model.

[Read More]

R

More on those stepped-wedge design assumptions: varying intra-cluster correlations over time

Posted on July 9, 2019

In my last post, I wrote about within- and between-period intra-cluster correlations in the context of stepped-wedge cluster randomized study designs. These are quite important to understand when figuring out sample size requirements (and models for analysis, which I’ll be writing about soon.) Here, I’m extending the constant ICC assumption I presented last time around by introducing some complexity into the correlation structure. Much of the code I am using can be found in last week’s post, so if anything seems a little unclear, hop over here.

[Read More]

R

Planning a stepped-wedge trial? Make sure you know what you're assuming about intra-cluster correlations ...

Posted on June 25, 2019

A few weeks ago, I was at the annual meeting of the NIH Collaboratory, which is an innovative collection of collaboratory cores, demonstration projects, and NIH Institutes and Centers that is developing new models for implementing and supporting large-scale health services research. A study I am involved with - Primary Palliative Care for Emergency Medicine - is one of the demonstration projects in this collaboratory.

The second day of this meeting included four panels devoted to the design and analysis of embedded pragmatic clinical trials, and focused on the challenges of conducting rigorous research in the real-world context of a health delivery system. The keynote address that started off the day was presented by David Murray of NIH, who talked about the challenges and limitations of cluster randomized trials. (I’ve written before on issues related to clustered randomized trials, including here.)

[Read More]

R

Don't get too excited - it might just be regression to the mean

Posted on June 11, 2019

It is always exciting to find an interesting pattern in the data that seems to point to some important difference or relationship. A while ago, one of my colleagues shared a figure with me that looked something like this:

It looks like something is going on. On average low scorers in the first period increased a bit in the second period, and high scorers decreased a bit. Something is going on, but nothing specific to the data in question; it is just probability working its magic.

[Read More]

R

simstudy update - stepped-wedge design treatment assignment

Posted on May 28, 2019

simstudy has just been updated (version 0.1.13 on CRAN), and includes one interesting addition (and a couple of bug fixes). I am working on a post (or two) about intra-cluster correlations (ICCs) and stepped-wedge study designs (which I’ve written about before), and I was getting tired of going through the convoluted process of generating data from a time-dependent treatment assignment process. So, I wrote a new function, trtStepWedge, that should simplify things.

[Read More]

R

Generating and modeling over-dispersed binomial data

Posted on May 14, 2019

A couple of weeks ago, I was inspired by a study to write about a classic design issue that arises in cluster randomized trials: should we focus on the number of clusters or the size of those clusters? This trial, which is concerned with preventing opioid use disorder for at-risk patients in primary care clinics, has also motivated this second post, which concerns another important issue - over-dispersion.

A count outcome

In this study, one of the primary outcomes is the number of days of opioid use over a six-month follow-up period (to be recorded monthly by patient-report and aggregated for the six-month measure). While one might get away with assuming that the outcome is continuous, it really is not; it is a count outcome, and the possible range is 0 to 180. There are two related questions here - what model will be used to analyze the data once the study is complete? And, how should we generate simulated data to estimate the power of the study?

[Read More]

R

What matters more in a cluster randomized trial: number or size?

Posted on April 30, 2019

I am involved with a trial of an intervention designed to prevent full-blown opioid use disorder for patients who may have an incipient opioid use problem. Given the nature of the intervention, it was clear the only feasible way to conduct this particular study is to randomize at the physician rather than the patient level.

There was a concern that the number of patients eligible for the study might be limited, so that each physician might only have a handful of patients able to participate, if that many. A question arose as to whether we can make up for this limitation by increasing the number of physicians who participate? That is, what is the trade-off between number of clusters and cluster size?

[Read More]

R

Musings on missing data

Posted on April 2, 2019

I’ve been meaning to share an analysis I recently did to estimate the strength of the relationship between a young child’s ability to recognize emotions in others (e.g. teachers and fellow students) and her longer term academic success. The study itself is quite interesting (hopefully it will be published sometime soon), but I really wanted to write about it here as it involved the challenging problem of missing data in the context of heterogeneous effects (different across sub-groups) and clustering (by schools).

[Read More]

R

A case where prospective matching may limit bias in a randomized trial

Posted on March 12, 2019

Analysis is important, but study design is paramount. I am involved with the Diabetes Research, Education, and Action for Minorities (DREAM) Initiative, which is, among other things, estimating the effect of a group-based therapy program on weight loss for patients who have been identified as pre-diabetic (which means they have elevated HbA1c levels). The original plan was to randomize patients at a clinic to treatment or control, and then follow up with those assigned to the treatment group to see if they wanted to participate. The primary outcome is going to be measured using medical records, so those randomized to control (which basically means nothing special happens to them) will not need to interact with the researchers in any way.

[Read More]

R

A example in causal inference designed to frustrate: an estimate pretty much guaranteed to be biased

Posted on February 26, 2019

I am putting together a brief lecture introducing causal inference for graduate students studying biostatistics. As part of this lecture, I thought it would be helpful to spend a little time describing directed acyclic graphs (DAGs), since they are an extremely helpful tool for communicating assumptions about the causal relationships underlying a researcher’s data.

The strength of DAGs is that they help us think how these underlying relationships in the data might lead to biases in causal effect estimation, and suggest ways to estimate causal effects that eliminate these biases. (For a real introduction to DAGs, you could take a look at this paper by Greenland, Pearl, and Robins or better yet take a look at Part I of this book on causal inference by Hernán and Robins.)

[Read More]

R