ouR data generation

Generating clustered data with marginal correlations

Posted on November 22, 2022

A student is working on a project to derive an analytic solution to the problem of sample size determination in the context of cluster randomized trials and repeated individual-level measurement (something I’ve thought a little bit about before). Though the goal is an analytic solution, we do want confirmation with simulation. So, I was a little disheartened to discover that the routines I’d developed in simstudy for this were not quite up to the task. I’ve had to quickly fix that, and the updates are available in the development version of simstudy, which can be downloaded using devtools::install_github(“kgoldfeld/simstudy”). While some of the changes are under the hood, I have added a new function, genBlockMat, which I’ll describe here.

[Read More]

R

Modeling the secular trend in a cluster randomized trial using very flexible models

Posted on November 1, 2022

A key challenge - maybe the key challenge - of a stepped wedge clinical trial design is the threat of confounding by time. This is a cross-over design where the unit of randomization is a group or cluster, where each cluster begins in the control state and transitions to the intervention. It is the transition point that is randomized. Since outcomes could be changing over time regardless of the intervention, it is important to model the time trends when conducting the efficacy analysis. The question is how we choose to model time, and I am going to suggest that we might want to use a very flexible model, such as a cubic spline or a generalized additive model (GAM).

[Read More]

R Cluster randomized trials

Presenting results for multinomial logistic regression: a marginal approach using propensity scores

Posted on September 20, 2022

Multinomial logistic regression modeling can provide an understanding of the factors influencing an unordered, categorical outcome. For example, if we are interested in identifying individual-level characteristics associated with political parties in the United States (Democratic, Republican, Libertarian, Green), a multinomial model would be a reasonable approach to for estimating the strength of the associations. In the case of a randomized trial or epidemiological study, we might be primarily interested in the effect of a specific intervention or exposure while controlling for other covariates. Unfortunately, interpreting results from a multinomial logistic model can be a bit of a challenge, particularly when there is a large number of possible responses and covariates.

[Read More]

R

Flexible simulation in simstudy with customized distribution functions

Posted on August 30, 2022

Really, the only problem with the simstudy package (😄) is that there is a hard limit to the possible probability distributions that are available (the current count is 15 - see here for a complete description). However, it turns out that there is more flexibility than first meets the eye, and we can easily accommodate a limitless number as long as you are willing to provide some extra code.

I am going to illustrate this with two examples, first by implementing a truncated normal distribution, and second by implementing the flexible non-linear data generating algorithm that I described last time.

[Read More]

R simulation

Simulating data from a non-linear function by specifying a handful of points

Posted on August 9, 2022

Trying to simulate data with non-linear relationships can be frustrating, since there is not always an obvious mathematical expression that will give you the shape you are looking for. I’ve come up with a relatively simple solution for somewhat complex scenarios that only requires the specification of a few points that lie on or near the desired curve. (Clearly, if the relationships are straightforward, such as relationships that can easily be represented by quadratic or cubic polynomials, there is no need to go through all this trouble.) The translation from the set of points to the desired function and finally to the simulated data is done by leveraging generalized additive modelling (GAM) methods, and is described here.

[Read More]

R simulation GAM

simstudy updated to version 0.5.0

Posted on July 20, 2022

A new version of simstudy is available on CRAN. There are two major enhancements and several new features. In the “major” category, I would include (1) changes to survival data generation that accommodate hazard ratios that can change over time, as well as competing risks, and (2) the addition of functions to allow users to sample from existing data sets with replacement to generate “synthetic” data will real life distribution properties. Other less monumental, but important, changes were made: updates to functions genFormula and genMarkov, and two added utility functions, survGetParams and survParamPlot. (I did describe the survival data generation functions in two recent posts, here and here.)

[Read More]

R simstudy simulation

To impute or not: the case of an RCT with baseline and follow-up measurements

Posted on April 12, 2022

Under normal conditions, conducting a randomized clinical trial is challenging. Throw in a pandemic and things like site selection, patient recruitment and patient follow-up can be particularly vexing. In any study, subjects need to be retained long enough so that outcomes can be measured; during a period when there are so many potential disruptions, this can become quite difficult. This issue of loss to follow-up recently came up during a conversation among a group of researchers who were troubleshooting challenges they are all experiencing in their ongoing trials. While everyone agreed that missing outcome data is a significant issue, there was less agreement on how to handle this analytically when estimating treatment effects.

[Read More]

R Missing data

Simulating time-to-event outcomes with non-proportional hazards

Posted on March 29, 2022

As I mentioned last time, I am working on an update of simstudy that will make generating survival/time-to-event data a bit more flexible. I previously presented the functionality related to competing risks, and this time I’ll describe generating survival data that has time-dependent hazard ratios. (As I mentioned last time, if you want to try this at home, you will need the development version of simstudy that you can install using devtools::install_github(“kgoldfeld/simstudy”).)

[Read More]

R survival analysis

Adding competing risks in survival data generation

Posted on March 15, 2022

I am working on an update of simstudy that will make generating survival/time-to-event data a bit more flexible. There are two biggish enhancements. The first facilitates generation of competing events, and the second allows for the possibility of generating survival data that has time-dependent hazard ratios. This post focuses on the first enhancement, and a follow up will provide examples of the second. (If you want to try this at home, you will need the development version of simstudy, which you can install using devtools::install_github(“kgoldfeld/simstudy”).)

[Read More]

R survival analysis

Follow-up: simstudy function for generating parameters for survival distribution

Posted on February 22, 2022

In the previous post I described how to determine the parameter values for generating a Weibull survival curve that reflects a desired distribution defined by two points along the curve. I went ahead and implemented these ideas in the development version of simstudy 0.4.0.9000, expanding the idea to allow for any number of points rather than just two. This post provides a brief overview of the approach, the code, and a simple example using the parameters to generate simulated data.

[Read More]

R survival analysis