Simulation -

An IV study design to estimate an effect size when randomization is not ethical

Posted on September 3, 2024

An investigator I frequently consult with seeks to estimate the effect of a palliative care treatment protocol for patients nearing end-stage disease, compared to a more standard, though potentially overly burdensome, therapeutic approach. Ideally, we would conduct a two-arm randomized clinical trial (RCT) to create comparable groups and obtain an unbiased estimate of the intervention effect. However, in this case, it may be considered unethical to randomize patients to a non-standard protocol.

[Read More]

R instrumental variable simulation causal inference

Generating binary data by specifying the relative risk, with simulations

Posted on July 2, 2024

The most traditional approach for analyzing binary outcome data is logistic regression, where the estimated parameters are interpreted as log odds ratios or, if exponentiated, as odds ratios (ORs). No one other than statisticians (and maybe not even statisticians) finds the odds ratio to be a very intuitive statistic, and many feel that a risk difference or risk ratio/relative risks (RRs) are much more interpretable. Indeed, there seems to be a strong belief that readers will, more often than not, interpret odds ratios as risk ratios. This turns out to be reasonable when an event is rare. However, when the event is more prevalent, the odds ratio will diverge from the risk ratio. (Here is a paper that discusses some of these issues in greater depth, in case you came here looking for more.)

[Read More]

R simulation logistic regression poisson regression simstudy

Flexible simulation in simstudy with customized distribution functions

Posted on August 30, 2022

Really, the only problem with the simstudy package (😄) is that there is a hard limit to the possible probability distributions that are available (the current count is 15 - see here for a complete description). However, it turns out that there is more flexibility than first meets the eye, and we can easily accommodate a limitless number as long as you are willing to provide some extra code.

I am going to illustrate this with two examples, first by implementing a truncated normal distribution, and second by implementing the flexible non-linear data generating algorithm that I described last time.

[Read More]

R simulation

Simulating data from a non-linear function by specifying a handful of points

Posted on August 9, 2022

Trying to simulate data with non-linear relationships can be frustrating, since there is not always an obvious mathematical expression that will give you the shape you are looking for. I’ve come up with a relatively simple solution for somewhat complex scenarios that only requires the specification of a few points that lie on or near the desired curve. (Clearly, if the relationships are straightforward, such as relationships that can easily be represented by quadratic or cubic polynomials, there is no need to go through all this trouble.) The translation from the set of points to the desired function and finally to the simulated data is done by leveraging generalized additive modelling (GAM) methods, and is described here.

[Read More]

R simulation GAM

simstudy updated to version 0.5.0

Posted on July 20, 2022

A new version of simstudy is available on CRAN. There are two major enhancements and several new features. In the “major” category, I would include (1) changes to survival data generation that accommodate hazard ratios that can change over time, as well as competing risks, and (2) the addition of functions to allow users to sample from existing data sets with replacement to generate “synthetic” data will real life distribution properties. Other less monumental, but important, changes were made: updates to functions genFormula and genMarkov, and two added utility functions, survGetParams and survParamPlot. (I did describe the survival data generation functions in two recent posts, here and here.)

[Read More]

R simstudy simulation

Simulating survival outcomes: setting the parameters for the desired distribution

Posted on February 8, 2022

The package simstudy has some functions that facilitate generating survival data using an underlying Weibull distribution. Originally, I added this to the package because I thought it would be interesting to try to do, and I figured it would be useful for me someday (and hopefully some others, as well). Well, now I am working on a project that involves evaluating at least two survival-type processes that are occurring simultaneously. To get a handle on the analytic models we might use, I’ve started to try to simulate a simplified version of the data that we have.

[Read More]

R simulation survival analysis