The most traditional approach for analyzing binary outcome data is logistic regression, where the estimated parameters are interpreted as log odds ratios or, if exponentiated, as odds ratios (ORs). No one other than statisticians (and maybe not even statisticians) finds the odds ratio to be a very intuitive statistic, and many feel that a risk difference or risk ratio/relative risks (RRs) are much more interpretable. Indeed, there seems to be a strong belief that readers will, more often than not, interpret odds ratios as risk ratios. This turns out to be reasonable when an event is rare. However, when the event is more prevalent, the odds ratio will diverge from the risk ratio. (Here is a paper that discusses some of these issues in greater depth, in case you came here looking for more.)
[Read More]Finding logistic models to generate data with desired risk ratio, risk difference and AUC profiles
About two years ago, someone inquired whether simstudy
had the functionality to generate data from a logistic model with a specific AUC. It did not, but now it does, thanks to a paper by Peter Austin that describes a nice algorithm to accomplish this. The paper actually describes a series of related algorithms for generating coefficients that target specific prevalence rates, risk ratios, and risk differences, in addition to the AUC. simstudy
has a new function logisticCoefs
that implements all of these. (The Austin paper also describes an additional algorithm focused on survival outcome data and hazard ratios, but that has not been implemented in simstudy
). This post describes the the new function and provides some simple examples.