On an incredibly heartening note, two COVID-19 vaccines have been approved for use in the US and other countries around the world. More are possibly on the way. The big challenge, at least here in the United States, is to convince people that these vaccines are safe and effective; we need people to get vaccinated as soon as they are able to slow the spread of this disease. I for one will not hesitate for a moment to get a shot when I have the opportunity, though I don’t think biostatisticians are too high on the priority list.
[Read More]A Bayesian implementation of a latent threshold model
In the previous post, I described a latent threshold model that might be helpful if we want to dichotomize a continuous predictor but we don’t know the appropriate cut-off point. This was motivated by a need to identify a threshold of antibody levels present in convalescent plasma that is currently being tested as a therapy for hospitalized patients with COVID in a number of RCTs, including those that are particpating in the ongoing COMPILE meta-analysis.
[Read More]A latent threshold model to dichotomize a continuous predictor
This is the context. In the convalescent plasma pooled individual patient level meta-analysis we are conducting as part of the COMPILE study, there is great interest in understanding the impact of antibody levels on outcomes. (I’ve described various aspects of the analysis in previous posts, most recently here). In other words, not all convalescent plasma is equal.
If we had a clear measure of antibodies, we could model the relationship of these levels with the outcome of interest, such as health status as captured by the WHO 11-point scale or mortality, and call it a day. Unfortunately, at the moment, there is no single measure across the RCTs included in the meta-analysis (though that may change). Until now, the RCTs have used a range of measurement “platforms” (or technologies), which may measure different components of the convalescent plasma using different scales. Given these inconsistencies, it is challenging to build a straightforward model that simply estimates the relationship between antibody levels and clinical outcomes.
[Read More]Exploring the properties of a Bayesian model using high performance computing
An obvious downside to estimating Bayesian models is that it can take a considerable amount of time merely to fit a model. And if you need to estimate the same model repeatedly, that considerable amount becomes a prohibitive amount. In this post, which is part of a series (last one here) where I’ve been describing various aspects of the Bayesian analyses we plan to conduct for the COMPILE meta-analysis of convalescent plasma RCTs, I’ll present a somewhat elaborate model to illustrate how we have addressed these computing challenges to explore the properties of these models.
[Read More]A refined brute force method to inform simulation of ordinal response data
Francisco, a researcher from Spain, reached out to me with a challenge. He is interested in exploring various models that estimate correlation across multiple responses to survey questions. This is the context:
- He doesn’t have access to actual data, so to explore analytic methods he needs to simulate responses.
- It would be ideal if the simulated data reflect the properties of real-world responses, some of which can be gleaned from the literature.
- The studies he’s found report only means and standard deviations of the ordinal data, along with the correlation matrices, but not probability distributions of the responses.
- He’s considering
simstudyfor his simulations, but the functiongenOrdCatrequires a set of probabilities for each response measure; it doesn’t seem like simstudy will be helpful here.
Ultimately, we needed to figure out if we can we use the empirical means and standard deviations to derive probabilities that will yield those same means and standard deviations when the data are simulated. I thought about this for a bit, and came up with a bit of a work-around; the approach seems to work decently and doesn’t require any outrageous assumptions.
[Read More]simstudy just got a little more dynamic: version 0.2.1
simstudy version 0.2.1 has just been submitted to CRAN. Along with this release, the big news is that I’ve been joined by Jacob Wujciak-Jens as a co-author of the package. He initially reached out to me from Germany with some suggestions for improvements, we had a little back and forth, and now here we are. He has substantially reworked the underbelly of simstudy, making the package much easier to maintain, and positioning it for much easier extension. And he implemented an entire system of formalized tests using testthat and hedgehog; that was always my intention, but I never had the wherewithal to pull it off, and Jacob has done that. But, most importantly, it is much more fun to collaborate on this project than to toil away on my own.
Permuted block randomization using simstudy
Along with preparing power analyses and statistical analysis plans (SAPs), generating study randomization lists is something a practicing biostatistician is occasionally asked to do. While not a particularly interesting activity, it offers the opportunity to tackle a small programming challenge. The title is a little misleading because you should probably skip all this and just use the blockrand package if you want to generate randomization schemes; don’t try to reinvent the wheel. But, I can’t resist. Since I was recently asked to generate such a list, I’ve been wondering how hard it would be to accomplish this using simstudy. There are already built-in functions for simulating stratified randomization schemes, so maybe it could be a good solution. The key element that is missing from simstudy, of course, is the permuted block setup.
Generating probabilities for ordinal categorical data
Over the past couple of months, I’ve been describing various aspects of the simulations that we’ve been doing to get ready for a meta-analysis of convalescent plasma treatment for hospitalized patients with COVID-19, most recently here. As I continue to do that, I want to provide motivation and code for a small but important part of the data generating process, which involves creating probabilities for ordinal categorical outcomes using a Dirichlet distribution.
[Read More]Diagnosing and dealing with degenerate estimation in a Bayesian meta-analysis
The federal government recently granted emergency approval for the use of antibody rich blood plasma when treating hospitalized COVID-19 patients. This announcement is unfortunate, because we really don’t know if this promising treatment works. The best way to determine this, of course, is to conduct an experiment, though this approval makes this more challenging to do; with the general availability of convalescent plasma (CP), there may be resistance from patients and providers against participating in a randomized trial. The emergency approval sends the incorrect message that the treatment is definitively effective. Why would a patient take the risk of receiving a placebo when they have almost guaranteed access to the therapy?
[Read More]Generating data from a truncated distribution
A researcher reached out to me the other day to see if the simstudy package provides a quick and easy way to generate data from a truncated distribution. Other than the noZeroPoisson distribution option (which is a very specific truncated distribution), there is no way to do this directly. You can always generate data from the full distribution and toss out the observations that fall outside of the truncation range, but this is not exactly efficient, and in practice can get a little messy. I’ve actually had it in the back of my mind to add something like this to simstudy, but have hesitated because it might mean changing (or at least adding to) the defData table structure.