Susan Athey is interviewd in JAMA:
How an Economist’s Application of Machine Learning to Target Nudges Applies to Precision Medicine by Roy Perlis and Virginia Hunt JAMA. Published online May 16, 2025. doi:10.1001/jama.2025.4497
"A recent study by economist Susan Athey, PhD, and her colleagues may shed light on how best to target treatments using machine learning. The investigation, published in the Journal of Econometrics, focused on the effectiveness of text and email reminders, or nudges, sent to students about renewing their federal financial aid. The researchers compared causal targeting, which was based on estimates of which treatments would produce the highest effects, and predictive targeting, which was based on both low and high predicted probability of financial aid renewal.
"In the end, the study found hybrid models of the 2 methods proved most effective. However, the result that may be most surprising to Athey was that targeting students at the highest risk of nonrenewal was actually less effective.
...
"Dr Athey:When I
first started working on this, I was like, “Oh, there’s going to be a
gold mine. I’m going to go back and reanalyze all of these experiments
that have already been run, and we’re going to be doing new scientific
discoveries every day.” It didn’t quite work out that way. We had some
big successes, but there has been a lot of lack of success.
What
are the cases where this doesn’t work? Machine learning is using the
data to learn about these treatment effects. You have to do a lot of
sample splitting. There’s always a cost to using the data to discover
the model. You can do it without sample splitting, but then you have to
adjust your P values. There’s no free lunch. If you have a very
small dataset, you probably know what the most important factors are.
You might be better off prespecifying those and just doing your subgroup
analysis. If [there are] hundreds of observations, it’s just unlikely.
These techniques are too data hungry to work.
Generally, you need
thousands of people in the experiment. Then more than that, the
statistical power needed to get treatment effect heterogeneity is large.
And even treatment effect heterogeneity is easier—trying to get
differential targeting is another thing. Imagine you have 3 drugs. It’s
hard enough to say that something works relative to nothing. If you’re
trying to say that one drug works better than another drug where both
work, that’s hard. Usually you need really large, expensive trials to do
that.
Then you add on top of that that I want to say, “This drug is better for these people, and this other drug is better for these other people.” You need 10 times
as much data as you would for the basic “is there a treatment effect at
all?” Now, of course, sometimes there’s a genetic thing: this drug
literally doesn’t work or it has this terrible side effect for some
people. That will pop out of the data.
For more subtle effects,
you do need larger studies. That’s really been the main impediment. And
as an economist, it’s like, why are all these things just barely
powered? Why are there so many clinical studies with a t-statistic of 2?
Of course, people did the power calculations, and they had some data
already when they planned the experiments. If you have more data, maybe
you add another treatment arm or something else. You don’t actually
overpower an experiment. In my own research, I’ve ended up running my
own experiments that are designed to get heterogeneity. I’ve also had a
lot of luck when there’s very big administrative datasets, and there’s a
really good natural experiment. Then you have lots of data. But former
clinical trials are selected to not be good because the researcher
themself didn’t overpower their own experiment. That’s why this isn’t so
useful.
But nonetheless, that’s not to say it’s not out there.
Like in any discovery, if it’s going to save lives and money, it’s worth
doing. It’s just that there’s not a whole bunch of low-hanging fruit.
There’s no dollars lying on the sidewalk."
No comments:
Post a Comment