Translating randomized controlled trials into policy action
“A randomized experiment is performed,a statistically significant comparison is found, and then story time begins, and continues and continues—as if the rigor from the randomized experiment somehow suffuses through the entire analysis.”
From a short paper by ANDREW GELMAN, who adds his analysis to the debate on the use of RCTs in policy development. Link.
The paper that Gelman is commenting on, by ANGUS DEATON and NANCY CARTWRIGHT, tackles misunderstandings and misuses of the form across disciplines:
“RCTs are both under- and over-sold. Oversold because extrapolating or generalizing RCT results requires a great deal of additional information that cannot come from RCTs; under-sold, because RCTs can serve many more purposes than predicting that results obtained in a trial population will hold elsewhere…
The gold standard or ‘truth’ view does harm when it undermines the obligation of science to reconcile RCTs results with other evidence in a process of cumulative understanding.”
- In a post from last year, Gelman provides further context for Deaton and Cartwright’s arguments: “The implication is not that observational studies are OK, but rather that real-world complexity… should be included in a policy analysis, even if a RCT is part of the story. Don’t expect the (real) virtues of a randomized trial to extend to the interpretation of the results.” Link. Gelman quotes extensively from Christopher Hennessy’s work. A paper of his from 2017: “Evidence from randomization is contaminated by ex post endogeneity if it is used to set policy endogenously in the future.” Link.
- Deaton, in a 2010 paper, critiqued overreliance on RCTs and quasi-random experiments in development: “I welcome recent trends in development experimentation away from the evaluation of projects and toward the evaluation of theoretical mechanisms.” Link. Also from 2010, Cartwright co-authored a paper on the limits of RCTs for predicting program effectiveness in health and social care. Link.
- A recent paper examines the path from proof-of-concept studies to scalable policy interventions, evaluating constraints and applications. Link.
- Similarly, at VoxDev, a case study from Bangladesh on scaling-up from RCT results. “How do we go from results from a tightly controlled experiment to an intervention that can actually improve the lives of millions of people?” Link.
- For more on RCTs and development, an upcoming lecture by Lant Pritchett at the Development Research Institute at NYU begins with the provocative title: “The Debate about RCTs in Development is over. We won. They lost.” Link to the event. ht Sidhya
BURN IT DOWN
Modeling the macroeconomic effects of abolishing student loan debt
“The reason why such policies of cancelling (or, less ambitiously, refinancing) student debt have been controversial in the past, at least among higher education scholars, is not their expense, nor their purported effect on the macroeconomy—whether positive or negative. Similar policy proposals have faced two main critiques: that they are inequitable and that student debt is not actually burdening the economy, because the education it buys increases earnings for those borrowers. Both of those critiques are much less true than they are commonly believed to be.”
The Roosevelt Institute’s MARSHALL STEINBAUM on a brand new paper he co-authored that proposes and models the abolition of student debt. The paper articulates and addresses significant dimensions of the student debt crisis, including its effects on the racial wealth gap:
“Student debt is intimately bound up with the route to financial stability for racial minorities.… The implication is that while higher education is commonly believed to be the route to economic and social mobility, especially by policy-makers, the racialized pattern of the student debt crisis demonstrates how structural barriers to opportunity stand in the way of individual efforts.… The racial wealth gap (the ratio of the median wealth of white households in that age range) is approximately 12:1 in 2016, whereas in the absence of student debt, that ratio is 5:1.”
And the income distribution of debtors:
“Student debt used to be a mark of the relatively-rich: something that was necessary only for well-paid professionals who spend a long time obtaining a lucrative graduate degree that likely pays off in the form of a high and stable salary over a lifetime. What the worsening labor market, credentialization rat race, and withdrawal of state support for public higher education has done is shift the distribution of people with a positive student loan balance toward the poor, or at least, the much-less-rich.”
- In a Twitter thread, Steinbaum situates the findings alongside his other research areas (which we’ve linked to previously), tying the student loan crisis to monopsony power and persistent myths about the labor market: “The expansion of the federal student loan program… did not solve the labor market’s problem, because the labor market’s problem was never a skills gap.” Link.
New research on correcting fake news
In 2010, Brendan Nyhan and Jason Reifler published an influential paper suggesting that news-story corrections are frequently ineffective:
“We conducted four experiments in which subjects read mock news articles that included either a misleading claim from a politician, or a misleading claim and a correction. Results indicate that corrections frequently fail to reduce misperceptions among the targeted ideological group. We also document several instances of a ‘backfire effect’ in which corrections actually increase misperceptions among the group in question.”
Link to the paper.
A short new study is at odds with the 2010 results:
“To be sure, there was some evidence of differential response to corrections by ideology. Furthermore, uncorrected subjects were credulous of the claims made by the fake stories. Yet, for no issue was a correction met with factual backfire (Nyhan and Reifler, 2010; Wood and Porter, nd). As with non-fake stories, corrections led to large gains in factually accurate beliefs across the ideological spectrum. While fake news may have had a significant impact on the 2016 election, upon seeing a correction, Americans are willing to disregard fanciful accounts and hew to the truth.”
Full paper by ETHAN PORTER, THOMAS J. WOOD, and DAVID KIRBY here.
- In Slate, Daniel Engber writes at length about fake news, echo chambers, and the way the media has perpetuated some faulty ideas. Link.
- Brendan Nyhan responds in a Twitter thread (and models gracious acknowledgment of replication issues): “Our finding may be an outlier… I now use larger samples and preregister all my experimental studies to try to improve the quality of my research and encourage others to do the same. We are learning a lot about what does and doesn’t work to counter misinformation. I’m encouraged that corrections seem to be effective under some conditions, but that raises another question: if that is true, why are misperceptions so durable even when high-profile corrective info is available? That’s the puzzle we need to grapple with as a field & a society.” Link.
Dr. Pande argues that we should not be concerned about the growing trend of replacing human decision-makers with “black box” systems based on artificial intelligence. While we often have only limited understanding of why these systems make particular decisions, he maintains, the same thing can be said of humans.
Pande’s point is both important and undeniable, but we have many reasons to remain concerned. Here are just a few:
(1) It’s true that we often have only limited access to the reasons why humans make the decisions they make. However, it clearly does not follow that we cannot (a) gain any real insight into how particular decisions were made or (b) criticize particular decisions as lacking an acceptable justification. For example, if a doctor makes a treatment decision that seems to violate accepted guidelines, we can ask that doctor to explain her reasoning and check that reasoning against the relevant guidelines and what we know about the information available to her at the time. We can also spot problematic patterns of decision-making and ask the decision-maker in question to provide an acceptable explanation for them. To the extent that we don’t yet know how to subject some AI-based systems to similar scrutiny, that is clearly a problem.
(2) In cases where we discover that human decision-makers are making decisions in an ethically problematic way, we first try to figure out if they are following professional/organizational guidelines correctly. If they are not, and following the guidelines more carefully would fix the problem, we can replace them with someone else or try to get them to do a better job. If they arefollowing existing guidelines correctly, or are failing to do so for reasons beyond their control, we try to replace the guidelines with ones that are more likely to yield acceptable results. (Changes in pain prescription protocols in order to reduce race- and gender-based biases are a good example of this.) Moreover, these interventions often work! If we have little insight into how an automated system is making decisions, then we may not be able to engage in this sort of auditing/corrective intervention process. That is a major cause for concern.
(3) When these systems are designed, they are typically optimized to achieve a very specific set of goals that simply does not include many other things we care about (e.g. disparate impact on members of vulnerable/marginalized social groups). It’s not hard to see why: for one thing, single-criteria optimization is hard enough; multi-criteria optimization is more difficult still. This is one way in which contemporary A.I.-based systems are more likely to get things wrong than well-intentioned and well-informed teams of human experts: at least the human teams will make efforts to ensure that other ethically important desiderata are factored into the decision-making process, insofar as time and resource constraints allow. If A.I.-based systems do not take into account morally important considerations that their human counterparts would, then we should expect them to be much less reliable at reaching ethically acceptable decisions.
- Developing metrics to assess sustainable investment: “Our consortium… is designing a next generation of traceable indicators to quantify external context and impact of investments and place these into a decision-making framework useful to investors.” Link.
- A new study suggests that the prospects for removing carbon from the atmosphere (via restoring the soil and forests; carbon capture and storage) are much more limited than how they’ve been used in models. “Putting a hypothetical technology into a computer model of future scenarios is rather different than researching, developing, constructing and operating such a technology at the planetary scale required to compensate for inadequate mitigation.” Link.
- How the APA oversold the effects of violent video games, in order to sell itself as the solution. Link.
- Making an algorithm to enable human-computer cooperation: “Since Alan Turing envisioned artificial intelligence, technical progress has often been measured by the ability to defeat humans in zero-sum encounters (e.g., Chess, Poker, or Go). Less attention has been given to scenarios in which human–machine cooperation is beneficial but non-trivial, such as scenarios in which human and machine preferences are neither fully aligned nor fully in conflict. Cooperation does not require sheer computational power, but instead is facilitated by intuition, cultural norms, emotions, signals, and pre-evolved dispositions.” Link.
- Stratechery with some speculations on last week’s Bezos-Buffett-Dimon health insurance announcement. Link.
- Farhad Manjoo in the NYTimes: “…the online ad machine is also a vast, opaque and dizzyingly complex contraption with underappreciated capacity for misuse — one that collects and constantly profiles data about our behavior, creates incentives to monetize our most private desires and frequently unleashes loopholes that the shadiest of people are only too happy to exploit.” Link. Tangentially related, a report on a new brand of social-media-driven online retailers. Link. ht reader Patrick S.
- A paper on the effects of the prices of scientific works: “we show that each 10 percent decline in price was associated with a 43 percent increase in citations. Lower prices increased citations by helping to distribute BRP books across US libraries, including less affluent institutions. Results are confirmed by two alternative measures of scientific output: new PhDs and US patents that use knowledge in BRP books.” Link.
- Related, on Sci-Hub and its creator, Alexandra Elbakyan: “To Open Access activists like Elbakyan and Suber, since most research is publicly funded, paywall journals have essentially made most science a twice-paid product, bought first by taxpayers and secondly by scientists.” Link. ht Ankit
Each week we highlight research from a graduate student, postdoc, or early-career professor. Send us recommendations: firstname.lastname@example.org