Thanks for this, Michael. It's really valuable to have someone carefully digging into these results. After reading Stevenson and Wolfers I'd sort of dismissed the paradox. This updated me against that view and has me more worried again.
I think I have more credence on the possibility that people's scales are shifting over time than you do. In particular, questions like the Cantril ladder asks people to think about a 10/10 as the "best possible life". But with growth, it's plausible to me that the best possible life is getting better over time. Perhaps people are interpreting that as best possible (attainable) life, rather than as the cosmically-absolute best possible life. And someone living the best possible (attainable) life in 2022 can go to space, travel the world, eat every kind of food, and access every possible entertaining movie and game ever made. None of these was possible in 1922, even for people living their best possible lives.
To account for this, people would have to be shifting their scales over time. Or, it is plausible to me that my 10/10 is different than my grandparents', and in an objective sense my 10/10 is better than my grandparents'.
I think it's possible there's too much promotion on the EA Forum these days. There are lots of posts announcing new organizations, hiring rounds, events, or opportunities. These are useful but not that informative, and they take up space on the frontpage. I'd rather see more posts about research, cause prioritization, critiques and redteams, and analysis. Perhaps promotional posts should be collected into a megathread, the way we do with hiring.
In general it feels like the signal-to-noise ration on the frontpage is lower now than it was a year ago, though I could be wrong. One metric might be number of comments - right now, 5/12 posts I see on the frontpage have 0 comments, and 11/12 have 10 comments or fewer.
One thought I had while reading this was just: you run slower during a marathon, but marathons are still really hard.
Maybe this comment conflates working more than average with giving "everything ... including their soul and weekends"?
It's tricky because different people perhaps need to hear different things here. I'd like to have a culture where it's possible for people to work normal hours in EA jobs. But I also know people who work more than average because they care deeply about their work and are ambitious, without seeming (to me at least) to be on the verge of crisis.
wars happen much more quickly now (I’m not sure why - maybe because planes are faster than walking?)
I think advances in strategy, automation, logistics, and transportation have a lot to do with this! And I do think there's a general lesson there - everything has been speeding up, so we should generally expect collapses today to happen faster than they happened in the past.
Nice work Ollie, this is very thought-provoking. It got me thinking a lot more about plausible reference classes for human extinction.
As I've mentioned to you, I think individual species extinctions are a better reference class than mass extinction events. It's a shame you couldn't find a good source that summarizes how quickly species declines tend to happen. Individual species must end faster than extinction events, since species collapses all occur within extinction events. And I strongly suspect if we had data on them, we'd see that species tend to go extinct much faster than extinction events. There's selection bias at work, but I can recall seeing graphs of, e.g., global whale, elephant, and rhinocerous populations that show precipitous declines following an exogenous catastrophe (usually the introduction of humans, or the invention of a new technology like whaling ships).
Your discussion of civilizational decline timelines, on the other hand, does seem directly relevant. It would be great to see a database that tracks the duration of civilizational declines categorized by cause (where possible), to see if we can find more specific reference classes based on different risks!
I'm not sure it's actually the case that interventions in temporary emergencies are "very likely" more cost-effective. Emergencies often lead to an influx of funds that local organizations struggle to absorb, and it's difficult to allocate funds efficiently. This GiveWell blog on the topic is somewhat dated, but I think the main points still stand.
I think this comment demonstrates the importance of quantifying probabilities. e.g. you write:
Could agriculture cope with projected warming? Possibly, maybe probably. Can it do so while supply chains, global power relations and financial systems are disrupted or in crisis? That's a much harder prospect.
I can imagine either kinda agreeing with this comment, or completely disagreeing, depending on how we're each defining "possibly", "probably", and "much harder".
For what it's worth, I also think it's probably that agriculture will cope with projected warming. In fact, I think it's extremely likely that, even conditional on geopolitical disruptions, the effects of technological change will swamp any negative effects of warming. To operationalize, I'd say something like: there's a 90% chance that global agricultural productivity will be higher in 50 years than it is today.
If you like the location you're currently in, it seems pretty worth it to try to hang out with other people in your current community first. Join a sports team or games club or something. If you're worried about incentives, then ask a friend for accountability. Say you'll pay them $20 if you don't actually go to the event and ask them to follow up on it.
I'm a bit worried you're underestimating how difficult it would be to move to an entirely different continent on your own. Life as an expat can be expensive and alienating.
I don't think a good name for this exists, and I don't think we need one. It's usually better to talk about the specific cause areas than to try and lump all of them together as not-longtermism.
As you mention, there are lots of different reasons one might choose not to identify as a longtermist, including both moral and practical considerations.
But more importantly, I just don't think that longtermist vs not-longtermist is sufficiently important to justify grouping all the other causes into one group.
Trying to find a word for all the clusters other than longtermism is like trying to find a word that describes all cats that aren't black, but isn't "not-black cats".
One way of thinking about these EA schools of thought is as clusters of causes in a multi-dimensional space. One of the dimensions along which these causes vary is longtermism vs. not-longtermism. But there are many other dimensions, including animal-focused vs. people-focused, high-certainty vs low-certainty, etc. Not-longtermist causes all vary along these dimensions, too. Finding a simple label for a category that includes animal welfare, poverty alleviation, metascience, YIMBYism, mental health, and community building is going to be weird and hard.
It's because there are so many other dimensions that we can end up with people working on AI safety and people working on chicken welfare in the same movement. I think that's cool. I really like that EA space has enough dimensions that a really diverse set of causes can all count as EA. Focusing so much on the longtermism vs. not-longtermism dimension under-emphasizes this.
I downvoted this post because it doesn't present any evidence to back up its claims. Frankly I also foudn the tone off-putting ("vultures"? really?) and the structure confusing.
I also think it underestimates the extent to which the following things are noticeable to grant evaluators. I reckon they'll usually be able to tell when applicants (1) don't really understand or care about x-risks, (2) don't really understand or care about EA, (3) are lying about what they'll spend the money on, or (4) have a theory of change that doesn't make sense. Of course grant applicants tailor their application to what they think the funder cares about. But it's hard to fake it, especially when questioned.
Also, something like the Atlas Fellowship is not "easy money". Applicants will be competing against extremely talented and impressive people from all over the world. I don't think the "bar" for getting funding for EA projects has fallen as much as this post, and some of the comments on this post, seem to assume.
I agree with this. I think there's multiple ways to generate predictions and couldn't cover everything in one post. So while here I used broad historical trends, I think that considerations specific to US-China, US-Russia, and China-India relations should also influence our predictions. I discuss a few of those considerations on pp. 59-62 of my full report for Founders Pledge and hope to at least get a post on US-China relations out within the next 2-3 months.
One quick hot take: I think Allison greatly overestimates the proportion of power transitions that end in conflict. It's not actually true that "incumbent hegemons rarely let others catch up to them without a fight" (emphasis mine). So, while I haven't run the numbers yet, I'll be somewhat surprised if my forecast of a US-China war ends up being higher than ~1 in 3 this century, and very surprised if it's >50%. (Metaculus has it at 15% by 2035).
Really enjoyed the way forecasts were integrated into the essay. Seems like a really useful approach!
I broadly agree that ending the trade war would be good. I'm not sure it's as easy to mitigate the political downsides as you suggest, though. I think it's quite unlikely that "these political costs could be reduced by communicating to the public the evidence showing tariffs are ineffective". Mostly because it's difficult to explain such a complicated issue on which people's intuitions point the other way. But also because it would be a political act and you'd have half the politicians in the country spreading the opposite message.
One longer-term scenario I'd have some credence in is: if Biden were to follow through on this action, I'd expect it to have a negative effect on his chances for re-election (maybe make it 1-5% less likely?), and any increase in the chance of Biden losing the next election could be worse for US-China relations than the gain from ending the trade war (something like 50% confidence).
I'm also not sure it's true that "other US-China issues are more complex and have less room for meaningful shifts". This seems to neglect the fact that the US and China have mostly managed to continue cooperating on climate change negotiations even though relations on the whole have remained frosty. I'd be a fan of trying to find other issues of common ground, even if they're less important than bilateral trade or territorial issues. For example, perhaps they could coordinate on space governance, clean tech investment, arms control, and maybe foreign aid?
I think cooperation on issues of lesser importance can be helpful as they allow countries the chance to show they can agree and uphold agreements, build trust, build personal ties between elites and diplomats, and reduce misunderstandings and misconceptions of the other side's intentions.
Thanks for writing this - I think it's accessible, informative, and interesting, which is difficult to pull off when writing about research methods!
I think it's telling that all the examples of the effectiveness of RCTs in this article come from clinical trials. However, you don't limit yourself to this domain in the headline or summary of the article (e.g. "How would we know about the effects of a new idea, treatment or policy?").
Our World in Data is often used by people (including myself) to gather development data. So I think it could be worth adding a caveat that many of the strengths you discuss in the article don't apply to RCTs conducted on social programs or policies. For example, it's difficult or impossible to have double-blinding or a placebo group; it's difficult to randomize effectively due to spillover effects; it's harder to get a large sample size when you're studying effects on villages or countries; and generalization is far more difficult (while a drug that works for a Brazilian is likely to work for an Indonesian, but a policy that works in Brazil is unlikely to have the same effect in Indonesia).
It proved hard to get this version published; the apparent subjectivity of the costs, the inclusion of economic methods in an epidemiology paper, and the specific choice of preference elicitation methods, etc, all exposed a large "attack surface" for reviewers. In the end, we just removed the cost-benefit analysis.
Clearly, internal documents of at least some governments will have estimated these costs. But in almost all cases these were not made public. Even then: as far as we know, only economic costs were counted in these private analyses; it is still rare to see estimates of the large direct disutility of lockdown.
I don't know exactly which papers you're referring to, but it's plausible to me that the cost-benefit analysis would be similarly valuable to the rest of the content in the paper. So it really sucks to just lose it.
Did you end up publishing those calculations elsewhere (e.g. as a blog post complement to the paper, or in a non-peer-reviewed verison of the article)? Do you have any thoughts on whether, when, and how we should try to help people escape the peer review game and just publish useful things outside of journals?
[Epistemic status: Writing off-the-cuff about issues I haven't thought about in a while - would welcome pushback and feedback]
Thanks for this post, I found it thought-provoking! I'm happy to see insightful global development content like this on the Forum.
My views after reading your post are:
You're probably right that it doesn't make sense for all studies to be benchmarking their intervention against cash transfers;
I still think there are good reasons for practitioners to think hard about whether their programs do more good than bduget-equivalent cash transfers would;
Your post raises issues that challenge the usefulness of RCTs in general, not just RCTs that compare interventions to cash transfers.
Why I like cash benchmarking
That’s the role that a cash arm plays: rather than just check if a program is better than doing nothing at all (comparing to a control), we index it against a simple intervention that we know works well: cash.
The reason I find a cash benchmark useful feels a bit different than this. IMO the purpose of cash benchmarking is to compare a program to a practical counterfactual: just giving the money to beneficiaries instead of funding a more complicated program. It feels intuitive to me that it's bad to fund a development program that ends up helping people less than just giving people the cash directly instead. So the key thing is not that 'we know cash works well' - it's that giving cash away instead is almost always a feasible alternative to whatever development program one is funding.
That still feels pretty compelling to me. I previously worked in development and was often annoyed, and sometimes furious, about the waste and bureaucratic bs we had to put up with to run simple interventions. Cash benchmarking to me is meant to test whether the beneficiaries would be better off if, instead of hiring another consultant or buying more equipment, we had just given them the money.
Problems with RCTs
I am most familiar with our own program but I expect this applies to many other international development programs too: your medicine/training/infrastructure/etc program will very likely deliver benefits over a different timeline to cash, making a direct RCT comparison dependent more on survey timing than intervention efficacy.
This is a really good point. In combination with the graph you posted, I'm not sure I've seen it laid out so clearly previously. But it seems like you've raised an issue with not just cash benchmarking, but with our ability to use RCTs to usefully measure program effects at all.
In your graph, you point out that the timing of your follow-up survey will affect your estimate of the gap between the effects of your intervention and the effects of a cash benchmark. But we'd have the same issue if we wanted to compare the effects of your interventions to all the other interventions we could possibly fund or deliver. And if we want to maximize impact, we should be considering all these different possibilities.
More worringly: what we really care about is not the gap between the effects at a given point in time. What we care about is the difference between the integrals of those curves. The difference in total impact (divided by program cost).
But, as you say, surveys are expensive and difficult. It's rare to even have one follow-up survey, much less a sufficient number of surveys to construct the shape of the benefits curve.
It seems to me people mostly muddle through and ignore that this is an issue. But the people who really care fill in the blanks with assumptions. GiveWell, for example, makes a lot of assumptions about the benefits-over-time of the interventions they compare. To their eternal credit you can see these in their public cost-effectiveness model. They make an assumption about how much of the transfer is invested; they make an assumption about how much that investment returns over time; they make an assumption about how many years that investment lasts; etc. etc. And they do similar things for the other interventions they consider.
All of this, though, is updating me further against RCTs really providing that much practical value for practitioners or funders. Estimating the true benefits of even the most highly-scrutinized interventions requires making a lot of assumptions. I'm a fan of doing this. I think we should accept the uncertainty we face and make decisions that seem good in expectation. But once we've accepted that, I start to question why we're messing around with RCTs at all.
Thanks for this, it's really helpful! I find it very plausible to me that "generalist forecasters are the most accurate source for predictions on ~any question" has become too much of a community shibboleth. This is a useful correction.
Given how widely the "forecasters are better than experts!" meme has spread, point 3a seems particularly important to me (emphasis mine):
A common misconception is that superforecasters outperformed intelligence analysts by 30% [...] The forecaster prediction market performed about as well as the intelligence analyst prediction market [...] [85% confidence]
I would have found a couple more discussion paragraphs helpful. As written, it's difficult for me to tell which studies you think are most influential in shaping the conclusions you lay out in the summary paragraph at the beginning of the post. The "Summary" section of the post isn't actually summarizing the rest of the post; instead, that's just where your discussion and conclusions are being presented.
I'm excited to potentially see more critical analysis of the forecasting literature! Plus ideas for new studies that can help identify the conditions under which forecasters are most accurate/helpful.
I'm (pleasantly) surprised by the number of entries! But as a result the Forum seems pretty far from optimal as a platform for this discussion. Would be helpful to have a way to filter by focus area, for example.
Comment by Stephen Clare on [deleted post]
I agree with your first point here. Looks like various nations have already committed military aid on the order of $2B to Ukraine, plus quite a lot of in-kind donations of military equipment. I'm very unsure about how elastic the supply of military equipment is at the current margin. Is it really the case that there are military supplies available that Ukraine would purchase but for lack of funds? That would surprise me.
It reminds me a bit of the early Covid days when everyone wanted to purchase PPE, but supply was bottlenecked, so donations increased prices and changed the distribution of who received the available supply.
I haven't looked into specific nuclear orgs so am pretty uncertain about this, but suspect there are probably good funding opportunities in this space.
To speculate on why no funders have stepped into the breach, though:
Macarthur could have good reason to change their priorities. Nuclear work may just be super intractable. Maybe we can still make much more progress on other issues.
Macarthur has funded 88 other organizations in nuclear issues in addition to NTI. EAs are aware of NTI because orgs like Open Phil have supported their bio work previously, but it would be good to look at the other orgs that Macarthur funded too to see who else is out there. With 89 orgs to choose from, it's plausible that NTI is not the best funding opp at the margin. But working out which funding opportunities would be most valuable at the margin is a lot fo work.
Macarthur represents about 45% of total funding in the space. That's a lot, but I'd expect the remaining 55% to be shifted around a bit and hopefully cover the most marginally-valuable opportunities
To respond to some of your specific points:
I'm unsure how relevant the "EA has a lot of money right now" point is. There's lots of stuff to fund, and saving can still be good because (1) we may learn a lot more about good stuff to fund in the coming years and decades and (2) the fields we're pretty sure are good to fund are still growing, and it might be worth saving our money so we can grant more to those fields in the future.
There's a war going on now, but I'm pretty sure there's nothing NTI can do to reduce nuclear risk right now. The question is whether we think total risk from nukes in the medium-to-long term has increased. Or these issues might become more tractable to work on as they're more salient now. This might make funding the work of NTI and similar orgs more attractive. But it's complicated.
Not sure I understand the point about "hiding it" - are you asking if there are plans to fund this stuff that funders just aren't discussing yet?
Again, I'm on the whole sympathetic to your view. I'm not sure how many EAs should be thinking about and funding nuclear/conflict issues, but the answer, IMO, is not 0. But I do also think there are good reasons not to rush into the space, and it's not obviously wrong that no one has stepped up to fund NTI.
To get a sense of the amount of funding we're talking about: members of the Peace and Security Funders Group, which I'm pretty sure accounts for a majority of the funders in the area (including MacArthur), grant about $70M-$80M per year for nuclear issues. Macarthur has given a total of $124M in this area since 2014. So, their estimate that Macarthur represents 40-50% of the total funding in the area seems too high.
Im a bit disappointed my question here wasn't answered. It would have been good to have a sense of what we could look at funding if someone wanted to cover some of the Marcarthur shortfall, without investing ~$40M per year into a space in which we don't have deep expertise.
This sounds interesting, though I feel slightly confused. I can see why socialism would be a useful thing to know about, but not why it's so much more interesting and useful than, e.g., neoliberalism. I'd also be pretty interested to hear more about how it relates to EA's historical and cultural influences. I guess you're right that I don't even know what the right questions to ask about this are.
If this work is as important as you say here then it seems like a lot of value is being left on the table. Seems like it would be really helpful if you could write out a few bullet points of what needs to be done to get to that stage and how others might be able to help, then reach out to EA Funds or someone else with a proposal.
Good points, thanks! I agree the wording in the main post there could be more careful. In deemphasizing the size of the effect there, I was reacting to claims along the lines of "US-China conflict is unlikely because their economic interdependence makes it too costly". I still think that that's not a particularly strong consideration for reasons discussed in the main post. But you're probably right that I'm probably responding to a strawman, and that serious takes are more nuanced than that.
Fair enough! I think something Braumoeller is trying to get at with his definition of intensity is like: if I were a citizen of one of the nations involved in a war, how likely is it that I would be killed? If you end up dividing by year, then you're measuring how likely is it that I would be killed per year of warfare. But what I would really care about is the total risk over the duration of the war.
Ah, great catch. It's the third-bloodiest war in the time period Braumoeller considers, i.e. 1816-2007. That's super different, so thanks! I've edited the main text.
On intensity - Braumoeller thinks dividing by year can actually mask the intensity of bloody, prolonged conflicts (pp. 39-41 of Only The Dead). For example, there were fewer battle deaths per year in the Vietnam War than in the Korean War, but the Vietnam War was much bloodier overall (~50% more battle deaths):
By any rational accounting, Vietnam was the more intense war. But the more modest annual death totals in Vietnam produce the illusion of a downward trend in battle deaths. That’s because, relative to the Korean War, the Vietnam War produced a much steadier death toll, and it produced it over a longer period. Korea looks incredibly deadly, and Vietnam seems less so, solely because the Korean War was short and intense while the war in Vietnam was long and drawn out
Thanks, this is really helpful. I think a hidden assumption in my head was that the hingey time is put on hold while civilization recovers, but now I see that that's pretty questionable.
I also share your feeling that, for fuzzy reasons, a world with 'lesser catastrophes' is significantly worse in the longterm than a world without them. I'm still trying to bring those reasons into focus, though, and think this could be a really interesting direction for future research.
Thanks, this is a great comment! I'm going to edit the main post to reflect some of this.
Does (1) a second catastrophe and (2) failure for civilization to recover exhaust the possibilities for "indirect paths"? I've thought about this less than the other points in my main post, but I think I disagree that these are as worrying as the direct path. I think it's possible they're on the same magnitude, but less likely in expectation, than the direct pathways from war to existential risk via extinction.
First, catastrophes in general are just very unlikely, and I think the 'period of vulnerability' following a war would probably be surprisingly short (on the order of 100 years rather than thousands). Post-WWII recovery in Europe took place over the course of a few years. The US funded some of this recover via the Marshall Plan, but the investment wasn't that big (probably <5% of national income). There's also a paper that found that, just 27 years after the Vietnam War, no difference in economic development between areas that were heavily bombed by the US and areas that weren't.
A war 10-30 times more severe than WWII would obviously take longer to recover from, but I still think we're talking about decades or centuries rather than millenia for civilization to stabilize somewhere (albeit at a much diminished population).
Second, I find it hard to think of specific reasons why we would expect long-term civilizational stagnation. I think a catastrophic war could wipe out most of the world population, but still leave several million people alive. New Zealand alone has 5M people, for example. Humanity has previously survived much smaller population bottlenecks. Conditional on there being survivors, it also seems likely to me that they survive in at least several different places (various islands and isolated parts of the world, for example). That gives us multiple chances for some population to get it together and restart economic growth, population growth, and scientific advancement.
I'd be interested to hear more about why you think the "less direct paths should be seen as more worrying than the fairly direct paths".
"The Marshall Plan's accounting reflects that aid accounted for about 3% of the combined national income of the recipient countries between 1948 and 1951" (from Wikipedia; I haven't chased down the original source, so caveat emptor)
"U.S. bombing does not have a robust negative impact on poverty rates, consumption levels, infrastructure, literacy or population density through 2002. This finding suggests that local recovery from war damage can be rapid under certain conditions, although further work is needed to establish the generality of the finding in other settings." (Miguel & Roland, abstract, https://eml.berkeley.edu/~groland/pubs/vietnam-bombs_19oct05.pdf)
This does seem useful. At least one similar survey does exist for other fields: the TRIP survey for international relations scholars. I've found this somewhat useful for my research, though often the questions in IR seem less specific than the questions asked of economists.
I think there's also a coordination problem here. A lot of people care a little bit about this, but it's hardly anyone's top priority, so there have been basically no serious, committed, focused campaigns to actually create and promote specific policies.
I think it's because you're making strong claims without presenting any supporting evidence. I don't know what reading lists you're referring to; I have doubts about not asking questions being an 'unspoken condition' about getting access to funding; and I have no idea what you're conspiratorially alluding to regarding 'quasi-censorship' and 'emotional blackmail'.
In the discussion section of your EAG talk, you and Carl Robichaud talked briefly about the implications of the Macarthur Foundation phasing out its Nuclear Challenges portfolio, likely leaving many of the organizations working in this space with a large funding gap. If Macarthur planned to reduce its nuclear grantmaking by 90% instead of ending it completely, what high-priority interventions would you recommend they continue to fund?
In your talk at EAG you said that you think the risk of nuclear war today is "high and rising". You also estimate the annual probability of a catastrophic nuclear event is about 0.5%. I wanted to first say kudos for quantifying your beliefs in this way. It's so helpful for communicating clearly about these risks. I have two related questions:
(1) Could you please say more about the main considerations, metrics, and/or data you use to inform this estimate?
(2) How quickly do you think the risk is rising? I'm curious whether you think the annual risk is likely to increase by some tenths of a percentage point, or by factors of 2 or more.
I agree that a good number of people around EA trend towards sadness (or maybe "pits of despair"). It's plausible to me that the proportion of the community in this group is somewhat higher than average, but I'm not sure about that. If that is the case, though, then my guess is that some selection effects, rampant Imposter Syndrome, and the weight of always thinking about ways the world is messed up are more important causes than social norms.
I have to say, I actually chuckled when I read "don’t ever indulge in Epicurean style" listed as an iron-clad EA norm. That, uhh, doesn't match my experience.
I'm interested in reading critiques of StrongMinds' research, but downvoted this comment because I didn't find it very helpful or constructive. Would you mind saying a bit more about why you think their standards are low, and the evidence that led you to believe they are "making up" numbers?