Posts

On Deference and Yudkowsky's AI Risk Estimates 2022-06-19T14:35:40.169Z
We should expect to worry more about speculative risks 2022-05-29T21:08:56.612Z
Is Democracy a Fad? 2021-03-13T12:40:35.298Z
Ben Garfinkel's Shortform 2020-09-03T15:55:56.188Z
Does Economic History Point Toward a Singularity? 2020-09-02T12:48:57.328Z
AMA or discuss my 80K podcast episode: Ben Garfinkel, FHI researcher 2020-07-13T16:17:45.913Z
Ben Garfinkel: How sure are we about this AI stuff? 2019-02-09T19:17:31.671Z

Comments

Comment by Ben Garfinkel (bmg) on On Deference and Yudkowsky's AI Risk Estimates · 2022-06-21T22:41:17.841Z · EA · GW

I really appreciate the time people have taken to engage with this post (and actually hope the attention cost hasn’t been too significant). I decided to write some post-discussion reflections on what I think this post got right and wrong.

The reflections became unreasonably long - and almost certainly should be edited down - but I’m posting them here in a hopefully skim-friendly format. They cover what I see as some mistakes with the post, first, and then cover some views I stand by.

Things I would do differently in a second version of the post:

1. I would either drop the overall claim about how much people should defer to Yudkowsky — or defend it more explicitly

At the start of the post, I highlight the two obvious reasons to give Yudkowsky's risk estimates a lot of weight: (a) he's probably thought more about the topic than anyone else and (b) he developed many of the initial AI risk arguments. I acknowledge that many people, justifiably, treat these as important factors when (explicitly or implicitly) deciding how much to defer to Yudkowsky.

Then the post gives some evidence that, at each stage of his career, Yudkowsky has made a dramatic, seemingly overconfident prediction about technological timelines and risks - and at least hasn’t obviously internalised lessons from these apparent mistakes.

The post expresses my view that these two considerations at least counterbalance each other - so that, overall, Yudkowsky's risk estimates shouldn't be given more weight than (e.g.) those of other established alignment researchers or the typical person on the OpenPhil worldview investigation team.

But I don't do a lot in the post to actually explore how we should weigh these factors up. In that sense: I think it’d be fair to regard the post’s central thesis as importantly under-supported by the arguments contained in the post.

I should have either done more to explicitly defend my view or simply framed the post as "some evidence about the reliability of Yudkowsky's risk estimates."

2. I would be clearer about how and why I generated these examples

In hindsight, this is a significant oversight on my part. The process by which I generated these examples is definitely relevant for judging how representative they are - and, therefore, how much to update on them. But I don’t say anything about this in the post. My motives (or at least conscious motives) are also part of the story that I only discuss in pretty high-level terms, but seem like they might be relevant for forming judgments.

For context, then, here was the process:

A few years ago, I tried to get a clearer sense of the intellectual history of the AI risk and existential risk communities. For that reason, I read a bunch of old white papers, blog posts, and mailing list discussions.

These gave me the impression that Yudkowsky’s track record (and - to some extent - the track record of the surrounding community) was worse than I’d realised. From reading old material, I basically formed something like this impression: “At each stage of Yudkowsky’s professional life, his work seems to have been guided by some dramatic and confident belief about technological trajectories and risks. The older beliefs have turned out to be wrong. And the ones that haven’t yet resolved at least seem to have been pretty overconfident in hindsight.”

I kept encountering the idea that Yudkowsky has an exceptionally good track record or that he has an unparalleled ability to think well about AI (he’s also expressed view himself) - and I kept thinking, basically, that this seemed wrong. I wrote up some initial notes on this discrepancy at some point, but didn’t do anything with them.

I eventually decided to write something public after the “Death with Dignity” post, since the view it expresses (that we’re all virtually certain to die soon) both seems wrong to me and very damaging if it’s actually widely adopted in the community. I also felt like the “Death with Dignity” post was getting more play than it should, simply because people have a strong tendency to give Yudkowsky’s views weight. I can’t imagine a similar post written by someone else having nearly as large of an impact. Notably, since that post didn’t really have substantial arguments in it (although the later one did), I think the fact it had an impact is seemingly a testament to the power of deference; I think it’d be hard to look at the reaction to that post and argue that it’s only Yudkowsky’s arguments (rather than his public beliefs in-and-of-themselves) that have a major impact on the community.

People are obviously pretty aware of Yudkowsky’s positive contributions, but my impression is that (especially) new community members tended not to be aware of negative aspects of his track record. So I wanted to write a post drawing attention to the negative aspects.

I was initially going to have the piece explicitly express the impression I’d formed, which was something like: “At each stage of Yudkowsky’s professional life, his work has been guided by some dramatic and seemingly overconfident belief about technological trajectories and risks.” The examples in the post were meant to map onto the main ‘animating predictions’ about technology he had at each stage of his career. I picked out the examples that immediately came to mind.

Then I realised I wasn’t at all sure I could defend the claim that these were his main ‘animating predictions’ - the category was obviously extremely vague, and the main examples that came to mind were extremely plausibly a biased sample. I thought there was a good chance that if I reflected more, then I’d also want to include various examples that were more positive.

I didn’t want to spend the time doing a thorough accounting exercise, though, so I decided to drop any claim that the examples were representative and just describe them as “cherry-picked” — and add in lots of caveats emphasising that they’re cherry-picked.

(At least, these were my conscious thought processes and motivations as I remember them. I’m sure other factors played a role!)

3. I’d tweak my discussion of take-off speeds

I’d make it clearer that my main claim is: it would have been unreasonable to assign a very high credence to fast take-offs back in (e.g.) the early- or mid-2000s, since the arguments for fast take-offs had significant gaps. For example, there were a lots of possible countervailing arguments for slow take-offs that pro-fast-take-off authors simply hadn’t address yet — as evidenced, partly, by the later publication of slow-take-off arguments leading a number of people to become significantly more sympathetic to slow take-offs. (I’m not claiming that there’s currently a consensus against fast-take-off views.)

4. I’d add further caveats to the “coherence arguments” case - or simply leave it out

Rohin’s and Oli’s comments under the post have made me aware that there’s a more positive way to interpret Yudkowsky’s use of coherence arguments. I’m not sure if that interpretation is correct, or if it would actually totally undermine the example, but this is at minimum something I hadn’t reflected on. I think it’s totally possible that further reflection would lead me to simply remove the example.

Positions I stand by:

On the flipside, here’s a set of points I still stand by:

1. If a lot of people in the community believe AI is probably going to kill everyone soon, then (if they’re wrong) this can have really important negative effects

In terms of prioritisation: My prediction is that if you were to ask different funders, career advisors, and people making career decisions (e.g. deciding whether to go into AI policy or bio policy) how much they value having a good estimate of AI risk, they’ll very often answer that they value it a great deal. I do think that over-estimating the level of risk could lead to concretely worse decisions.

In terms of community health: I think that believing you’re probably going to die soon is probably bad for a large portion of people. Reputationally: Being perceived as believing that everyone is probably going to die soon (particularly if this actually an excessive level of worry) also seems damaging.

I think we should also take seriously the tail-risk that at least one person with doomy views (even if they’re not directly connected to the existential risk community) will take dramatic and badly harmful actions on the basis of their views.

2. Directly and indirectly, deference to Yudkowsky has a significant influence on a lot of people’s views

As above: One piece of evidence for this is Yudkowsky’s “Death with Dignity” post triggered a big reaction, even though it didn’t contain any significant new arguments. I think his beliefs (above and beyond his arguments) clearly do have an impact.

Another reason to believe deference is a factor: I think it’s both natural and rational for people, particularly people new to an area, to defer to people with more expertise in that area.[1] Yudkowsky is one of the most obvious people to defer to, as one of the two people most responsible for developing and popularising AI risk arguments and as someone who has (likely) spent more time thinking about the subject than anyone else.

Beyond that: A lot of people also clearly in general have huge amount of respect for Yudkowsky, sometimes more than they have for any other public intellectual. I think it’s natural (and sensible) for people’s views to be influenced by the views of the people they respect. In general, I think, unless you have tremendous self-control, this will tend to happen sub-consciously even if you don’t consciously choose to defer to the people you respect.

Also, people sometimes just do talk about Yudkowsky’s track record or reputation as a contributing factor to their views.

3. The track records of influential intellectuals (including Yudkowsky) should be publicly discussed.

A person’s track-record provides evidence about how reliable their predictions are. If people are considering how much to defer to some intellectual, then they should want to know what their track record (at least within the relevant domain) looks like.

The main questions that matter are: What has the intellectual gotten wrong and right? Beyond whether they were wrong or right, about a given case, does it also seem like their predictions were justified? If they’ve made certain kinds of mistakes in the past, do we now have reason to think they won’t repeat those kinds of mistakes?

4. Yudkowsky’s track record suggests a substantial bias toward dramatic and overconfident predictions.

One counter - which I definitely think it’s worth reflecting on - is that it might be possible to generate a similarly bias-suggesting list of examples like this for any other public intellectual or member of the existential risk community.

I’ll focus on one specific comment, suggesting that Yudkowsky’s incorrect predictions about nanotechnology are in the same reference class as ‘writing a typically dumb high school essay.’ The counter goes something like this: Yes, it was possible to find this example from Yudkowsky’s past - but that’s not importantly different than being able to turn up anyone else’s dumb high school essay about (e.g.) nuclear power.

Ultimately, I don’t buy the comparison. I think it’s really out-of-distribution for someone in their late teens and early twenties to pro-actively form the view that an emerging technology is likely to kill everyone within a decade, found an organization and devote years of their professional life to address the risk, and talk about how they’re the only person alive who can stop it.

That just seems very different from writing a dumb high school essay. Much more than a standard dumb high school essay, I think this aspect of Yudkowsky’s track record really does suggest a bias toward dramatic and overconfident predictions. This prediction is also really strikingly analogous to the prediction Yudkowsky is making right now - its relevance is clearly higher than the relevance of (e.g.) a random poorly thought-out view in a high school essay.

(Yudkowsky's early writing and work is also impressive, in certain ways, insofar as it suggests a much higher level of originality of thought and agency than the typical young person has. But the fact that this example is impressive doesn’t undercut, I think, the claim that it’s also highly suggestive of a bias toward highly confident and dramatic predictions.)

5. Being one of the first people to identify, develop, or take seriously some idea doesn’t necessarily mean that you predictions about the idea will be unusually reliable

By analogy:

  • I don’t think we can assume that the first person to take the covid lab leak theory seriously (when others were dismissive) is currently the most reliable predictor of whether the theory is true.

  • I don’t think we can assume that the first person to develop the many worlds theory of quantum mechanics (when others were dismissive) would currently be the best person to predict whether the theory is true, if they were still alive.

There are, certainly, reasons to give pioneers in a domain special weight when weighing expert opinion in that domain.[2] But these reasons aren’t absolute.

There are even easons that point in the opposite direction: we might worry that the pioneer has an attachment to their theory, so will be biased toward believing it is true and as important as possible. We might also worry that the pioneering-ness of their beliefs is evidence that these beliefs front-ran the evidence and arguments (since one way to be early is to simply be excessively confident). We also have less evidence of their open-mindedness than we do for the people who later on moved toward the pioneer’s views — since moving toward the pioneer’s views, when you were initially dismissive, is at least a bit of evidence for open-mindedness and humility.[3]

Overall, I do think we should tend defer more to pioneers (all else being equal). But this tendency can definitely be overruled by other evidence and considerations.

6. The causal effects that people have had on the world don’t (in themselves) have implications for how much we should defer to them

At least in expectation, so far, Eliezer Yudkowsky has probably had a very positive impact on the world. There is a plausible case to be made that misaligned AI poses a substantial existential risk - and Yudkowsky’s work has probably, on net, massively increased the number of people thinking about it and taking it seriously. He’s also written essays that have exposed huge numbers of people to other important ideas and helped them to think more clearly. It makes sense for people to applaud all of this.

Still, I don’t think his positive causal effect on the world gives people much additional reason to be deferential to him.

Here’s a dumb thought experiment: Suppose that Yudkowsky wrote all of the same things, but never published them. But suppose, also, that a freak magnetic storm ended up implanting all of the same ideas in his would-be-readers’ brains. Would this absence of a casual effect count against deferring to Yudkowsky? I don’t think so. The only thing that ultimately matters, I think, is his track record of beliefs - and the evidence we currently have about how accurate or justified those beliefs were.

I’m not sure anyone disagrees with the above point, but I did notice there seemed to be a decent amount of discussion in the comments about Yudkowsky’s impact - and I’m not sure I think this issue will ultimately be relevant.[4]


  1. For example: I had ten hours to form a view about the viability of some application of nanotechnology, I definitely wouldn’t want to ignore the beliefs of people who have already thought about the question. Trying to learn the relevant chemistry and engineering background wouldn’t be a good use of my time. ↩︎

  2. One really basic reason is simply that they’ve simply had more time to think about certain subjects than anyone else. ↩︎

  3. Here’s a concrete case: Holden Karnofsky eventually moved toward taking AI risks seriously, after publicly being fairly dismissive of it, and then wrote up a document analysing why he was initially dismissive and drawing lessons from the experience. It seems like we could count that as positive evidence about his future judgment. ↩︎

  4. Even though I’ve just said I’m not sure this question is relevant, I do also want to say a little bit about Yudkowsky’s impact. I personally think's probably had a very significant impact. Nonetheless, I also think the impact can be overstated. For example, I think, it’s been suggested that the effective altruism community might not be very familiar with concepts like Bayesian or the importance of overcoming bias if it weren’t for Yudkowsky’s writing. I don’t really find that particular suggestion plausible.

    Here’s one data point I can offer from my own life: Through a mixture of college classes and other reading, I’m pretty confident I had already encountered the heuristics and biases literature, Bayes’ theorem, Bayesian epistemology, the ethos of working to overcome bias, arguments for the many worlds interpretation, the expected utility framework, population ethics, and a number of other ‘rationalist-associated’ ideas before I engaged with the effective altruism or rationalist communities. For example, my college had classes in probability theory, Bayesian epistemology, and the philosophy of quantum mechanics, and I’d read at least parts of books like Thinking Fast and Slow, the Signal and the Noise, the Logic of Science, and various books associated with the “skeptic community.” (Admittedly, I think it would have been harder to learn some of these things if I’d gone to college a bit earlier or had a different major. I also probably "got lucky" in various ways with the classes I took and books I picked up.) See also Carl Shulman making a similar point and John Halstead also briefly commenting the way in which he personally encountered some the relevant ideas. ↩︎

Comment by Ben Garfinkel (bmg) on On Deference and Yudkowsky's AI Risk Estimates · 2022-06-20T11:06:30.592Z · EA · GW

A general reflection: I wonder if one at least minor contributing factor to disagreement, around whether this post is worthwhile, is different understandings about who the relevant audience is.

I mostly have in mind people who have read and engaged a little bit with AI risk debates, but not yet in a very deep way, and would overall be disinclined to form strong independent views on the basis of (e.g.) simply reading Yudkowsky's and Christiano's most recent posts. I think the info I've included in this post could be pretty relevant to these people, since in practice they're often going to rely a lot -- consciously or unconsciously; directly or indirectly -- on cues about how much weight to give different prominent figures' views. I also think that the majority of members of the existential risk community are in this reference class.

I think the info in this post isn't nearly as relevant to people who've consumed and reflected on the relevant debates very deeply. The more you've engaged with and reflected on an issue, the less you should be inclined to defer -- and therefore the less relevant track records become.

(The limited target audience might be something I don't do a good enough job communicating in the post.)

Comment by Ben Garfinkel (bmg) on On Deference and Yudkowsky's AI Risk Estimates · 2022-06-20T08:26:56.743Z · EA · GW

The part of this post which seems most wild to me is the leap from "mixed track record" to

In particular, I think, they shouldn’t defer to him more than they would defer to anyone else who seems smart and has spent a reasonable amount of time thinking about AI risk.

For any reasonable interpretation of this sentence, it's transparently false. Yudkowsky has proven to be one of the best few thinkers in the world on a very difficult topic. Insofar as there are others who you couldn't write a similar "mixed track record" post about, it's almost entirely because they don't have a track record of making any big claims, in large part because they weren't able to generate the relevant early insights themselves. Breaking ground in novel domains is very, very different from forecasting the weather or events next year; a mixed track record is the price of entry.

I disagree that the sentence is false for the interpretation I have in mind.

I think it's really important to seperate out the question "Is Yudkowsky an unusually innovative thinker?" and the question "Is Yudkowsky someone whose credences you should give an unusual amount of weight to?"

I read your comment as arguing for the former, which I don't disagree with. But that doesn't mean that people should currently weigh his risk estimates more highly than they weigh the estimates of other researchers currently in the space (like you).

I also think that there's a good case to be made that Yudkowsky tends to be overconfident, and this should be taken into account when deferring; but when it comes to making big-picture forecasts, the main value of deference is in helping us decide which ideas and arguments to take seriously, rather than the specific credences we should place on them, since the space of ideas is so large.

But we do also need to try to have well-calibrated credences, of course. For the reason given in the post, it's important to know whether the risk of everyone dying soon is 5% or 99%. It's not enough just to determine whether we should take AI risk seriously.

We're also now past the point, as a community, where "Should AI risk be taken seriously?" is that much of a live question. The main epistemic question that matters is what probability we assign to it - and I think this post is relevant to that.

(More generally, rather than reading this post, I recommend people read this one by Paul Christiano, which outlines specific agreements and disagreements.)

I definitely recommend people read the post Paul just wrote! I think it's overall more useful than this one.

But I don't think there's an either-or here. People - particularly non-experts in a domain - do and should form their views through a mixture of engaging with arguments and deferring to others. So both arguments and track records should be discussed.

The EA community has ended up strongly moving in Yudkowsky's direction over the last decade, and that seems like much more compelling evidence than anything listed in this post.

I discuss this in response to another comment, here, but I'm not convinced of that point.

Comment by Ben Garfinkel (bmg) on On Deference and Yudkowsky's AI Risk Estimates · 2022-06-19T22:56:42.247Z · EA · GW

If someone visibly learns from forecasting mistakes they make, that should clearly update us positively on them not repeating the same mistakes.

I suppose one of my main questions is whether he has visibly learned from the mistakes, in this case.

For example, I wasn't able to find a post or comment to the effect of "When I was younger, I spent of years of my life motivated by the belief that near-term extinction from nanotech was looming. I turned out to be wrong. Here's what I learned from that experience and how I've applied it to my forecasts of near-term existential risk from AI." Or a post or comment acknowledging his previous over-optimistic AI timelines and what he learned from them, when formulating his current seemingly short AI timelines.

(I genuinely could be missing these, since he has so much public writing.)

Comment by Ben Garfinkel (bmg) on On Deference and Yudkowsky's AI Risk Estimates · 2022-06-19T20:47:39.260Z · EA · GW

While he's not single-handedly responsible, he lead the movement to take AI risk seriously at a time when approximately no one was talking about it, which has now attracted the interests of top academics. This isn't a complete track record, but it's still a very important data-point.

I definitely do agree with that!

It's possible I should have emphasized the significance of it more in the post, rather than moving on after just a quick mention at the top.

If it's of interest: I say a little more about how I think about this, in response to Gwern's comment below. (To avoid thread-duplicating, people might want to respond there rather than here if they have follow-on thoughts on this point.) My further comment is:

This is certainly a positive aspect of his track-record - that many people have now moved closer to his views. (It also suggests that his writing was, in expectation, a major positive contribution to the project of existential risk reduction - insofar as this writing has helped move people up and we assume this was the right direction to move.) But it doesn't imply that we should give him many more "Bayes points" to him than we give to the people who moved.

Suppose, for example, that someone says in 2020 that there was a 50% chance of full-scale nuclear war in the next five years. Then - due to Russia's invasion of Ukraine - most people move their credences upward (although they still remained closer to 0% than 50%). Does that imply the person giving the early warning was better-calibrated than the people who moved their estimates up? I don't think so. And I think - in this nuclear case - some analysis can be used to justify the view that the person giving the early warning was probably overconfident; they probably didn't have enough evidence or good enough arguments to actually justify a 50% credence.

It may still be the case that the person giving the early warning (in the hypothetical nuclear case) had some valuable and neglected insights, missed by others, that are well worth paying attention to and seriously reflecting on; but that's a different matter from believing they were overall well-calibrated or should be deferred to much more than the people who moved.

[[EDIT: Something else it might be worth emphasizing, here, is that I'm not arguing for the view "ignore Eliezer." It's closer to "don't give Eliezer's views outsized weight, compared to (e.g.) the views of the next dozen people you might be inclined to defer to, and factor in evidence that his risk estimates might have a significant upward bias to them."]]

Comment by Ben Garfinkel (bmg) on On Deference and Yudkowsky's AI Risk Estimates · 2022-06-19T19:54:29.833Z · EA · GW

What?

I interpreted Gwern as mostly highlighting that people have updated toward's Yudkowsky's views - and using this as evidence in favor of the view we should defer a decent amount to Yudkowsky. I think that was a reasonable move.

There is also a causal question here ('Has Yudkowsky on-net increased levels of concern about AI risk relative to where they would otherwise be?'), but I didn't take the causal question to be central to the point Gwern was making. Although now I'm less sure.

I don't personally have strong views on the causal question - I haven't thought through the counterfactual.

Comment by Ben Garfinkel (bmg) on On Deference and Yudkowsky's AI Risk Estimates · 2022-06-19T19:19:32.752Z · EA · GW

On 1 (the nanotech case):

I want to remind any reader that this is an opinion from 1999, when Eliezer was barely 20 years old.

I think your comment might give the misimpression that I don't discuss this fact in the post or explain why I include the case. What I write is:

I should, once again, emphasize that Yudkowsky was around twenty when he did the final updates on this essay. In that sense, it might be unfair to bring this very old example up.

Nonetheless, I do think this case can be treated as informative, since: the belief was so analogous to his current belief about AI (a high outlier credence in near-term doom from an emerging technology), since he had thought a lot about the subject and was already highly engaged in the relevant intellectual community, since it's not clear when he dropped the belief, and since twenty isn't (in my view) actually all that young. I do know a lot of people in their early twenties; I think their current work and styles of thought are likely to be predictive of their work and styles of thought in the future, even though I do of course expect the quality to go up over time....

An addition reason why I think it's worth distinguishing between his views on nanotech and (e.g.) your views on nuclear power: I think there's a difference between an off-hand view picked up from other people vs. a fairly idiosyncratic view that you consciously adopted after a lot of reflection and that you decide to devote your professional life to and found an organization to address.

It's definitely up to the reader to decide how relevant the nanotech case is. Since it's not widely known, it seems at least pretty plausibly relevant, and the post twice flags his age at the time, I do still endorse including it.

At face value, as well: we're trying to assess how much weight to give to someone's extreme, outlier-ish prediction that an emerging technology is almost certain to kill everyone very soon. It just does seem very relevant, to me, that they previously had a different extreme outlier-ish prediction that another emerging technology was very likely kill everyone within a decade.

I don't find it plausible that we should assign basically no significance to this.

On 6 (the question of whether Yudkowsky has acknowledged negative aspects of his track record):

For the two "clear cut" examples, Eliezer has posted dozens of times on the internet that he has disendorsed his views from before 2002. This is present on his personal website, the relevant articles are no longer prominently linked anywhere, and Eliezer has openly and straightforwardly acknowledged that his predictions and beliefs from the relevant period were wrong.

Similarly, I think your comment may give the impression that I don't discuss this point in the post. What I write is this:

He has written about mistakes from early on in his intellectual life (particularly pre-2003) and has, on this basis, even made a blanket-statement disavowing his pre-2003 work. However, based on my memory and a quick re-read/re-skim, this writing is an exploration of why it took him a long time to become extremely concerned about existential risks from misaligned AI. For instance, the main issue it discusses with his plans to build AGI are that these plans didn't take into account the difficulty and importance of ensuring alignment. This writing isn't, I think, an exploration or acknowledgement of the kinds of mistakes I've listed in this post.

On the general point that this post uses old examples:

Give the sorts of predictions involved (forecasts about pathways to transformative technologies), old examples are generally going to be more unambiguous than new examples. Similarly for risk arguments: it's hard to have a sense of how new arguments are going to hold up. It's only for older arguments that we can start to approach the ability to say that technological progress, progress in arguments, and evolving community opinion say something clear-ish about how strong the arguments were.

On signposting:

I also dislike calling this post "On Deference and Yudkowsky's AI Risk Estimates", as if this post was trying to be an unbiased analysis of how much to defer to Eliezer, while you just list negative examples. I think this post is better named "against Yudkowsky on AI Risk estimates". Or "against Yudkowsky's track record in AI Risk Estimates". Which would have made it clear that you are selectively giving evidence for one side, and more clearly signposted that if someone was trying to evaluate Eliezer's track record, this post will only be a highly incomplete starting point.

I think it's possible another title would have been better (I chose a purposely bland one partly for the purpose of trying to reduce heat - and that might have been a mistake). But I do think I signpost what the post is doing fairly clearly.

The introduction says it's focusing on "negative aspects" of Yudkowsky's track record, the section heading for the section introducing the examples describes them as "cherry-picked," and the start of the section introducing the examples has an italicized paragraph re-emphasizing that the examples are selective and commenting on the significance of this selectiveness.

On the role of the fast take-off assumption in classic arguments:

I think the arguments are pretty tight and sufficient to establish the basic risk argument. I found your critique relatively uncompelling. In particular, I think you are misrepresenting that a premise of the original arguments was a fast takeoff.

I disagree with this. I do think it's fair to say that fast take-off was typically a premise of the classic arguments.

Two examples I have off-hand (since they're in the slides from my talk) are from Yudkowsky's exchange with Caplan and from Superintelligence. Superintelligence isn't by Yudkowsky, of course, but hopefully is still meaningful to include (insofar as Superintelligence heavily drew on Yudkowsky's work and was often accepted as a kind of distillation of the best arguments as they existed at the time).

From Yudkowsky's debate with Caplan (2016):

“I’d ask which of the following statements Bryan Caplan [a critic of AI risk arguments] denies:

  1. Orthogonality thesis: Intelligence can be directed toward any compact goal….

  2. Instrumental convergence: An AI doesn’t need to specifically hate you to hurt you; a paperclip maximizer doesn’t hate you but you’re made out of atoms that it can use to make paperclips, so leaving you alive represents an opportunity cost and a number of foregone paperclips….

  3. Rapid capability gain and large capability differences: Under scenarios seeming more plausible than not, there’s the possibility of AIs gaining in capability very rapidly, achieving large absolute differences of capability, or some mixture of the two….

  4. 1-3 in combination imply that Unfriendly AI is a critical problem-to-be-solved, because AGI is not automatically nice, by default does things we regard as harmful, and will have avenues leading up to great intelligence and power.”

(Caveat that the fast-take-off premise is stated a bit ambiguity here, so it's not clear what level of rapidness is being assumed.)

From Superintelligence:

Taken together, these three points [decisive strategic advantage, orthogonality, and instrumental convergence] thus indicate that the first superintelligence may shape the future of Earth-originating life, could easily have non-anthropomorphic final goals, and would likely have instrumental reasons to pursue open-ended resource acquisition. If we now reflect that human beings consist of useful resources (such as conveniently located atoms) and that we depend for our survival and flourishing on many more local resources, we can see that the outcome could easily be one in which humanity quickly becomes extinct.

The decisive strategic advantage point is justified through a discussion of the possibility of a fast take-off. The first chapter of the book also starts by introducing the possibility of an intelligence explosion. It then devotes two chapters to the possibility of a fast take-off and the idea this might imply a decisive strategic advantage, before it gets to discussing things like the orthogonality thesis.

I think it's also relevant that content from MIRI and people associated with MIRI, raising the possibility of extinction from AI, tended to very strongly emphasize (e.g. spend most of its time on) the possibility of a run-away intelligence explosion. The most developed classic pieces arguing for AI risk often have names like "Shaping the Intelligence Explosion," "Intelligence Explosion: Evidence and import," "Intelligence Explosion Microeconomics," and "Facing the Intelligence Explosion."

Overall, then, I do think it's fair to consider a fast-takeoff to be a core premise of the classic arguments. It wasn't incidental or a secondary consideration.

[[Note: I've edited my comment, here, to respond to additional points. Although there are still some I haven't responded to yet.]]

Comment by Ben Garfinkel (bmg) on On Deference and Yudkowsky's AI Risk Estimates · 2022-06-19T18:56:39.346Z · EA · GW

No, it's just as I said, and your Karnofsky retrospective strongly supports what I said.

I also agree that Karnfosky's retrospective supports Gwern's analysis, rather than doing the opposite.

(I just disagree about how strongly it counts in favor of deference to Yudkowsky. For example, I don't think this case implies we should currently defer more to Yudkwosky's risk estimates than we do to Karnofsky's.)

Comment by Ben Garfinkel (bmg) on On Deference and Yudkowsky's AI Risk Estimates · 2022-06-19T18:22:12.656Z · EA · GW

Thanks for the comment! A lot of this is useful.

calling LOGI and related articles 'wrong' because that's not how DL looks right now is itself wrong. Yudkowsky has never said that DL or evolutionary approaches couldn't work, or that all future AI work would look like the Bayesian program and logical approach he favored;

I mainly have the impression that LOGI and related articles were probably "wrong" because, so far as I've seen, nothing significant has been built on top of them in the intervening decade-and-half (even though LOGI's successor was seemingly predicted to make it possible for a small group to build AGI). It doesn't seem like there's any sign that these articles were the start of a promising path to AGI that was simply slower than the deep learning path.

I have had the impression, though, that Yudkowsky also thought that logical/Bayesian approaches were in general more powerful/likely-to-enable-near-term-AGI (not just less safe) than DL. It's totally possible this is a misimpression - and I'd be inclined to trust your impression over mine, since you've read more of his old writing than I have. (I'd also be interested if you happen to have any links handy.) But I'm not sure this significantly undermine the relevance of the LOGI case.

I continue to be amazed anyone can look at the past decade of DL and think that Hanson is strongly vindicated by it, rather than Yudkowsky-esque views.

I also think that, in various ways, Hanson also doesn't come off great. For example, he expresses a favorable attitude toward the CYC project, which now looks like a clear dead end. He is also overly bullish about the importance of having lots of different modules. So I mostly don't want to defend the view "Hanson had a great performance in the FOOM debate."

I do think, though, his abstract view that compute and content (i.e. data) are centrally important are closer to mark than Yudkowsky's expressed view. I think it does seem hard to defend Yudkowsky's view that it's possible for a programming team (with mid-2000s levels of compute) to acquire some "deep new insights," go down into their basement, and then create an AI system that springboards itself into taking over the world. At least - I think it's fair to say - the arguments weren't strong enough to justify a lot of confidence in that view.

Yet, the number who take it seriously since Eliezer started advocating it is now far greater than it was when he started and was approximately the only person anywhere. You aren't taking seriously that these surveyed researchers ("AI Impacts, CHAI, CLR, CSER, CSET, FHI, FLI, GCRI, MILA, MIRI, Open Philanthropy and PAI") wouldn't exist without Eliezer as he created the AI safety field as we know it, with everyone else downstream (like Bostrom's influential Superintelligence - Eliezer with the serial numbers filed off and an Oxford logo added).

This is certainly a positive aspect of his track-record - that many people have now moved closer to his views. (It also suggests that his writing was, in expectation, a major positive contribution to the project of existential risk reduction - insofar as this writing has helped move people up and we assume this was the right direction to move.) But it doesn't imply that we should give him many more "Bayes points" to him than we give to the people who moved.

Suppose, for example, that someone says in 2020 that there was a 50% chance of full-scale nuclear war in the next five years. Then - due to Russia's invasion of Ukraine - most people move their credences upward (although they still remained closer to 0% than 50%). Does that imply the person giving the early warning was better-calibrated than the people who moved their estimates up? I don't think so. And I think - in this nuclear case - some analysis can be used to justify the view that the person giving the early warning was probably overconfident; they probably didn't have enough evidence or good enough arguments to actually justify a 50% credence.

It may still be the case that the person giving the early warning (in the hypothetical nuclear case) had some valuable and neglected insights, missed by others, that are well worth paying attention to and seriously reflecting on; but that's a different matter from believing they were overall well-calibrated or should be deferred to much more than the people who moved.

[[EDIT: Something else it might be worth emphasizing, here, is that I'm not arguing for the view "ignore Eliezer." It's closer to "don't give Eliezer's views outsized weight, compared to (e.g.) the views of the next dozen people you might be inclined to defer to, and factor in evidence that his risk estimates might have a sigificant upward bias to them."]]

Comment by Ben Garfinkel (bmg) on On Deference and Yudkowsky's AI Risk Estimates · 2022-06-19T16:37:25.844Z · EA · GW

I do not want an epistemic culture that finds it acceptable to challenge an individuals overall credibility in lieu of directly engaging with their arguments.

I think I roughly agree with you on this point, although I would guess I have at least a somewhat weaker version of your view. If discourse about people's track records or reliability starts taking up (e.g.) more than a fifth of the space that object-level argument does, within the most engaged core of people, then I do think that will tend to suggest an unhealthy or at least not-very-intellectually-productive community.

One caveat: For less engaged people, I do actually think it can make sense to spend most of your time thinking about questions around deference. If I'm only going to spend ten hours thinking about nanotechnology risk, for example, then I might actually want to spend most of this time trying to get a sense of what different people believe and how much weight I should give their views; I'm probably not going to be able to make a ton of headway getting a good gears-level-understanding of the relevant issues, particularly as someone without a chemistry or engineering background.

Comment by Ben Garfinkel (bmg) on On Deference and Yudkowsky's AI Risk Estimates · 2022-06-19T16:15:30.470Z · EA · GW

I prefer to just analyse and refute his concrete arguments on the object level.

I agree that work analyzing specific arguments is, overall, more useful than work analyzing individual people's track records. Personally, partly for that reason, I've actually done a decent amount of public argument analysis (e.g. here, here, and most recently here) but never written a post like this before.

Still, I think, people do in practice tend to engage in epistemic deference. (I think that even people who don't consciously practice epistemic deference tend to be influenced by the views of people they respect.) I also think that people should practice some level of epistemic deference, particularly if they're new to an area. So - in that sense - I think this kind of track record analysis is still worth doing, even if it's overall less useful than argument analysis.

Comment by Ben Garfinkel (bmg) on We should expect to worry more about speculative risks · 2022-06-04T20:46:02.657Z · EA · GW

However, if there's no correlation between the payoff of an arm and our ability to know it, then we should eventually find an arm that pays off 100% of the time with high probability, pull that arm, and stop worrying about the unknowable one. So I'm not sure your story explains why we end up fixating on the uncertain interventions (AIS research).

The story does require there to be only a very limited number of arms that we initially think have a non-negligible chance of paying. If there are unlimited arms, then one of them should be both paying and easily identifiable.

So the story (in the case of existential risks) is that there are only a very small number of risks that, on the basis of limited argument/evidence, initially seem like they might lead to extinction or irrecoverable collapse by default. Maybe this set looks like: nuclear war, misaligned AI, pandemics, nanotechnology, climate change, overpopulation / resource depletion.

If we're only talking about a very limited set, like this, then it's not too surprising that we'd end up most worried about an ambiguous risk.

Comment by Ben Garfinkel (bmg) on Ben Garfinkel's Shortform · 2022-06-04T20:19:33.397Z · EA · GW

A follow-on:

The above post focused on the idea that certain traits -- reflectiveness and self-skepticism -- are more valuable in the context of non-profits (especially ones long-term missions) than they are in the context of startups.

I also think that certain traits -- drivenness, risk-tolerance, and eccentricity-- are less valuable in the context of non-profits than they are in the context of startups.

Hiring advice from the startup world often suggests that you should be looking for extraordinarily driven, risk-tolerant people with highly idiosyncratic perspectives on the world.[1] And, in the context of for-profit startups, it makes sense that these traits would be crucial.

A startup's success will often depend on its ability to outcompete large, entrenched firms in some industry (e.g. taxi companies, hotels, tech giants). To do that, an extremely high level of drivenness may be necessary to compensate for lower resource levels, lower levels of expertise, and weaker connections to gatekeepers. Or you may need to be willing to take certain risks (e.g. regulatory/PR/enemy-making risks) that would slow down existing companies in pursuing certain opportunities. Or you may need to simply see an opportunity that virtually no one else would (despite huge incentives to see it), because you have an idiosyncratic way of seeing the world. Having all three of these traits (extreme drivenness, risk tolerance, idiosyncrasy) may be necessary for you to have any plausible chance of success.

I think that all of these traits are still valuable in the non-profit world, but I also think they're comparatively less valuable (especially if you're lucky enough to have secure funding). There's simply less direct competition in the non-profit world. Large, entrenched non-profits also have much weaker incentives to find and exploit impact opportunities. Furthermore, the non-profit world isn't even that big to begin with. So there's no reason to assume all the low-hanging fruit have been plucked or to assume that large non-profits will crush you by default.[2]

For example: To accomplish something that (e.g.) the Gates Foundation hasn't already accomplished, I think you don't need to have extraordinary drivenness, risk-tolerance, or idiosyncrasy. [3]


Addendum that occurred to me while writing this follow-up: I also think that (at least given secure funding) these traits are less crucial in the non-profit world than they are in academia. Academic competition is more intense than non-profit competition and academics have stronger incentives to find new, true ideas than non-profits have to find and exploit opportunities to do good.


  1. This seems to be roughly the perspective taken by the book Talent, for example. ↩︎

  2. In fact -- unlike in the for-profit start-up world -- you should actually consider it a good outcome if a large non-profit starts copying your idea, implements it better than you, and makes your own organization redundant! ↩︎

  3. To be clear: These things -- especially drivenness -- are all important. But, unlike in the startup world, major success doesn't necessarily require setting them to extreme values. I think we should be wary of laser-focusing on these traits in the way a VC would. ↩︎

Comment by Ben Garfinkel (bmg) on We should expect to worry more about speculative risks · 2022-06-04T09:59:30.017Z · EA · GW

The bandit problem is definitely related, although I'm not sure it's the best way to formulate the situation here. The main issue is that the bandit formulation, here, treats learning about the magnitude of a risk and working to address the risk as the same action - when, in practice, they often come apart.

Here's a toy model/analogy that feels a bit more like it fits the case, in my mind.

Let's say there are two types of slot machines: one that has a 0% chance of paying and one that has a 100% chance of paying. Your prior gives you a 90% credence that each machine is non-paying.[1]

Unfortunately: When you pull the lever on either machine, you don't actually get to see what the payout is. However, there's some research you can do to try to get a clearer sense of what each machine's "type" is.

And this research is more tractable in the case of the first machine. For example: Maybe the first machine has identifying information on it, like a model number, which might allow you to (e.g.) call up the manufacturer and ask them. The second machine is just totally nondescript.

The most likely outcome, then, is that you quickly find out that the first slot machine is almost certainly non-paying -- but continue to have around a 10% credence that the second machine pays.

In this scenario, you should keep pulling the lever on the second machine. You should also, even as a rational Bayesian, actually be more optimistic about the second machine.

(By analogy, I think we actually should tend to fear speculative existential risks more.]


  1. A more sophisticated version of this scenario would have a continuum of slot machine types and a skewed prior over the likelihood of different types arising. ↩︎

Comment by Ben Garfinkel (bmg) on Ben Garfinkel's Shortform · 2022-06-04T09:22:58.507Z · EA · GW

Couldn't the exact same arguments be made to argue that there would not be successful internet companies, because the fundamental tech is hard to patent, and any website is easy to duplicate?

Definitely!

(I say above that the dynamic applies to "most software," but should have said something broader to make it clear that it also applies to any company whose product - basically - is information that it's close to costless to reproduce/generate. The book Information Rules is really good on this.)

Sometimes the above conditions hold well enough for people to be able to keep charging for software or access to websites. For example, LinkedIn can charge employers to access its specialized search tools, etc., due to network effects.

What otherwise often ends up happening is something is offered for free, with ads -- because there's some quality difference between products, which is too small for people to be willing to pay to use the better product but large enough for people to be willing to look at sufficiently non-annoying ads to use the better product. (E.g. Google vs. the next-best search engine, for most people.) Sometimes that can still lead to a lot of revenue, other times less so.

Other times companies just stop very seriously trying to directly make money in a certain domain (e.g. online encyclopaedias). Sometimes - as you say - that leads competition to shift to some nearby and complementary domain, where it is more possible to make money.

As initial speculation: It seems decently likely to me (~60%?) that it will be hard for companies making large language/image-generation models to charge significant prices to most of their users. In that scenario, it's presumably still possible to make money through ads or otherwise by collecting user information.

It'd be interesting, though, if that revenue wasn't very high -- then most of the competition might happen around complementary products/services. I'm not totally clear on what these would be, though.

Comment by Ben Garfinkel (bmg) on Ben Garfinkel's Shortform · 2022-06-03T18:21:38.951Z · EA · GW

It’ll be interesting to see how well companies will be able to monetise large, multi-purpose language and image-generation models.

Companies and investors are spending increasingly huge amounts of money on ML research talent and compute, typically with the hope that investments in this area lead to extremely profitable products. But - even if the resulting products are very useful and transformative - it still seems like it's still a bit of an open question how profitable they’ll be.

Some analysis:[1]

1.

Although huge state-of-the-art models are increasingly costly to create, the marginal cost of generating images and text using these models will tend to be low. Since competition tends to push the price of a service down close to the marginal cost of providing the service, it’ll be hard for any company to charge a lot for the use of their models.

As a result: It could be hard -- or simply take a long time -- for companies to recoup sufficiently large R&D costs, even a lot of people end up using their models.

2.

Of course, famously, this dynamic applies to most software. But some software services (e.g. Microsoft Office) still manage to charge users fees that are much higher than the cost of running the service.

Some things that can support important, persistent quality differences are:[2]

(a) patents or very-hard-to-learn-or-rediscover trade secrets that prevent competitors from copying valuable features;

(b) network effects that make the service more valuable the more other people are using it (and therefore create serious challenges for new entrants);

(c) steep learning curves or strong dependencies that, for reasons that go beyond network effects, make it very costly for existing users to switch to new software;

(d) customer bases that are small (which limit the value of trying to enter the area) or hard to cater to without lots of specialist knowledge and strong relationships (which raise the cost/difficulty of entering the area);

(e) other extreme difficulties involved in making the software;

(f) customer bases that are highly sensitive to very small differences in quality

3.

It’s not totally clear to what extent any of these conditions will apply, at least, to large language and image-generation models.

Patents in this space currently don’t seem like a huge deal; none of the key things that matter for making large models are patented. (Although maybe that will start to change?). Trade secrets also don’t seem like a huge deal; lots of companies are currently producing pretty similar models using pretty similar methods.

It’s also not clear to me that network effects or steep learning curves are a big deal: for example, if you want to generate illustrations for online articles, it doesn’t currently seem like it’d be too problematic or costly to switch from using DALL-E to using Imagen. It’s also not clear that it matters very much how many other people are using one or the other to generate their images. If you want to generate an illustration for an article, then I think it probably doesn’t really matter what other the authors of other articles tend to use. It’s also not clear to me that there will tend to be a lot of downstream dependencies that you need to take into account when switching from one model to another. (Although, again, maybe this will all change a lot over time?) At least big general models will tend to have large customer bases, I think, and their development/marketing/customer-support will not tend to heavily leverage specialist knowledge or relationships.

These models also don’t seem super, super hard to make — it seems like, for any given quality threshold, we should be expect multiple companies to be able to reach that quality threshold within a year of each other. To some extent, it seems, a wealthy tech company can throw money at the problem (compute and salaries for engineering talent) if it wants to create a model that's close to as good as the best model available. At least beyond a certain performance level, I’m also not sure that most customers will care a ton about very slight differences in quality (although this might be the point I’m most unsure about).

4.

If none of the above conditions end up holding, to a sufficiently large extent, then I suppose the standard thing is you offer the service for free, try to make it at least slightly better than competitors’ services, and make money by showing ads (up to the point where it’s actively annoying to your user) and otherwise using customer data in various ways.

That can definitely work. (E.g. Google’s annual ad revenue is more than $100 billion.) But it also seem like a lot of idiosyncratic factors can influence a company’s ability to extract ad revenue from a service.

This also seems like, in some ways, a funny outcome. When people think about transformative AI, I don't think they're normally imagining it being attached to a giant advertising machine.

5.

One weird possible world is a world where the most important AI software is actually very hard to monetize. Although I'd still overall bet against this scenario[3], I think it's probably worth analyzing.

Here, I think, are some dynamics that could emerge in this world:

(a) AI progress is a bit slower than it would otherwise be, since - after a certain point - companies realise that the financial returns on AI R&D are lower than they hoped. The rapid R&D growth in these areas eventually levels off, even though higher R&D levels could support substantially faster AI progress.[4]

(b) Un-monetized (e.g. academia-associated) models are pretty commonly used, at least as foundation models, since companies don’t have strong incentives to invest in offering superior monetized models.

(c) Governments become bigger players in driving AI progress forward, since companies are investing less in AI R&D than governments would ideally want (from the standpoint of growth, national power and prestige, or scientific progress for its own sake). Governments might step up their own research funding - or take various actions to limit downward pressure on prices.


  1. I'm not widely read here, or an economist, so it's possible these points are all already appreciated within the community of people thinking about inter-lab competition to create large models is going to play out. Alternatively, the points might just be wrong. ↩︎

  2. This list here is mostly inspired by my dim memory of the discussion of software pricing in the book Information Rules. ↩︎

  3. Companies do seem to have an impressive track of monetizing seemingly hard-to-monetize things. ↩︎

  4. Maybe competition also shifts a bit toward goods/services that complement large many-purpose models, whatever these might be, or toward fine-tuned/specialized models that target more niche customer-bases or that are otherwise subject to less-intense downward pressure on pricing. ↩︎

Comment by Ben Garfinkel (bmg) on We should expect to worry more about speculative risks · 2022-05-30T08:42:31.610Z · EA · GW

This is a helpful comment - I'll see if I can reframe some points to make them clearer.

Human psychology is flawed in such a way that we consistently estimate the probability of existential risk from each cause to be ~10% by default.

I'm actually not assuming human psychology is flawed. The post is meant to be talking about how a rational person (or, at least, a boundedly rational person) should update their views.

On the probabilities: I suppose I'm implicitly evoking both a subjective notion of probability ("What's a reasonable credence to assign to X happening?" or "If you were betting on X, what betting odds should you be willing to accept?") and a more objective notion ("How strong is the propensity for X to happen?" or "How likely is X actually?" or "If you replayed the tape a billion times, with slight tweaks to the initial conditions, how often would X happen?").[1] What it means for something to pose a "major risk," in the language I'm using, is for the objective probability of doom to be high.

For example, let's take existential risks from overpopulation. In the 60s and 70s, a lot of serious people were worried about near-term existential risks from overpopulation and environmental depletion. In hindsight, we can see that overpopulation actually wasn't a major risk. However, this wouldn't have been clear to someone first encountering the idea and noticing how many experts took it seriously. I think it might have been reasonable for someone first hearing about The Population Bomb to assign something on the order of a 10% credence to overpopulation being a major risk.

I think, for a small number of other proposed existential risks, we're in a similar epistemic position. We don't yet know enough to say whether it's actually a major risk, but we've heard enough to justify a significant credence in the hypothesis that it is one.[2]

why is there a 90% chance that more information leads to less worry? Is this assuming that for 90% of risks, they have P(Doom) < 10%, and for the other 10% of risks P(Doom) ≥ 10%?

If you assign a 10% credence to something not being a major risk, then you should assign a roughly 90% credence to further evidence/arguments helping you see that it's not a major risk. If you become increasingly confident that it's not a major risk, then your credence in doom should go down.


  1. You can also think of the objective probability as, basically, what your subjective should become if you gained access to dramatically more complete evidence and arguments. ↩︎

  2. The ~10% number is a bit arbitrary. I think it'd almost always be unreasonable to be close to 100% confident that something is a major existential risk, after hearing just initial rough arguments and evidence for it. In most cases - like when hearing about possible existential risks from honeybee collapse - it's in fact reasonable to start out with a credence below 1%. So, when I'm talking about risks that we should assign "something on the order of a 10% credence to," I'm talking about the absolute most plausible category of risks. ↩︎

Comment by Ben Garfinkel (bmg) on Ben Garfinkel's Shortform · 2022-05-29T10:08:17.660Z · EA · GW

Good points - those all seem right to me!

Comment by Ben Garfinkel (bmg) on Ben Garfinkel's Shortform · 2022-05-28T20:40:18.909Z · EA · GW

A point about hiring and grantmaking, that may already be conventional wisdom:

If you're hiring for highly autonomous roles at a non-profit, or looking for non-profit founders to fund, then advice derived from the startup world is often going to overweight the importance of entrepreneurialism relative to self-skepticism and reflectiveness.[1]

Non-profits, particularly non-profits with longtermist missions, are typically trying to maximize something that is way more illegible than time-discounted future profits. To give a specific example: I think it's way harder for an organization like the Centre for Effective Altruism to tell if it's on the right track than it is for a company like Zoom to tell if it's on the right track. CEA can track certain specific metrics (e.g. the number of "new connections" reported at each conference it organizes), but it will often be ambiguous how strongly these metrics reflect positive impact - and there will also always be a risk that various negative indirect effects aren't being captured by the key metrics being used. In some cases, evaluating the expected impact of work will also require making assumptions about how the world will evolve over the next couple decades (e.g. assumptions about how pressing risks from AI are).

I think this means that it's especially important for these non-profits to employ and be headed by people who are self-skeptical and reflect deeply on decisions. Being entrepreneurial, having a bias toward action, and so on, don't count for much if the organisation isn't pointed in the right direction. As Ozzie Gooen has pointed out, there are many examples of massive and superficially successful initiatives (headed by very driven and entrepreneurial people) whose theories-of-impact don't stand up to scrutiny.

A specific example from Ozzie's post: SpaceX is a massive and extraordinarily impressive venture that was (at least according to Elon Musk) largely started to help reduce the chance of human extinction, by helping humanity become a multi-planetary species earlier than it otherwise would. But I think it's hard to see how their work reduces extinction risk very much. If you're worried about the climate effects of nuclear war, for example, then it seems important to remember that post-nuclear-war Earth would still have a much more hospitable climate than Mars. It's pretty hard to imagine a disaster scenario where building Martian colonies would be better than (for example) building some bunkers on Earth.[2][3] So - relative to the organization's stated social mission - all the talent, money, and effort SpaceX has absorbed might not ultimately come out to even close to as much as it could have.

A more concise way to put the concern here: Popular writing on talent identification is often implicitly asking the question "How can we identify future Elon Musks?" But, for the most part, longtermist non-profits shouldn't be looking to put future Elon Musks into leadership positions .[4]


  1. I have in mind, for example, advice given by Y Combinator and advice given in Talent. ↩︎

  2. Another example: It's possible that many highly successful environmentalist organizations/groups have ended up causing net harm to the environment, by being insufficiently self-skeptical and reflective when deciding how to approach nuclear energy issues. ↩︎

  3. I've encountered the argument that a Mars mission will reduce existential risk by fostering a common human identity and political unity, or hope for the future, which will in turn lead to policies that reduce other existential risks (e.g. bioterrorism or nuclear war). But I think this argument also doesn't hold to scrutiny. Focusing just at the domestic level, for example, the Apollo program had far from universal support, and the decade that followed the moon landing definitely was very from a time of optimism and unity in the US. At the international level, it was also of course largely motivated by great power competition with the Soviet Union. ↩︎

  4. A follow-up thought: Ultimately, outside of earning-to-give ventures, we probably shouldn't expect the longtermist community (or at least the best version of it) to house many extremely entrepreneurial people. There will be occasional leaders who are extremely high on both entrepreneurialism and reflectiveness (I can currently think of at least a couple); however, since these two traits don't seem to be strongly correlated, this will probably only happen pretty rarely. It's also, often, hard to keep exceptionally entrepreneurial people satisfied in non-leadership positions -- since, almost by definition, autonomy is deeply important to them -- so there may not be many opportunities, in general, to harness the talents of people who are exceptionally high on entrepreneurialism but mediocre on reflectiveness. ↩︎

Comment by Ben Garfinkel (bmg) on Simulation argument? · 2022-05-25T21:14:33.168Z · EA · GW

I think most people would probably regard the objection as a nitpick (e.g. "OK, maybe the Indifference Principle isn't actually sufficient to support a tight formal argument, and you need to add in some other assumption, but the informal version if the argument is just pretty clearly right"), feel the objection has been successfully answered (e.g. find the response in the Simulation Argument FAQ more compelling than I do), or just haven't completely noticed the potential issue.

I think it's still totally reasonable for the paper to have passed peer review. (I would have recommended publication if I were a reviewer.) It's still a groundbreaking paper that raises new considerations and brings attention to a really important hypothesis. It's also rare for a published philosophical argument to actually be totally tight and free from issues, and the issue with the paper is ambiguous enough and hard-to-think-about enough that there's still no consensus about whether it actually is a real or important issue.

Comment by Ben Garfinkel (bmg) on Simulation argument? · 2022-05-23T10:48:53.679Z · EA · GW

To be clear, I'm not saying the conclusion is wrong - just that the explicit assumptions the paper makes (mainly the Indifference Principle) aren't sufficient to imply its conclusion.

The version that you've just presented isn't identical to the one in Bostrom's paper -- it's (at least implicitly) making use of assumptions beyond the Indifference Principle. And I think it's surprisingly non-trivial to work out exactly how to formalize the needed assumptions, and make the argument totally tight, although I'd still guess that this is ultimately possible.[1]


  1. Caveat: Although the conclusion is at least slightly wrong, since - if we're willing to assign non-zero probability to the hypothesis that we're hallucating the world, because we're ancestor simulations - it seems we should also assign non-zero probability to the hypothesis that we're hallucinating for some other reason. (The argument implicitly assumes that being an ancestor simulation is the only 'skeptical hypothesis' we should assign non-zero probability to.) I think it's also unclear how big a deal this caveat is. ↩︎

Comment by Ben Garfinkel (bmg) on Simulation argument? · 2022-05-22T16:22:08.301Z · EA · GW

I'm trying to understand the simulation argument. I think Bostrom uses the Indifference Principle (IP) in a weird way. If we become a posthuman civilization that runs many many simulations of our ancestors (meaning us), then how does the IP apply? It only applies when one has no other information to go on. But in this case, we do have some extra information -- crucial information! I.e., we know that we are not in any of the simulations that we have produced. Therefore, we do not have any statistical reason to believe that we are simulated.

I agree that that's a valid objection to the argument, as it's presented in the paper, and that the follow-up FAQ essay also doesn't sufficiently address it. Basically, the Indifference Principle defined in the paper isn't sufficient to support the paper's conclusions (for the reason you give).

I think the main question is whether this issue can be patched in a simple way (e.g. by slightly tweaking the Indifference Principle) or whether the objection is actually much deeper than that. I'm not sure, personally.

I also really recommend Joe's essay as an exploration of these issues. (The essay also links a related Google doc I wrote on the subject, although that doc goes a bit less deep.)

Comment by Ben Garfinkel (bmg) on Ben Garfinkel's Shortform · 2022-05-22T15:19:19.640Z · EA · GW

The actual worry with inner misalignment style concerns is that the selection you do during training does not fully constrain the goals of the AI system you get out; if there are multiple goals consistent with the selection you applied during training there's no particular reason to expect any particular one of them. Importantly, when you are using natural selection or gradient descent, the constraints are not "you must optimize X goal", the constraints are "in Y situations you must behave in Z ways", which doesn't constrain how you behave in totally different situations. What you get out depends on the inductive biases of your learning system (including e.g. what's "simpler").

I think that's well-put -- and I generally agree that this suggests genuine reason for concern.

I suppose my point is more narrow, really just questioning whether the observation "humans care about things besides their genes" gives us any additional reason for concern. Some presentations seem to suggest it does. For example, this introduction to inner alignment concerns (based on the MIRI mesa-optimization paper) says:

We can see that humans are not aligned with the base objective of evolution [maximize inclusive genetic fitness].... [This] analogy might be an argument for why Inner Misalignment is probable since it has occurred "naturally" in the biggest non-human-caused optimization process we know.

And I want to say: "On net, if humans did only care about maximizing inclusive genetic fitness, that would probably be a reason to become more concerned (rather than less concerned) that ML systems will generalize in dangerous ways." While the abstract argument makes sense, I think this specific observation isn't evidence of risk.


Relatedly, something I'd be interested in reading (if it doesn't already exist?) would be a piece that takes a broader approach to drawing lessons from the evolution of human goals - rather than stopping at the fact that humans care about things besides genetic fitness.

My guess is that the case of humans is overall a little reassuring (relative to how we might have expected generalization to work), while still leaving a lot of room for worry.

For example, in the case of violence:

People who committed totally random acts of violence presumably often failed to pass on their genes (because they were often killed or ostracized in return). However, a large portion of our ancestors did have occasion for violence. On high-end estimates, our average ancestor may have killed about .25 people. This has resulted in most people having a pretty strong disinclination to commit murder; for most people, it's very hard to bring yourself to murder and you'll often be willing to pay a big cost to avoid committing murder.

The three main reasons for concern, though, are:

  • people's desire to avoid murder isn't strong enough to consistently prevent murder from happening (e.g. when incentives are strong enough)

  • there's a decent amount of random variation in how strong this desire is (a small minority of people don't really care that much about committing violence)

  • the disinclination to murder becomes weaker the more different the method of murder is from methods that were available in the ancestral environment (e.g. killing someone with a drone strike vs. killing someone with a rock)

These issues might just reflect the fact that murder was still often rewarded (even though it was typically punished) and the fact that there was pretty limited variation in the ancestral environment. But it's hard to be sure. And it's hard to know, in any case, how similar generalization in human evolution will be to generalization in ML training processes.

So -- if we want to create AI systems that don't murder people, by rewarding non-murderous behavior --then the evidence from human evolution seems like it might be medium-reassuring. I'd maybe give it a B-.

I can definitely imagine different versions of human values that would have more worrying implications. For example, if our aversion to violence didn't generalize at all to modern methods of killing, or if we simply didn't have any intrinsic aversion to killing (and instead avoided it for purely instrumental reasons), then that would be cause for greater concern. I can also imagine different versions of human values that would be more reassuring. For example, I would feel more comfortable if humans were never willing to kill for the sake of weird abstract goals.

Comment by Ben Garfinkel (bmg) on Ben Garfinkel's Shortform · 2022-05-21T19:27:36.375Z · EA · GW

(Disclaimer: The argument I make in this short-form feels I little sophistic to me. I’m not sure I endorse it.)

Discussions of AI risk, particular risks from “inner misalignment,” sometimes heavily emphasize the following observation:

Humans don’t just care about their genes: Genes determine, to a large extent, how people behave. Some genes are preserved from generation-to-generation and some are pushed out of the gene-pool. Genes that cause certain human behaviours (e.g. not setting yourself on fire) are more likely to be preserved. But people don’t care very much about preserving their genes. For example, they typically care more about not setting themselves on fire than they care about making sure that their genes are still present in future generations.

This observation is normally meant to be alarming. And I do see some intuition for that.

But wouldn’t the alternative observation be more alarming?

Suppose that evolutionary selection processes — which iteratively update people’s genes, based on the behaviour these genes produce — tended to produce people who only care about preserving their genes. It seems like that observation would suggest that ML training processes — which iterative update a network’s parameter values, based on the behaviour these parameter values produce — will tend to produce AI systems that only care about preserving their parameter values. And that would be really concerning, since an AI system that cares only about preserving its parameter values would obviously have (instrumentally convergent) reasons to act badly.

So it does seem, to me, like there’s something funny going on here. If “Humans just care about their genes” would be a more worrying observation than “Humans don’t just care about their genes,” then it seems backward for the latter observation to be used to try to convince people to worry more.

To push this line of thought further, let’s go back to specific observation about humans’ relationship to setting themselves on fire:

Human want to avoid setting themselves on fire: If a person has genes that cause them to avoid setting themselves on fire, then these genes are more likely to be preserved from one generation to the next. One thing that has happened, as a result of this selection pressure, is that people tend to want to avoid setting themselves on fire.

It seems like this can be interpreted as a reassuring observation. By analogy, in future ML training processes, parameter values that cause ML systems to avoid acts of violence are more likely to be “preserved” from one iteration to the next. We want this to result in AI systems that care about avoiding acts of violence. And the case of humans and fire suggests this might naturally happen.

All this being said, I do think that human evolutionary history still gives us reason to worry. Clearly, there’s a lot of apparent randomness and unpredictability in what humans have actually ended up caring about, which suggests it may be hard to predict or perfectly determine what AI systems care about. But, I think, the specific observation “Humans don’t just care about their genes” might not itself be cause for concern.

Comment by Ben Garfinkel (bmg) on Ben Garfinkel's Shortform · 2022-05-21T14:54:13.652Z · EA · GW

The existential risk community’s relative level of concern about different existential risks is correlated with how hard-to-analyze these risks are. For example, here is The Precipice’s ranking of the top five most concerning existential risks:

  1. Unaligned artificial intelligence[1]
  2. Unforeseen anthropogenic risks (tied)
  3. Engineered pandemics (tied)
  4. Other anthropogenic risks
  5. Nuclear war (tied)
  6. Climate change (tied)

This isn’t surprising.

For a number of risks, when you first hear about them, it’s reasonable to have the reaction “Oh, hm, maybe that could be a huge threat to human survival” and initially assign something on the order of a 10% credence to the hypothesis that it will by default lead to existentially bad outcomes. In each case, if we can gain much greater clarity about the risk, then we should think there’s about a 90% chance we’ll become less worried about it. We’re likely to remain decently worried about hard-to-analyze risks (because we can’t get greater clarity about them) while becoming less worried about easy-to-analyze risks.

In particular, our level of worry about different plausible existential risks is likely to roughly track our ability to analyze them (e.g. through empirical evidence, predictively accurate formal models, and clearcut arguments).

Some plausible existential risks also are far easier to analyze than others. If you compare 80K’s articles on climate change and artificial intelligence, for example, then I think it is pretty clear that people analyzing climate risk simply have a lot more to go on. When we study climate change, we can rely on climate models that we have reason to believe have a decent amount of validity. We can also draw on empirical evidence about the historical effects of previous large changes in global temperature and about the ability of humans and other specifies to survive under different local climate conditions. And so on. We’re in a much worse epistemic position when it comes to analyzing the risk from misaligned AI: we’re reliant on fuzzy analogies, abstract arguments that use highly ambiguous concepts, observations of the behaviour of present-day AI systems (e.g. reinforcement learners that play videogames) that will probably be very different than future AI systems, a single datapoint (the evolution of human intelligence and values) that has a lot of important differences with the case we’re considering, and attempts to predict the incentives and beliefs of future actors in development scenarios that are still very opaque to us. Even if the existential risk from misaligned AI actually is reasonably small, it’s hard to see how we could become really confident of that.

Some upshots:

  1. The fact that the existential risk community is particularly worried about misaligned AI might mostly reflect the fact that it’s hard to analyze risks from misaligned AI.

  2. Nonetheless, even if the above possibility is true, it doesn't at all follow that the community is irrational to worry far more about misaligned AI than other potential risks. It’s completely coherent to have something like this attitude: “If I could think more clearly about the risk from misaligned AI, then I would probably come to realize it’s not that big a deal. But, in practice, I can’t yet think very clearly about it. That means that, unlike in the case of climate change, I also can’t rule out the small possibility that clarity would make me much more worried about it than I currently am. So, on balance, I should feel more worried about misaligned AI than I do about other risks. I should focus my efforts on it, even if — to uncharitable observers — my efforts will probably look a bit misguided after the fact.”

  3. For hard-to-analyze risks, it matters a lot what your “prior” in the risks is (since evidence, models, and arguments can only really move you so much). I sometimes get the sense that some people are starting from a prior that’s not far from 50%: For example, people who are very worried about misaligned AI sometimes use the rhetorical move “How would the world look different if AI wasn’t going to kill everyone?”, and this move seems to assume that empirical evidence is needed to shift us down from a high credence. I think that other people (including myself) are often implicitly starting from a low prior and feel the need to be argued up. Insofar as it’s very unclear how we should determine our priors, and it's even a bit unclear what exactly a "prior" means in this case, it’s also unsurprising that there’s a particularly huge range of variation in estimates of the risk from misaligned AI.

(This shortform partly inspired by Greg Lewis's recent forecasting post .)


  1. Toby Ord notes, in the section of The Precipice that gives risk estimates: "The case for existential risk from AI is clearly speculative. Indeed, it is the most speculative case for a major risk in this book." ↩︎

Comment by Ben Garfinkel (bmg) on How likely is World War III? · 2022-02-15T23:49:26.534Z · EA · GW

Let’s call the hypothesis that the base rate of major wars hasn’t changed the constant risk hypothesis. The best presentation of this view is in Only the Dead, a book by an IR professor with the glorious name of Bear Braumoeller. He argues that there is no clear trend in the average incidence of several measures of conflict—including uses of force, militarized disputes, all interstate wars, and wars between “politically-relevant dyads”—between 1800 and today.

A quick note on Braumoeller's analysis:

He's relying on the Correlates of War (COW) dataset, which is extremely commonly used but (in my opinion) more problematic than his book indicates. As a result, I don't think we should take his main finding too seriously.

The COW dataset is meant to record all "militarized disputes" between states since 1816. However, it uses a really strange standard for what counts as a "state." If I remember correctly, up until WW1, a political entity only qualifies as a "state" if it has a sufficiently high-level diplomatic presence in England or France. As a result, in 1816, there are supposedly only two non-European states: Turkey and the US. If I remember correctly, even an obvious state like China doesn't get classified as a "state" until after the Opium Wars. The dataset only really becomes properly global sometime in the 20th century century.

This means that Braumoeller is actually comparing (A) the rate of intra-European conflict in the first half of the 19th century and (B) the global rate of interstate conflict in the late 20th century.

This 19th-century-Europe-vs.-20th-century-world comparison is interesting, and suggestive, but isn't necessarily all that informative. Europe was almost certainly, by far, the most conflict-free part of the world at the start of the 19th century -- so I strongly expect that the actual global rate of conflict in the early 19th century was much higher.

It's also important that the COW dataset begins in 1816, at the very start of a few-decade period that was -- at the time -- marvelled over as the most peaceful in all of European history. This period was immediately preceded by two decades of intense warfare involving essentially all the states in Europe.

So, in summary: I think Braumoeller's analysis would probably show a long-run drop in the rate of conflict if the COW dataset was either properly global or went back slightly further in time. (Which is good news!)


EDIT: Here's a bit more detail, on the claim that the COW dataset can't tell us very much about long-run trends in the global rate of interstate conflict.

From the COW documentation, these are the criteria for state membership:

The Correlates of War project includes a state in the international system from 1816-2016 for the following criteria. Prior to 1920, the entity must have had a population greater than 500,000 and have had diplomatic missions at or above the rank of charge d’affaires with Britain and France. After 1920, the entity must be a member of the League of Nations or the United Nations, or have a population greater than 500,000 and receive diplomatic missions from two major powers.

As a result, the dataset starts out assuming that only 23 states existed in 1816. For reference, they're: Austria-Hungary, Baden, Bavaria, Denmark, France, Germany, Hesse Electoral, Hesse Grand Ducal, Italy, Netherlands, Papal States, Portugal, Russia, Saxony, Two Sicilies, Spain, Sweden, Switzerland, Tuscany, United Kingdom, USA, Wuerttemburg, and Turkey.

An alternative dataset, the International Systems(s) Dataset, instead produces an estimate of 135 states by relaxing the criteria to (a) estimated population over 100,000, (b) "autonomy over a specific territory", and (c) "sovereignty that is either uncontested or acknowledged by the relevant international actors."

So - at least by these alternative standards - the COW dataset starts out considering only a very small portion (<20%) of the international system. We also have reason to believe that this portion of the international system was really unusually peaceful internally, rather than serving as a representative sample.

Comment by Ben Garfinkel (bmg) on Democratising Risk - or how EA deals with critics · 2022-01-01T05:30:31.300Z · EA · GW

I'm not familiar with Zoe's work, and would love to hear from anyone who has worked with them in the past. After seeing the red flags mentioned above,  and being stuck with only Zoe's word for their claims, anything from a named community member along the lines of "this person has done good research/has been intellectually honest" would be a big update for me…. [The post] strikes me as being motivated not by a desire to increase community understanding of an important issue, but rather to generate sympathy for the authors and support for their position by appealing to justice and fairness norms. The other explanation is that this was a very stressful experience, and the author was simply venting their frustrations.

(Hopefully I'm not overstepping; I’m just reading this thread now and thought someone ought to reply.)

I’ve worked with Zoe and am happy to vouch for her intentions here; I’m sure others would be as well. I served as her advisor at FHI for a bit more than a year, and have now known her for a few years. Although I didn’t review this paper, and don’t have any detailed or first-hand knowledge of the reviewer discussions, I have also talked to her about this paper a few different times while she’s been working on it with Luke.

I’m very confident that this post reflects genuine concern/frustration; it would be a mistake to dismiss it as (e.g.) a strategy to attract funding or bias readers toward accepting the paper’s arguments. In general, I’m confident that Zoe genuinely cares about the health of the EA and existential risk communities and that her critiques have come from this perspective.

Comment by Ben Garfinkel (bmg) on Why AI alignment could be hard with modern deep learning · 2021-09-26T18:30:41.005Z · EA · GW

FWIW, I haven't had this impression.

Single data point: In the most recent survey on community opinion on AI risk, I was in at least the 75th percentile for pessimism (for roughly the same reasons Lukas suggests below). But I'm also seemingly unusually optimistic about alignment risk.

I haven't found that this is a really unusual combo: I think I know at least a few other people who are unusually pessimistic about 'AI going well,' but also at least moderately optimistic about alignment.

(Caveat that my apparently higher level of pessimism could also be explained by me having a more inclusive conception of "existential risk" than other survey participants.)

Comment by Ben Garfinkel (bmg) on All Possible Views About Humanity's Future Are Wild · 2021-07-16T12:02:18.312Z · EA · GW

Thanks for the clarification! I still feel a bit fuzzy on this line of thought, but hopefully understand a bit better now.

At least on my read, the post seems to discuss a couple different forms of wildness: let’s call them “temporal wildness” (we currently live at an unusually notable time) and “structural wildness” (the world is intuitively wild; the human trajectory is intuitively wild).[1]

I think I still don’t see the relevance of “structural wildness,” for evaluating fishiness arguments. As a silly example: Quantum mechanics is pretty intuitively wild, but the fact that we live in a world where QM is true doesn’t seem to substantially undermine fishiness arguments.

I think I do see, though, how claims about temporal wildness might be relevant. I wonder if this kind of argument feels approximately right to you (or to Holden):

Step 1: A priori, it’s unlikely that we would live even within 10000 years of the most consequential century in human history. However, despite this low prior, we have obviously strong reasons to think it’s at least plausible that we live this close to the HoH. Therefore, let’s say, a reasonable person should assign at least a 20% credence to the (wild) hypothesis: “The HoH will happen within the next 10000 years.”

Step 2: If we suppose that the HoH will happen with the next 10000 years, then a reasonable conditional credence that this century is the HoH should probably be something like 1/100. Therefore, it seems, our ‘new prior’ that this century is the HoH should be at least .2*.01 = .002. This is substantially higher than (e.g.) the more non-informative prior that Will's paper starts with.

Fishiness arguments can obviously still be applied to the hypothesis presented in Step 1, in the usual way. But maybe the difference, here, is that the standard arguments/evidence that lend credibility to the more conservative hypothesis “The HoH will happen within the next 10000” are just pretty obviously robust — which makes it easier to overcome a low prior. Then, once we’ve established the plausibility of the more conservative hypothesis, we can sort of back-chain and use it to bump up our prior in the Strong HoH Hypothesis.


  1. I suppose it also evokes an epistemic notion of wildness, when it describes certain confidence levels as “wild,” but I take it that “wild” here is mostly just a way of saying “irrational”? ↩︎

Comment by Ben Garfinkel (bmg) on All Possible Views About Humanity's Future Are Wild · 2021-07-15T18:25:18.458Z · EA · GW

To say a bit more here, on the epistemic relevance of wildness:

I take it that one of the main purposes of this post is to push back against “fishiness arguments,” like the argument that Will makes in “Are We Living at the Hinge of History?

The basic idea, of course, is that it’s a priori very unlikely that any given person would find themselves living at the hinge of history (and correctly recognise this). Due to the fallibility of human reasoning and due to various possible sources of bias, however, it’s not as unlikely that a given person would mistakenly conclude that they live at the HoH. Therefore, if someone comes to believe that they probably live at the HoH, we should think there’s a sizeable chance they’ve simply made a mistake.

As this line of argument is expressed in the post:

I know what you're thinking: "The odds that we could live in such a significant time seem infinitesimal; the odds that Holden is having delusions of grandeur (on behalf of all of Earth, but still) seem far higher."

The three critical probabilities here are:

  • Pr(Someone makes an epistemic mistake when thinking about their place in history)
  • Pr(Someone believes they live at the HoH|They haven’t made an epistemic mistake)
  • Pr(Someone believes they live at the HoH|They’ve made an epistemic mistake)

The first describes the robustness of our reasoning. The second describes the prior probability that we would live at the HoH (and be able to recognise this fact if reasoning well). The third describes the level of bias in our reasoning, toward the HoH hypothesis, when we make mistakes.

I agree that all possible futures are “wild,” in some sense, but I don’t think this point necessarily bears much on the magnitudes of any of these probabilities.

For example, it would be sort of “wild” if long-distance space travel turns out to be impossible and our solar system turns out to be the only solar system to ever harbour life. It would also be “wild” if long-distance space travel starts to happen 100,000 years from now. But — at least at a glance — I don’t see how this wildness should inform our estimates for the three key probabilities.

One possible argument here, focusing on the bias factor, is something like: “We shouldn’t expect intellectuals to be significantly biased toward the conclusion that they live at the HoH, because the HoH Hypothesis isn’t substantially more appealing, salient, etc., than other beliefs they could have about the future.”

But I don’t think this argument would be right. For example: I think the hypothesis “the HoH will happen within my lifetime” and the hypothesis “the HoH will happen between 100,000 and 200,000 years from now” are pretty psychologically different.

To sum up: At least on a first pass, I don't see why the point "all possible futures are wild" undermines the fishiness argument raised at the top of the post.

Comment by Ben Garfinkel (bmg) on All Possible Views About Humanity's Future Are Wild · 2021-07-15T13:50:23.775Z · EA · GW

Some possible futures do feel relatively more "wild” to me, too, even if all of them are wild to a significant degree. If we suppose that wildness is actually pretty epistemically relevant (I’m not sure it is), then it could still matter a lot if some future is 10x wilder than another.

For example, take a prediction like this:

Humanity will build self-replicating robots and shoot them out into space at close to the speed of light; as they expand outward, they will construct giant spherical structures around all of the galaxy’s stars to extract tremendous volumes of energy; this energy will be used to power octillions of digital minds with unfathomable experiences; this process will start in the next thirty years, by which point we’ll already have transcended our bodies to reside on computers as brain emulation software.

A prediction like “none of the above happens; humanity hangs around and then dies out sometime in the next million years” definitely also feels wild in its own way. So does the prediction “all of the above happens, starting a few hundred years from now.” But both of these predictions still feel much less wild than the first one.

I suppose whether they actually are much less “wild” depends on one’s metric of wildness. I’m not sure how to think about that metric, though. If wildness is epistemically relevant, then presumably some forms of wildness are more epistemically relevant than others.

Comment by Ben Garfinkel (bmg) on Taboo "Outside View" · 2021-06-30T18:57:29.949Z · EA · GW

I suspect you are more broadly underestimating the extent to which people used "insect-level intelligence" as a generic stand-in for "pretty dumb," though I haven't looked at the discussion in Mind Children and Moravec may be making a stronger claim.

I think that's good push-back and a fair suggestion: I'm not sure how seriously the statement in Nick's paper was meant to be taken. I hadn't considered that it might be almost entirely a quip. (I may ask him about this.)

Moravec's discussion in Mind Children is similarly brief: He presents a graph of the computing power of different animal's brains and states that "lab computers are roughly equal in power to the nervous systems of insects."He also characterizes current AI behaviors as "insectlike" and writes: "I believe that robots with human intelligence will be common within fifty years. By comparison, the best of today's machines have minds more like those of insects than humans. Yet this performance itself represents a giant leap forward in just a few decades." I don't think he's just being quippy, but there's also no suggestion that he means anything very rigorous/specific by his suggestion.

Rodney Brooks, I think, did mean for his comparisons to insect intelligence to be taken very seriously. The idea of his "nouvelle AI program" was to create AI systems that match insect intelligence, then use that as a jumping-off point for trying to produce human-like intelligence. I think walking and obstacle navigation, with several legs, was used as the main dimension of comparison. The Brooks case is a little different, though, since (IIRC) he only claimed that his robots exhibited important aspects of insect intelligence or fell just short insect intelligence, rather than directly claiming that they actually matched insect intelligence. On the other hand, he apparently felt he had gotten close enough to transition to the stage of the project that was meant to go from insect-level stuff to human-level stuff.

A plausible reaction to these cases, then, might be:

OK, Rodney Brooks did make a similar comparison, and was a major figure at the time, but his stuff was pretty transparently flawed. Moravec's and Bostrom's comments were at best fairly off-hand, suggesting casual impressions more than they suggest outcomes of rigorous analysis. The more recent "insect-level intelligence" claim is pretty different, since it's built on top of much more detailed analysis than anything Moravec/Bostrom did, and it's less obviously flawed than Brooks' analysis. The likelihood that it reflects an erroneous impression is, therefore, a lot lower. The previous cases shouldn't actually do much to raise our suspicion levels.

I think there's something to this reaction, particularly if there's now more rigorous work being done to operationalize and test the "insect-level intelligence" claim. I hadn't yet seen the recent post you linked to, which, at first glance, seems like a good and clear piece of work. The more rigorous work is done to flesh out the argument, the less I'm inclined to treat the Bostrom/Moravec/Brooks cases as part of an epistemically relevant reference class.

My impression a few years ago was that the claim wasn't yet backed by any really clear/careful analysis. At least, the version that filtered down to me seemed to be substantially based on fuzzy analogies between RL agent behavior and insect behavior, without anyone yet knowing much about insect behavior. (Although maybe this was a misimpression.) So I probably do stand by the reference class being relevant back then.

Overall, to sum up, my position here is something like: "The Bostrom/Moravec/Brooks cases do suggest that it might be easy to see roughly insect-level intelligence, if that's what you expect to see and you're relying on fuzzy impressions, paying special attention to stuff AI systems can already do, or not really operationalizing your claims. This should make us more suspicious of modern claims that we've recently achieved 'insect-level intelligence,' unless they're accompanied by transparent and pretty obviously robust reasoning. Insofar as this work is being done, though, the Bostrom/Moravec/Brooks cases become weaker grounds for suspicion."

Comment by Ben Garfinkel (bmg) on Taboo "Outside View" · 2021-06-29T13:46:41.044Z · EA · GW

As a last thought here (no need to respond), I thought it might useful to give one example of a concrete case where: (a) Tetlock’s work seems relevant, and I find the terms “inside view” and “outside view” natural to use, even though the case is relatively different from the ones Tetlock has studied; and (b) I think many people in the community have tended to underweight an “outside view.”

A few years ago, I pretty frequently encountered the claim that recently developed AI systems exhibited roughly “insect-level intelligence.” This claim was typically used to support an argument for short timelines, since the claim was also made that we now had roughly insect-level compute. If insect-level intelligence has arrived around the same time as insect-level compute, then, it seems to follow, we shouldn’t be at all surprised if we get ‘human-level intelligence’ at roughly the point where we get human-level compute. And human-level compute might be achieved pretty soon.

For a couple of reasons, I think some people updated their timelines too strongly in response to this argument. First, it seemed like there are probably a lot of opportunities to make mistakes when constructing the argument: it’s not clear how “insect-level intelligence” or “human-level intelligence” should be conceptualised, it’s not clear how best to map AI behaviour onto insect behaviour, etc. The argument also hadn't yet been vetted closely or expressed very precisely, which seemed to increase the possibility of not-yet-appreciated issues.

Second, we know that there are previous of examples of smart people looking at AI behaviour and forming the impression that it suggests “insect-level intelligence.” For example, in Nick Bostrom’s paper “How Long Before Superintelligence?” (1998) he suggested that “approximately insect-level intelligence” was achieved sometime in the 70s, as a result of insect-level computing power being achieved in the 70s. In Moravec’s book Mind Children (1990), he also suggested that both insect-level intelligence and insect-level compute had both recently been achieved. Rodney Brooks also had this whole research program, in the 90s, that was based around going from “insect-level intelligence” to “human-level intelligence.”

I think many people didn’t give enough weight to the reference class “instances of smart people looking at AI systems and forming the impression that they exhibit insect-level intelligence” and gave too much weight to the more deductive/model-y argument that had been constructed.

This case is obviously pretty different than the sorts of cases that Tetlock’s studies focused on, but I do still feel like the studies have some relevance. I think Tetlock’s work should, in a pretty broad way, make people more suspicious of their own ability to perform to linear/model-heavy reasoning about complex phenomena, without getting tripped up or fooling themselves. It should also make people somewhat more inclined to take reference classes seriously, even when the reference classes are fairly different from the sorts of reference classes good forecasters used in Tetlock’s studies. I do also think that the terms “inside view” and “outside view” apply relatively neatly, in this case, and are nice bits of shorthand — although, admittedly, it’s far from necessary to use them.

This is the sort of case I have in the back of my mind.

(There are also, of course, cases that point in the opposite direction, where many people seemingly gave too much weight to something they classified as an "outside view." Early under-reaction to COVID is arguably one example.)

Comment by Ben Garfinkel (bmg) on Taboo "Outside View" · 2021-06-29T13:05:06.146Z · EA · GW

Thank you (and sorry for my delayed response)!

I shudder at the prospect of having a discussion about "Outside view vs inside view: which is better? Which is overrated and which is underrated?" (and I've worried that this thread may be tending in that direction) but I would really look forward to having a discussion about "let's look at Daniel's list of techniques and talk about which ones are overrated and underrated and in what circumstances each is appropriate."

I also shudder a bit at that prospect.

I am sometimes happy making pretty broad and sloppy statements. For example: "People making political predictions typically don't make enough use of 'outside view' perspectives" feels fine to me, as a claim, despite some ambiguity around the edges. (Which perspectives should they use? How exactly should they use them? Etc.)

But if you want to dig in deep, for example when evaluating the rationality of a particular prediction, you should definitely shift toward making more specific and precise statements. For example, if someone has based their own AI timelines on Katja's expert survey, and they wanted to defend their view by simply evoking the principle "outside views are better than inside views," I think this would probably a horrible conversation. A good conversation would focus specifically on the conditions under which it makes sense to defer heavily to experts, whether those conditions apply in this particular case, etc. Some general Tetlock stuff might come into the conversation, like: "Tetlock's work suggests it's easy to trip yourself up if you try to use your own detailed/causal model of the world to make predictions, so you shouldn't be so confident that your own 'inside view' prediction will be very good either." But mostly you should be more specific.

Now I'll try to say what I think your position is:

  1. If people were using "outside view" without explaining more specifically what they mean, that would be bad and it should be tabood, but you don't see that in your experience
  2. If the things in the first Big List were indeed super diverse and disconnected from the evidence in Tetlock's studies etc., then there would indeed be no good reason to bundle them together under one term. But in fact this isn't the case; most of the things on the list are special cases of reference-class / statistical reasoning, which is what Tetlock's studies are about. So rather than taboo "outside view" we should continue to use the term but mildly prune the list.
  3. There may be a general bias in this community towards using the things on the first Big List, but (a) in your opinion the opposite seems more true, and (b) at any rate even if this is true the right response is to argue for that directly rather than advocating the tabooing of the term.

How does that sound?

I'd say that sounds basically right!

The only thing is that I don't necessarily agree with 3a.

I think some parts of the community lean too much on things in the bag (the example you give at the top of the post is an extreme example). I also think that some parts of the community lean too little on things in the bag, in part because (in my view) they're overconfident in their own abilities to reason causally/deductively in certain domains. I'm not sure which is overall more problematic, at the moment, in part because I'm not sure how people actually should be integrating different considerations in domains like AI forecasting.

There also seem to be biases that cut in both directions. I think the 'baseline bias' is pretty strongly toward causal/deductive reasoning, since it's more impressive-seeming, can suggest that you have something uniquely valuable to bring to the table (if you can draw on lots of specific knowledge or ideas that it's rare to possess), is probably typically more interesting and emotionally satisfying, and doesn't as strongly force you to confront or admit the limits of your predictive powers. The EA community has definitely introduced an (unusual?) bias in the opposite direction, by giving a lot of social credit to people who show certain signs of 'epistemic virtue.' I guess the pro-causal/deductive bias often feels more salient to me, but I don't really want to make any confident claim here that it actually is more powerful.

Comment by Ben Garfinkel (bmg) on Ben Garfinkel's Shortform · 2021-06-28T20:12:10.118Z · EA · GW

I'm not sure if you think this is an interesting point to notice that's useful for building a world-model, and/or a reason to be skeptical of technical alignment work. I'd agree with the former but disagree with the latter.

Mostly the former!

I think the point may have implications for how much we should prioritize alignment research, relative to other kinds of work, but this depends on what the previous version of someone's world model was.

For example, if someone has assumed that solving the 'alignment problem' is close to sufficient to ensure that humanity has "control" of its future, then absorbing this point (if it's correct) might cause them to update downward on the expected impact of technical alignment research. Research focused on coordination-related issues (e.g. cooperative AI stuff) might increase in value, at least in relative terms.

Comment by Ben Garfinkel (bmg) on Taboo "Outside View" · 2021-06-23T12:03:55.872Z · EA · GW

It’s definitely entirely plausible that I’ve misunderstood your views.

My interpretation of the post was something like this:

There is a bag of things that people in the EA community tend to describe as “outside views.” Many of the things in this bag are over-rated or mis-used by members of the EA community, leading to bad beliefs.

One reason for this over-use or mis-use is that the the term “outside view” has developed an extremely positive connotation within the community. People are applauded for saying that they’re relying on “outside views” — “outside view” has become “an applause light” — and so will rely on items in the bag to an extent that is epistemically unjustified.

The things in the bag are also pretty different from each other — and not everyone who uses the term “outside view” agrees about exactly what belongs in the bag. This conflation/ambiguity can lead to miscommunication.

More importantly, when it comes to the usefulness of the different items in the bag, some have more evidential support than others. Using the term “outside view” to refer to everything in the bag might therefore lead people to overrated certain items that actually have weak evidential support.

To sum up, tabooing the term “outside view” might solve two problems. First, it might reduce miscommunication. Second, more importantly, it might cause people to stop overrating some of the reasoning processes that they currently characterize as involving “outside views.” The mechanisms by which tabooing the term can help to solve the second problem are: (a) it takes away an “applause light,” whose existence incentivizes excessive use of these reasoning processes, and (b) it allows people to more easily recognize that some of these reasoning processes don't actually have much empirical support.

I’m curious if this feels roughly right, or feels pretty off.

Part of the reason I interpreted your post this way: The quote you kicked the post off suggested to me that your primary preoccupation was over-use or mis-use of the tools people called “outside views,” including more conventional reference-class forecasting. It seemed like the quote is giving an example of someone who’s refusing to engage in causal reasoning, evaluate object-level arguments, etc., based on the idea that outside views are just strictly dominant in the context of AI forecasting. It seemed like this would have been an issue even if the person was doing totally orthodox reference-class forecasting and there was no ambiguity about what they were doing.[1]

I don’t think that you’re generally opposed to the items in the “outside view” bag or anything like that. I also don’t assume that you disagree with most of the points I listed in my last comment, for why I think intellectuals probably on average underrated the items in the bag. I just listed all of them because you asked for an explanation for my view, I suppose with some implication that you might disagree with it.

You've also given two rough definitions of the term, which seem quite different to me, and also quite fuzzy. (e.g. if by "reference class forecasting" you mean the stuff Tetlock's studies are about, then it really shouldn't include the anti-weirdness heuristic, but it seems like you are saying it does?)

I think it’s probably not worth digging deeper on the definitions I gave, since I definitely don’t think they're close to perfect. But just a clarification here, on the anti-weirdness heuristic: I’m thinking of the reference class as “weird-sounding claims.”

Suppose someone approaches you not the street and hands you a flyer claiming: “The US government has figured out a way to use entangled particles to help treat cancer, but political elites are hoarding the particles.” You quickly form a belief that the flyer’s claim is almost certainly false, by thinking to yourself: “This is a really weird-sounding claim, and I figure that virtually all really weird-sounding claims that appear in random flyers are wrong.”

In this case, you’re not doing any deductive reasoning about the claim itself or relying on any causal models that directly bear on the claim. (Although you could.) For example, you’re not thinking to yourself: “Well, I know about quantum mechanics, and I know entangled particles couldn’t be useful for treating cancer for reason X.” Or: “I understand economic incentives, or understand social dynamics around secret-keeping, so I know it’s unlikely this information would be kept secret.” You’re just picking a reference class — weird-sounding claims made on random flyers — and justifying your belief that way.

I think it’s possible that Tetlock’s studies don’t bear very strongly on the usefulness of this reference class, since I imagine participants in his studies almost never used it. (“The claim ‘there will be a coup in Venezuela in the next five years’ sounds really weird to me, and most claims that sound weird to me aren't true, so it’s probably not true!”) But I think the anti-weirdness heuristic does fit with the definitions I gave, as well as the definition you give that characterizes the term's "original meaning." I also do think that Tetlock's studies remain at least somewhat relevant when judging the potential usefulness of the heuristic.


  1. I initially engaged on the miscommunication, point, though, since this is the concern that would mostly strongly make me want to taboo the term. I’d rather address the applause light problem, if it is a problem, but trying get people in the EA community stop applauding, and the evidence problem, if it is a problem, by trying to just directly make people in the EA community more aware of the limits of evidence. ↩︎

Comment by Ben Garfinkel (bmg) on Taboo "Outside View" · 2021-06-21T11:26:13.679Z · EA · GW

On the contrary; tabooing the term is more helpful, I think. I've tried to explain why in the post. I'm not against the things "outside view" has come to mean; I'm just against them being conflated with / associated with each other, which is what the term does. If my point was simply that the first Big List was overrated and the second Big List was underrated, I would have written a very different post!

My initial comment was focused on your point about conflation, because I think this point bears on the linguistic question more strongly than the other points do. I haven’t personally found conflation to be a large issue. (Recognizing, again, that our experiences may differ.) If I agreed with the point about conflation, though, then I would think it might be worth tabooing the term "outside view."

By what definition of "outside view?"

By “taking an outside view on X” I basically mean “engaging in statistical or reference-class-based reasoning.” I think it might also be best defined negatively: "reasoning that doesn’t substantially involve logical deduction or causal models of the phenomenon in question. "[1]

I think most of the examples in your list fit these definitions.

Epistemic deference is a kind of statistical/reference-class-based reasoning, for example, which doesn't involve applying any sort of causal model of the phenomenon in question. The logic is “Ah, I should update downward on this claim, since experts in domain X disagree with it and I think that experts in domain X will typically be right.”

Same for anti-weirdness: The idea is that weird claims are typically wrong.

I’d say that trend extrapolation also fits: You’re not doing logical reasoning or relying on a causal model of the relevant phenomenon. You’re just extrapolating a trend forward, largely based on the assumption that long-running trends don’t typically end abruptly.

“Foxy aggregation,” admittedly, does seem like a different thing to me: It arguably fits the negative definition, depending on how you generate your weights, but doesn’t seem to fit statistical/reference-class one. It also feels like more of a meta-level thing. So I wouldn’t personally use the term “outside view” to talk about foxy aggregation. (I also don’t think I’ve personally heard people use the term “outside view” to talk about foxy aggegration, although I obviously believe you have.)

There is some evidence that in some circumstances people don't take reference class forecasting seriously enough; that's what the original term "outside view" meant. What evidence is there that the things on the Big List O' Things People Describe as Outside View are systematically underrated by the average intellectual?

A condensed answer is: (a) I think most public intellectuals barely use any of the items on this list (with the exception of the anti-weirdness heuristic); (b) I think some of the things on this list are often useful; (c) I think that intellectuals and people more generally are very often bad at reasoning causally/logically about complex social phenomena; (d) I expect intellectuals to often have a bias against outside-view-style reasoning, since it often feels somewhat unsatisfying/unnatural and doesn't allow them to display impressive-seeming domain-knowledge, interesting models of the world, or logical reasoning skills; and (e) I do still think Tetlock’s evidence is at least somewhat relevant to most things on the list, in part because I think they actually are somewhat related to each other, although questions of external validity obviously grow more serious the further you move from the precise sorts of questions asked in his tournaments and the precise styles of outside-view reasoning displayed by participants. [2]

There’s also, of course, a bit of symmetry here. One could also ask: “What evidence is there that the things on the Big List O' Things People Describe as Outside View are systematically overrated by the average intellectual?” :)


  1. These definitions of course aren’t perfect, and other people sometimes use the term more broadly than I do, but, again, some amount of fuzziness seems OK to me. Most concepts have fuzzy boundaries and are hard to define precisely. ↩︎

  2. On the Tetlock evidence: I think one thing his studies suggest, which I expect to generalize pretty well to many different contexts, is that people who are trying to make predictions about complex phenemona (especially complex social phenemona) often do very poorly when they don't incorporate outside views into their reasoning processes. (You can correct me if this seems wrong, since you've thought about Tetlock's work far more than I have.) So, on my understanding, Tetlock's work suggests that outside-view-heavy reasoning processes would often substitute for reasoning processes that lead to poor predictions anyways. At least for most people, then, outside-view-heavy reasoning processes don't actually need to be very reliable to constitute improvements -- and they need to be pretty bad to, on average, lead to worse predictions.

    Another small comment here: I think Tetlock's work also counts, in a somewhat broad way, against the "reference class tennis" objection to reference-class-based forecasting. On its face, the objection also applies to the use of reference classes in standard forecasting tournaments. There are always a ton of different reference classes someone could use to forecast any given political event. Forecasters need to rely on some sort of intuition, or some sort of fuzzy reasoning, to decide on which reference classes to take seriously; it's a priori plausible that people would be just consistently very bad at this, given the number of degrees of freedom here and the absence of clear principles for making one's selections. But this issue doesn't actually seem to be that huge in the context of the sorts of questions Tetlock asked his participants. (You can again correct me if I'm wrong.) The degrees-of-freedom problem might be far larger in other contexts, but the fact that the issue is manageable in Tetlockian contexts presumably counts as at least a little bit of positive evidence. ↩︎

Comment by Ben Garfinkel (bmg) on Taboo "Outside View" · 2021-06-19T11:54:11.099Z · EA · GW

I agree that people sometimes put too much weight on particular outside views -- or do a poor job of integrating outside views with more inside-view-style reasoning. For example, in the quote/paraphrase you present at the top of your post, something has clearly gone wrong.[1]

But I think the best intervention, in this case, is probably just to push the ideas "outside views are often given too much weight" or "heavily reliance on outside views shouldn't be seen as praiseworthy" or "the correct way to integrate outside views with more inside-view reasoning is X." Tabooing the term itself somehow feels a little roundabout to me, like a linguistic solution to a methodological disagreement.

I think you're right that "outside view" now has a very positive connotation. If enough community members become convinced that this positive connotation is unearned, though, I think the connotation will probably naturally become less positive over time. For example, the number of upvotes on this post is a signal that people shouldn't currently expect that much applause for using the term "outside view."


  1. As a caveat, although I'm not sure how much this actually matters for the present discussion, I probably am significantly less concerned about the problem than you are. I'm pretty confident that the average intellectual doesn't pay enough attention to "outside views" -- and I think that, absent positive reinforcement from people in your community, it actually does take some degree of discipline to take outside views sufficiently seriously. I'm open to the idea that the average EA community member has over-corrected, here, but I'm not yet convinced of it. I think it's also possible that, in a lot of cases, the natural substitute for bad outside-view-heavy reasoning is worse inside-view-heavy reasoning. ↩︎

Comment by Ben Garfinkel (bmg) on Taboo "Outside View" · 2021-06-18T22:20:00.683Z · EA · GW

When people use “outside view” or “inside view” without clarifying which of the things on the above lists they mean, I am left ignorant of what exactly they are doing and how well-justified it is. People say “On the outside view, X seems unlikely to me.” I then ask them what they mean, and sometimes it turns out they are using some reference class, complete with a dataset. (Example: Tom Davidson’s four reference classes for TAI). Other times it turns out they are just using the anti-weirdness heuristic. Good thing I asked for elaboration!

FWIW, as a contrary datapoint, I don’t think I’ve really encountered this problem much in conversation. In my own experience (which may be quite different from yours): when someone makes some reference to an “outside view,” they say something that indicates roughly what kind of “outside view” they’re using. For example, if someone is just extrapolating a trend forward, they’ll reference the trend. Or if someone is deferring to expert opinion, they’ll reference expert opinion. I also don’t think I’d find it too bothersome, in any case, to occasionally have to ask the person which outside view they have in mind.

So this concern about opacity wouldn’t be enough to make me, personally, want people to stop using the term “outside view.”

If there’s a really serious linguistic issue, here, I think it’s probably that people sometimes talk about "the outside view” as though there's only a single relevant outside view. I think Michael Aird made a good comment on my recent democracy post, where he suggests that people should taboo the phrase “the outside view” and instead use the phrase “an outside view.” (I was guilty of using the phrase “the outside view” in that post — and, arguably, of leaning too hard on one particular way of defining a reference class.) I’d be pretty happy if people just dropped the “the,” but kept talking about “outside views.”[1]


  1. It’s of course a little ambiguous what counts as an “outside view,” but in practice I don’t think this is too huge of an issue. In my experience, which again may be different from yours, “taking an outside view” still does typically refer to using some sort of reference-class-based reasoning. It’s just the case that there are lots of different reference classes that people use. (“I’m extrapolating this 20-year trend forward, for another five years, because if a trend has been stable for 20 years it’s typically stable for another five.” “I’m deferring to the experts in this survey, because experts typically have more accurate views than amateurs.” Etc.) I do feel like this style of reasoning is useful and meaningfully distinct from, for example, reasoning based on causal models, so I’m happy to have a term for it, even if the boundaries of the concept are somewhat fuzzy. ↩︎

Comment by Ben Garfinkel (bmg) on What are things everyone here should (maybe) read? · 2021-05-18T23:03:29.559Z · EA · GW

Fortunately, if I remember correctly, something like the distinction between the true criterion of rightness and the best practical decision procedure actually is a major theme in the Kagan book. (Although I think the distinction probably often is underemphasized.)

It is therefore kind of misleading to think of consequentialism vs. deontology vs. virtue ethics as alternative theories, which however is the way normative ethics is typically presented in the analytic tradition.

I agree there is something to this concern. But I still wouldn't go so far as to say that it's misleading to think of them as alternative theories. I do think they count as conceptually distinct (even if the boundaries are sometimes a bit muddy), and I think they do sometimes have different implications for how you should in fact make moral decisions.

Beyond the deontology/consequentialism debate, I think there are also relevant questions around demandingness (how strong are our moral obligations, if any?), on the nature of well-being (e.g. hedonistic vs. preference-based vs. objective list theories), on the set of things that count as morally relevant consequences (e.g. do things beyond well-being matter? should we care more about totals or averages?), and so on.

Comment by Ben Garfinkel (bmg) on What are things everyone here should (maybe) read? · 2021-05-18T21:52:21.419Z · EA · GW

A slightly boring answer: I think most people should at least partly read something that overviews common theories and frameworks in normative ethics (and the arguments for and against them) and something that overviews core concepts and principles in economics (e.g. the idea of expected utility, the idea of an externality, supply/demand, the basics of economic growth, the basics of public choice).

In my view, normative ethics and economics together make up a really large portion of the intellectual foundation that EA is built on.

One good book that overviews normative ethics is Shelly Kagan's Normative Ethics, although I haven't read it since college (and I think it has only a tiny amount of coverage of population ethics and animal ethics). One thing I like about it is it focuses on laying out the space of possible ethical views in a sensible way, rather than tracing the history of the field. If I remember correctly, names like Aristotle, Kant, etc. never show up. It's also written in a very conversational style.

One good introductory economics textbook is Tyler Cowen's and Alex Tabarrok's Modern Principles of Economics. I don't know how it stacks up to other intro textbooks, since it's the only one that I've read more than a little of, but it's very readable, has very little math, and emphasizes key concepts and principles. Reading just the foundational chapters in an intro textbook, then the chapters whose topics sound important, can probably get most people a decent portion of the value of reading a full textbook.

Comment by Ben Garfinkel (bmg) on Ben Garfinkel's Shortform · 2021-05-04T12:18:30.909Z · EA · GW

That's a good example.

I do agree that quasi-random variation in culture can be really important. And I agree that this variation is sometimes pretty sticky (e.g. Europe being predominantly Christian and the Middle East being predominantly Muslim for more than a thousand years). I wouldn't say that this kind of variation is a "rounding error."

Over sufficiently long timespans, though, I think that technological/economic change has been more significant.

As an attempt to operationalize this claim: The average human society in 1000AD was obviously very different than the average human society in 10,000BC. I think that the difference would have been less than half as large (at least in intuitive terms) if there hadn't been technological/economic change.

I think that the pool of available technology creates biases in the sorts of societies that emerge and stick around. For large enough amounts of technological change, and long enough timespans (long enough for selection pressures to really matter), I think that shifts in these technological biases will explain a large portion of the shifts we see in the traits of the average society.[1]


  1. If selection pressures become a lot weaker in the future, though, then random drift might become more important in relative terms. ↩︎

Comment by Ben Garfinkel (bmg) on Ben Garfinkel's Shortform · 2021-05-03T14:01:34.952Z · EA · GW

FWIW, I wouldn't say I agree with the main thesis of that post.

However, while I expect machines that outcompete humans for jobs, I don’t see how that greatly increases the problem of value drift. Human cultural plasticity already ensures that humans are capable of expressing a very wide range of values. I see no obviously limits there. Genetic engineering will allow more changes to humans. Ems inherit human plasticity, and may add even more via direct brain modifications.

In principle, non-em-based artificial intelligence is capable of expressing the entire space of possible values. But in practice, in the shorter run, such AIs will take on social roles near humans, and roles that humans once occupied....

I don’t see why people concerned with value drift should be especially focused on AI. Yes, AI may accompany faster change, and faster change can make value drift worse for people with intermediate discount rates. (Though it seems to me that altruistic discount rates should scale with actual rates of change, not with arbitrary external clocks.)

I definitely think that human biology creates at least very strong biases toward certain values (if not hard constraints) and that AI system would not need to have these same biases. If you're worried about future agents having super different and bad values, then AI is a natural focal point for your worry.


A couple other possible clarifications about my views here:

  • I think that the outcome of the AI Revolution could be much worse, relative to our current values, than the Neolithic Revolution was relative to the values of our hunter-gatherer ancestors. But I think the question "Will the outcome be worse?" is distinct from the question "Will we have less freedom to choose the outcome?"

  • I'm personally not so focused on value drift as a driver of long-run social change. For example, the changes associated with the Neolithic Revolution weren't really driven by people becoming less egalitarian, more pro-slavery, more inclined to hold certain religious beliefs, more ideologically attached to sedentism/farming, more happy to accept risks from disease, etc. There were value changes, but, to some significant degree, they seem to have been downstream of technological/economic change.

Comment by Ben Garfinkel (bmg) on Ben Garfinkel's Shortform · 2021-05-03T11:44:21.342Z · EA · GW

Do you have the intuition that absent further technological development, human values would drift arbitrarily far?

Certainly not arbitrarily far. I also think that technological development (esp. the emergence of agriculture and modern industry) has played a much larger role in changing the world over time than random value drift has.

[E]ven non-extinction AI is enabling a new set of possibilities that modern-day humans would endorse much less than the decisions of future humans otherwise.

I definitely think that's true. But I also think that was true of agriculture, relative to the values of hunter-gatherer societies.

To be clear, I'm not downplaying the likelihood or potential importance of any of the three crisper concerns I listed. For example, I think that AI progress could conceivably lead to a future that is super alienating and bad.

I'm just (a) somewhat pedantically arguing that we shouldn't frame the concerns as being about a "loss of control over the future" and (b) suggesting that you can rationally have all these same concerns even if you come to believe that technical alignment issues aren't actually a big deal.

Comment by Ben Garfinkel (bmg) on Ben Garfinkel's Shortform · 2021-05-02T15:30:57.454Z · EA · GW

A thought on how we describe existential risks from misaligned AI:

Sometimes discussions focus on a fairly specific version of AI risk, which involves humanity being quickly wiped out. Increasingly, though, the emphasis seems to be on the more abstract idea of “humanity losing control of its future.” I think it might be worthwhile to unpack this latter idea a bit more.

There’s already a fairly strong sense in which humanity has never controlled its own future. For example, looking back ten thousand years, no one decided that the sedentary agriculture would increasingly supplant hunting and gathering, that increasingly complex states would arise, that slavery would become common, that disease would take off, that social hierarchies and gender divisions would become stricter, etc. The transition to the modern world, and everything that came with this transition, also doesn’t seem to have been meaningfully chosen (or even really understood by anyone). The most serious effort to describe a possible future in detail — Hanson’s Age of Em — also describes a future with loads of features that most present-day people would not endorse.

As long as there are still strong competitive pressures or substantial random drift, it seems to me, no generation ever really gets to choose the future.[1] It's actually sort of ambiguous, then, what it means to worry about “losing control of our future."

Here are a few alternative versions of the concern that feel a bit crisper to me:

  1. If we ‘mess up on AI,’ then even the most powerful individual humans will have unusually little influence over their own lives or the world around them.[2]
  1. If we ‘mess up on AI,’ then future people may be unusually dissatisfied about the world they live in. In other words, people's preferences will be unfilled to an unusually large degree.

  2. Humanity may have a rare opportunity to take control of its own future, by achieving strong coordination and then locking various things in. But if we ‘mess up on AI,’ then we’ll miss out on this opportunity.[3]

Something that’s a bit interesting about these alternative versions of the concern, though, is that they’re not inherently linked to AI alignment issues. Even if AI systems behave roughly as their users intend, I believe each of these outcomes is still conceivable. For example, if there’s a missed opportunity to achieve strong coordination around AI, the story might look like the failure of the Baruch Plan for international control of nuclear weapons: that failure had much more to do with politics than it had to do with the way engineers designed the technology in question.

In general, if we move beyond discussing very sharp alignment-related catastrophes (e.g. humanity being quickly wiped out), then I think concerns about misaligned AI start to bleed into broader AI governance concerns. It starts to become more ambiguous whether technical alignment issues are actually central or necessary to the disaster stories people tell.


  1. Although, admittedly, notable individuals or groups (e.g. early Christians) do sometimes have a fairly lasting and important influence. ↩︎

  2. As an analogy, in the world of The Matrix, people may not actually have much less control over the long-run future than hunter-gatherers did twenty thousand years ago. But they certainly have much less control over their own lives. ↩︎

  3. Notably, this is only a bad thing if we expect the relevant generation of humans to choose a better future than would be arrived at by default. ↩︎

Comment by Ben Garfinkel (bmg) on Ben Garfinkel's Shortform · 2021-04-30T21:25:45.962Z · EA · GW

Good point!

That consideration -- and the more basic consideration that more junior people often just know less -- definitely pushes in the opposite direction. If you wanted to try some version of seniority-weighted epistemic deference, my guess is that the most reliable cohort would have studied a given topic for at least a few years but less than a couple decades.

Comment by Ben Garfinkel (bmg) on Ben Garfinkel's Shortform · 2021-04-30T12:18:38.994Z · EA · GW

A thought on epistemic deference:

The longer you hold a view, and the more publicly you hold a view, the more calcified it typically becomes. Changing your mind becomes more aversive and potentially costly, you have more tools at your disposal to mount a lawyerly defense, and you find it harder to adopt frameworks/perspectives other than your favored one (the grooves become firmly imprinted into your brain). At least, this is the way it seems and personally feels to me.[1]

For this reason, the observation “someone I respect publicly argued for X many years ago and still believes X” typically only provides a bit more evidence than the observation “someone I respect argued for X many years ago.” For example, even though I greatly respect Daron Acemoglu, I think the observation “Daron Acemoglu still believes that political institutions are the central determinant of economic growth rates” only gives me a bit more evidence than the observation “15 years ago Daron Acemoglu publicly argued that institutions are the central determinant of economic growth rates.”

A corollary: If there’s an academic field that contains a long-standing debate, and you’d like to defer to experts in this field, you may want to give disproportionate weight to the opinions of junior academics. They’re less likely to have responded to recent evidence and arguments in an epistemically inflexible way.


  1. Of course, there are exceptions. The final chapter of Scout Mindset includes a moving example of a professor publicly abandoning a view he had championed for fifteen years, after a visiting academic presented persuasive new evidence. The reason these kinds of stories are moving, though, is that they describe truly exceptional behavior. ↩︎

Comment by Ben Garfinkel (bmg) on Ben Garfinkel's Shortform · 2021-04-19T10:08:54.890Z · EA · GW

I’d actually say this is a variety of qualitative research. At least in the main academic areas I follow, though, it seems a lot more common to read and write up small numbers of detailed case studies (often selected for being especially interesting) than to read and write up large numbers of shallow case studies (selected close to randomly).

This seems to be true in international relations, for example. In a class on interstate war, it’s plausible people would be assigned a long analysis of the outbreak WW1, but very unlikely they’d be assigned short descriptions of the outbreaks of twenty random wars. (Quite possible there’s a lot of variation between fields, though.)

Comment by Ben Garfinkel (bmg) on Ben Garfinkel's Shortform · 2021-04-18T23:19:48.847Z · EA · GW

In general, I think “read short descriptions of randomly sampled cases” might be an underrated way to learn about the world and notice issues with your assumptions/models.

A couple other examples:

I’ve been trying to develop a better understanding of various aspects of interstate conflict. The Correlates of War militarized interstate disputes (MIDs) dataset is, I think, somewhat useful for this. The project files include short descriptions of (supposedly) every case between 1993 and 2014 in which one state “threatened, displayed, or used force against another.” Here, for example, is the set of descriptions for 2011-2014. I’m not sure I’ve had any huge/concrete take-aways, but I think reading the cases: (a) made me aware of some international tensions I was oblivious to; (b) gave me a slightly better understanding of dynamics around ‘micro-aggressions’ (e.g. flying over someone’s airspace); and (c) helped me more strongly internalize the low base rate for crises boiling over into war (since I disproportionately read about historical disputes that turned into something larger).

Last year, I also spent a bit of time trying to improve my understanding of police killings in the US. I found this book unusually useful. It includes short descriptions of every single incident in which an unarmed person was killed by a police officer in 2015. I feel like reading a portion of it helped me to quickly notice and internalize different aspects of the problem (e.g. the fact that something like a third of the deaths are caused by tasers; the large role of untreated mental illness as a risk factor; the fact that nearly all fatal interactions are triggered by 911 calls, rather than stops; the fact that officers are trained to interact importantly differently with people they believe are on PCP; etc.). l assume I could have learned all the same things by just reading papers — but I think the case sampling approach was probably faster and better for retention.

I think it's possible there might be value in creating “random case descriptions” collections for a broader range of phenomena. Academia really doesn’t emphasize these kinds of collections as tools for either research or teaching.

EDIT: Another good example of this approach to learning is Rob Besinger's recent post "thirty-three randomly selected bioethics papers."

Comment by Ben Garfinkel (bmg) on Ben Garfinkel's Shortform · 2021-04-09T19:39:16.295Z · EA · GW

The O*NET database includes a list of about 20,000 different tasks that American workers currently need to perform as part of their jobs. I’ve found it pretty interesting to scroll through the list, sorted in random order, to get a sense of the different bits of work that add up to the US economy. I think anyone who thinks a lot about AI-driven automation might find it useful to spend five minutes scrolling around: it’s a way of jumping yourself down to a lower level of abstraction. I think the list is also a little bit mesmerizing, in its own right.

One update I’ve made is that I’m now more confident that more than half of present-day occupational tasks could be automated using fairly narrow, non-agential, and boring-looking AI systems. (Most of them don’t scream “this task requires AI systems with long-run objectives and high levels of generality.”) I think it’s also pretty interesting, as kind of a game, to try to imagine as concretely as possible what the training processes might look like for systems that can perform (or eliminate the need for) different tasks on the list.

As a sample, here are ten random tasks. (Some of these could easily be broken up into a lot of different sub-tasks or task variants, which might be automated independently.)

  • Cancel letter or parcel post stamps by hand.
  • Inquire into the cause, manner, and circumstances of human deaths and establish the identities of deceased persons.
  • Teach patients to use home health care equipment.
  • Write reports or articles for Web sites or newsletters related to environmental engineering issues.
  • Supervise and participate in kitchen and dining area cleaning activities.
  • Intervene as an advocate for clients or patients to resolve emergency problems in crisis situations.
  • Mark or tag material with proper job number, piece marks, and other identifying marks as required.
  • Calculate amount of debt and funds available to plan methods of payoff and to estimate time for debt liquidation.
  • Weld metal parts together, using portable gas welding equipment.
  • Provide assistance to patrons by performing duties such as opening doors and carrying bags.