On Deference and Yudkowsky's AI Risk Estimates

post by Ben Garfinkel (bmg) · 2022-06-19T14:35:40.169Z · EA · GW · 160 comments

Contents

  Introduction
  Why write this post?
  Yudkowsky’s track record: some cherry-picked examples
    Fairly clearcut examples
      1. Predicting near-term extinction from nanotech
      2. Predicting that his team had a substantial chance of building AGI before 2010
    Somewhat disputable examples
      3. Having high confidence that AI progress would be extremely discontinuous and localized and not require much compute
      4. Treating early AI risk arguments as close to decisive
      5. Treating "coherence arguments" as forceful
    A somewhat meta example
      6. Not acknowledging his mixed track record
None
163 comments

Note: I mostly wrote this post after Eliezer Yudkowsky’s “Death with Dignity [LW · GW]” essay appeared on LessWrong. Since then, Jotto has written a post [LW · GW] that overlaps a bit with this one, which sparked an extended discussion in the comments. You may want to look at that discussion as well. See also, here [LW(p) · GW(p)], for another relevant discussion thread.

EDIT: See here [EA(p) · GW(p)] for some post-discussion reflections on what I think this post got right and wrong.

Introduction

Most people, when forming their own views on risks from misaligned AI, have some inclination to defer to others who they respect or think of as experts.

This is a reasonable thing to do, especially if you don’t yet know much about AI or haven’t yet spent much time scrutinizing the arguments. If someone you respect has spent years thinking about the subject, and believes the risk of catastrophe is very high, then you probably should take that information into account when forming your own views.

It’s understandable, then, if Eliezer Yudkowsky’s recent writing on AI risk helps to really freak some people out. Yudkowsky has probably spent more time thinking about AI risk than anyone else. Along with Nick Bostrom, he is the person most responsible for developing and popularizing these concerns. Yudkowsky has now begun to publicly express the view that misaligned AI has a virtually 100% chance of killing everyone on Earth - such that all we can hope to do is “die with dignity [LW · GW].”

The purpose of this post is, simply, to argue that people should be wary of deferring too much to Eliezer Yudkowsky when it comes to estimating AI risk.[1] In particular, I think, they shouldn’t defer to him more than they would defer to anyone else who seems smart and has spent a reasonable amount of time thinking about AI risk.[2]

The post highlights what I regard as some negative aspects of Yudkowsky’s track record, when it comes to technological risk forecasting. I think these examples suggest that (a) his track record is at best fairly mixed and (b) he has some tendency toward expressing dramatic views with excessive confidence. As a result, I don’t personally see a strong justification for giving his current confident and dramatic views about AI risk a great deal of weight.[3]

I agree it’s highly worthwhile to read and reflect on Yudkowsky’s arguments. I also agree that potential risks from misaligned AI deserve serious attention - and are even, plausibly, more deserving of attention than any other existential risk.[4] I just don’t think people should make too much of the fact that Yudkowsky believes we’re doomed.

Why write this post?

Before diving in, it may be worth saying a little more about why I hope this post might be useful. (Feel free to skip ahead if you're not interested in this section.)

In brief, it matters what the existential risk community believes about the risk from misaligned AI. I think that excessively high credences in doom can lead to:

My own impression is that, although it's sensible to take potential risks from misaligned AI very seriously, a decent number of people are now more freaked out than they need to be. And I think that excessive deference to some highly visible intellectuals in this space, like Yudkowsky, may be playing an important role - either directly or through deference cascades.[6] I'm especially concerned about new community members, who may be particularly inclined to defer to well-known figures and who may have particularly limited awareness of the diversity of views in this space. I've recently encountered some anecdotes I found worrisome.

Nothing I write in this post implies that people shouldn't freak out, of course, since I'm mostly not engaging with the substance of the relevant arguments (although I have done this elsewhere, for instance here, here, and here). If people are going to freak out about AI risk, then I at least want to help make sure that they’re freaking out for sufficiently good reasons.

Yudkowsky’s track record: some cherry-picked examples

Here, I’ve collected a number of examples of Yudkowsky making (in my view) dramatic and overconfident predictions concerning risks from technology.

Note that this isn’t an attempt to provide a balanced overview of Yudkowsky’s technological predictions over the years. I’m specifically highlighting a number of predictions that I think are underappreciated and suggest a particular kind of bias.

Doing a more comprehensive overview, which doesn’t involve specifically selecting poor predictions, would surely give a more positive impression. Hopefully this biased sample is meaningful enough, however, to support the claim that Yudkowsky’s track record is at least pretty mixed.[7]

Also, a quick caveat: Unfortunately, but understandably, Yudkowsky didn’t have time review this post and correct any inaccuracies. In various places, I’m summarizing or giving impressions of lengthy pieces I haven’t fully read, or haven't fully read in well more than year, so there's a decent chance that I’ve accidentally mischaracterized some of his views or arguments. Concretely: I think there’s something on the order of a 50% chance I’ll ultimately feel I should correct something below.

Fairly clearcut examples

1. Predicting near-term extinction from nanotech

At least up until 1999, admittedly when he was still only about 20 years old, Yudkowsky argued that transformative nanotechnology would probably emerge suddenly and soon (“no later than 2010”) and result in human extinction by default. My understanding is that this viewpoint was a substantial part of the justification for founding the institute that would become MIRI; the institute was initially focused on building AGI, since developing aligned superintelligence quickly enough was understood to be the only way to manage nanotech risk:

On the nanotechnology side, we possess machines capable of producing arbitrary DNA sequences, and we know how to turn arbitrary DNA sequences into arbitrary proteins (6). We have machines - Atomic Force Probes - that can put single atoms anywhere we like, and which have recently [1999] been demonstrated to be capable of forming atomic bonds. Hundredth-nanometer precision positioning, atomic-scale tweezers... the news just keeps on piling up…. If we had a time machine, 100K of information from the future could specify a protein that built a device that would give us nanotechnology overnight….

If you project on a graph the minimum size of the materials we can manipulate, it reaches the atomic level - nanotechnology - in I forget how many years (the page vanished), but I think around 2035. This, of course, was before the time of the Scanning Tunnelling Microscope and "IBM" spelled out in xenon atoms. For that matter, we now have the artificial atom ("You can make any kind of artificial atom - long, thin atoms and big, round atoms."), which has in a sense obsoleted merely molecular nanotechnology - the surest sign that nanotech is just around the corner. I believe Drexler is now giving the ballpark figure of 2013. My own guess would be no later than 2010…

Above all, I would really, really like the Singularity to arrive before nanotechnology, given the virtual certainty of deliberate misuse - misuse of a purely material (and thus, amoral) ultratechnology, one powerful enough to destroy the planet. We cannot just sit back and wait….

Mitchell Porter calls it "The race between superweapons and superintelligence." Human civilization will continue to change until we either create superintelligence, or wipe ourselves out. Those are the two stable states, the two "attractors". It doesn't matter how long it takes, or how many cycles of nanowar-and-regrowth occur before Transcendence or final extinction. If the system keeps changing, over a thousand years, or a million years, or a billion years, it will eventually wind up in one attractor or the other. But my best guess is that the issue will be settled now.”

I should, once again, emphasize that Yudkowsky was around twenty when he did the final updates on this essay. In that sense, it might be unfair to bring this very old example up.

Nonetheless, I do think this case can be treated as informative, since: the belief was so analogous to his current belief about AI (a high outlier credence in near-term doom from an emerging technology), since he had thought a lot about the subject and was already highly engaged in the relevant intellectual community, since it's not clear when he dropped the belief, and since twenty isn't (in my view) actually all that young. I do know a lot of people in their early twenties; I think their current work and styles of thought are likely to be predictive of their work and styles of thought in the future, even though I do of course expect the quality to go up over time.

2. Predicting that his team had a substantial chance of building AGI before 2010

In 2001, and possibly later, Yudkowsky apparently believed that his small team would be able to develop a “final stage AI” that would “reach transhumanity sometime between 2005 and 2020, probably around 2008 or 2010.”

In the first half of the 2000s, he produced a fair amount of technical and conceptual work related to this goal. It hasn't ultimately had much clear usefulness for AI development, and, partly on the basis, my impression is that it has not held up well - but that he was very confident in the value of this work at the time.

The key points here are that:

Flare

Although I haven’t evaluated the work, my impression is that Yudkowsky was a key part of a Singularity Institute effort to develop a new programming language to use to create “seed AI.” He (or whoever was writing the description of the project) seems to have been substantially overconfident about its usefulness. From the section of the documentation titled “Foreword: Earth Needs Flare” (2001):

A new programming language has to be really good to survive. A new language needs to represent a quantum leap just to be in the game. Well, we're going to be up-front about this: Flare is really good. There are concepts in Flare that have never been seen before. We expect to be able to solve problems in Flare that cannot realistically be solved in any other language. We expect that people who learn to read Flare will think about programming differently and solve problems in new ways, even if they never write a single line of Flare….Flare was created under the auspices of the Singularity Institute for Artificial Intelligence, an organization created with the mission of building a computer program far before its time - a true Artificial Intelligence. Flare, the programming language they asked for to help achieve that goal, is not that far out of time, but it's still a special language.”

Coding a Transhuman AI

I haven’t read it, to my discredit, but “Coding a Transhuman AI 2.2” is another piece of technical writing by Yudkowsky that one could look at. The document is described as “the first serious attempt to design an AI which has the potential to become smarter than human,” and aims to “describe the principles, paradigms, cognitive architecture, and cognitive components needed to build a complete mind possessed of general intelligence.”

From a skim, I suspect there’s a good chance it hasn’t held up well - since I’m not aware of any promising later work that builds on it and since it doesn’t seem to have been written with the ML paradigm in mind - but can’t currently give an informed take.

Levels of Organization in General Intelligence

A later piece of work which I also haven’t properly read is “Levels of Organization in General Intelligence.” At least by 2005, going off of Yudkowsky’s post “So You Want to be a Seed AI Programmer,” it seems like he thought a variation of the framework in this paper would make it possible for a very small team at the Singularity Institute to create AGI:

There's a tradeoff between the depth of AI theory, the amount of time it takes to implement the project, the number of people required, and how smart those people need to be. The AI theory we're planning to use - not LOGI, LOGI's successor - will save time and it means that the project may be able to get by with fewer people. But those few people will have to be brilliant…. The theory of AI is a lot easier than the practice, so if you can learn the practice at all, you should be able to pick up the theory on pretty much the first try. The current theory of AI I'm using is considerably deeper than what's currently online in Levels of Organization in General Intelligence - so if you'll be able to master the new theory at all, you shouldn't have had trouble with LOGI. I know people who did comprehend LOGI on the first try; who can complete patterns and jump ahead in explanations and get everything right, who can rapidly fill in gaps from just a few hints, who still don't have the level of ability needed to work on an AI project.

Somewhat disputable examples

I think of the previous two examples as predictions that resolved negatively. I'll now cover a few predictions that we don't yet know are wrong (e.g. predictions about the role of compute in developing AGI), but I think now have reason to regard as significantly overconfident.

3. Having high confidence that AI progress would be extremely discontinuous and localized and not require much compute

In his 2008 "FOOM debate" with Robin Hanson, Yudkowsky confidentally staked out very extreme positions about what future AI progress would look like - without (in my view) offering strong justifications. The past decade of AI progress has also provided further evidence against the correctness of his core predictions.

A quote from the debate, describing the median development scenario he was imaging at the time:

When we try to visualize how all this is likely to go down, we tend to visualize a scenario that someone else once termed “a brain in a box in a basement.” I love that phrase, so I stole it. In other words, we tend to visualize that there’s this AI programming team, a lot like the sort of wannabe AI programming teams you see nowadays, trying to create artificial general intelligence, like the artificial general intelligence projects you see nowadays. They manage to acquire some new deep insights which, combined with published insights in the general scientific community, let them go down into their basement and work on it for a while and create an AI which is smart enough to reprogram itself, and then you get an intelligence explosion…. (p. 436)

The idea (as I understand it) was that AI progress would have very little impact on the world, then a small team of people with a very small amount of computing power would have some key insight, then they’d write some code for an AI system, then that system would rewrite its own code, and then it would shortly after take over the world.

When pressed by his debate partner, regarding the magnitude of the technological jump he was forecasting, Yudkowsky suggested that economic output could at least plausibly rise by twenty orders-of-magnitude within not much more than a week - once the AI system has developed relevant nanotechnologies (pg. 400).[8] To give a sense of how extreme that is: If you extrapolate twenty-orders-of-magnitude-per-week over the course of a year - although, of course, no one expected this rate to be maintained for anywhere close to a year - it is equivalent to an annual economic growth rate of (10^1000)%.

I think it’s pretty clear that this viewpoint was heavily influenced by the reigning AI paradigm at the time, which was closer to traditional programming than machine learning. The emphasis on “coding” (as opposed to training) as the means of improvement, the assumption that large amounts of compute are unnecessary, etc. seem to follow from this. A large part of the debate was Yudkowsky arguing against Hanson, who thought that Yudkowsky was underrating the importance of compute and “content” (i.e. data) as drivers of AI progress. Although Hanson very clearly wasn’t envisioning something like deep learning either[9], his side of the argument seems to fit better with what AI progress has looked like over the past decade. In particular, huge amounts of compute and data have clearly been central to recent AI progress and are currently commonly thought to be central - or, at least, necessary - for future progress.

In my view, the pro-FOOM essays in the debate also just offered very weak justifications for thinking that a small number of insights could allow a small programming team, with a small amount of computing power, to abruptly jump the economic growth rate up by several orders of magnitude. The main reasons that stood out to me, from the debate, are these:[10]

I think that Yudkowsky's prediction - that a small amount of code, run using only a small amount of computing power, was likely to abruptly jump economic output upward by more than a dozen orders-of-magnitude - was extreme enough to require very strong justifications. My view is that his justifications simply weren't that strong. Given the way AI progress has looked over the past decade, his prediction also seems very likely to resolve negatively.[12]

4. Treating early AI risk arguments as close to decisive

In my view, the arguments for AI risk that Yudkowsky had developed by the early 2010s had a lot of very important gaps. They were suggestive of a real risk, but were still far from worked out enough to justify very high credences in extinction from misaligned AI. Nonetheless, Yudkowsky recalls his credence in doom was "around the 50% range [LW · GW]" at the time, and his public writing tended to suggest that he saw the arguments as very tight and decisive.

These slides summarize what I see as gaps in the AI risk argument that appear in Yudkowsky’s essays/papers and in Superintelligence, which presents somewhat fleshed out and tweaked versions of Yudkowsky’s arguments. This podcast episode covers most of the same points. (Note that almost none of these objections I walk through are entirely original to me.)

You can judge for yourself whether these criticisms of his arguments fair. If they seem unfair to you, then, of course, you should disregard this as an illustration of an overconfident prediction. One additional piece of evidence, though, is that his arguments focused on a fairly specific catastrophe scenario that most researchers [EA · GW] now assign less weight to than they did when they first entered the field.

For instance, the classic arguments treated used an extremely sudden "AI takeoff" as a central premise. Arguably, fast takeoff was the central premise, since presentations of the risk often began by establishing that there is likely to be a fast take-off (and thus an opportunity for a decisive strategic advantage) and then built the remainder of the argument on top of this foundation. However, many people in the field have now moved away from finding sudden take-off arguments compelling (e.g. for the kinds of reasons discussed here and here).

My point, here, is not necessarily that Yudkowsky was wrong, but rather that he held a much higher credence in existential risk from AI than his arguments justified at the time. The arguments had pretty crucial gaps that still needed to be resolved[13], but, I believe, his public writing tended to suggest that these arguments were tight and sufficient to justify very high credences in doom.

5. Treating "coherence arguments" as forceful

In the mid-2010s, some arguments for AI risk began to lean heavily on “coherence arguments” (i.e. arguments that draw implications from the von Neumann-Morgenstern utility theorem) to support the case for AI risk. See, for instance, this introduction to AI risk from 2016, by Yudkowsky, which places a coherence argument front and center as a foundation for the rest of the presentation. I think it's probably fair to guess that the introduction-to-AI-risk talk that Yudkowsky was giving in 2016 contained what he regarded as the strongest concise arguments available.

However, later analysis has suggested that coherence arguments have either no or very limited implications for how we should expect future AI systems to behave. See Rohin Shah’s (I think correct) objection [? · GW] to the use of “coherence arguments” to support AI risk concerns. See also similar objections by Richard Ngo [LW · GW] and Eric Drexler (Section 6.4).

Unfortunately, this is another case where the significance of this example depends on how much validity you assign to a given critique. In my view, the critique is strong. However, I'm unsure what portion of alignment researchers currently agree with me. I do know of at least one prominent researcher who was convinced by it; people also don't seem to make coherence arguments very often anymore, which perhaps suggests that the critiques have gotten traction. However, if you have the time and energy, you should reflect on the critiques for yourself.[14]

If the critique is valid, then this would be another example of Yudkowsky significantly overestimating the strength of an argument for AI risk.

[[EDIT: See here [EA(p) · GW(p)] for a useful clarification by Rohin.]]

A somewhat meta example

6. Not acknowledging his mixed track record

So far as I know, although I certainly haven't read all of his writing, Yudkowsky has never (at least publicly) seemed to take into account the mixed track record outlined above - including the relatively unambiguous misses.

He has written [? · GW] about mistakes from early on in his intellectual life (particularly pre-2003) and has, on this basis, even made a blanket-statement disavowing his pre-2003 work. However, based on my memory and a quick re-read/re-skim, this writing is an exploration of why it took him a long time to become extremely concerned about existential risks from misaligned AI. For instance, the main issue [? · GW] it discusses with his plans to build AGI are that these plans didn't take into account the difficulty and importance of ensuring alignment. This writing isn't, I think, an exploration or acknowledgement of the kinds of mistakes I've listed in this post.

The fact he seemingly hasn’t taken these mistakes into account - and, if anything, tends to write [LW · GW] in a way that suggests he holds a very high opinion of his technological forecasting track record - leads me to trust his current judgments less than I otherwise would.


  1. To be clear, Yudkowsky isn’t asking other people to defer to him. He’s spent a huge amount of time outlining his views (allowing people to evaluate them on their merits) and has often expressed concerns [LW · GW] about excessive epistemic deference. ↩︎

  2. A better, but still far-from-optimal approach to deference might be to give a lot of weight to the "average" view within the pool of smart people who have spent a reasonable amount of time thinking about AI risk. This still isn't great, though, since different people do deserve different amounts of weight, and since there's at least some reason to think that selection effects might bias this pool toward overestimating the level of risk. ↩︎

  3. It might be worth emphasizing that I’m not making any claim about the relative quality of my own track record. ↩︎

  4. To say something concrete about my current views on misalignment risk: I'm currently inclined to assign a low-to-mid-single-digits probability to existential risk from misaligned AI this century, with a lot of volatility in my views. This is of course, in some sense, still extremely high! ↩︎

  5. I think that expressing extremely high credences in existential risk (without sufficiently strong and clear justification) can also lead some people to simply dismiss the concerns. It is often easier to be taken seriously, when talking about strange and extreme things, if you express significant uncertainty. Importantly, I don't think this means that people should ever misrepresent their levels of concern about existential risks; dishonesty seems like a really bad and corrosive policy. Still, this is one extra reason to think that it can be important to avoid overestimating risks. ↩︎

  6. Yudkowsky is obviously a pretty polarizing figure. I'd also say that some people are probably too dismissive of him, for example because they assign too much significance to his lack of traditional credentials. But it also seems clear that many people are inclined to give Yudkowsky's views a great deal of weight. I've even encountered the idea that Yudkowsky is virtually the only person capable of thinking about alignment risk clearly. ↩︎

  7. I think that cherry-picking examples from someone's forecasting track record is normally bad to do, even if you flag that you're engaged in cherry-picking. However, I do think (or at least hope) that it's fair in cases where someone already has a very high level of respect and frequently draws attention to their own successful predictions. ↩︎

  8. I don't mean to suggest that the specific twenty orders-of-magnitude of growth figure was the result of deep reflection or was Yudkowsky's median estimate. Here is the specific quote, in response to Hanson raising the twenty orders-of-magnitude-in-a-week number: "Twenty orders of magnitude in a week doesn’t sound right, unless you’re talking about the tail end after the AI gets nanotechnology. Figure more like some number of years to push the AI up to a critical point, two to six orders of magnitude improvement from there to nanotech, then some more orders of magnitude after that." I think that my general point, that this is a very extreme prediction, stays the same even if we lower the number to ten orders-of-magnitude and assume that there will be a bit of a lag between the 'critical point' and the development of the relevant nanotechnology. ↩︎

  9. As an example of a failed prediction or piece of analysis on the other side of the FOOM debate, Hanson praised the CYC project - which lies far afield of the current deep learning paradigm and now looks like a clear dead end. ↩︎

  10. Yudkowsky also provides a number of arguments in favor of the view that the human mind can be massively improved upon. I think these arguments are mostly right. However, I think, they don't have any very strong implications for the question of whether AI progress will be compute-intensive, sudden, or localized. ↩︎

  11. To probe just the relevance of this one piece of evidence, specifically, let’s suppose that it’s appropriate to use the length of a person’s genome in bits of information as an upper bound on the minimum amount of code required to produce a system that shares their cognitive abilities (excluding code associated with digital environments). This would imply that it is in principle possible to train an ML model that can do anything a given person can do, using something on the order of 10 million lines of code. But even if we accept this hypothesis - which seems quite plausible to me - it doesn’t seem to me like this implies much about the relative contributions of architecture and compute to AI progress or the extent to which progress in architecture design is driven by “deep insights.” For example, why couldn’t it be true that it is possible to develop a human-equivalent system using fewer than 10 million lines of code and also true that computing power (rather than insight) is the main bottleneck to developing such a system? ↩︎

  12. Two caveats regarding my discussion of the FOOM debate:

    First, I should emphasize that, although I think Yudkowsky’s arguments were weak when it came to the central hypothesis being debated, his views were in some other regards more reasonable than his debate partner’s. See here [LW(p) · GW(p)] for comments by Paul Christiano on how well various views Yudkowsky expressed in the FOOM debate have held up.

    Second, it's been a few years since I've read the FOOM debate - and there's a lot in there (the book version of it is 741 pages long) - so I wouldn't be surprised if my high-level characterization of Yudkowsky's arguments is importantly misleading. My characterization here is based on some rough notes I took the last time I read it. ↩︎

  13. For example, it may be possible to construct very strong arguments for AI risk that don't rely on the fast take-off assumption. However, in practice, I think it's fair to say that the classic arguments did rely on this assumption. If the assumption wasn't actually very justified, then, I think, it seems to follow that having a very high credence in AI risk also wasn't justified at the time ↩︎

  14. Here’s another example [EA(p) · GW(p)] of an argument that’s risen to prominence in the past few years, and plays an important role in some presentations of AI risk, that I now suspect simply might not work. This argument shows up, for example, in Yudkowsky’s recent post “AGI Ruin: A List of Lethalities [EA · GW],” at the top of the section outlining “central difficulties.” ↩︎

160 comments

Comments sorted by top scores.

comment by richard_ngo · 2022-06-20T03:08:17.001Z · EA(p) · GW(p)

I think that a bunch of people are overindexing on Yudkowsky's views; I've nevertheless downvoted this post because it seems like it's making claims that are significantly too strong, based on a methodology that I strongly disendorse. I'd much prefer a version of this post which, rather than essentially saying "pay less attention to Yudkowsky", is more nuanced about how to update based on his previous contributions; I've tried to do that in this comment [LW · GW], for example. (More generally, rather than reading this post, I recommend people read this one by Paul Christiano [LW · GW], which outlines specific agreements and disagreements. Note that the list of agreements there, which I expect that many other alignment researchers also buy into, serves as a significant testament to Yudkowsky's track record.)

The part of this post which seems most wild to me is the leap from "mixed track record" to

In particular, I think, they shouldn’t defer to him more than they would defer to anyone else who seems smart and has spent a reasonable amount of time thinking about AI risk.

For any reasonable interpretation of this sentence, it's transparently false. Yudkowsky has proven to be one of the best few thinkers in the world on a very difficult topic. Insofar as there are others who you couldn't write a similar "mixed track record" post about, it's almost entirely because they don't have a track record of making any big claims, in large part because they weren't able to generate the relevant early insights themselves. Breaking ground in novel domains is very, very different from forecasting the weather or events next year; a mixed track record is the price of entry.

Based on his track record, I would endorse people deferring more towards the general direction of Yudkowsky's views than towards the views of almost anyone else. I also think that there's a good case to be made that Yudkowsky tends to be overconfident, and this should be taken into account when deferring; but when it comes to making big-picture forecasts, the main value of deference is in helping us decide which ideas and arguments to take seriously, rather than the specific credences we should place on them, since the space of ideas is so large. The EA community has ended up strongly moving in Yudkowsky's direction over the last decade, and that seems like much more compelling evidence than anything listed in this post.

Replies from: bmg, rohinmshah, kokotajlod, Telofy
comment by Ben Garfinkel (bmg) · 2022-06-20T08:26:56.743Z · EA(p) · GW(p)

The part of this post which seems most wild to me is the leap from "mixed track record" to

In particular, I think, they shouldn’t defer to him more than they would defer to anyone else who seems smart and has spent a reasonable amount of time thinking about AI risk.

For any reasonable interpretation of this sentence, it's transparently false. Yudkowsky has proven to be one of the best few thinkers in the world on a very difficult topic. Insofar as there are others who you couldn't write a similar "mixed track record" post about, it's almost entirely because they don't have a track record of making any big claims, in large part because they weren't able to generate the relevant early insights themselves. Breaking ground in novel domains is very, very different from forecasting the weather or events next year; a mixed track record is the price of entry.

I disagree that the sentence is false for the interpretation I have in mind.

I think it's really important to seperate out the question "Is Yudkowsky an unusually innovative thinker?" and the question "Is Yudkowsky someone whose credences you should give an unusual amount of weight to?"

I read your comment as arguing for the former, which I don't disagree with. But that doesn't mean that people should currently weigh his risk estimates more highly than they weigh the estimates of other researchers currently in the space (like you).

I also think that there's a good case to be made that Yudkowsky tends to be overconfident, and this should be taken into account when deferring; but when it comes to making big-picture forecasts, the main value of deference is in helping us decide which ideas and arguments to take seriously, rather than the specific credences we should place on them, since the space of ideas is so large.

But we do also need to try to have well-calibrated credences, of course. For the reason given in the post, it's important to know whether the risk of everyone dying soon is 5% or 99%. It's not enough just to determine whether we should take AI risk seriously.

We're also now past the point, as a community, where "Should AI risk be taken seriously?" is that much of a live question. The main epistemic question that matters is what probability we assign to it - and I think this post is relevant to that.

(More generally, rather than reading this post, I recommend people read this one by Paul Christiano, which outlines specific agreements and disagreements.)

I definitely recommend people read the post Paul just wrote! I think it's overall more useful than this one.

But I don't think there's an either-or here. People - particularly non-experts in a domain - do and should form their views through a mixture of engaging with arguments and deferring to others. So both arguments and track records should be discussed.

The EA community has ended up strongly moving in Yudkowsky's direction over the last decade, and that seems like much more compelling evidence than anything listed in this post.

I discuss this in response to another comment, here [EA(p) · GW(p)], but I'm not convinced of that point.

Replies from: richard_ngo
comment by richard_ngo · 2022-06-20T19:55:29.842Z · EA(p) · GW(p)

I phrased my reply strongly (e.g. telling people to read the other post instead of this one) because deference epistemology is intrinsically closely linked to status interactions, and you need to be pretty careful in order to make this kind of post not end up being, in effect, a one-dimensional "downweight this person". I don't think this post was anywhere near careful enough to avoid that effect. That seems particularly bad because I think most EAs should significantly upweight Yudkowsky's views if they're doing any kind of reasonable, careful deference, because most EAs significantly underweight how heavy-tailed the production of innovative ideas actually is (e.g. because of hindsight bias, it's hard to realise how much worse than Eliezer we would have been at inventing the arguments for AI risk, and how many dumb things we would have said in his position).

By contrast, I think your post is implicitly using a model where we have a few existing, well-identified questions, and the most important thing is to just get to the best credences on those questions, and we should do so partly by just updating in the direction of experts. But I think this model of deference is rarely relevant; see my reply to Rohin [EA · GW] for more details. Basically, as soon as we move beyond toy models of deference, the "innovative thinking" part becomes crucially important, and the "well-calibrated" part becomes much less so.

One last intuition: different people have different relationships between their personal credences and their all-things-considered credences. Inferring track records in the way you've done here will, in addition to favoring people who are quieter and say fewer useful things, also favor people who speak primarily based on their all-things-considered credences rather than their personal credences. But that leads to a vicious cycle where people are deferring to people who are deferring to people who... And then the people who actually do innovative thinking in public end up getting downweighted to oblivion via cherrypicked examples.

Modesty epistemology delenda est.

comment by Rohin Shah (rohinmshah) · 2022-06-20T16:00:30.657Z · EA(p) · GW(p)

when it comes to making big-picture forecasts, the main value of deference is in helping us decide which ideas and arguments to take seriously, rather than the specific credences we should place on them, since the space of ideas is so large.

This seems like an overly research-centric position.

When your job is to come up with novel relevant stuff in a domain, then I agree that it's mostly about "which ideas and arguments to take seriously" rather than specific credences.

When your job is to make decisions right now, the specific credences matter. Some examples:

  • Any cause prioritization decision, e.g. should funders reallocate nearly all biosecurity money to AI?
  • What should AI-focused community builders provide as starting resources?
  • Should there be an organization dedicated to solving Eliezer's health problems? What should its budget be?
  • Should people try to solve technical AI alignment or try to, idk, create a culture of secrecy within AGI labs?
Replies from: richard_ngo
comment by richard_ngo · 2022-06-20T19:25:36.969Z · EA(p) · GW(p)

I think that there are very few decisions which are both a) that low-dimensional and b) actually sensitive to the relevant range of credences that we're talking about.

Like, suppose you think that Eliezer's credences on his biggest claims are literally 2x higher than they should be, even for claims where he's 90% confident. This is a huge hit in terms of Bayes points; if that's how you determine deference, and you believe he's 2x off, then plausibly that implies you should defer to him less than you do to the median EA. But when it comes to grantmaking, for example, a cost-effectiveness factor of 2x is negligible given the other uncertainties involved - this should very rarely move you from a yes to no, or vice versa. (edit: I should restrict the scope here to grantmaking in complex, high-uncertainty domains like AI alignment).

Then you might say: well, okay, we're not just making binary decisions, we're making complex decisions where we're choosing between lots of different options. But the more complex the decisions you're making, the less you should care about whether somebody's credences on a few key claims are accurate, and the more you should care about whether they're identifying the right types of considerations, even if you want to apply a big discount factor to the specific credences involved.

As a simple example, as soon as you're estimating more than one variable, you typically start caring a lot about whether the errors on your estimates are correlated or uncorrelated. But there are so many different possibilities for ways and reasons that they might be correlated that you can't just update towards experts' credences, you have to actually update towards experts' reasons for those credences, which then puts you in the regime of caring more about whether you've identified the right types of considerations.

Replies from: CarlShulman, rohinmshah
comment by CarlShulman · 2022-06-20T19:39:36.017Z · EA(p) · GW(p)

Like, suppose you think that Eliezer's credences on his biggest claims are literally 2x higher than they should be, even for claims where he's 90% confident. This is a huge hit in terms of Bayes points; if that's how you determine deference, and you believe he's 2x off, then plausibly you should defer to him less than you do to the median EA. But when it comes to grantmaking, for example, a cost-effectiveness factor of 2x is negligible given the other uncertainties involved - this should very rarely move you from a yes to no, or vice versa.

 

Such differences are crucial for many of the most important grant areas IME, because they are areas where you are trading off multiple high-stakes concerns. E.g. in nuclear policy all the strategies on offer  have arguments that they might lead to nuclear war or worse war. On AI alignment there are multiple such tradeoffs and people embracing strategies to push the same variable in opposite directions with high stakes on both sides.

Replies from: richard_ngo
comment by richard_ngo · 2022-06-20T20:03:35.217Z · EA(p) · GW(p)

I haven't thought much about nuclear policy, so I can't respond there. But at least in alignment, I expect that pushing on variables where there's less than a 2x difference between the expected positive and negative effects of changing that variable is not a good use of time for altruistically-motivated people.

(By contrast, upweighting or downweighting Eliezer's opinions by a factor of 2 could lead to significant shifts in expected value, especially for people who are highly deferential. The specific thing I think doesn't make much difference is deferring to a version of Eliezer who's 90% confident about something, versus deferring to the same extent to a version of Eliezer who's 45% confident in the same thing.)

My more general point, which doesn't hinge on the specific 2x claim, is that naive conversions between metrics of calibration and deferential weightings are a bad idea, and that a good way to avoid naive conversions is to care a lot more about innovative thinking than calibration when deferring.

comment by Rohin Shah (rohinmshah) · 2022-06-22T08:46:32.631Z · EA(p) · GW(p)

Like, suppose you think that Eliezer's credences on his biggest claims are literally 2x higher than they should be, even for claims where he's 90% confident.

I think differences between Eliezer + my views often make way more than a 2x difference to the bottom line. I'm not sure why you're only considering probabilities on specific claims; when I think of "deferring" I also imagine deferring on estimates of usefulness of various actions, which can much more easily have OOMs of difference.

(Fwiw I also think Eliezer is way more than 2x too high for probabilities on many claims, though I don't think that matters much for my point.)

Taking my examples:

should funders reallocate nearly all biosecurity money to AI?

Since Eliezer thinks something like 99.99% chance of doom from AI, that reduces cost effectiveness of all x-risk-targeted biosecurity work by a factor of 10,000x (since only in 1 in 10,000 worlds does the reduced bio x-risk matter at all), whereas if you have < 50% of doom from AI (as I do) then that's a discount factor of < 2x on x-risk-targeted biosecurity work. So that's almost 4 OOMs of difference.

What should AI-focused community builders provide as starting resources?

Eliezer seems very confident that a lot of existing alignment work is useless. So if you imagine taking a representative set of such papers as starting resources, I'd imagine that Eliezer would be at < 1% on "this will help the person become an effective alignment researcher" whereas I'd be at > 50% (for actual probabilities I'd want a better operationalization), leading to a >50x difference in cost effectiveness.

(And if you compare against the set of readings Eliezer would choose, I'd imagine the difference becomes even greater -- I could imagine we'd each think the other's choice would be net negative.)

Should there be an organization dedicated to solving Eliezer's health problems? What should its budget be?

I don't have a citation but I'm guessing that Eliezer thinks that with more energy and project management skills he could make a significant dent in x-risk (perhaps 10 percentage points), while thinking the rest of the alignment field if fully funded can't make a dent of more than 0.01 percentage points, suggesting that "improve Eliezer's health + project management skills" is 3 OOM more important than "all other alignment work" (saying nothing about tractability, which I don't know enough to evaluate). Whereas I'd have it at, idk, 1-2 OOM less important, for a difference of 4-5 OOMs.

Should people try to solve technical AI alignment or try to, idk, create a culture of secrecy within AGI labs?

This one is harder to make up numbers for but intuitively it seems like there should again be many OOMs of difference, primarily because we differ by  many OOMs on "regular EAs trying to solve technical AI alignment" but roughly agree on the value of "culture of secrecy".


I realize I haven't engaged with the abstract points you made. I think I mostly just don't understand them and currently they feel like they have to be wrong given the obvious OOMs of difference in all of the examples I gave. If you still disagree it would be great if you could explain how your abstract points play out in some of my concrete examples.

Replies from: RobBensinger, richard_ngo
comment by RobBensinger · 2022-06-23T04:51:54.692Z · EA(p) · GW(p)

Since Eliezer thinks something like 99.99% chance of doom from AI

I could be wrong, but I'd guess Eliezer's all-things-considered p(doom) is less extreme than that.

Replies from: richard_ngo
comment by richard_ngo · 2022-06-23T05:52:34.179Z · EA(p) · GW(p)

Yeah, I'm gonna ballpark guess he's around 95%? I think the problem is that he cites numbers like 99.99% when talking about the chance of doom "without miracles", which in his parlance means assuming that his claims are never overly pessimistic. Which seems like wildly bad epistemic practice. So then it goes down if you account for that, and then maybe it goes down even further if he adjusts for the possibility that other people are more correct than him overall (although I'm not sure that's a mental move he does at all, or would ever report on if he did).

Replies from: rohinmshah
comment by Rohin Shah (rohinmshah) · 2022-06-23T10:04:50.328Z · EA(p) · GW(p)

Even at 95% you get OOMs of difference by my calculations, though significantly fewer OOMs, so this doesn't seem like the main crux.

comment by richard_ngo · 2022-06-23T05:45:16.250Z · EA(p) · GW(p)

We both agree that you shouldn't defer to Eliezer's literal credences, because we both think he's systematically overconfident. The debate is between two responses to that:

a)  Give him less deference weight than the cautious, sober, AI safety people who make few novel claims but are better-calibrated (which is what Ben advocates).

b) Try to adjust for his overconfidence and then give significant deference weight to a version of his worldview that isn't overconfident.

I say you should do the latter, because you should be deferring to coherent worldviews (which are rare) rather than deferring on a question-by-question basis. This becomes more and more true the more complex the decisions you have to make. Even for your (pretty simple) examples, the type of deference you seem to be advocating doesn't make much sense.

For instance:

should funders reallocate nearly all biosecurity money to AI?

It doesn't make sense to defer to Eliezer's estimate of the relative importance of AI without also accounting for his estimate of the relative tractability of funding AI, which I infer he thinks is very low.

Should there be an organization dedicated to solving Eliezer's health problems? What should its budget be?

I'm guessing that Eliezer thinks that with more energy and project management skills he could make a significant dent in x-risk (perhaps 10 percentage points), while thinking the rest of the alignment field if fully funded can't make a dent of more than 0.01 percentage points, suggesting that "improve Eliezer's health + project management skills" is 3 OOM more important than "all other alignment work" (saying nothing about tractability, which I don't know enough to evaluate). Whereas I'd have it at, idk, 1-2 OOM less important, for a difference of 4-5 OOMs.

Again, the problem is that you're deferring on a question-by-question basis, without considering the correlations between different questions - in this case, the likelihood that Eliezer is right, and the value of his work. (Also, the numbers seem pretty wild; maybe a bit uncharitable to ascribe to Eliezer the view that his research would be 3 OOM more valuable than the rest of the field combined? His tone is strong but I don't think he's ever made a claim that big.)

Here's an alternative calculation which takes into account that correlation. I claim that the value of this organization is mainly determined by the likelihood that Eliezer is correct about a few key claims which underlie his research agenda. Suppose he thinks that's 90% likely and I think that's 10% likely. Then  if our choices are "defer entirely to Eliezer" or "defer entirely to Richard", there's a 9x difference in funding efficacy. In practice, though, the actual disagreement here is between "defer to Eliezer no more than a median AI safety researcher" and something like "assume Eliezer is, say, 2x overconfident and then give calibrated-Eliezer, say, 30%ish of your deference weight". If we assume for the sake of simplicity that every other AI safety researcher has my worldview, then the practical difference here is something like a 2x difference in this org's efficacy (0.1 vs 0.3*0.9*0.5+0.7*0.1). Which is pretty low!

Won't go through the other examples but hopefully that conveys the idea. The basic problem here, I think, is that the implicit "deference model" that you and Ben are using doesn't actually work (even for very simple examples like the ones you gave).

Replies from: richard_ngo, rohinmshah
comment by richard_ngo · 2022-06-23T06:24:15.140Z · EA(p) · GW(p)

Musing out loud: I don't know of any complete model of deference which doesn't run into weird issues, like the conclusion that you should never trust yourself. But suppose you have some kind of epistemic parliament where you give your own views some number of votes, and assign the rest of the votes to other people in proportion to how defer-worthy they seem. Then you need to make a bunch of decisions, and your epistemic parliament keeps voting on what will best achieve your (fixed) goals.

If you do naive question-by-question majority voting on each question simultaneously then you can end up with an arbitrarily incoherent policy - i.e. a set of decisions that's inconsistent with each other. And if you make the decisions in some order, with the constraint that they each have to be consistent with all prior decisions, then the ordering of the decisions can become arbitrarily important.

Instead, you want your parliament to negotiate some more coherent joint policy to follow. And I expect that in this joint policy, each worldview gets its way on the questions that are most important to it, and cedes responsibility on the questions that are least important. So Eliezer's worldview doesn't end up reallocating all the biosecurity money, but it does get a share of curriculum time (at least for the most promising potential researchers). But in general how to conduct those negotiations is an unsolved problem (and pretty plausibly unsolveable).

comment by Rohin Shah (rohinmshah) · 2022-06-23T10:41:35.505Z · EA(p) · GW(p)

It doesn't make sense to defer to Eliezer's estimate of the relative importance of AI without also accounting for his estimate of the relative tractability of funding AI, which I infer he thinks is very low.

There's lots of things you can do under Eliezer's worldview that add dignity points, like paying relevant people millions of dollars to spend a week really engaging with the arguments, or trying to get whole-brain emulation before AGI. My understanding is that he doesn't expect those sorts of things to happen.

 I claim that the value of this organization is mainly determined by the likelihood that Eliezer is correct about a few key claims which underlie his research agenda. Suppose he thinks that's 90% likely and I think that's 10% likely.

This seems like a crazy way to do cost-effectiveness analyses.

Like, if I were comparing deworming to GiveDirectly, would I be saying "well, the value of deworming is mainly determined by the likelihood that the pro-deworming people are right, which I estimate is 70% but you estimate is 50%, so there's only a 1.4x difference"? Something has clearly gone wrong here.

It also feels like this reasoning implies that no EA action can be > 10x more valuable than any other action that an EA critic thinks is good? Since you assign a 90% chance that the EA is right, and the critic thinks there's a 10% chance of that, so there's only a 9x gap? And then once you do all of your adjustments it's only 2x? Why do we even bother with cause prioritization under this worldview?

I don't have a fleshed out theory of how and when to defer, but I feel pretty confident that even our intuitive pretheoretic deference should not be this sort of thing, and should be the sort of thing that can have orders of magnitude of difference between actions.

(One major thing is that I think you should be comparing between two actions, rather than evaluating an action by itself, which is why I compared to "all other alignment work".)

The debate is between two responses to that:

a)  Give him less deference weight than the cautious, sober, AI safety people who make few novel claims but are better-calibrated (which is what Ben advocates).

b) Try to adjust for his overconfidence and then give significant deference weight to a version of his worldview that isn't overconfident.

I don't see why you are not including "c) give significant deference weight to his actual worldview", which is what I'd be inclined to do if I didn't have significant AI expertise myself and so was trying to defer.

(Aside: note that Ben said "they shouldn’t defer to him more than they would defer to anyone else who seems smart and has spent a reasonable amount of time thinking about AI risk", which is slightly different from your rephrasing, but that's a nitpick)

Also, the numbers seem pretty wild; maybe a bit uncharitable to ascribe to Eliezer the view that his research would be 3 OOM more valuable than the rest of the field combined?

¯\_(ツ)_/¯ Both the 10% and 0.01% (= 100% - 99.99%) numbers are ones I've heard reported (though both second-hand, not directly from Eliezer), and it also seems consistent with other things he writes. It seems entirely plausible that people misspoke or misremembered or lied, or that Eliezer was reporting probabilities "excluding miracles" or something else that makes these not the right numbers to use.

I'm not trying to be "charitable" to Eliezer, I'm trying to predict his views accurately (while noting that often people predict views inaccurately by failing to be sufficiently charitable). Usually when I see people say things like "obviously Eliezer meant this more normal, less crazy thing" they seem to be wrong.

Rob thinking that it's not actually 99.99% is in fact an update for me.

Replies from: richard_ngo, Verden
comment by richard_ngo · 2022-06-24T00:10:21.628Z · EA(p) · GW(p)

(One major thing is that I think you should be comparing between two actions, rather than evaluating an action by itself, which is why I compared to "all other alignment work".)

IMO the crux is that I disagree with both of these. Instead I think you should use each worldview to calculate a policy, and then generate some kind of compromise between those policies. My arguments above were aiming to establish that this strategy is not very sensitive to exactly how much you defer to Eliezer, because there just aren't very many good worldviews going around - hence why I assign maybe 15 or 20% (inside view) credence to his worldview (updated from 10% above after reflection). (I think my all-things-considered view is similar, actually, because deference to him cancels out against deference to all the people who think he's totally wrong.)

Again, the difference is in large part determined by whether you think you're in a low-dimensional space (here are our two actions, which one should we take?) versus a high-dimensional space (millions of actions available to us, how do we narrow it down?) In a high-dimensional space the tradeoffs between the best ways to generate utility according to Eliezer's worldview and the best ways to generate utility according to other worldviews become much smaller.

This seems like a crazy way to do cost-effectiveness analyses.

Like, if I were comparing deworming to GiveDirectly, would I be saying "well, the value of deworming is mainly determined by the likelihood that the pro-deworming people are right, which I estimate is 70% but you estimate is 50%, so there's only a 1.4x difference"? Something has clearly gone wrong here.

Within a worldview, you can assign EVs which are orders of magnitude different. But once you do worldview diversification, if a given worldview gets even 1% of my resources, then in some sense I'm acting like that worldview's favored interventions are in a comparable EV ballpark to all the other worldviews' favored interventions. That's a feature not a bug.

It also feels like this reasoning implies that no EA action can be > 10x more valuable than any other action that an EA critic thinks is good? Since you assign a 90% chance that the EA is right, and the critic thinks there's a 10% chance of that, so there's only a 9x gap? And then once you do all of your adjustments it's only 2x? Why do we even bother with cause prioritization under this worldview?

I don't have a fleshed out theory of how and when to defer, but I feel pretty confident that even our intuitive pretheoretic deference should not be this sort of thing, and should be the sort of thing that can have orders of magnitude of difference between actions.

An arbitrary critic typically gets well less than 0.1% of my deference weight on EA topics (otherwise it'd run out fast!) But also see above: because in high-dimensional spaces there are few tradeoffs between different worldviews' favored interventions, changing the weights on different worldviews doesn't typically lead to many OOM changes in how you're acting like you're assigning EVs.

Also, I tend to think of cause prio as trying to integrate multiple worldviews into a single coherent worldview. But with deference you intrinsically can't do that, because the whole point of deference is you don't fully understand their views.

There's lots of things you can do under Eliezer's worldview that add dignity points, like paying relevant people millions of dollars to spend a week really engaging with the arguments, or trying to get whole-brain emulation before AGI. My understanding is that he doesn't expect those sorts of things to happen.

What do you mean "he doesn't expect this sort of thing to happen"? I think I would just straightforwardly endorse doing a bunch of costly things like these that Eliezer's worldview thinks are our best shot, as long as they don't cause much harm according to other worldviews.


I don't see why you are not including "c) give significant deference weight to his actual worldview", which is what I'd be inclined to do if I didn't have significant AI expertise myself and so was trying to defer.

Because neither Ben nor myself was advocating for this.
 

Replies from: rohinmshah, rohinmshah
comment by Rohin Shah (rohinmshah) · 2022-06-24T09:59:56.511Z · EA(p) · GW(p)

Responding to other more minor points:

What do you mean "he doesn't expect this sort of thing to happen"?

I mean that he predicts that these costly actions will not be taken despite seeming good to him.

Because neither Ben nor myself was advocating for this.

I think it's also important to consider Ben's audience. If I were Ben I'd be imagining my main audience to be people who give significant deference weight to Eliezer's actual worldview. If you're going to write a top-level comment arguing against Ben's post it seems pretty important to engage with the kind of deference he's imagining (or argue that no one actually does that kind of deference, or that it's not worth writing to that audience, etc).

(Of course, I could be wrong about who Ben imagines his audience to be.)

comment by Rohin Shah (rohinmshah) · 2022-06-24T09:52:04.979Z · EA(p) · GW(p)

Okay, my new understanding of your view is that you're suggesting that (if one is going to defer) one should:

  1. Identify a panel of people to defer to
  2. Assign them weights based on how good they seem (e.g. track record, quality and novelty of ideas, etc)
  3. Allocate resources to [policies advocated by person X] in proportion to [weight assigned to person X].

I agree that (a) this is a reasonable deference model and (b) under this deference model most of my calculations and questions in this thread don't particularly make sense to think about.

However, I still disagree with the original claim I was disagreeing with:

when it comes to making big-picture forecasts, the main value of deference is in helping us decide which ideas and arguments to take seriously, rather than the specific credences we should place on them, since the space of ideas is so large.

Even in this new deference model, it seems like the specific weights chosen in step 2 are a pretty big deal (which seem like the obvious analogues of "credences", and the sort of thing that Ben's post would influence). If you switch from a weight of 0.5 to a weight of 0.3, that's a reallocation of 20% of your resources, which is pretty large!

Replies from: richard_ngo
comment by richard_ngo · 2022-06-24T23:03:38.064Z · EA(p) · GW(p)

Yepp, thanks for the clear rephrasing. My original arguments for this view were pretty messy because I didn't have it fully fleshed out in my mind before writing this comment thread, I just had a few underlying intuitions about ways I thought Ben was going wrong.

Upon further reflection I think I'd make two changes to your rephrasing.

First change: in your rephrasing, we assign people weights based on the quality of their beliefs, but then follow their recommended policies. But any given way of measuring the quality of beliefs (in terms of novelty, track record, etc) is only an imperfect proxy for quality of policies. For example, Kurzweil might very presciently predict that compute is the key driver of AI progress, but suppose (for the sake of argument) that the way he does so is by having a worldview in which everything is deterministic, individuals are powerless to affect the future, etc. Then you actually don't want to give many resources to Kurzweil's policies, because Kurzweil might have no idea which policies make any difference.

So I think I want to adjust the rephrasing to say: in principle we should assign people weights based on how well their past recommended policies for someone like you would have worked out, which you can estimate using things like their track record of predictions, novelty of ideas, etc. But notably, the quality of past recommended policies is often not very sensitive to credences! For example, if you think that there's a 50% chance of solving nanotech in a decade, or a 90% chance of solving nanotech in a decade, then you'll probably still recommend working on nanotech (or nanotech safety) either way.

Having said all that, since we only get one rollout, evaluating policies is very high variance. And so looking at other information like reasoning, predictions, credences, etc, helps you distinguish between "good" and "lucky". But fundamentally we should think of these as approximations to policy evaluation, at least if you're assuming that we mostly can't fully evaluate whether their reasons for holding their views are sound.

Second change: what about the case where we don't get to allocate resources, but we have to actually make a set of individual decisions? I think the theoretically correct move here is something like: let policies spend their weight on the domains which they think are most important, and then follow the policy which has spent most weight on that domain.

Some complications:

  • I say "domains" not "decisions" because you don't want to make a series of related decisions which are each decided by a different policy, that seems incoherent (especially if policies are reasoning adversarially about how to undermine each other's actions).
  • More generally, this procedure could in theory be sensitive to bargaining and negotiating dynamics between different policies, and also the structure of the voting system (e.g. which decisions are voted on first, etc). I think we can just resolve to ignore those and do fine, but in principle I expect it gets pretty funky.

Lastly, two meta-level notes:

  • I feel like I've probably just reformulated some kind of reinforcement learning. Specifically the case where you have a fixed class of policies and no knowledge of how they relate to each other, so you can only learn how much to upweight each policy. And then the best policy is not actually in your hypothesis space, but you can learn a simple meta-policy of when to use each existing policy.
  • It's very ironic that in order to figure out how much to defer to Yudkowsky we need to invent a theory of idealised cooperative decision-making. Since he's probably the person whose thoughts on that I trust most, I guess we should meta-defer to him about what that will look like...
Replies from: rohinmshah
comment by Rohin Shah (rohinmshah) · 2022-06-25T09:07:54.243Z · EA(p) · GW(p)

First change:

In your Kurzweil example I think the issue is not that you assigned weights based on hypothetical-Kurzweil's beliefs, but that hypothetical-Kurzweil is completely indifferent over policies. I think the natural fix is "moral parliament" style decision making where the weights can still come from beliefs but they now apply more to preferences-over-policies. In your example hypothetical-Kurzweil has a lot of weight but never has any preferences-over-policies so doesn't end up influencing your decisions at all.

That being said, I agree that if you can evaluate quality of past recommended policies well, without a ton of noise, that would be a better signal than accuracy of beliefs. This just seems extremely hard to do, especially given the selection bias in who comes to your attention in the first place, and idk how I'd do it for Eliezer in any sane way. (Whereas you get to see people state many more beliefs and so there are a lot more data points that you can evaluate if you look at beliefs.)

But notably, the quality of past recommended policies is often not very sensitive to credences!

I think you're thinking way too much about credences-in-particular. The relevant notion is not "credences", it's that-which-determines-how-much-influence-the-person-has-over-your-actions. In this model of deference the relevant notion is the weights assigned in step 2 (however you calculate them), and the message of Ben's post would be "I think people assign too high a weight to Eliezer", rather than anything about credences. I don't think either Ben or I care particularly much about credences-based-on-deference except inasmuch as they affect your actions.

I do agree that Ben's post looks at credences that Eliezer has given and considers those to be relevant evidence for computing what weight to assign Eliezer. You could take a strong stand against using people's credences or beliefs to compute weights, but that is at least a pretty controversial take (that I personally don't agree with), and it seems different from what you've been arguing so far (except possibly in the parent comment).

Second change:

This change seems fine. Personally I'm pretty happy with a rough heuristic of "here's how I should be splitting my resources across worldviews" and then going off of intuitive "how much does this worldview care about this decision" + intuitive trading between worldviews rather than something more fleshed out and formal but that seems mostly a matter of taste.

Replies from: richard_ngo
comment by richard_ngo · 2022-06-28T01:18:10.873Z · EA(p) · GW(p)

In your Kurzweil example I think the issue is not that you assigned weights based on hypothetical-Kurzweil's beliefs, but that hypothetical-Kurzweil is completely indifferent over policies.

Your procedure is non-robust in the sense that, if Kurzweil transitions from total indifference to thinking that one policy is better by epsilon, he'll throw his full weight behind that policy. Hmm, but then in a parliamentary approach I guess that if there are a few different things he cares epsilon about, then other policies could negotiate to give him influence only over the things they don't care about themselves. Weighting by hypothetical-past-impact still seems a bit more elegant, but maybe it washes out.

(If we want to be really on-policy then I guess the thing which we should be evaluating is whether the person's worldview would have had good consequences when added to our previous mix of worldviews. And one algorithm for this is assigning policies weights by starting off from a state where they don't know anything about the world, then letting them bet on all your knowledge about the past (where the amount they win on bets is determined not just by how correct they are, but also how much they disagree with other policies). But this seems way too complicated to be helpful in practice.)

I agree that if you can evaluate quality of past recommended policies well, without a ton of noise, that would be a better signal than accuracy of beliefs. This just seems extremely hard to do, especially given the selection bias in who comes to your attention in the first place, and idk how I'd do it for Eliezer in any sane way.

I think I'm happy with people spending a bunch of time evaluating accuracy of beliefs, as long as they keep in mind that this is a proxy for quality of recommended policies. Which I claim is an accurate description of what I was doing, and what Ben wasn't: e.g. when I say that credences matter less than coherence of worldviews, that's because the latter is crucial for designing good policies, whereas the former might not be; and when I say that all-things-considered estimates of things like "total risk level" aren't very important, that's because in principle we should be aggregating policies not risk estimates between worldviews.

I also agree that selection bias could be a big problem; again, I think that the best strategy here is something like "do the standard things while remembering what's a proxy for what".

Replies from: rohinmshah
comment by Rohin Shah (rohinmshah) · 2022-06-28T08:45:14.655Z · EA(p) · GW(p)

Meta: This comment (and some previous ones) get a bunch into "what should deference look like", which is interesting, but I'll note that most of this seems unrelated to my original claim, which was just "deference* seems important for people making decisions now, even if it isn't very important in practice for researchers", in contradiction to a sentence on your top-level comment. Do you now agree with that claim?

*Here I mean deference in the sense of how-much-influence-various-experts-have-over-your-actions. I initially called this "credences" because I thought you were imagining a model of deference in which literal credences determined how much influence experts had over your actions.

Your procedure is non-robust in the sense that, if Kurzweil transitions from total indifference to thinking that one policy is better by epsilon, he'll throw his full weight behind that policy.

Agreed, but I'm not too worried about that. It seems like you'll necessarily have some edge cases like this; I'd want to see an argument that the edge cases would be common before I switch to something else.

The chain of approximations could look something like:

  1. The correct thing to do is to consider all actions / policies and execute the one with the highest expected impact.
  2. First approximation: Since there are so many actions / policies, it would take too long to do this well, and so we instead take a shortcut and consider only those actions / policies that more experienced people have thought of, and execute the ones with the highest expected impact. (I'm assuming for now that you're not in the business of coming up with new ideas of things to do.)
  3. Second approximation: Actually it's still pretty hard to evaluate the expected impact of the restricted set of actions / policies, so we'll instead do the ones that the experts say is highest impact. Since the experts disagree, we'll divide our resources amongst them, in accordance with our predictions of which experts have highest expected impact across their portfolios of actions. (This is assuming a large enough pile of resources that it makes sense to diversify due to diminishing marginal returns for any one expert.)
  4. Third approximation: Actually expected impact of an expert's portfolio of actions is still pretty hard to assess, we can save ourselves decision time by choosing weights for the portfolios according to some proxy that's easier to assess.

It seems like right now we're disagreeing about  proxies we could use in the third approximation. It seems to me like proxies should be evaluated based on how close they reach the desired metric (expected future impact) in realistic use cases, which would involve both (1) how closely they align with "expected future impact" in general and (2) how easy they are to evaluate. It seems to me like you're thinking mostly of (1) and not (2) and this seems weird to me; if you were going to ignore (2) you should just choose "expected future impact". Anyway, individual proxies and my thoughts on them:

  1. Beliefs / credences: 5/10 on easy to evaluate (e.g. Ben could write this post). 3/10 on correlation with expected future impact. Doesn't take into account how much impact experts think their policies could have (e.g. the Kurzweil example above).
  2. Coherence:  3/10 on easy to evaluate (seems hard to do this without being an expert in the field). 2/10 on correlation with expected future impact (it's not that hard to have wrong coherent worldviews, see e.g. many pop sci books).
  3. Hypothetical impact of past policies: 1/10 on easy to evaluate (though it depends on the domain). 7/10 on correlation with expected future impact (it's not 9/10 or 10/10 because selection bias seems very hard to account for).

As is almost always the case with proxies, I would usually use an intuitive combination of all the available proxies, because that seems way more robust than relying on any single one. I am not advocating for only relying on beliefs.

Which I claim is an accurate description of what I was doing, and what Ben wasn't

I get the sense that you think I'm trying to defend "this is a good post and has no problems whatsoever"? (If so, that's not what I said.)

Summarizing my main claims about this deference model that you might disagree with:

  1. In practice, an expert's beliefs / credences will be relevant information into deciding what weight to assign them,
  2. Ben's post provides relevant information about Eliezer's beliefs (note this is not taking a stand on other aspects of the post, e.g. the claim about how much people should defer to Eliezer)
  3. The weights assigned to experts are important / valuable to people who need to make decisions now (but they are usually not very important / valuable to researchers).
comment by Verden · 2022-06-23T13:41:19.515Z · EA(p) · GW(p)

Rob thinking that it's not actually 99.99% is in fact an update for me.

This survey [LW(p) · GW(p)] suggests that he was at 96-98% a year ago.

Replies from: RobBensinger
comment by RobBensinger · 2022-06-23T17:58:26.269Z · EA(p) · GW(p)

Why do you think it suggests that? There are two MIRI responses in that range, but responses are anonymous, and most MIRI staff didn't answer the survey.

Replies from: Verden
comment by Verden · 2022-06-23T19:02:28.106Z · EA(p) · GW(p)

I should have clarified that I think (or at least I thought so, prior to your question; kind of confused now) Yudkowsky's answer is probably one of those two MIRI responses. Sorry about that.

I recall you or somebody else at MIRI once wrote something along the lines that most of MIRI researchers don't actually believe that p(doom) is extremely high, like >90% doom. Then, in the linked post, there is a comment from someone who marked themselves both as a technical safety and strategy researcher and who gave 0.98, 0.96 on your questions. The style/content of the comment struck me as something Yudkowsky would have written.

Replies from: RobBensinger
comment by RobBensinger · 2022-06-24T03:06:45.151Z · EA(p) · GW(p)

Cool! I figured your reasoning was probably something along those lines, but I wanted to clarify that the survey is anonymous and hear your reasoning. I personally don't know who wrote the response you're talking about, and I'm very uncertain how many researchers at MIRI have 90+% p(doom), since only five MIRI researchers answered the survey (and marked that they're from MIRI).

comment by kokotajlod · 2022-06-20T04:23:03.977Z · EA(p) · GW(p)

Beat me to it & said it better than I could. 

My now-obsolete draft comment was going to say:

It seems to me that between about 2004 and 2014, Yudkowsky was the best person in the world to listen to on the subject of AGI and AI risks. That is, deferring to Yudkowsky would have been a better choice than deferring to literally anyone else in the world. Moreover, after about 2014 Yudkowsky would probably have been in the top 10; if you are going to choose 10 people to split your deference between (which I do not recommend, I recommend thinking for oneself), Yudkowsky should be one of those people and had you dropped Yudkowsky from the list in 2014 you would have missed out on some important stuff. Would you agree with this?

On the positive side, I'd be interested to see a top ten list from you of people you think should be deferred to as much or more than Yudkowsky on matters of AGI and AI risks.*

*What do I mean by this? Idk, here's a partial operationalization: Timelines, takeoff speeds, technical AI alignment, and p(doom).

[ETA: lest people write me off as a Yudkowsky fanboy, I wish to emphasize that I too think people are overindexing on Yudkowsky's views, I too think there are a bunch of people who defer to him too much, I too think he is often overconfident, wrong about various things, etc.]

[ETA: OK, I guess I think Bostrom probably was actually slightly better than Yudkowsky even on 20-year timespan.]

[ETA: I wish to reemphasize, but more strongly, that Yudkowsky seems pretty overconfident not just now but historically. Anyone deferring to him should keep this in mind; maybe directly update towards his credences but don't adopt his credences. E.g. think "we're probably doomed" but not "99% chance of doom" Also, Yudkowsky doesn't seem to be listening to others and understanding their positions well. So his criticisms of other views should be listened to but not deferred to, IMO.]

Replies from: Habryka
comment by Habryka · 2022-06-20T05:45:57.156Z · EA(p) · GW(p)

Didn't you post that comment right here [EA(p) · GW(p)]? 

Replies from: kokotajlod
comment by kokotajlod · 2022-06-20T16:05:46.146Z · EA(p) · GW(p)

Oops! Dunno what happened, I thought it was not yet posted. (I thought I had posted it at first, but then I looked for it and didn't see it & instead saw the unposted draft, but while I was looking for it I saw Richard's post... I guess it must have been some sort of issue with having multiple tabs open. I'll delete the other version.)

comment by Denis Drescher (Telofy) · 2022-06-21T11:29:00.001Z · EA(p) · GW(p)

I've nevertheless downvoted this post because it seems like it's making claims that are significantly too strong, based on a methodology that I strongly disendorse.

 

I agree, and I’m a bit confused that the top-level post does not violate forum rules in its current form. There is a version of the post – rephrased and reframed – that I think would be perfectly fine even though I would still disagree with it.

And I say that as someone who loved Paul’s response to Eliezer’s list [LW · GW]!

Separately, my takeaway from Ben’s 80k interview has been that I think that Eliezer’s take on AI risk is much more truth-tracking than Ben’s. To improve my understanding, I would turn to Paul and ARC’s writings rather than Eliezer and MIRI’s, but Eliezer’s takes are still up there among the most plausible ones in my mind.

I suspect that the motivation for this post comes from a place that I would find epistemically untenable and that bears little semblance to the sophisticated disagreement between Eliezer and Paul. But I’m worried that a reader may come away with the impression that Ben and Paul fall into one camp and Eliezer into another on AI risk when really Paul agrees with Eliezer on many points when it comes to the importance and urgency of AI safety (see the list of agreements at the top of Paul’s post).

Replies from: Stefan_Schubert
comment by Stefan_Schubert · 2022-06-21T11:46:55.359Z · EA(p) · GW(p)

I agree, and I’m a bit confused that the top-level post does not violate forum rules in its current form. 

That seems like a considerable overstatement to me. I think it would be bad if the forum rules said an article like this couldn't be posted.

Replies from: Telofy
comment by Denis Drescher (Telofy) · 2022-06-21T13:01:10.413Z · EA(p) · GW(p)

Maybe, but I find it important to maintain the sort of culture where one can be confidently wrong about something without fear that it’ll cause people to interpret all future arguments only in light of that mistake instead of taking them at face value and evaluating them for their own merit.

The sort of entrepreneurialness that I still feel is somewhat lacking in EA requires committing a lot of time to a speculative idea on the off-chance that it is correct. If it is not, the entrepreneur has wasted a lot of time and usually money. If additionally it has the social cost that they can't try again because people will dismiss them because of that past failure, it makes it just so much less likely still that anyone will try in the first place.

Of course that’s not the status quo. I just really don’t want EA to move in that direction.

Replies from: Stefan_Schubert
comment by Stefan_Schubert · 2022-06-21T13:13:34.995Z · EA(p) · GW(p)

If anything, I think that prohibiting posts like this from being published would have a more detrimental effect on community culture.

Of course, people are welcome to criticise Ben's post - which some in fact do. That's a very different category from prohibition.

Replies from: Telofy
comment by Denis Drescher (Telofy) · 2022-06-21T13:49:28.534Z · EA(p) · GW(p)

Yeah, that sounds perfectly plausible to me.

“A bit confused” wasn’t meant to be any sort of rhetorical pretend understatement or something. I really just felt a slight surprise that caused me to check whether the forum rules contain something about ad hom, and found that they don’t. It may well be the right call on balance. I trust the forum team on that.

comment by Ben Garfinkel (bmg) · 2022-06-21T22:41:17.841Z · EA(p) · GW(p)

I really appreciate the time people have taken to engage with this post (and actually hope the attention cost hasn’t been too significant). I decided to write some post-discussion reflections on what I think this post got right and wrong.

The reflections became unreasonably long - and almost certainly should be edited down - but I’m posting them here in a hopefully skim-friendly format. They cover what I see as some mistakes with the post, first, and then cover some views I stand by.

Things I would do differently in a second version of the post:

1. I would either drop the overall claim about how much people should defer to Yudkowsky — or defend it more explicitly

At the start of the post, I highlight the two obvious reasons to give Yudkowsky's risk estimates a lot of weight: (a) he's probably thought more about the topic than anyone else and (b) he developed many of the initial AI risk arguments. I acknowledge that many people, justifiably, treat these as important factors when (explicitly or implicitly) deciding how much to defer to Yudkowsky.

Then the post gives some evidence that, at each stage of his career, Yudkowsky has made a dramatic, seemingly overconfident prediction about technological timelines and risks - and at least hasn’t obviously internalised lessons from these apparent mistakes.

The post expresses my view that these two considerations at least counterbalance each other - so that, overall, Yudkowsky's risk estimates shouldn't be given more weight than (e.g.) those of other established alignment researchers or the typical person on the OpenPhil worldview investigation team.

But I don't do a lot in the post to actually explore how we should weigh these factors up. In that sense: I think it’d be fair to regard the post’s central thesis as importantly under-supported by the arguments contained in the post.

I should have either done more to explicitly defend my view or simply framed the post as "some evidence about the reliability of Yudkowsky's risk estimates."

2. I would be clearer about how and why I generated these examples

In hindsight, this is a significant oversight on my part. The process by which I generated these examples is definitely relevant for judging how representative they are - and, therefore, how much to update on them. But I don’t say anything about this in the post. My motives (or at least conscious motives) are also part of the story that I only discuss in pretty high-level terms, but seem like they might be relevant for forming judgments.

For context, then, here was the process:

A few years ago, I tried to get a clearer sense of the intellectual history of the AI risk and existential risk communities. For that reason, I read a bunch of old white papers, blog posts, and mailing list discussions.

These gave me the impression that Yudkowsky’s track record (and - to some extent - the track record of the surrounding community) was worse than I’d realised. From reading old material, I basically formed something like this impression: “At each stage of Yudkowsky’s professional life, his work seems to have been guided by some dramatic and confident belief about technological trajectories and risks. The older beliefs have turned out to be wrong. And the ones that haven’t yet resolved at least seem to have been pretty overconfident in hindsight.”

I kept encountering the idea that Yudkowsky has an exceptionally good track record or that he has an unparalleled ability to think well about AI (he’s also expressed view himself) - and I kept thinking, basically, that this seemed wrong. I wrote up some initial notes on this discrepancy at some point, but didn’t do anything with them.

I eventually decided to write something public after the “Death with Dignity” post, since the view it expresses (that we’re all virtually certain to die soon) both seems wrong to me and very damaging if it’s actually widely adopted in the community. I also felt like the “Death with Dignity” post was getting more play than it should, simply because people have a strong tendency to give Yudkowsky’s views weight. I can’t imagine a similar post written by someone else having nearly as large of an impact. Notably, since that post didn’t really have substantial arguments in it (although the later one did), I think the fact it had an impact is seemingly a testament to the power of deference; I think it’d be hard to look at the reaction to that post and argue that it’s only Yudkowsky’s arguments (rather than his public beliefs in-and-of-themselves) that have a major impact on the community.

People are obviously pretty aware of Yudkowsky’s positive contributions, but my impression is that (especially) new community members tended not to be aware of negative aspects of his track record. So I wanted to write a post drawing attention to the negative aspects.

I was initially going to have the piece explicitly express the impression I’d formed, which was something like: “At each stage of Yudkowsky’s professional life, his work has been guided by some dramatic and seemingly overconfident belief about technological trajectories and risks.” The examples in the post were meant to map onto the main ‘animating predictions’ about technology he had at each stage of his career. I picked out the examples that immediately came to mind.

Then I realised I wasn’t at all sure I could defend the claim that these were his main ‘animating predictions’ - the category was obviously extremely vague, and the main examples that came to mind were extremely plausibly a biased sample. I thought there was a good chance that if I reflected more, then I’d also want to include various examples that were more positive.

I didn’t want to spend the time doing a thorough accounting exercise, though, so I decided to drop any claim that the examples were representative and just describe them as “cherry-picked” — and add in lots of caveats emphasising that they’re cherry-picked.

(At least, these were my conscious thought processes and motivations as I remember them. I’m sure other factors played a role!)

3. I’d tweak my discussion of take-off speeds

I’d make it clearer that my main claim is: it would have been unreasonable to assign a very high credence to fast take-offs back in (e.g.) the early- or mid-2000s, since the arguments for fast take-offs had significant gaps. For example, there were a lots of possible countervailing arguments for slow take-offs that pro-fast-take-off authors simply hadn’t address yet — as evidenced, partly, by the later publication of slow-take-off arguments leading a number of people to become significantly more sympathetic to slow take-offs. (I’m not claiming that there’s currently a consensus against fast-take-off views.)

4. I’d add further caveats to the “coherence arguments” case - or simply leave it out

Rohin’s and Oli’s comments under the post have made me aware that there’s a more positive way to interpret Yudkowsky’s use of coherence arguments. I’m not sure if that interpretation is correct, or if it would actually totally undermine the example, but this is at minimum something I hadn’t reflected on. I think it’s totally possible that further reflection would lead me to simply remove the example.

Positions I stand by:

On the flipside, here’s a set of points I still stand by:

1. If a lot of people in the community believe AI is probably going to kill everyone soon, then (if they’re wrong) this can have really important negative effects

In terms of prioritisation: My prediction is that if you were to ask different funders, career advisors, and people making career decisions (e.g. deciding whether to go into AI policy or bio policy) how much they value having a good estimate of AI risk, they’ll very often answer that they value it a great deal. I do think that over-estimating the level of risk could lead to concretely worse decisions.

In terms of community health: I think that believing you’re probably going to die soon is probably bad for a large portion of people. Reputationally: Being perceived as believing that everyone is probably going to die soon (particularly if this actually an excessive level of worry) also seems damaging.

I think we should also take seriously the tail-risk that at least one person with doomy views (even if they’re not directly connected to the existential risk community) will take dramatic and badly harmful actions on the basis of their views.

2. Directly and indirectly, deference to Yudkowsky has a significant influence on a lot of people’s views

As above: One piece of evidence for this is Yudkowsky’s “Death with Dignity” post triggered a big reaction, even though it didn’t contain any significant new arguments. I think his beliefs (above and beyond his arguments) clearly do have an impact.

Another reason to believe deference is a factor: I think it’s both natural and rational for people, particularly people new to an area, to defer to people with more expertise in that area.[1] Yudkowsky is one of the most obvious people to defer to, as one of the two people most responsible for developing and popularising AI risk arguments and as someone who has (likely) spent more time thinking about the subject than anyone else.

Beyond that: A lot of people also clearly in general have huge amount of respect for Yudkowsky, sometimes more than they have for any other public intellectual. I think it’s natural (and sensible) for people’s views to be influenced by the views of the people they respect. In general, I think, unless you have tremendous self-control, this will tend to happen sub-consciously even if you don’t consciously choose to defer to the people you respect.

Also, people sometimes just do talk about Yudkowsky’s track record or reputation as a contributing factor to their views.

3. The track records of influential intellectuals (including Yudkowsky) should be publicly discussed.

A person’s track-record provides evidence about how reliable their predictions are. If people are considering how much to defer to some intellectual, then they should want to know what their track record (at least within the relevant domain) looks like.

The main questions that matter are: What has the intellectual gotten wrong and right? Beyond whether they were wrong or right, about a given case, does it also seem like their predictions were justified? If they’ve made certain kinds of mistakes in the past, do we now have reason to think they won’t repeat those kinds of mistakes?

4. Yudkowsky’s track record suggests a substantial bias toward dramatic and overconfident predictions.

One counter - which I definitely think it’s worth reflecting on - is that it might be possible to generate a similarly bias-suggesting list of examples like this for any other public intellectual or member of the existential risk community.

I’ll focus on one specific comment, suggesting that Yudkowsky’s incorrect predictions about nanotechnology are in the same reference class as ‘writing a typically dumb high school essay.’ The counter goes something like this: Yes, it was possible to find this example from Yudkowsky’s past - but that’s not importantly different than being able to turn up anyone else’s dumb high school essay about (e.g.) nuclear power.

Ultimately, I don’t buy the comparison. I think it’s really out-of-distribution for someone in their late teens and early twenties to pro-actively form the view that an emerging technology is likely to kill everyone within a decade, found an organization and devote years of their professional life to address the risk, and talk about how they’re the only person alive who can stop it.

That just seems very different from writing a dumb high school essay. Much more than a standard dumb high school essay, I think this aspect of Yudkowsky’s track record really does suggest a bias toward dramatic and overconfident predictions. This prediction is also really strikingly analogous to the prediction Yudkowsky is making right now - its relevance is clearly higher than the relevance of (e.g.) a random poorly thought-out view in a high school essay.

(Yudkowsky's early writing and work is also impressive, in certain ways, insofar as it suggests a much higher level of originality of thought and agency than the typical young person has. But the fact that this example is impressive doesn’t undercut, I think, the claim that it’s also highly suggestive of a bias toward highly confident and dramatic predictions.)

5. Being one of the first people to identify, develop, or take seriously some idea doesn’t necessarily mean that you predictions about the idea will be unusually reliable

By analogy:

  • I don’t think we can assume that the first person to take the covid lab leak theory seriously (when others were dismissive) is currently the most reliable predictor of whether the theory is true.

  • I don’t think we can assume that the first person to develop the many worlds theory of quantum mechanics (when others were dismissive) would currently be the best person to predict whether the theory is true, if they were still alive.

There are, certainly, reasons to give pioneers in a domain special weight when weighing expert opinion in that domain.[2] But these reasons aren’t absolute.

There are even easons that point in the opposite direction: we might worry that the pioneer has an attachment to their theory, so will be biased toward believing it is true and as important as possible. We might also worry that the pioneering-ness of their beliefs is evidence that these beliefs front-ran the evidence and arguments (since one way to be early is to simply be excessively confident). We also have less evidence of their open-mindedness than we do for the people who later on moved toward the pioneer’s views — since moving toward the pioneer’s views, when you were initially dismissive, is at least a bit of evidence for open-mindedness and humility.[3]

Overall, I do think we should tend defer more to pioneers (all else being equal). But this tendency can definitely be overruled by other evidence and considerations.

6. The causal effects that people have had on the world don’t (in themselves) have implications for how much we should defer to them

At least in expectation, so far, Eliezer Yudkowsky has probably had a very positive impact on the world. There is a plausible case to be made that misaligned AI poses a substantial existential risk - and Yudkowsky’s work has probably, on net, massively increased the number of people thinking about it and taking it seriously. He’s also written essays that have exposed huge numbers of people to other important ideas and helped them to think more clearly. It makes sense for people to applaud all of this.

Still, I don’t think his positive causal effect on the world gives people much additional reason to be deferential to him.

Here’s a dumb thought experiment: Suppose that Yudkowsky wrote all of the same things, but never published them. But suppose, also, that a freak magnetic storm ended up implanting all of the same ideas in his would-be-readers’ brains. Would this absence of a casual effect count against deferring to Yudkowsky? I don’t think so. The only thing that ultimately matters, I think, is his track record of beliefs - and the evidence we currently have about how accurate or justified those beliefs were.

I’m not sure anyone disagrees with the above point, but I did notice there seemed to be a decent amount of discussion in the comments about Yudkowsky’s impact - and I’m not sure I think this issue will ultimately be relevant.[4]


  1. For example: I had ten hours to form a view about the viability of some application of nanotechnology, I definitely wouldn’t want to ignore the beliefs of people who have already thought about the question. Trying to learn the relevant chemistry and engineering background wouldn’t be a good use of my time. ↩︎

  2. One really basic reason is simply that they’ve simply had more time to think about certain subjects than anyone else. ↩︎

  3. Here’s a concrete case: Holden Karnofsky eventually moved toward taking AI risks seriously, after publicly being fairly dismissive of it, and then wrote up a document analysing why he was initially dismissive and drawing lessons from the experience. It seems like we could count that as positive evidence about his future judgment. ↩︎

  4. Even though I’ve just said I’m not sure this question is relevant, I do also want to say a little bit about Yudkowsky’s impact. I personally think's probably had a very significant impact. Nonetheless, I also think the impact can be overstated. For example, I think, it’s been suggested that the effective altruism community might not be very familiar with concepts like Bayesian or the importance of overcoming bias if it weren’t for Yudkowsky’s writing. I don’t really find that particular suggestion plausible.

    Here’s one data point I can offer from my own life: Through a mixture of college classes and other reading, I’m pretty confident I had already encountered the heuristics and biases literature, Bayes’ theorem, Bayesian epistemology, the ethos of working to overcome bias, arguments for the many worlds interpretation, the expected utility framework, population ethics, and a number of other ‘rationalist-associated’ ideas before I engaged with the effective altruism or rationalist communities. For example, my college had classes in probability theory, Bayesian epistemology, and the philosophy of quantum mechanics, and I’d read at least parts of books like Thinking Fast and Slow, the Signal and the Noise, the Logic of Science, and various books associated with the “skeptic community.” (Admittedly, I think it would have been harder to learn some of these things if I’d gone to college a bit earlier or had a different major. I also probably "got lucky" in various ways with the classes I took and books I picked up.) See also Carl Shulman making a similar point [LW · GW] and John Halstead also [EA(p) · GW(p)] briefly commenting the way in which he personally encountered some the relevant ideas. ↩︎

Replies from: RobBensinger, Habryka, Owen_Cotton-Barratt, richard_ngo, hibukki, Verden, Eddie K, Dr. David Mathers
comment by RobBensinger · 2022-06-23T03:41:55.696Z · EA(p) · GW(p)

I noted some places I agree with your comment here [EA · GW], Ben. (Along with my overall take on the OP.)

Some additional thoughts:

Notably, since that post didn’t really have substantial arguments in it (although the later one did), I think the fact it had an impact is seemingly a testament to the power of deference

The “death with dignity” post came in the wake of Eliezer writing hundreds of thousands of words about why he thinks alignment is hard in the Late 2021 MIRI Conversations (in addition to the many specific views and arguments about alignment difficulty he’s written up in the preceding 15+ years). So it seems wrong to say that everyone was taking it seriously based on deference alone.

The post also has a lot of content beyond “p(doom) is high”. Indeed, I think the post’s focus (and value-add) is mostly in its discussion of rationalization, premature/excessive conditionalizing, and ethical injunctions, not in the bare assertion that p(doom) is high. Eliezer was already saying pretty similar stuff about p(doom) back in September [LW · GW].

I’d make it clearer that my main claim is: it would have been unreasonable to assign a very high credence to fast take-offs back in (e.g.) the early- or mid-2000s, since the arguments for fast take-offs had significant gaps. For example, there were a lots of possible countervailing arguments for slow take-offs that pro-fast-take-off authors simply hadn’t address yet — as evidenced, partly, by the later publication of slow-take-off arguments leading a number of people to become significantly more sympathetic to slow take-offs.

I disagree; I think that, e.g., noting how powerful and widely applicable general intelligence has historically been, and noting a bunch of standard examples of how human cognition is a total shitshow, is sufficient to have a very high probability on hard takeoff.

I think the people who updated a bunch toward hard takeoff based on the recent debate were making a mistake, and should have already had a similarly high p(hard takeoff) going back to the Foom debate, if not earlier.

Insofar as others disagree, I obviously think it’s a good thing for people to publish arguments like “but ML might be very competitive”, and for people to publicly respond to them. But I don’t think “but ML might be very competitive” and related arguments ought to look compelling at a glance (given the original simple arguments for hard takeoff), so I don’t think someone should need to consider the newer discussion in order to arrive at a confident hard-takeoff view.

(Also, insofar as Paul recently argued for X and Eliezer responded with a valid counter-argument for Y, it doesn’t follow that Eliezer had never considered anything like X or Y in initially reaching his confidence. Eliezer’s stated view is that the new Paul arguments seem obviously invalid and didn’t update him at all when he read them. Your criticism would make more sense here if Eliezer had said “Ah, that’s an important objection I hadn’t considered; but now that I’m thinking about it, I can generate totally new arguments that deal with the objections, and these new counter-arguments seem correct to me.”)

The main questions that matter are: What has the intellectual gotten wrong and right? Beyond whether they were wrong or right, about a given case, does it also seem like their predictions were justified?

At least as important, IMO, is the visible quality of their reasoning and arguments, and their retrodictions.

AGI, moral philosophy, etc. are not topics where we can observe extremely similar causal processes today and test all the key claims and all the key reasoning heuristics with simple experiments. Tossing out ‘argument evaluation’ and ‘how well does this fit what I already know?’ altogether would mean tossing out the majority of our evidence about how much weight to put on people’s views.

Ultimately, I don’t buy the comparison. I think it’s really out-of-distribution for someone in their late teens and early twenties to pro-actively form the view that an emerging technology is likely to kill everyone within a decade, found an organization and devote years of their professional life to address the risk, and talk about how they’re the only person alive who can stop it.

I take the opposite view on this comparison. I agree that this is really unusual, but I think the comparison is unfavorable to the high school students, rather than unfavorable to Eliezer. Having unusual views and then not acting on them in any way is way worse than actually acting on your predictions.

I agree that Eliezer acting on his beliefs to this degree suggests he was confident; but in a side-by-side comparison of a high schooler who’s expressed equal confidence in some other unusual view, but takes no unusual actions as a result, the high schooler is the one I update negatively about.

(This also connects up to my view that EAs generally are way too timid/passive in their EA activity, don’t start enough new things, and (when they do start new things) start too many things based on ‘what EA leadership tells them’ rather than based on their own models of the world. The problem crippling EA right now is not that we're generating and running with too many wildly different, weird, controversial moonshot ideas. The problem is that we're mostly just passively sitting around, over-investing in relatively low-impact meta-level interventions, and/or hoping that the most mainstream already-established ideas will somehow suffice.)

Replies from: Oliver Sourbut, MichaelStJules
comment by Oliver Sourbut · 2022-06-25T09:08:46.820Z · EA(p) · GW(p)

I just wanted to state agreement that it seems a large number of people largely misread Death with Dignity, at least according to what seems to me the most plausible intended message: mainly about the ethical injunctions (which are very important as a finitely-rational and prone-to-rationalisation being), as Yudkowsky has written of in the past [LW · GW].

The additional detail of 'and by the way this is a bad situation and we are doing badly' is basically modal Yudkowsky schtick and I'm somewhat surprised it updated anyone's beliefs (about Yudkowsky's beliefs, and therefore their all-things-considered-including-deference beliefs).

I think if he had been a little more audience-aware he might have written it differently. Then again maybe not, if the net effect is more attention and investment in AI safety - and more recent posts [LW · GW] and comments [LW(p) · GW(p)] suggest he's more willing than before to use certain persuasive techniques to spur action (which seems potentially misguided to me, though understandable).

comment by MichaelStJules · 2022-06-23T18:29:20.700Z · EA(p) · GW(p)

The “death with dignity” post came in the wake of Eliezer writing hundreds of thousands of words about why he thinks alignment is hard in the Late 2021 MIRI Conversations (in addition to the many specific views and arguments about alignment difficulty he’s written up in the preceding 15+ years). So it seems wrong to say that everyone was taking it seriously based on deference alone.

I think "deference alone" is a stronger claim than the one we should worry about. People might read the arguments on either side (or disproportionately Eliezer's arguments), but then defer largely to Eliezer's weighing of arguments because of his status/position, confidence, references to having complicated internal models (that he often doesn't explain or link explanations to), or emotive writing style.

What share of people with views similar to Eliezer's do you expect to have read these conversations? They're very long, not well organized, and have no summaries/takeaways. The format seems pretty bad if you value your time.

I think the AGI Ruin: A List of Lethalities [LW · GW] post was formatted pretty accessibly, but that came after death with dignity.

Also, insofar as Paul recently argued for X and Eliezer responded with a valid counter-argument for Y, it doesn’t follow that Eliezer had never considered anything like X or Y in initially reaching his confidence. Eliezer’s stated view is that the new Paul arguments seem obviously invalid and didn’t update him at all when he read them.

If the new Paul arguments seem obviously invalid, then Eliezer should be able to explain why in such a way that convinces Paul. Has this generally been the case?

comment by Habryka · 2022-06-23T05:14:48.658Z · EA(p) · GW(p)

I appreciate this update! 

Then the post gives some evidence that, at each stage of his career, Yudkowsky has made a dramatic, seemingly overconfident prediction about technological timelines and risks - and at least hasn’t obviously internalised lessons from these apparent mistakes.

I am confused about you bringing in the claim of "at each stage of his career", given that the only two examples you cited that seemed to provide much evidence here were from the same (and very early) stage of his career. Of course, you might have other points of evidence that point in this direction, but I did want to provide some additional pushback on the "at each stage of his career" point, which I think you didn't really provide evidence for. 

I do think finding evidence for each stage of his career would of course be time-consuming, and I understand that you didn't really want to go through all of that, but it seemed good to point out explicitly. 

Ultimately, I don’t buy the comparison. I think it’s really out-of-distribution for someone in their late teens and early twenties to pro-actively form the view that an emerging technology is likely to kill everyone within a decade, found an organization and devote years of their professional life to address the risk, and talk about how they’re the only person alive who can stop it.

FWIW, indeed in my teens I basically did dedicate a good chunk of my time and effort towards privacy efforts out of a concern for US and UK-based surveillance-state concerns. I was in high-school, so making it my full-time efforts was a bit hard, though I did help found a hackerspace in my hometown that had a lot of privacy concerns baked into the culture, and I did write a good number of essays on this. I think the key difference between me and Eliezer here is more the fact that Eliezer was home-schooled and had experience doing things on his own, and not some kind of other fact about his relationship to the ideas being very different. 

It's plausible you should update similarly on me, which I think isn't totally insane (I do think I might have, as Luke put it, the "taking ideas seriously gene" [LW · GW], which I would also associate with taking other ideas to their extremes, like religious beliefs). 

comment by Owen Cotton-Barratt (Owen_Cotton-Barratt) · 2022-06-21T23:51:20.449Z · EA(p) · GW(p)

I really appreciated this update. Mostly it checks out to me, but I wanted to push back on this:

Here’s a dumb thought experiment: Suppose that Yudkowsky wrote all of the same things, but never published them. But suppose, also, that a freak magnetic storm ended up implanting all of the same ideas in his would-be-readers’ brains. Would this absence of a casual effect count against deferring to Yudkowsky? I don’t think so. The only thing that ultimately matters, I think, is his track record of beliefs - and the evidence we currently have about how accurate or justified those beliefs were.

It seems to me that a good part of the beliefs I care about assessing are the beliefs about what is important. When someone has a track record of doing things with big positive impact, that's some real evidence that they have truth-tracking beliefs about what's important. In the hypothetical where Yudkowsky never published his work, I don't get the update that he thought these were important things to publish, so he doesn't get credit for being right about that.

Replies from: hibukki
comment by Yonatan Cale (hibukki) · 2022-06-22T13:30:43.695Z · EA(p) · GW(p)

There's also (imperfect) information in "lots of smart people thought about EY's opinions and agree with him" that you don't get from the freak magnetic storm scenario.

comment by richard_ngo · 2022-06-23T06:41:15.748Z · EA(p) · GW(p)

Thanks for writing this update. I think my number one takeaway here is something like: when writing a piece with the aim of changing community dynamics, it's important to be very clear about motivations and context. E.g. I think a version of the piece which said "I think people are overreacting to Death with Dignity, here are my specific models of where Yudkowsky tends to be overconfident, here are the reasons why I think people aren't taking those into account as much as they should" would have been much more useful and much less controversial than the current piece, which (as I interpret it) essentially pushes a general "take Yudkowsky less seriously" meme (and is thereby intrinsically political/statusy).

comment by Yonatan Cale (hibukki) · 2022-06-22T13:38:25.029Z · EA(p) · GW(p)

I'm a bit confused about a specific small part:

tendency toward expressing dramatic views

I imagine that for many people, including me (including you?), once we work on [what we believe to be] preventing the world from ending, we would only move to another job if it was also preventing the world from ending, probably in an even more important way.

 

In other words, I think "working at a 2nd x-risk job and believing it is very important" is mainly predicted by "working at a 1st x-risk job and believing it is very important", much more than by personality traits.

 

This is almost testable, given we have lots of people working on x-risk today and believing it is very important. But maybe you can easily put your finger on what I'm missing?

comment by Verden · 2022-06-22T00:22:47.065Z · EA(p) · GW(p)

I feel like people are missing one fairly important consideration when discussing how much to defer to Yudkowsky, etc. Namely, I've heard multiple times that Nate Soares, the executive director of MIRI, has models of AI risk that are very similar to Yudkowsky's, and their p(doom) are also roughly the same. My limited impression is that Soares is no less smart or otherwise capable than Yudkowsky. So, when having this kind of discussion, focusing on Yudkowsky's track record or whatever, I think it's good to remember that there's another very smart person, who entered AI safety much later than Yudkowsky, and who holds very similar inside views on AI risk.

Replies from: technicalities
comment by Gavin (technicalities) · 2022-06-22T08:01:47.694Z · EA(p) · GW(p)

This isn't much independent evidence I think: seems unlikely that you could become director of MIRI unless you agreed. (I know that there's a lot of internal disagreement at other levels.)

Replies from: Verden
comment by Verden · 2022-06-22T12:16:31.721Z · EA(p) · GW(p)

My point has little to do with him being the director of MIRI per se. 

I suppose I could be wrong about this, but my impression is that Nate Soares is among the top 10 most talented/insightful people with elaborate inside view and years of research experience in AI alignment. He also seems to agree with Yudkowsky on a whole lot of issues and predicts about the same p(doom) for about the same reasons. And I feel that many people don't give enough thought to the fact that while e.g. Paul Christiano has interacted a lot with Yudkowsky and disagreed with him on many key issues (while agreeing on many others [LW · GW]), there's also Nate Soares, who broadly agrees with Yudkowsky's models that predict very high p(doom). 

Another, more minor point: if someone is bringing up Yudkowsky's track record in the context of his extreme views on AI risk, it seems helpful to talk about Soares' track record as well.

Replies from: Guy Raveh
comment by Guy Raveh · 2022-06-22T21:57:23.372Z · EA(p) · GW(p)

I think this maybe argues against a point not made in the OP. Garfinkel isn't saying "disregard Yudkowsky's views" - rather he's saying "don't give them extra weight just because Yudkowsky's the one saying them".

For example, from his reply to Richard Ngo:

I think it's really important to seperate out the question "Is Yudkowsky an unusually innovative thinker?" and the question "Is Yudkowsky someone whose credences you should give an unusual amount of weight to?"

I read your comment as arguing for the former, which I don't disagree with. But that doesn't mean that people should currently weigh his risk estimates more highly than they weigh the estimates of other researchers currently in the space

So at least from Garfinkel's perspective, Yudkowsky and Soares do count as data points, they're just equal in weight to other relevant data points.

(I'm not expressing any of my own, mostly unformed, views here)

Replies from: RobBensinger
comment by RobBensinger · 2022-06-23T01:08:35.785Z · EA(p) · GW(p)

So at least from Garfinkel's perspective, Yudkowsky and Soares do count as data points, they're just equal in weight to other relevant data points.

Ben has said this about Eliezer, but not about Nate, AFAIK.

comment by ekka (Eddie K) · 2022-06-22T00:56:37.521Z · EA(p) · GW(p)

For what it's worth, I found this post and the ensuing comments very illuminating. As a person relatively new to both EA and the arguments about AI risk, I was a little bit confused as to why there was not much push back on the very high confidence beliefs about AI doom within the next 10 years. My assumption had been that there was a lot of deference to EY because of reverence and fealty stemming from his role in getting the AI alignment field started not to mention the other ways he has shaped people's thinking. I also assumed that his track record on predictions was just ambiguous enough for people not to question his accuracy. Given that I don't give much credence to the idea that prophets/oracles exist, I thought it unlikely that the high confidence on his predictions were warranted on the count that there doesn't seem to be much evidence supporting the accuracy of long range forecasts. I did not think that there were such glaring mispredictions made by EY in the past so thank you for highlighting them.

comment by Dr. David Mathers · 2022-06-22T10:59:50.964Z · EA(p) · GW(p)

'Here’s one data point I can offer from my own life: Through a mixture of college classes and other reading, I’m pretty confident I had already encountered the heuristics and biases literature, Bayes’ theorem, Bayesian epistemology, the ethos of working to overcome bias, arguments for the many worlds interpretation, the expected utility framework, population ethics, and a number of other ‘rationalist-associated’ ideas before I engaged with the effective altruism or rationalist communities.'

I think some of this is just a result of being a community founded partly by analytic philosophers. (though as a philosopher I would say that!). 

I think it's normal to encounter some of these ideas in undergrad philosophy programs. At my undergrad back in 2005-09 there was a whole upper-level undergraduate course in decision theory. I don't think that's true everywhere all the time, but I'd be surprised if it was wildly unusual. I can't remember if we covered population ethics in any class, but I do remember discovering Parfit on the Repugnant Conclusion in 2nd-year of undergrad because one of my ethics lecturers said Reasons and Persons was a super-important book. In terms of the Oxford phil scene where the term "effective altruism" was born, the main titled professorship in ethics at that time was held by John Broome, a utilitarianism-sympathetic former economist, who had written famous stuff on expected utility theory. I can't remember if he was the PhD supervisor of anyone important to the founding of EA, but I'd be astounded if some of the phil. people involved in that had not been reading his stuff and talking to him about it.  Most of the phil. physics people at Oxford were gung-ho for many worlds, it's not a fringe view in philosophy of physics as far as I know. (Though I think Oxford was kind of a centre for it and there was more dissent elsewhere.)  As far as I can tell, Bayesian epistemology in at least some senses of that term is a fairly well-known approach in philosophy of science. Philosophers specializing in epistemology might more often ignore it, but they know it's there. And not all of them ignore it! I'm not an epistemologist, by my doctoral supervisor was, and it's not unusual for his work to refer to Bayesian ideas in modelling stuff about how to evaluate evidence. (I.e. in uhm, defending the fine-tuning argument for the existence of God, which might not be the best use, but still!: https://www.yoaavisaacs.com/uploads/6/9/2/0/69204575/ms_for_fine-tuning_fine-tuning.pdf). (John was my supervisor, not Yoav.) 

A high interest in bias stuff might genuinely be more an Eliezer/LessWrong legacy though. 

Replies from: Pablo_Stafforini, Linch, Guy Raveh
comment by Pablo (Pablo_Stafforini) · 2022-06-22T12:46:46.036Z · EA(p) · GW(p)

the main titled professorship in ethics at that time was held by John Broome, a utilitarianism-sympathetic former economist, who had written famous stuff on expected utility theory. I can't remember if he was the PhD supervisor of anyone important to the founding of EA, but I'd be astounded if some of the phil. people involved in that had not been reading his stuff and talking to him about it.

Indeed, Broome co-supervised the doctoral theses of both Toby Ord and Will MacAskill. And Broome was,  in fact, the person who advised Will to get in touch with Toby, before the two had met.

comment by Linch · 2022-06-24T20:14:30.144Z · EA(p) · GW(p)

Speaking for myself, I was interested in a lot of the same things in the LW cluster (Bayes, approaches to uncertainty, human biases, utilitarianism, philosophy, avoiding the news) before I came across LessWrong or EA. The feeling is much more like "I found people who can describe these ideas well" than "oh these are interesting and novel ideas to me." (I had the same realization when I learned about utilitarianism...much more of a feeling that "this is the articulation of clearly correct ideas, believing otherwise seems dumb").

That said, some of the ideas on LW that seemed more original to me (AI risk, logical decision theory stuff, heroic responsibility in an inadequate world), do seem both substantively true and extremely important, and it took me a lot of time to be convinced of this.

(There are also other ideas that I'm less sure about, like cryonics and MW).

comment by Guy Raveh · 2022-06-22T11:43:57.010Z · EA(p) · GW(p)

Veering entirely off-topic here, but how does the many worlds hypothesis tie in with all the rest of the rationality/EA stuff?

Replies from: hibukki
comment by Yonatan Cale (hibukki) · 2022-06-22T13:41:26.843Z · EA(p) · GW(p)

[replying only to you with no context]

EY pointed out the many worlds hypothesis as a thing that even modern science, specifically physics (which is considered a very well functioning science, it's not like social psychology), is missing.

And he used this as an example to get people to stop trusting authority, including modern science, which many people around him seem to trust.

I think this [LW · GW] is a reasonable reference.

Replies from: Guy Raveh
comment by Guy Raveh · 2022-06-22T21:37:40.771Z · EA(p) · GW(p)

Can't say any of that makes sense to me. I have the feeling there's some context I'm totally missing (or he's just wrong about it). I may ask you about this in person at some point :)

comment by Habryka · 2022-06-19T18:40:41.262Z · EA(p) · GW(p)

It seems that half of these examples are from 15+ years ago, from a period for which Eliezer has explicitly disavowed his opinions (and the ones that are not strike me as most likely correct, like treating coherence arguments as forceful and that AI progress is likely to be discontinuous and localized and to require relatively little compute). 

Let's go example-by-example: 

1. Predicting near-term extinction from nanotech

This critique strikes me as about as sensible as digging up someone's old high-school essays and critiquing their stance on communism or the criminal justice system. I want to remind any reader that this is an opinion from 1999, when Eliezer was barely 20 years old. I am confident I can find crazier and worse opinions for every single leadership figure in Effective Altruism, if I am willing to go back to what they thought while they were in high-school. To give some character, here are some things I believed in my early high-school years: 

  • The economy was going to collapse because the U.S. was establishing a global surveillance state
  • Nuclear power plants are extremely dangerous and any one of them is quite likely to explode in a given year
  • We could have easily automated the creation of all art, except for the existence of a vaguely defined social movement that tries to preserve the humanity of art-creation

These are dumb opinions. I am not ashamed of having had them. I was young and trying to orient in the world. I am confident other commenters can add their own opinions they had when they were in high-school. The only thing that makes it possible for someone to critique Eliezer on these opinions is that he was virtuous and wrote them down, sometimes in surprisingly well-argued ways. 

If someone were to dig up an old high-school essay of mine, in-particular one that has at the top written "THIS IS NOT ENDORSED BY ME, THIS IS A DUMB OPINION", and used it to argue that I am wrong about important cause prioritization questions, I would feel deeply frustrated and confused.

For context, on Eliezer's personal website it says: 

My parents were early adopters, and I’ve been online since a rather young age. You should regard anything from 2001 or earlier as having been written by a different person who also happens to be named “Eliezer Yudkowsky”. I do not share his opinions.

2. Predicting that his team had a substantial chance of building AGI before 2010

Given that this is only 2 years later, all my same comments apply. But let's also talk a bit about the object-level here. 

This is the quote on which this critique is based: 

Our best guess for the timescale is that our final-stage AI will reach transhumanity sometime between 2005 and 2020, probably around 2008 or 2010.  As always with basic research, this is only a guess, and heavily contingent on funding levels.

This... is not a very confident prediction. This paragraph literally says "only a guess". I agree, if Eliezer said this today, I would definitely dock him some points, but this is again a freshman-aged Eliezer, and it was more than 20 years ago. 

But also, I don't know, predicting AGI by 2020 from the year 2000 doesn't sound  that crazy. If we didn't have a whole AI winter, if Moore's law had accelerated a bit instead of slowed down, if more talent had flowed into AI and chip-development, 2020 doesn't seem implausible to me. I think it's still on the aggressive side, given what we know now, but technological forecasting is hard, and the above sounds more like a 70% confidence interval instead of a 90% confidence interval. 

3. Having high confidence that AI progress would be extremely discontinuous and localized and not require much compute

This opinion strikes me as approximately correct. I still expect highly discontinuous progress, and many other people have argued for this as well. Your analysis that the world looks more like Hanson's world described in the AI foom debate also strikes me as wrong (and e.g. Paul Christiano has also said that Hanson's predictions looked particularly bad in the FOOM debate [LW(p) · GW(p)]). Indeed, I would dock Hanson many more points in that discussion (though, overall, I give both of them a ton of points, since they both recognized the importance of AI-like technologies early, and performed vastly above baseline for technological forecasting, which again, is extremely hard). 

This seems unlikely to be the right place for a full argument on discontinuous progress. However, continuous takeoff is very far from consensus in the AI Alignment field, and this post seems to try to paint it as such, which seems pretty bad to me (especially if it's used in a list with two clearly wrong things, without disclaiming it as such). 

4. Treating early AI risk arguments as close to decisive

You say: 

My point, here, is not necessarily that Yudkowsky was wrong, but rather that he held a much higher credence in existential risk from AI than his arguments justified at the time. The arguments had pretty crucial gaps that still needed to be resolved[14] [EA · GW], but, I believe, his public writing tended to suggest that these arguments were tight and sufficient to justify very high credences in doom.

I think the arguments are pretty tight and sufficient to establish the basic risk argument. I found your critique relatively uncompelling. In particular, I think you are misrepresenting that a premise of the original arguments was a fast takeoff. I can't currently remember any writing that said it was a necessary component of the AI risk arguments that takeoff happens fast, or at least whether the distinction between "AI vastly exceeds human intelligence in 1 week vs 4 years" is that crucial to the overall argument, which is as far as I can tell the range that most current opinions in the AI Alignment field falls into (and importantly, I know of almost no one who believes that it could take 20+ years for AI to go from mildly subhuman to vastly superhuman, which does feel like it could maybe change the playing field, but also seems to be a very rarely held opinion). 

Indeed, I think Eliezer was probably underconfident in doom from AI, since I currently assign >50% probability to AI Doom, as do many other people in the AI Alignment field. 

See also Nate's recent comment on some similar critiques to this: https://www.lesswrong.com/posts/8NKu9WES7KeKRWEKK/why-all-the-fuss-about-recursive-self-improvement [LW · GW]

5. Treating "coherence arguments" as forceful

Coherence arguments do indeed strike me as one of the central valid arguments in favor of AI Risk. I think there was a common misunderstanding that did confuse some people, but that misunderstanding was not argued for by Eliezer or other people at MIRI, as far as I can tell (and I've looked into this for 5+ hours as part of discussions with Rohin and Richard). 

The central core of coherence arguments, which are based in arguments of competetiveness and economic efficiency strike me as very strong, robustly argued for, and one of the main reasons for why AI Risk will be dangerous. The Neumann-Morgensterm theorem does play a role here, though it's definitely not sufficient to establish a strong case, and Rohin and Richard have successfully argued against that, though I don't think Eliezer has historically argued that the Neumann-Morgenstern theorem is sufficient to establish an AI-alignment relevant argument on its own (though Dutch-book style arguments are very suggestive for the real structure of the argument).

Edit: Rohin says something similar in a separate comment reply [EA(p) · GW(p)].

6. Not acknowledging his mixed track record

Given my disagreements with the above, I think doing so would be a mistake. But even without that, let's look at the merits of this critique. 

For the two "clear cut" examples, Eliezer has posted dozens of times on the internet that he has disendorsed his views from before 2002. This is present on his personal website, the relevant articles are no longer prominently linked anywhere, and Eliezer has openly and straightforwardly acknowledged that his predictions and beliefs from the relevant period were wrong. 

For the disputed examples, Eliezer still believes all of these arguments (as do I), so it would be disingenuous for Eliezer to "acknowledge his mixed track record" in this domain. You can either argue that he is wrong, or you can argue that he hasn't acknowledged that he has changed his mind and was previously wrong, but you can't both argue that Eliezer is currently wrong in his beliefs, and accuse him of not telling others that he is wrong. I want people to say things they believe. And for the only two cases where you have established that Eliezer has changed his mind, he has extensively acknowledged his track record.

Some comments on the overall post: 

I really dislike this post. I think it provides very little argument, and engages in extremely extensive cherry-picking in a way that does not produce a symmetric credit-allocation (i.e. most people who are likely to update downwards on Yudkowsky on the basis of this post, seem to me to be generically too trusting, and I am confident I can write a more compelling post about any other central figure in Effective Altruism that would likely cause you to update downwards even more). 

I think a good and useful framing on this post could have been "here are 3 points where I disagree with Eliezer on AI Risk" (I don't think it would have been useful under almost any circumstance to bring up the arguments from the year 2000). And then to primarily spend your time arguing about the concrete object-level. Not to start a post that is trying to say that Eliezer is "overconfident in his beliefs about AI" and "miscalibrated", and then to justify that by cherry-picking two examples from when Eliezer was barely no longer a teenager, and three arguments on which there is broad disagreement within the AI Alignment field.

I also dislike calling this post "On Deference and Yudkowsky's AI Risk Estimates", as if this post was trying to be an unbiased analysis of how much to defer to Eliezer, while you just list negative examples. I think this post is better named "against Yudkowsky on AI Risk estimates". Or "against Yudkowsky's track record in AI Risk Estimates". Which would have made it clear that you are selectively giving evidence for one side, and more clearly signposted that if someone was trying to evaluate Eliezer's track record, this post will only be a highly incomplete starting point. 

I have many more thoughts, but I think I've written enough for now. I think I am somewhat unlikely to engage with replies in much depth, because writing this comment has already taken up a lot of my time, and I expect given the framing of the post, discussion on the post to be unnecessarily conflicty and hard to navigate. 

Replies from: Pablo_Stafforini, bmg, Jan_Kulveit, Linch, Guy Raveh
comment by Pablo (Pablo_Stafforini) · 2022-06-19T19:20:39.175Z · EA(p) · GW(p)

It seems that half of these examples are from 15+ years ago, from a period for which Eliezer has explicitly disavowed his opinions

Just to note that the boldfaced part has no relevance in this context. The post is not attributing these views to present-day Yudkowsky. Rather, it is arguing that Yudkowsky's track record is less flattering than some people appear to believe. You can disavow an opinion that you once held, but this disavowal doesn't erase a bad prediction from your track record.

Replies from: Habryka
comment by Habryka · 2022-06-19T22:02:13.146Z · EA(p) · GW(p)

Hmm, I think that part definitely has relevance. Clearly we would trust Eliezer less if his response to that past writing was "I just got unlucky in my prediction, I still endorse the epistemological principles that gave rise to this prediction, and would make the same prediction, given the same evidence, today". 

If someone visibly learns from forecasting mistakes they make, that should clearly update us positively on them not repeating the same mistakes.

Replies from: bmg, David Johnston
comment by Ben Garfinkel (bmg) · 2022-06-19T22:56:42.247Z · EA(p) · GW(p)

If someone visibly learns from forecasting mistakes they make, that should clearly update us positively on them not repeating the same mistakes.

I suppose one of my main questions is whether he has visibly learned from the mistakes, in this case.

For example, I wasn't able to find a post or comment to the effect of "When I was younger, I spent of years of my life motivated by the belief that near-term extinction from nanotech was looming. I turned out to be wrong. Here's what I learned from that experience and how I've applied it to my forecasts of near-term existential risk from AI." Or a post or comment acknowledging his previous over-optimistic AI timelines and what he learned from them, when formulating his current seemingly short AI timelines.

(I genuinely could be missing these, since he has so much public writing.)

Replies from: Habryka
comment by Habryka · 2022-06-20T01:45:14.158Z · EA(p) · GW(p)

Eliezer writes a bit about his early AI timeline and nanotechnology opinions here [EA · GW], though it sure is a somewhat obscure reference that takes a bunch of context to parse:  

Luke Muehlhauser reading a previous draft of this (only sounding much more serious than this, because Luke Muehlhauser):  You know, there was this certain teenaged futurist who made some of his own predictions about AI timelines -

Eliezer:  I'd really rather not argue from that as a case in point.  I dislike people who screw up something themselves, and then argue like nobody else could possibly be more competent than they were.  I dislike even more people who change their mind about something when they turn 22, and then, for the rest of their lives, go around acting like they are now Very Mature Serious Adults who believe the thing that a Very Mature Serious Adult believes, so if you disagree with them about that thing they started believing at age 22, you must just need to wait to grow out of your extended childhood.

Luke Muehlhauser (still being paraphrased):  It seems like it ought to be acknowledged somehow.

Eliezer:  That's fair, yeah, I can see how someone might think it was relevant.  I just dislike how it potentially creates the appearance of trying to slyly sneak in an Argument From Reckless Youth that I regard as not only invalid but also incredibly distasteful.  You don't get to screw up yourself and then use that as an argument about how nobody else can do better.

Humbali:  Uh, what's the actual drama being subtweeted here?

Eliezer:  A certain teenaged futurist, who, for example, said in 1999, "The most realistic estimate for a seed AI transcendence is 2020; nanowar, before 2015."

Humbali:  This young man must surely be possessed of some very deep character defect, which I worry will prove to be of the sort that people almost never truly outgrow except in the rarest cases.  Why, he's not even putting a probability distribution over his mad soothsaying - how blatantly absurd can a person get?

Eliezer:  Dear child ignorant of history, your complaint is far too anachronistic.  This is 1999 we're talking about here; almost nobody is putting probability distributions on things, that element of your later subculture has not yet been introduced.  Eliezer-2002 hasn't been sent a copy of "Judgment Under Uncertainty" by Emil Gilliam.  Eliezer-2006 hasn't put his draft online for "Cognitive biases potentially affecting judgment of global risks".  The Sequences won't start until another year after that.  How would the forerunners of effective altruism in 1999 know about putting probability distributions on forecasts?  I haven't told them to do that yet!  We can give historical personages credit when they seem to somehow end up doing better than their surroundings would suggest; it is unreasonable to hold them to modern standards, or expect them to have finished refining those modern standards by the age of nineteen.

Though there's also a more subtle lesson you could learn, about how this young man turned out to still have a promising future ahead of him; which he retained at least in part by having a deliberate contempt for pretended dignity, allowing him to be plainly and simply wrong in a way that he noticed, without his having twisted himself up to avoid a prospect of embarrassment.  Instead of, for example, his evading such plain falsification by having dignifiedly wide Very Serious probability distributions centered on the same medians produced by the same basically bad thought processes.

But that was too much of a digression, when I tried to write it up; maybe later I'll post something separately.

While also including some other points, I do read it as a pretty straightforward "Yes, I was really wrong. I didn't know about cognitive biases, and I did not know about the virtue of putting probability distributions on things, and I had not thought enough about the art of thinking well. I would not make the same mistakes today.".

Replies from: Guy Raveh
comment by Guy Raveh · 2022-06-20T19:59:08.695Z · EA(p) · GW(p)

How would the forerunners of effective altruism in 1999 know about putting probability distributions on forecasts? I haven't told them to do that yet!

Did Yudkowsky actually write these sentences?

If Yudkowsky thinks, as this suggests, that people in EA think or do things because he tells them to - this alone means it's valuable to question whether people give him the right credibility.

Replies from: Habryka
comment by Habryka · 2022-06-20T20:01:07.141Z · EA(p) · GW(p)

I am not sure about the question. Yeah, this is a quote from the linked post, so he wrote those sections. 

Also, yeah, seems like Eliezer has had a very large effect on whether this community uses things like probability distributions, models things in a bayesian way, makes lots of bets, and pays attention to things like forecasting track records. I don't think he gets to take full credit for those norms, but my guess is he is the single individual who most gets to take credit for those norms.

Replies from: Guy Raveh, Halstead, HaydnBelfield
comment by Guy Raveh · 2022-06-20T20:15:30.096Z · EA(p) · GW(p)

I am not sure about the question.

I wanted to make sure I'm not missing something, since this shines a negative light about him IMO.

There's a difference between saying, for example, "You can't expect me to have done X then - nobody was doing it, and I haven't even written about it yet, nor was I aware of anyone else doing so" - and saying "... nobody was doing it because I haven't told them to."

This isn't about credit. It's about self-perception and social dynamics.

Replies from: Habryka
comment by Habryka · 2022-06-20T20:18:52.439Z · EA(p) · GW(p)

I mean... it is true that Eliezer really did shape the culture in the direction of forecasting and predictions and that kind of stuff. My best guess is that without Eliezer, we wouldn't have a culture of doing those things (and like, the AI Alignment community as is probably wouldn't exist). You might disagree with me and him on this, in which case sure, update in that direction, but I don't think it's a crazy opinion to hold.

Replies from: RyanCarey, Guy Raveh
comment by RyanCarey · 2022-06-20T22:54:03.809Z · EA(p) · GW(p)

My best guess is that without Eliezer, we wouldn't have a culture of [forecasting and predictions]

The timeline doesn't make sense for this version of events at all. Eliezer was uninformed on this topic in 1999, at a time when Robin Hanson had already written about gambling on scientific theories (1990), prediction markets (1996), and other betting-related topics, as you can see from the bibliography of his Futarchy paper (2000).  Before Eliezer wrote his sequences (2006-2009), the Long Now Foundation already had Long Bets (2003), and Tetlock had already written Expert Political Judgment (2005). 

If Eliezer had not written his sequences, forecasting content would have filtered through to the EA community from contacts of Hanson. For instance, through blogging by other GMU economists like Caplan (2009). And of course, through Jason Matheny, who worked at FHI, where Hanson was an affiliate. He ran the ACE project (2010), which led to the science behind Superforecasting, a book that the EA community would certainly have discovered.

Replies from: Habryka
comment by Habryka · 2022-06-20T23:58:20.892Z · EA(p) · GW(p)

Hmm, I think these are good points. My best guess is that I don't think we would have a strong connection to Hanson without Eliezer, though I agree that that kind of credit is harder to allocate (and it gets fuzzy what we even mean by "this community" as we extend into counterfactuals like this). 

I do think the timeline here provides decent evidence in favor of less credit allocation (and I think against the stronger claim "we wouldn't have a culture of [forecasting and predictions] without Eliezer"). My guess is in terms of causing that culture to take hold, Eliezer is probably still the single most-responsible individual, though I do now expect (after having looked into a bunch of comment threads from 1996 to 1999 and seeing many familiar faces show up) that a lot of the culture would show up without Eliezer.

Replies from: Halstead
comment by John G. Halstead (Halstead) · 2022-06-21T09:43:42.634Z · EA(p) · GW(p)

speaking for myself, eliezer has played no role in encouraging me to give quantitative probability distributions. For me, that was almost entirely due to people like Tetlock and Bryan Caplan, both of whom I would have encountered regardless of Eliezer. I strongly suspect this is true of lots of people who are in EA but don't identify with the rationalist community

More generally, I do think that Eliezer and other rationalists overestimate how much influence they have had on wider views in the community. eg I have not read the sequences and I just don't think it plays a big role in the internal story of a lot of EAs. 

Replies from: Halstead, bec_hawk
comment by John G. Halstead (Halstead) · 2022-06-21T10:36:10.211Z · EA(p) · GW(p)

For me, even people like Nate Silver or David McKay, who aren't part of the community, have played a bigger role on encouraging quantification and probabilistic judgment.  

comment by Rebecca (bec_hawk) · 2022-06-24T18:19:55.637Z · EA(p) · GW(p)

This is my impression and experience as well

comment by Guy Raveh · 2022-06-20T20:45:50.830Z · EA(p) · GW(p)

I'll currently take your word for that because I haven't been here nearly as long. I'll mention that some of these contributions I don't necessarily consider positive.

But the point is, is Yudkowsky a (major) contributor to a shared project, or is he a ruler directing others, like his quote suggests? How does he view himself? How do the different communities involved view him?

P.S. I disagree with whoever (strong-)downvoted your comment.

Replies from: hibukki, D0TheMath
comment by Yonatan Cale (hibukki) · 2022-06-21T01:09:31.353Z · EA(p) · GW(p)
  1. Yudkowsky often complains rants hopes people will form their own opinions instead of just listening to him, I can find references if you want.
  2. I also think he lately finds it depressing worrying that he's got to be the responsible adult. Easy references: Search for "Eliezer" in List Of Lethalities [LW · GW].
Replies from: Guy Raveh
comment by Guy Raveh · 2022-06-21T12:58:51.222Z · EA(p) · GW(p)

I also think he lately finds it depressing worrying that he's got to be the responsible adult. Easy references: Search for "Eliezer" in List Of Lethalities

I think this strengthens my point, especially given how it is written in the post you linked. Telling people you're the responsible adult, or the only one who notices things, still means telling them you're smarter than them and they should just defer to you.

I'm trying to account for my biases in these comments, but I encourage others to go to that post, search for "Eliezer" as you suggested, and form their own views.

Replies from: RobBensinger
comment by RobBensinger · 2022-06-23T05:30:46.235Z · EA(p) · GW(p)

Telling people you're the responsible adult, or the only one who notices things, still means telling them you're smarter than them and they should just defer to you.

Those are four very different claims. In general, I think it's bad to collapse all (real or claimed) differences in ability into a single status hierarchy, for the reasons stated in Inadequate Equilibria.

Eliezer is claiming that other people are not taking the problem sufficiently seriously, claiming ownership of it, trying to form their own detailed models of the full problem, and applying enough rigor and clarity to make real progress on the problem.

He is specifically not saying "just defer to me", and in fact is saying that he and everyone else is going to die if people rely on deference here. A core claim in AGI Ruin [LW · GW] is that we need more people with "not the ability to read this document and nod along with it, but the ability to spontaneously write it from scratch without anybody else prompting you".

Deferring to Eliezer means that Eliezer is the bottleneck on humanity solving the alignment problem; which means we die. The thing Eliezer claims we need is a larger set of people who arrive at true, deep, novel insights about the problem on their own —without Eliezer even mentioning the insights, much less spending a ton of time trying to persuade anyone of them—and writing them up.

It's true that Eliezer endorses his current stated beliefs; this goes without saying, or he obviously wouldn't have written them down. It doesn't mean that he thinks humanity has any path to survival via deferring to him, or that he thinks he has figured out enough of the core problems (or ever could conceivably could do so, on his own) to give humanity a significant chance of surviving. Quoting AGI Ruin:

It's guaranteed that some of my analysis is mistaken, though not necessarily in a hopeful direction.  The ability to do new basic work noticing and fixing those flaws is the same ability as the ability to write this document before I published it[.]

The end of the "death with dignity [LW · GW]" post is also alluding to Eliezer's view that it's pretty useless to figure out what's true merely via deferring to Eliezer.

Replies from: Guy Raveh
comment by Guy Raveh · 2022-06-23T07:01:24.861Z · EA(p) · GW(p)

Thanks, those are some good counterpoints.

comment by D0TheMath · 2022-06-22T14:37:19.550Z · EA(p) · GW(p)

Eliezer is cleanly just a major contributor. If he went off the rails tomorrow, some people would follow him (and the community would be better with those few gone), but the vast majority would say “wtf is that Eliezer fellow doing”. I also don’t think he sees himself as the leader of the community either.

Probably Eliezer likes Eliezer more than EA/Rationality likes Eliezer, because Eliezer really likes Eliezer. If I were as smart & good at starting social movements as Eliezer, I’d probably also have an inflated ego, so I don’t take it as too unreasonable of a character flaw.

comment by John G. Halstead (Halstead) · 2022-06-21T10:31:33.613Z · EA(p) · GW(p)

I don't see how he has encouraged people to pay attention to forecasting track records. People who have encouraged that norm make public bets or go on public forecasting platforms and make predictions about questions that can resolve in the short term. Bryan Caplan does this; I think greg Lewis and David Manheim are superforecasters. 

I thought the upshot of this piece and the Jotto post was that Yudkowsky is in fact very dismissive of people who make public forecasts. "I consider naming particular years to be a cognitively harmful sort of activity; I have refrained from trying to translate my brain's native intuitions about this into probabilities, for fear that my verbalized probabilities will be stupider than my intuitions if I try to put weight on them." This seems like the opposite of encouraging people to pay attention to forecasting but is rather dismissing the whole enterprise of forecasting. 

comment by HaydnBelfield · 2022-06-20T20:20:56.713Z · EA(p) · GW(p)

More than Philip Tetlock (author of Superforecasting)?

Does that particular quote from Yudkowsky not strike you as slightly arrogant?

Replies from: Habryka
comment by Habryka · 2022-06-20T20:46:47.586Z · EA(p) · GW(p)

Yes, definitely much more than Philip Tetlock, given that our community had strong norms of forecasting and making bets before Tetlock had done most of his work on the topic (Expert Political Forecasting was out, but as far as I can tell was not a major influence on people in the community, though I am not totally confident of that).

Does that particular quote from Yudkowsky not strike you as slightly arrogant?

I am generally strongly against a culture of fake modesty. If I want people to make good decisions, they need to be able to believe things about them that might sound arrogant to others. Yes, it sounds arrogant to an external audience, but it also seems true, and it seems like whether it is true should be the dominant fact on whether it is good to say.

comment by David Johnston · 2022-06-20T22:10:02.671Z · EA(p) · GW(p)

FWIW I think "it was 20 years ago" is a good reason not to take these failed predictions too seriously, and "he has disavowed these predictions after seeing they were false" is a bad reason to take them unseriously.

comment by Ben Garfinkel (bmg) · 2022-06-19T19:19:32.752Z · EA(p) · GW(p)

On 1 (the nanotech case):

I want to remind any reader that this is an opinion from 1999, when Eliezer was barely 20 years old.

I think your comment might give the misimpression that I don't discuss this fact in the post or explain why I include the case. What I write is:

I should, once again, emphasize that Yudkowsky was around twenty when he did the final updates on this essay. In that sense, it might be unfair to bring this very old example up.

Nonetheless, I do think this case can be treated as informative, since: the belief was so analogous to his current belief about AI (a high outlier credence in near-term doom from an emerging technology), since he had thought a lot about the subject and was already highly engaged in the relevant intellectual community, since it's not clear when he dropped the belief, and since twenty isn't (in my view) actually all that young. I do know a lot of people in their early twenties; I think their current work and styles of thought are likely to be predictive of their work and styles of thought in the future, even though I do of course expect the quality to go up over time....

An addition reason why I think it's worth distinguishing between his views on nanotech and (e.g.) your views on nuclear power: I think there's a difference between an off-hand view picked up from other people vs. a fairly idiosyncratic view that you consciously adopted after a lot of reflection and that you decide to devote your professional life to and found an organization to address.

It's definitely up to the reader to decide how relevant the nanotech case is. Since it's not widely known, it seems at least pretty plausibly relevant, and the post twice flags his age at the time, I do still endorse including it.

At face value, as well: we're trying to assess how much weight to give to someone's extreme, outlier-ish prediction that an emerging technology is almost certain to kill everyone very soon. It just does seem very relevant, to me, that they previously had a different extreme outlier-ish prediction that another emerging technology was very likely kill everyone within a decade.

I don't find it plausible that we should assign basically no significance to this.

On 6 (the question of whether Yudkowsky has acknowledged negative aspects of his track record):

For the two "clear cut" examples, Eliezer has posted dozens of times on the internet that he has disendorsed his views from before 2002. This is present on his personal website, the relevant articles are no longer prominently linked anywhere, and Eliezer has openly and straightforwardly acknowledged that his predictions and beliefs from the relevant period were wrong.

Similarly, I think your comment may give the impression that I don't discuss this point in the post. What I write is this:

He has written about mistakes from early on in his intellectual life (particularly pre-2003) and has, on this basis, even made a blanket-statement disavowing his pre-2003 work. However, based on my memory and a quick re-read/re-skim, this writing is an exploration of why it took him a long time to become extremely concerned about existential risks from misaligned AI. For instance, the main issue it discusses with his plans to build AGI are that these plans didn't take into account the difficulty and importance of ensuring alignment. This writing isn't, I think, an exploration or acknowledgement of the kinds of mistakes I've listed in this post.

On the general point that this post uses old examples:

Give the sorts of predictions involved (forecasts about pathways to transformative technologies), old examples are generally going to be more unambiguous than new examples. Similarly for risk arguments: it's hard to have a sense of how new arguments are going to hold up. It's only for older arguments that we can start to approach the ability to say that technological progress, progress in arguments, and evolving community opinion say something clear-ish about how strong the arguments were.

On signposting:

I also dislike calling this post "On Deference and Yudkowsky's AI Risk Estimates", as if this post was trying to be an unbiased analysis of how much to defer to Eliezer, while you just list negative examples. I think this post is better named "against Yudkowsky on AI Risk estimates". Or "against Yudkowsky's track record in AI Risk Estimates". Which would have made it clear that you are selectively giving evidence for one side, and more clearly signposted that if someone was trying to evaluate Eliezer's track record, this post will only be a highly incomplete starting point.

I think it's possible another title would have been better (I chose a purposely bland one partly for the purpose of trying to reduce heat - and that might have been a mistake). But I do think I signpost what the post is doing fairly clearly.

The introduction says it's focusing on "negative aspects" of Yudkowsky's track record, the section heading for the section introducing the examples describes them as "cherry-picked," and the start of the section introducing the examples has an italicized paragraph re-emphasizing that the examples are selective and commenting on the significance of this selectiveness.

On the role of the fast take-off assumption in classic arguments:

I think the arguments are pretty tight and sufficient to establish the basic risk argument. I found your critique relatively uncompelling. In particular, I think you are misrepresenting that a premise of the original arguments was a fast takeoff.

I disagree with this. I do think it's fair to say that fast take-off was typically a premise of the classic arguments.

Two examples I have off-hand (since they're in the slides from my talk) are from Yudkowsky's exchange with Caplan and from Superintelligence. Superintelligence isn't by Yudkowsky, of course, but hopefully is still meaningful to include (insofar as Superintelligence heavily drew on Yudkowsky's work and was often accepted as a kind of distillation of the best arguments as they existed at the time).

From Yudkowsky's debate with Caplan (2016):

“I’d ask which of the following statements Bryan Caplan [a critic of AI risk arguments] denies:

  1. Orthogonality thesis: Intelligence can be directed toward any compact goal….

  2. Instrumental convergence: An AI doesn’t need to specifically hate you to hurt you; a paperclip maximizer doesn’t hate you but you’re made out of atoms that it can use to make paperclips, so leaving you alive represents an opportunity cost and a number of foregone paperclips….

  3. Rapid capability gain and large capability differences: Under scenarios seeming more plausible than not, there’s the possibility of AIs gaining in capability very rapidly, achieving large absolute differences of capability, or some mixture of the two….

  4. 1-3 in combination imply that Unfriendly AI is a critical problem-to-be-solved, because AGI is not automatically nice, by default does things we regard as harmful, and will have avenues leading up to great intelligence and power.”

(Caveat that the fast-take-off premise is stated a bit ambiguity here, so it's not clear what level of rapidness is being assumed.)

From Superintelligence:

Taken together, these three points [decisive strategic advantage, orthogonality, and instrumental convergence] thus indicate that the first superintelligence may shape the future of Earth-originating life, could easily have non-anthropomorphic final goals, and would likely have instrumental reasons to pursue open-ended resource acquisition. If we now reflect that human beings consist of useful resources (such as conveniently located atoms) and that we depend for our survival and flourishing on many more local resources, we can see that the outcome could easily be one in which humanity quickly becomes extinct.

The decisive strategic advantage point is justified through a discussion of the possibility of a fast take-off. The first chapter of the book also starts by introducing the possibility of an intelligence explosion. It then devotes two chapters to the possibility of a fast take-off and the idea this might imply a decisive strategic advantage, before it gets to discussing things like the orthogonality thesis.

I think it's also relevant that content from MIRI and people associated with MIRI, raising the possibility of extinction from AI, tended to very strongly emphasize (e.g. spend most of its time on) the possibility of a run-away intelligence explosion. The most developed classic pieces arguing for AI risk often have names like "Shaping the Intelligence Explosion," "Intelligence Explosion: Evidence and import," "Intelligence Explosion Microeconomics," and "Facing the Intelligence Explosion."

Overall, then, I do think it's fair to consider a fast-takeoff to be a core premise of the classic arguments. It wasn't incidental or a secondary consideration.

[[Note: I've edited my comment, here, to respond to additional points. Although there are still some I haven't responded to yet.]]

Replies from: Habryka
comment by Habryka · 2022-06-20T01:39:13.680Z · EA(p) · GW(p)

One quick response, since it was easy (might respond more later): 

Overall, then, I do think it's fair to consider a fast-takeoff to be a core premise of the classic arguments. It wasn't incidental or a secondary consideration.

I do think takeoff speeds between 1 week and 10 years are a core premise of the classic arguments. I do think the situation looks very different if we spend 5+ years in the human domain, but I don't think there are many who believe that that is going to happen. 

I don't think the distinction between 1 week and 1 year is that relevant to the core argument for AI Risk, since it seems in either case more than enough cause for likely doom, and that premise seems very likely to be true to me. I do think Eliezer believes things more on the order of 1 week than 1 year, but I don't think the basic argument structure is that different in either case (though I do agree that the 1 year opens us up to some more potential mitigating strategies).

comment by Jan_Kulveit · 2022-06-20T03:37:34.712Z · EA(p) · GW(p)

(i.e. most people who are likely to update downwards on Yudkowsky on the basis of this post, seem to me to be generically too trusting, and I am confident I can write a more compelling post about any other central figure in Effective Altruism that would likely cause you to update downwards even more)


My impression is the post is somewhat unfortunate attempt to "patch" the situation in which many generically too trusting people updated a lot on AGI Ruin: A List of Lethalities  and Death with Dignity  and subsequent deference/update cascades. 

In my view the deeper problem here is instead of disagreements about model internals, many of these people do some sort of "averaging conclusions" move, based on signals like seniority, karma, vibes, etc. 

Many of these signals are currently wildly off from truth-tracking, so you get attempts to push the conclusion-updates directly. 


 

comment by Linch · 2022-06-20T21:54:58.903Z · EA(p) · GW(p)

This critique strikes me as about as sensible as digging up someone's old high-school essays and critiquing their stance on communism or the criminal justice system. I want to remind any reader that this is an opinion from 1999, when Eliezer was barely 20 years old. I am confident I can find crazier and worse opinions for every single leadership figure in Effective Altruism, if I am willing to go back to what they thought while they were in high-school. To give some character, here are some things I believed in my early high-school years

This is really minor and nitpicky, and I agree with much of your overall points, but I don't think equivocating between "barely 20" and "early high-school" is fair. The former is a normal age to be a third-year university student in the US, and plenty of college-age EAs are taken quite seriously by the rest of us.

Replies from: Habryka
comment by Habryka · 2022-06-21T00:06:53.860Z · EA(p) · GW(p)

Oh, hmm, I think this is just me messing up the differences between the U.S. and german education systems (I was 18 and 19 in high-school, and enrolled in college when I was 20). 

I think the first quote on nanotechnology was actually written in 1996 originally (though was maybe updated in 1999). Which would put Eliezer at ~17 years old when he wrote that. 

The second quote was I think written in more like 2000, which would put him more in the early college years, and I agree that it seems good to clarify that. 

Replies from: Linch
comment by Linch · 2022-06-21T16:15:48.860Z · EA(p) · GW(p)

Thank you, this clarification makes sense to me! 

comment by Guy Raveh · 2022-06-20T18:29:37.659Z · EA(p) · GW(p)

I am confident I can write a more compelling post about any other central figure in Effective Altruism that would likely cause you to update downwards even more

If done in a polite and respectful manner, I think this would be a genuinely good idea.

comment by gwern · 2022-06-19T16:56:49.075Z · EA(p) · GW(p)

Not sure why this is on EAF rather than LW or maybe AF, but anyway. I find this interesting to look at because I have been following Eliezer's work since approximately 2003 on SL4, and so I remember this firsthand, as it were. I disagree with several of the evaluations here (but of course agree with several of the others - I found the premise of Flare to be ludicrous at the time, and thankfully, AFAICT, pretty much zero effort went into that vaporware*):

  • calling LOGI and related articles 'wrong' because that's not how DL looks right now is itself wrong. Yudkowsky has never said that DL or evolutionary approaches couldn't work, or that all future AI work would look like the Bayesian program and logical approach he favored; he's said (consistently since at least SL4 that I've observed) that they would be extremely dangerous when they worked, and extremely hard to make safe to the high probability that we need them to when deployed to the real world indefinitely and unboundedly and self-modifyingly, and that rigorous program-proof approaches which can make formal logical guarantees of 100% safety are what are necessary and must deal with the issues and concepts discussed in LOGI. I think this is true: they do look extremely dangerous by default, and we still do not have adequate solutions to problems like "how do we talk about human values in a way which doesn't hardwire them dangerously into a reward function which can't be changed?" This is something actively researched now in RL & AI safety, and which continues to lack any solution you could call even 'decent'. (If you have ever been surprised by any result from causal influence diagrams, then you have inadvertently demonstrated the value of this.) More broadly, we still do not have any good proof or approach that we can feasibly engineer any of that with prosaic alignment approaches, which tend towards the 'patch bugs as you find them' or 'make systems so complex you can't immediately think of how they fail' approach to security that we already knew back then was a miserable failure. Eliezer hasn't been shown to be wrong here.

  • I continue to be amazed anyone can look at the past decade of DL and think that Hanson is strongly vindicated by it, rather than Yudkowsky-esque views. (Take a look at his OB posts on AI the past few years. Hanson is not exactly running victory laps, either on DL, foom, or ems. It would be too harsh to compare him to Gary Marcus... but I've seen at least one person do so anyway.) I would also say that to the extent that Yudkowsky-style research has enjoyed any popularity of late, it's because people have been looking at the old debate and realizing that extremely simple generic architectures written down in a few dozen lines of code, with large capability differences between very similar lines of code, solving many problems in many fields and subsuming entire subfields as simply another minor variant, with large generalizing models (as opposed to the very strong small-models-unique-to-each-individual-problem-solved-case-by-case-by-subject-experts which Hanson & Drexler strongly advocated and which was the ML mainstream at the time) powered by OOMs more compute, steadily increasing in agency, is a short description of Yudkowsky's views on what the runup will look like and how DL now works.

  • "his arguments focused on a fairly specific catastrophe scenario that most researchers now assign less weight to than they did when they first entered the field."

    Yet, the number who take it seriously since Eliezer started advocating it in the 1990s is now far greater than it was when he started and was approximately the only person anywhere. You aren't taking seriously that these surveyed researchers ("AI Impacts, CHAI, CLR, CSER, CSET, FHI, FLI, GCRI, MILA, MIRI, Open Philanthropy and PAI") wouldn't exist without Eliezer as he created the AI safety field as we know it, with everyone else downstream (like Bostrom's influential Superintelligence - Eliezer with the serial numbers filed off and an Oxford logo added). This is missing the forest for a few trees; if you are going to argue that a bit of regression to the mean in extreme beliefs should be taken as some evidence against Eliezer, then you must also count the initial extremity of the beliefs leading to these NGOs doing AI safety & people at them doing AI safety at all as much evidence for Eliezer.† (What a perverse instance of Simpson's paradox.)

    There's also the caveat mentioned there that the reduction may simply be because they have moved up other scenarios like the part 2 scenario where it's not a singleton hard takeoff but a multipolar scenario (a distinction of great comfort, I'm sure), which is a scenario which over the past few years is certainly looking more probable due to how DL scaling and arms races work. (In particular, we've seen some fast followups - because the algorithms are so simple that once you hear the idea described at all, you know most of it.) I didn't take the survey & don't work at the listed NGOs, but I would point out that if I had gone pro sometime in the past decade & taken it, under your interpretation of this statistic, you would conclude "Gwern now thinks Eliezer was wrong". Something to think about, especially if you want to consider observations like "this statistic claims most people are moving away from Eliezer's views, even though when I look at discussions of scaling, research trends, and what startups/NGOs are being founded, it sure looks like the opposite..."

* Flare has been, like Roko's Basilisk, one of those things where the afterlife of it has been vastly greater than the thing itself ever was, and where it gets employed in mutually contradictory ways by critics

† I find it difficult to convey what incredibly hot garbage AI researcher opinions in the '90s were about these topics. And I don't mean the casual projections that AGI would take until 2500 AD or whatever, I mean basics like the orthogonality thesis and instrumental drives. Like 'transhumanism', these are terms used in inverse proportion to how much people need them. Even on SL4, which was the fringiest of the fringe in AI alarmism, you had plenty of people reading and saying, "no, there's no problem here at all, any AI will just automatically be friendly and safe, human moral values aren't fragile or need to be learned, they're just, like, a law of physics and any evolving system will embody our values". If you ever wonder how old people in AI like Kurzweil or Schmidhuber can be so gungho about the prospect of AGI happening and replacing (ie. killing) humanity and why they have zero interest in AI safety/alignment, it's because they think that this is a good thing and our mind-children will just automatically be like us but better and this is evolution. ("Say, doth the dull soil / Quarrel with the proud forests it hath fed, / And feedeth still, more comely than itself?"...) If your response to reading this is, "gwern, do you have a cite for all of that? because no real person could possibly believe such a both deeply naive and also colossally evil strawman", well, perhaps that will convey some sense of the intellectual distance traveled.

Replies from: RyanCarey, bmg, AllAmericanBreakfast, Samuel Shadrach, Charles He
comment by RyanCarey · 2022-06-19T19:27:13.947Z · EA(p) · GW(p)

like Bostrom's influential Superintelligence - Eliezer with the serial numbers filed off and an Oxford logo added

It's not accurate that the key ideas of Superintelligence came to Bostrom from Eliezer, who originated them. Rather, at least some of the main ideas came to Eliezer from Nick. For instance, in one message from Nick to Eliezer on the Extropians mailing list, dated to Dec 6th 1998, inline quotations show Eliezer arguing that it would be good to allow a superintelligent AI system to choose own its morality. Nick responds that it's possible for an AI system to be highly intelligent without being motivated to act morally. In other words, Nick explains to Eliezer an early version of the orthogonality thesis.

Nick was not lagging behind Eliezer on evaluating the ideal timing of a singularity, either - the same thread reveals that they both had some grasp of the issue. Nick said that the fact that 150,000 people die per day must be contextualised against "the total number of sentiences that have died or may come to live", foreshadowing his piece on Astronomical Waste, that would be published five years later. Eliezer said that having waited billions of years, the probability of a success is more important than any delay of hundreds of years.

These are indeed two of the most-important macrostrategy insights relating to AI. A reasonable guess is that a lot of the big ideas in Superintelligence were discovered by Bostrom. Some surely came from Eliezer and his sequences, or from discussions between the two, and I suppose that some came from other utilitarians and extropians.

Replies from: Ben Pace
comment by Ben Pace · 2022-06-19T21:22:34.238Z · EA(p) · GW(p)

I think chapter 4, The Kinetics of an Intelligence Explosion, has a lot of terms and arguments from EY's posts in the FOOM Debate. (I've been surprised by this in the past, thinking Bostrom invented the terms, then finding things like resource overhangs getting explicitly defined in the FOOM Debate.)

comment by Ben Garfinkel (bmg) · 2022-06-19T18:22:12.656Z · EA(p) · GW(p)

Thanks for the comment! A lot of this is useful.

calling LOGI and related articles 'wrong' because that's not how DL looks right now is itself wrong. Yudkowsky has never said that DL or evolutionary approaches couldn't work, or that all future AI work would look like the Bayesian program and logical approach he favored;

I mainly have the impression that LOGI and related articles were probably "wrong" because, so far as I've seen, nothing significant has been built on top of them in the intervening decade-and-half (even though LOGI's successor was seemingly predicted to make it possible for a small group to build AGI). It doesn't seem like there's any sign that these articles were the start of a promising path to AGI that was simply slower than the deep learning path.

I have had the impression, though, that Yudkowsky also thought that logical/Bayesian approaches were in general more powerful/likely-to-enable-near-term-AGI (not just less safe) than DL. It's totally possible this is a misimpression - and I'd be inclined to trust your impression over mine, since you've read more of his old writing than I have. (I'd also be interested if you happen to have any links handy.) But I'm not sure this significantly undermine the relevance of the LOGI case.

I continue to be amazed anyone can look at the past decade of DL and think that Hanson is strongly vindicated by it, rather than Yudkowsky-esque views.

I also think that, in various ways, Hanson also doesn't come off great. For example, he expresses a favorable attitude toward the CYC project, which now looks like a clear dead end. He is also overly bullish about the importance of having lots of different modules. So I mostly don't want to defend the view "Hanson had a great performance in the FOOM debate."

I do think, though, his abstract view that compute and content (i.e. data) are centrally important are closer to mark than Yudkowsky's expressed view. I think it does seem hard to defend Yudkowsky's view that it's possible for a programming team (with mid-2000s levels of compute) to acquire some "deep new insights," go down into their basement, and then create an AI system that springboards itself into taking over the world. At least - I think it's fair to say - the arguments weren't strong enough to justify a lot of confidence in that view.

Yet, the number who take it seriously since Eliezer started advocating it is now far greater than it was when he started and was approximately the only person anywhere. You aren't taking seriously that these surveyed researchers ("AI Impacts, CHAI, CLR, CSER, CSET, FHI, FLI, GCRI, MILA, MIRI, Open Philanthropy and PAI") wouldn't exist without Eliezer as he created the AI safety field as we know it, with everyone else downstream (like Bostrom's influential Superintelligence - Eliezer with the serial numbers filed off and an Oxford logo added).

This is certainly a positive aspect of his track-record - that many people have now moved closer to his views. (It also suggests that his writing was, in expectation, a major positive contribution to the project of existential risk reduction - insofar as this writing has helped move people up and we assume this was the right direction to move.) But it doesn't imply that we should give him many more "Bayes points" to him than we give to the people who moved.

Suppose, for example, that someone says in 2020 that there was a 50% chance of full-scale nuclear war in the next five years. Then - due to Russia's invasion of Ukraine - most people move their credences upward (although they still remained closer to 0% than 50%). Does that imply the person giving the early warning was better-calibrated than the people who moved their estimates up? I don't think so. And I think - in this nuclear case - some analysis can be used to justify the view that the person giving the early warning was probably overconfident; they probably didn't have enough evidence or good enough arguments to actually justify a 50% credence.

It may still be the case that the person giving the early warning (in the hypothetical nuclear case) had some valuable and neglected insights, missed by others, that are well worth paying attention to and seriously reflecting on; but that's a different matter from believing they were overall well-calibrated or should be deferred to much more than the people who moved.

[[EDIT: Something else it might be worth emphasizing, here, is that I'm not arguing for the view "ignore Eliezer." It's closer to "don't give Eliezer's views outsized weight, compared to (e.g.) the views of the next dozen people you might be inclined to defer to, and factor in evidence that his risk estimates might have a sigificant upward bias to them."]]

comment by AllAmericanBreakfast · 2022-06-20T02:27:18.563Z · EA(p) · GW(p)

I'm going to break a sentence from your comment here into bits for inspection. Also, emphasis and elisions mine.

I would also say that to the extent that Yudkowsky-style research has enjoyed any popularity of late, it's because people have been looking at the old debate and realizing that

  • extremely simple generic architectures written down in a few dozen lines of code
  • with large capability differences between very similar lines of code
  • solving many problems in many fields and subsuming entire subfields as simply another minor variant
  • with large generalizing models...
  • powered by OOMs more compute
  • steadily increasing in agency

is

  • a short description of Yudkowsky's views on what the runup will look like
  • and how DL now works.

We don't have a formalism to describe what "agency" is. We do have several posts trying to define it on the Alignment Forum:

While it might not be the best choice, I'm going to use Gradations of Agency as a definition, because it's more systematic in its presentation.

"Level 3" is described as "Armed with this ability you can learn not just from your own experience, but from the experience of others—you can identify successful others and imitate them."

This doesn't seem like what any ML model does. So  we can look at "Level 2," which gives the example " You start off reacting randomly to inputs, but you learn to run from red things and towards green things because when you ran towards red things you got negative reward and when you ran towards green things you got positive reward."

This seems like how all ML works.

So using the "Gradations of Agency" framework, we might view individual ML systems as improving in power and generality within a single level of agency. But they don't appear to be changing levels of agency. They aren't identifying other successful ML models and imitating them.

Gradations of Agency doesn't argue whether or not there is an asymptote of power and generality within each level. Is there a limit to the power and generality possible within level 2, where all ML seems to reside?

This seems to be the crux of the issue. If DL is approaching an asymptote of power and generality below that of AGI as model and data sizes increase, then this cuts directly against Yudkowsky's predictions. On the other hand, if we think that DL can scale to AGI through model and data size increases alone, then that would be right in line with his predictions.

A 10 trillion parameter model now exists, and it's been suggested that a 100 trillion parameter model, which might even be created this year, might be roughly comparable to the power of the human brain.

It's scary to see that we're racing full-on toward a very near-term ML project that might plausibly be AGI. However, if a 100-trillion parameter ML model is not AGI, then we'd have two strikes against Yudkowski. If neither a small coded model nor a 100-trillion parameter trained model using 2022-era ML results in AGI, then I think we have to take a hard look at his track record on predicting what technology is likely to result in AGI. We also have his "AGI well before 2050" statement from "Beware boasting" to work with, although that's not much help.

On the other hand, I think his assertiveness about the importance of AI safety and risk is appropriate even if he proves wrong about the technology by which AGI will be created.

I would critique the OP, however, for not being sufficiently precise in its critiques of Yudkowsky. As its "fairly clearcut examples," it uses 20+-year-old predictions that Yudkowsky has explicitly disavowed. Then, at the end, it complains that he hasn't "acknowledged his mixed track record." Yet in the post it links, Yudkowsky's quoted as saying:

To be a slightly better Bayesian is to spend your entire life watching others slowly update in excruciatingly predictable directions that you jumped ahead of 6 years earlier so that your remaining life could be a random epistemic walk like a sane person with self-respect.

6 years is not 20 years. It's perfectly consistent to say that a youthful, 20+-years-in-the-past version  of you thought wrongly about a topic, but that you've since come to be so much better at making predictions within your field that you're 6 years ahead of Metaculus. We might wish he'd stated these predictions in public and specified  what they were. But his failure to do so doesn't make him wrong, but rather lacking evidence of his superior forecasting ability. These are distinct failure modes.

Overall, I think it's wrong to conflate "Yudkowsky was wrong 20+ years ago in his youth" with "not everyone in AI safety agrees with Yudkowsky" with "Yudkowsky hasn't made many recent, falsifiable near-term public predictions about AI timelines." I think this is a fair critique of the OP, which claims to be interrogating Yudkowsky's "track record."

But I do agree that it's wise for a non-expert to defer to a portfolio of well-chosen experts, rather than the views of the originator of the field alone. While I don't love the argument the OP used to get there, I do agree with the conclusion, which strikes me as just plain common sense.

Replies from: kokotajlod, Charles He
comment by kokotajlod · 2022-06-20T04:38:22.003Z · EA(p) · GW(p)

Re gradations of agency: Level 3 and level 4 seem within reach IMO. IIRC there are already some examples of neural nets being trained to watch other actors in some simulated environment and then imitate them. Also, model-based planning (i.e. level 4) is very much a thing, albeit something that human programmers seem to have to hard-code. I predict that within 5 years there will be systems which are unambiguously in level 3 and level 4, even if they aren't perfect at it (hey, we humans aren't perfect at it either).

comment by Charles He · 2022-06-20T05:58:56.300Z · EA(p) · GW(p)

Level 3" is described as "Armed with this ability you can learn not just from your own experience, but from the experience of others—you can identify successful others and imitate them." This doesn't seem like what any ML model does.

This sounds like straightforward transfer learning (TL) or fine tuning, common in 2017.

So you could just write 15 lines of python which shops between some set of pretrained weights and sees how they perform. Often TL is many times (1000x) faster than random weights and only needs a few examples.

As speculation: it seems like in one of the agent simulations you can just have agents grab other agents weights or layers and try them out in a strategic way (when they detect an impasse or new environment or something). There is an analogy to biology where species alternate between asexual vs sexual reproduction, and trading of genetic material occurs during periods of adversity. (This is trivial, I’m sure a second year student has written a lot more.)

This doesn’t seem to fit any sort of agent framework or improve agency though. It just makes you train faster.

Replies from: Charles He
comment by Charles He · 2022-06-20T07:06:55.015Z · EA(p) · GW(p)

Eh, there seems like a connection to interpretability.

For example, if the ML architecture “were modular+categorized or legible to the agents”, they would more quickly and effectively swap weights or models.

So there might be some way where legibility can emerge by selection pressure in an environment where say, agents had limited capacity to store weights or data, and had to constantly and extensively share weights with each other. You could imagine teams of agents surviving and proliferating by a shared architecture that let them pass this data fluently in the form of weights.

To make sure the transmission mechanism itself isn’t crazy baroque you can, like, use some sort of regularization or something.

I’m 90% sure this is a shower thought but like it can’t be worse than “The Great Reflection”.

comment by acylhalide (Samuel Shadrach) · 2022-06-23T07:06:24.705Z · EA(p) · GW(p)

Not sure why this is on EAF rather than LW or maybe AF, but anyway

One obvious answer is LW community and mods tend to defer to yudkowksy more than EAF connunity.

(This doesn't argue whether the deferrence is good or bad, but this difference is a fact about reality I think)

comment by Charles He · 2022-06-19T17:42:37.561Z · EA(p) · GW(p)

Eh.

The above seems voluminous and I believe this is the written output with the goal of defending a person.

I will reluctantly engage directly,  instead of just launching into  another class of arguments or something or go for a walk (I'm being blocked by moral maze sort of reasons and unseasonable weather).

 

You aren't taking seriously that these surveyed researchers ("AI Impacts, CHAI, CLR, CSER, CSET, FHI, FLI, GCRI, MILA, MIRI, Open Philanthropy and PAI") wouldn't exist without Eliezer as he created the AI safety field as we know it

Yeah, no, it's the exact opposite.

So one dude, who only has a degree in social studies, but seems to write well, wrote this:

https://docs.google.com/document/d/1hKZNRSLm7zubKZmfA7vsXvkIofprQLGUoW43CYXPRrk/edit#

I'm copying a screenshot to show the highlighting isn't mine:

 

This isn't what is written or is said, but using other experience  unrelated to EA or anyone in it, I'm really sure even a median thought leader would have better convinced the person written this.

  • So they lost 4 years of support (until Superintelligence was written)
Replies from: gwern, Lizka
comment by gwern · 2022-06-19T18:13:57.940Z · EA(p) · GW(p)

The above seems voluminous and I believe this is the written output with the goal of defending a person.

Yes, much like the OP is voluminous and is the written output with the goal of criticizing a person. You're familiar with such writings, as you've written enough criticizing me [EA(p) · GW(p)]. Your point?

Yeah, no, it's the exact opposite.

No, it's just as I said, and your Karnofsky retrospective strongly supports what I said. (I strongly encourage people to go and read it, not just to see what's before and after the part He screenshots, but because it is a good retrospective which is both informative about the history here and an interesting case study of how people change their minds and what Karnofsky has learned.)

Karnofsky started off disagreeing that there is any problem at all in 2007 when he was introduced to MIRI via EA, and merely thought there were some interesting points. Interesting, but certainly not worth sending any money to MIRI or looking for better alternative ways to invest in AI safety. These ideas kept developing, and Karnofsky kept having to engage, steadily moving from 'there is no problem' to intermediate points like 'but we can make tool AIs and not agent AIs' (a period in his evolution I remember well because I wrote criticisms of it), which he eventually abandons. You forgot to screenshot the part where Karnofsky writes that he assumed 'the experts' had lots of great arguments against AI risk and the Yudkowsky paradigm and that was why they just bother talking about it, and then moved to SF and discovered 'oh no', that not only did those not exist, the experts hadn't even begun to think about it. Karnofsky also agrees with many of the points I make about Bostrom's book & intellectual pedigree ("When I'd skimmed Superintelligence (prior to its release), I'd felt that its message was very similar to - though more clearly and carefully stated than - the arguments MIRI had been making without much success." just below where you cut off). And so here we are today, where Karnofsky has not just overseen donations of millions of dollars to MIRI and AI safety NGOs or the recruitment of MIRI staffers like ex-MIRI CEO Muehlhauser, but it remains a major area for OpenPhil (and philanthropies imitating it like FTX). It all leads back to Eliezer. As Karnofsky concludes:

One of the biggest changes is the one discussed above, regarding potential risks from advanced AI. I went from seeing this as a strange obsession of the community to a case of genuine early insight and impact. I felt the community had identified a potentially enormously important cause and played a major role in this cause's coming to be taken more seriously. This development became - in my view - a genuine and major candidate for a "hit", and an example of an idea initially seeming "wacky" and later coming to seem prescient.

Of course, it is far from a settled case: many questions remain about whether this cause is indeed important and whether today's preparations will look worthwhile in retrospect. But my estimate of the cause's likely importance - and, I believe, conventional wisdom among AI researchers in academia and industry - has changed noticeably.

That is, Karnofsky explicitly attributes the widespread changes I am describing to the causal impact of the AI risk community around MIRI & Yudkowsky. He doesn't say it happened regardless or despite them, or that it was already fairly common and unoriginal, or that it was reinvented elsewhere, or that Yudkowsky delayed it on net.

I'm really sure even a median thought leader would have better convinced the person written this.

Hard to be convincing when you don't exist.

Replies from: bmg, Charles He
comment by Ben Garfinkel (bmg) · 2022-06-19T18:56:39.346Z · EA(p) · GW(p)

No, it's just as I said, and your Karnofsky retrospective strongly supports what I said.

I also agree that Karnfosky's retrospective supports Gwern's analysis, rather than doing the opposite.

(I just disagree about how strongly it counts in favor of deference to Yudkowsky. For example, I don't think this case implies we should currently defer more to Yudkwosky's risk estimates than we do to Karnofsky's.)

Replies from: Charles He
comment by Charles He · 2022-06-19T19:35:24.927Z · EA(p) · GW(p)

Ugh. Y'all just made me get into "EA rhetoric" mode:

I also agree that Karnfosky's retrospective supports Gwern's analysis, rather than doing the opposite.

What? 

No. Not only is this not true but this is indulging in a trivial rhetorical maneuver.

 

My comment said that the counterfactual would be better without the involvement of the person mentioned in the OP. I used the retrospective as evidence. 

The retrospective includes at least two points for why the author changed their mind:

  1. The book Superintelligence, which they explicitly said was the biggest event
  2. The author moved to SF and learned about DL, and was informed by speaking to non-rationalist AI researchers, and then decided that LessWrong and MIRI were right.

In response to this,  Gwern states the point #2, and asserts that this is causal evidence in favor of the person mentioned in the OP being useful. 

Why? How?  

Notice that #2 above doesn't at all rule out that the founders or culture was repellent. In fact it seems like a lavish, and unlikely level amount of involvement.

Replies from: bmg, Charles He
comment by Ben Garfinkel (bmg) · 2022-06-19T19:54:29.833Z · EA(p) · GW(p)

What?

I interpreted Gwern as mostly highlighting that people have updated toward's Yudkowsky's views - and using this as evidence in favor of the view we should defer a decent amount to Yudkowsky. I think that was a reasonable move.

There is also a causal question here ('Has Yudkowsky on-net increased levels of concern about AI risk relative to where they would otherwise be?'), but I didn't take the causal question to be central to the point Gwern was making. Although now I'm less sure.

I don't personally have strong views on the causal question - I haven't thought through the counterfactual.

comment by Charles He · 2022-06-19T19:37:02.581Z · EA(p) · GW(p)

(I strongly encourage people to go and read it, not just to see what's before and after the part He screenshots, but because it is a good retrospective which is both informative about the history here and an interesting case study of how people change their minds and what Karnofsky has learned.)

By the way, I didn't screenshot the pieces that fit my narrative—Gwern's assertion of bad faith is another device being used.

Yes, much like the OP is voluminous and is the written output with the goal of criticizing a person. You're familiar with such writings, as you've written enough criticizing me [EA(p) · GW(p)]. Your point?

Gwern also digs up a previous argument. Not only is that issue entirely unrelated, its sort of exactly the opposite evidence he wants to show: Gwern appeared to borderline or threaten to dox someone who spoke out against him. 

I commented. However I do not know anyone involved, such as who Gwern was, but only acting on the content and behaviour I saw, which was outright abusive. 
 

There is no expected benefit to doing this. It's literally the most principled thing to act in this way and I would do it again. 

The consequences of that incident, the fact that this person with this behavior and content had this much status, was a large update for me.

  

More subtly and perniciously, Gwern's adverse behavior in this comment chain and the incident mentioned above, is calibrated to the level of "EA rhetoric". Digs like his above can sail through, with the tailwind of support of a subset of this community, a subset that values authority over content and Truth, to a degree much more than it understands. 

On the other hand, in contrast, an outsider, who already has to dance through all the rhetorical devices and elliptical references, has to make a high effort, unemotional comment to try to make a point. Even or especially if they manage to do this, they can expect to be hit with a wall of text with various hostilities.  

 

Like, this is awful. This isn't just bad but it's borderline abusive.

It's wild that that this is the level of discourse here. 

Because of the amount of reputation, money and ingroupness, this is probably one of the most extreme forms of tribalism that exists.

Do you know how much has been lost?

Replies from: technicalities
comment by Gavin (technicalities) · 2022-06-19T21:03:21.972Z · EA(p) · GW(p)

Charles, consider going for that walk now if you're able to. (Maybe I'm missing it, but the rhetorical moves in this thread seem equally bad, and not very bad at that.)

Replies from: Charles He
comment by Charles He · 2022-06-19T22:21:55.670Z · EA(p) · GW(p)

You are right, I don't think my comments are helping.

comment by Charles He · 2022-06-19T18:23:26.378Z · EA(p) · GW(p)

Like, how can so many standard, stale patterns of internet forum authority, devices and rhetoric be rewarded and replicate in a community explicitly addressing topics like tribalism and "evaporative cooling"? 

comment by Lizka · 2022-06-21T15:17:59.752Z · EA(p) · GW(p)

The moderators feel that some comments in this thread break Forum norms and are discussing what to do about it.

Replies from: Lizka
comment by Lizka · 2022-06-22T15:36:36.933Z · EA(p) · GW(p)

Here are some things we think break Forum norms [EA · GW]: 

  • Rude/hostile language and condescension, especially from Charles He
  • Gwern brings in an external dispute — a thread in which Charles accuses them of doxing an anonymous critic on LessWrong. We think that bringing in external disputes interferes with good discourse; it moves the thread away from discussion of the topic in question, and more towards discussions of individual users’ characters
  • The conversation about the external dispute gets increasingly unproductive

The mentioned thread about doxing also breaks Forum norms in multiple ways. We’ve listed them on that thread. 

The moderators are still considering a further response. We’ll also be discussing with both Gwern and Charles privately.

Replies from: Lizka, RyanCarey
comment by Lizka · 2022-06-23T19:54:34.323Z · EA(p) · GW(p)

The moderation team is issuing Charles a 3-month ban. 

comment by RyanCarey · 2022-06-22T22:49:53.054Z · EA(p) · GW(p)

I honestly don't see such a problem with Gwern calling out out Charles' flimsy argument and hypocrisy using an example, be it a part of an external dispute.

On the other hand, I think Charles' uniformly low comment quality should have had him (temporarily) banned long ago (sorry Charles). The material is generally poorly organised, poorly researched, often intentionally provocative, sometimes interspersed with irrelevant images, and high in volume. One gets the impression of an author who holds their reader in contempt.

Replies from: anonymous_ea, Charles He
comment by anonymous_ea · 2022-06-23T01:43:04.847Z · EA(p) · GW(p)

I don't necessarily disagree with the assessment of a temporary ban for "unnecessary rudeness or offensiveness", or "other behaviour that interferes with good discourse", but I disagree that Charles' comment quality is "uniformly" low or that a ban might be merited primarily because of high comment volume and too low quality.There are some real insights and contributions sprinkled in in my opinion. 

For me the unnecessary rudeness or offensiveness and other behavior interfering with discourse comes from things like comments that are technically replies to a particular person but seem like they're mostly intended to win the argument in front of unknown readers, and containing things like rudeness, paranoia, and condescension towards the person they're replying to. I think the doxing accusation, which if I remember correctly actually doxxed the victim much more than Gwern's comment, is part of a similar pattern of engaging poorly with a particular person, partly through an incorrect assessment that the benefits to bystanders will outweigh the costs. I think this sort of behavior stifles conversation and good will. 

I'm not sure a ban is a great solution though. There might be other, less blunt ways of tackling this situation. 

What I would really like to see is a (much) higher lower limit of comment quality from Charles i.e. moving the bar for tolerating rudeness and bad behavior in a comment much higher even though it could be potentially justified in terms of benefits to bystanders or readers. 

Replies from: Charles He
comment by Charles He · 2022-06-23T02:22:51.782Z · EA(p) · GW(p)

This is useful and thoughtful. I will read and will try to update on this (in general life, if not the forum?) Please continue as you wish!

I want to notify you and others, that I don't expect such discussion to materially affect any resulting moderator action, see this comment describing my views on my ban. [EA(p) · GW(p)]

Below that comment, I wrote some general thoughts on EA. It would be great if people considered or debated the ideas there.

comment by Charles He · 2022-06-22T23:07:44.266Z · EA(p) · GW(p)

I don’t disagree with your judgement of banning but I point out there’s no banning for quality—you must be very frustrated with the content.

To get a sense of this, for the specific issue in the dispute, where I suggested the person or institution in question caused a a 4 year delay in funding, are you saying it’s an objectively bad read, even limited to just the actual document cited? I don’t see how that is.

Or is this wrong, but requires additional context or knowledge.

Replies from: RyanCarey
comment by RyanCarey · 2022-06-22T23:18:34.254Z · EA(p) · GW(p)

Re the banning idea, I think you could fall afoul of "unnecessary rudeness or offensiveness", or "other behaviour that interferes with good discourse" (too much volume, too low quality). But I'm not the moderator here.

My point is that when you say that Gwern produces verbose content about a person, it seems fine - indeed quite appropriate - for him to point out that you do too. So it seems  a bit rich for that to be a point of concern for moderators.

I'm not taking any stance on the doxxing dispute itself, funding delays, and so on.

Replies from: Charles He
comment by Charles He · 2022-06-22T23:22:39.680Z · EA(p) · GW(p)

I agree with your first paragraph for sure.

comment by Ben Garfinkel (bmg) · 2022-06-20T11:06:30.592Z · EA(p) · GW(p)

A general reflection: I wonder if one at least minor contributing factor to disagreement, around whether this post is worthwhile, is different understandings about who the relevant audience is.

I mostly have in mind people who have read and engaged a little bit with AI risk debates, but not yet in a very deep way, and would overall be disinclined to form strong independent views on the basis of (e.g.) simply reading Yudkowsky's and Christiano's most recent posts. I think the info I've included in this post could be pretty relevant to these people, since in practice they're often going to rely a lot -- consciously or unconsciously; directly or indirectly -- on cues about how much weight to give different prominent figures' views. I also think that the majority of members of the existential risk community are in this reference class.

I think the info in this post isn't nearly as relevant to people who've consumed and reflected on the relevant debates very deeply. The more you've engaged with and reflected on an issue, the less you should be inclined to defer -- and therefore the less relevant track records become.

(The limited target audience might be something I don't do a good enough job communicating in the post.)

Replies from: kokotajlod
comment by kokotajlod · 2022-06-20T16:24:36.304Z · EA(p) · GW(p)

I think that insofar as people are deferring on matters of AGI risk etc., Yudkowsky is in the top 10 people in the world to defer to based on his track record, and arguably top 1. Nobody who has been talking about these topics for 20+ years has a similarly good track record. If you restrict attention to the last 10 years, then Bostrom does and Carl Shulman and maybe some other people too (Gwern?), and if you restrict attention to the last 5 years then arguably about a dozen people have a somewhat better track record than him. 

(To my knowledge. I think I'm probably missing a handful of people who I don't know as much about because their writings aren't as prominent in the stuff I've read, sorry!)

He's like Szilard. Szilard wasn't right about everything (e.g. he predicted there would be a war and the Nazis would win) but he was right about a bunch of things including that there would be a bomb, that this put all of humanity in danger, etc. and importantly he was the first to do so by several years.

I think if I were to write a post cautioning people against deferring to Yudkowsky, I wouldn't talk about his excellent track record but rather about his arrogance, inability to clearly explain his views and argue for them (at least on some important topics, he's clear on others), seeming bias towards pessimism, ridiculously high (and therefore seemingly overconfident) credences in things like p(doom), etc. These are the reasons I would reach for (and do reach for) when arguing against deferring to Yudkowsky.

[ETA: I wish to reemphasize, but more strongly, that Yudkowsky seems pretty overconfident not just now but historically. Anyone deferring to him should keep this in mind; maybe directly update towards his credences but don't adopt his credences. E.g. think "we're probably doomed" but not "99% chance of doom" Also, Yudkowsky doesn't seem to be listening to others and understanding their positions well. So his criticisms of other views should be listened to but not deferred to, IMO.]

comment by Rohin Shah (rohinmshah) · 2022-06-19T17:42:40.658Z · EA(p) · GW(p)

See Rohin Shah’s (I think correct) objection [? · GW] to the use of “coherence arguments” to support AI risk concerns.

Fwiw I'd say this somewhat differently.

I object to a specific way in which one could use coherence arguments to support AI risk: namely, "AI is intelligent --> AI satisfies coherence arguments better than we do --> AI looks as though it is maximizing a utility function from our perspective --> Convergent instrumental subgoals --> Doom".

As far as I know, anyone who has spent ~an hour reading my post and thinking about it basically agrees with that particular narrow point.

This doesn't rule out other ways that one could use coherence arguments to support AI risk, such as "coherence arguments show that achieving stuff can typically be factored into beliefs about the world and goals that you want to achieve; since we'll be building AIs to achieve stuff, it seems likely they'll work by having separated beliefs and goals; if they have bad goals, then we die because of convergent instrumental subgoals". I'm more sympathetic to this argument (though not nearly as much as Eliezer appears to be).

I agree that the intro talk that you link to would likely cause people to think of the first pathway (which I object to) rather than the second pathway. Similar rhetoric caused me to believe the first pathway for a while.

But it also looks like the sort of talk you might give if you were thinking about the second pathway, and then compressed it losing a bunch of nuance, and didn't notice that people might then instead think of the first pathway.

(It's not clear whether any of this changes the upshot of your post. I am mostly trying to preserve nuance so I get fewer people saying "I thought you thought utility functions are fake" which is definitely not what I said or believed.)

comment by Akash · 2022-06-20T02:40:37.407Z · EA(p) · GW(p)

Thank you for writing this, Ben. I think the examples are a helpful and I plan to read more about several of them. 

With that in mind, I'm confused about how to interpret your post and how much to update on Eliezer. Specifically, I find it pretty hard to assess how much I should update (if at all) given the "cherry-picking" methodology:

Here, I’ve collected a number of examples of Yudkowsky making (in my view) dramatic and overconfident predictions concerning risks from technology.

Note that this isn’t an attempt to provide a balanced overview of Yudkowsky’s technological predictions over the years. I’m specifically highlighting a number of predictions that I think are underappreciated and suggest a particular kind of bias.

If you were apply this to any EA thought leader (or non-EA thought leader, for that matter), I strongly suspect you'd find a lot clearcut and disputable examples of them being wrong on important things. 

As a toy analogy, imagine that Alice is widely-considered to be extremely moral. I hire an investigator to find as many examples of Alice doing Bad Things as possible. I then publish my list of Bad Things that Alice has done. And I tell people "look-- Alice has done some Bad Things. You all think of her as a really moral person, and you defer to her a lot, but actually, she has done Bad Things!"

And I guess I'm left with a feeling of... OK, but I didn't expect Alice to have never done Bad Things! In fact, maybe I expected Alice to do worse things than the things that were on this list, so I should actually update toward Alice being moral and defer to Alice more

To make an informed update, I'd want to understand your balanced take. Or I'd want to know some of the following:

  • How much effort did the investigator spend looking for examples of Bad Things?
  • Given my current impression of Alice, how many Bad Things (weighted by badness) would I have expected the investigator to find?
  • How many Good Things did Alice do (weighted by goodness)? 

Final comment: I think this comment might come across as ungrateful-- just want to point out that I appreciate this post, find it useful, and will be more likely to challenge/question my deference as a result of it.

Replies from: Guy Raveh
comment by Guy Raveh · 2022-06-20T18:56:56.039Z · EA(p) · GW(p)

I think the effect should depend on your existing view. If you've always engaged directly with Yudkowsky's arguments and chose the ones convinced you, there's nothing to learn. If you thought he was a unique genius and always assumed you weren't convinced of things because he understood things you didn't know about, and believed him anyway, maybe it's time to dial it back. If you'd always assumed he's wrong about literally everything, it should be telling for you that OP had to go 15 years back for good examples.

Writing this comment actually helped me understand how to respond to the OP myself.

Replies from: Dr. David Mathers
comment by Dr. David Mathers · 2022-06-21T22:24:58.345Z · EA(p) · GW(p)

'If you'd always assumed he's wrong about literally everything, it should be telling for you that OP had to go 15 years back to get good examples.' How strong evidence this is also depends on whether he has made many resolvable predictions since 15-years ago, right? If he hasn't it's not very telling. To be clear, I genuinely don't know if he has or hasn't.

Replies from: Guy Raveh
comment by Guy Raveh · 2022-06-21T22:29:11.276Z · EA(p) · GW(p)

Sounds reasonable. Though predictions aren't the only thing one can be demonstratably wrong about.

comment by Dr. David Mathers · 2022-06-21T01:14:39.802Z · EA(p) · GW(p)

Several thoughts:

  1. I'm not sure I can argue for this, but it feels weird and off-putting to me that all this energy is being spent discussing how good a track-record one guy has, especially one guy with a very charismatic and assertive writing-style, and a history of attempting to provide very general guidance for how to think across all topics (though I guess any philosophical theory of rationality does the last thing.) It just feels like a bad sign to me, though that could just be for dubious social reasons.

  2. The question of how much to defer to E.Y. isn't answered just by things like "he has possibly the best track record in the world on this issue." If he's out of step with other experts, and by a long way, we need to have reason to think he outperforms the aggregate of experts before we weight him more than the aggregate and it's entirely normal, I'd have thought, for the aggregate to significantly outperform the single best individual. (I'm not making as strong a claim as that the best individual outperforming the aggregate is super-unusual and unlikely.) Of course if you think he's nearly as good as the aggregate, then you should still move a decent amount in his direction. But even that is quite a strong claim that goes beyond him being in the handful of individuals with the best track record.

  3. It strikes me that some of the people criticizing this post on the grounds that actually E.Y. has a great track record keep citing "he's been right that there is significant X-risk from A.I., when almost everyone else missed that' for a couple of reasons.

Firstly, this isn't actually a prediction that has been resolved as correct in any kind of unambiguous way. Sure, a lot of very smart people in the EA community now agree. (And I agree the risk is worth assigning EA resources to as well, to be clear.) But we should be wary of substituting the judgment of the community that a prediction looks rational, for a track record of predictions that have actually resolved successfully in my view. (I think the later is better evidence than the former in most cases.)

Secondly, I feel like E.Y. being right about the importance of A.I.-risk is actually not very surprising, conditional on the key assumption here about E.Y. that Ben is relying on in telling people to be cautious about the probabilities and timelines that E.Y. gives for A.I. doom, but that even given this, IF Ben's assumption is correct it's still a good reason to doubt E.Y.'s p(doom). Suppose, as is being alleged here, someone has a general bias, for whatever reasons towards the view that doom from some technological source or other is likely and imminent. Does that make it especially surprising that that individual finds an important source of doom most people have missed? Not especially that I can see: sure they will be less rational on the topic perhaps, but a) a bias towards p(doom) wbeing high doesn't necessarily imply being poor ranking sources of doom-risk by relative importance, and b) there is probably a counter-effect where bias towards doom makes you more likely to find underrated doom-risks, because you spend more time looking. Of course, finding a doom-risk larger than most others that approx. everyone had missed would still be a very impressive achievement. But the question Ben's addressing isn't "is E.Y. a smart person with insights about A.I. risk?" but rather "how much should we update on E.Y.'s views about p(near-term A.I. doom)?" Suppose significant bias towards doom is genuinely evidenced by E.Y.'s earlier nanotech prediction (which to be fair is only 1 data point) and a good record at identifying neglected important doom sources is only weak evidence that E.Y. lacks the bias. Then we'd be right to only update a little towards doom, even if E.Y.'s record on A.I. risk was impressive in some ways.

Replies from: Charles He
comment by Charles He · 2022-06-21T16:37:47.141Z · EA(p) · GW(p)

Some things that aren't said in this post or any comments in here yet:

  • The issue isn't at all about 15-20 year old content, it's about very recent content and events (mostly publicly visible)
  • In addition to this recent, publicly visible content, there are several latent issues or effects that directly affect progress in the relevant cause area
    • To calibrate, this could be slowing things down by 10 times or more, in what is supposed to be the most important cause area in EA and whose effects are supposed to happen very soon
  • Certain comments here do not at all contain all of the relevant content, because laying them out risks damaging an entire cause area.
    • Certain commentors may feel personally restricted from doing for a variety of complex reasons ("moral mazes") and the content they are presenting is a "second best" option
    • The above interacts poorly with the customs and practices around discourse and criticism
      • These in totality have become sort of an odious and out of space specter, invisible to people who a lot of spend time here
Replies from: Dr. David Mathers
comment by Dr. David Mathers · 2022-06-21T22:07:44.150Z · EA(p) · GW(p)

For all I know, you maybe right or not (insofar as I follow what's being insinuated), but whilst I freely admit that l, like anyone who wants to work in EA, have self-interested incentives to not be too critical of Eliezer, there is no specific secret "latent issue" that I personally am aware of and consciously avoiding talking about. Honest.

Replies from: Charles He
comment by Charles He · 2022-06-21T22:17:03.213Z · EA(p) · GW(p)

I am grateful for your considerate comment and your reply. I had no belief or thought about dishonesty.

Maybe I should have added[1]:

  • "this is for onlookers"
  • "this is trying to rationalize/explain why this post exists, that has 234 karma and 156 votes, yet only talks about high school stuff."

I posted my comment because this situation is hurting onlookers and producing bycatch? 

I don't really know what to do here (as a communications thing) and I have incentives not to be involved?

  1. ^

    But this is sort of getting into the elliptical rhetoric and self-referential stuff, that is sort of related to the problem in the first place. 

comment by Guy Raveh · 2022-06-20T18:42:39.836Z · EA(p) · GW(p)

Some off-topic comments, not specific to you or Yudkowsky:

the belief was so analogous to his current belief about AI... since he had thought a lot about the subject and was already highly engaged in the relevant intellectual community

  1. It seems to me (but I could be mistaken) like I see the phrase "has thought a lot about X" fairly often in EA contexts, where it is taken to imply being very well-informed about X. I don't think this is good reasoning. Thinking about something is probably required for understanding it well, but is certainly not enough.

  2. When an idea or theory is very fringe, there's a strong selection effect for people in the relevant intellectual community. This means even their average views are sometimes not good evidence for something. For example, to answer a question about the probability of doom from AI in this century, are alignment researchers a good reference class? They all naturally believe AI is an existential risk to begin with. I'm not sure I have the solution, since "AI researchers in general" isn't a good reference class either - many might have not given any thought to whether AI is dangerous.

Replies from: Eddie K
comment by ekka (Eddie K) · 2022-06-21T23:38:10.898Z · EA(p) · GW(p)

Strong +1 on this. It in fact seems like the more someone thinks about something and takes a public position on it with strong confidence the more incentive they have to stick to the position they have. It's why making explicit forecasts and creating a forecasting track record is so important in countering this tendency. If arguments cannot be resolved by events happening in the real world then there is not much incentive for one to change their mind especially if it's about something speculative and abstract that one can generate arguments for ad infinitum by engaging in more speculation.

On your example. The question of AI existential risk this century seems downstream to the question of the probability of AGI this century and one can find some potential reference classes for that: AI safety research, general AI research, computer science research, scientific research, technological innovation etc. None of these are perfect reference classes but are at least something to work with. Contingent on AGI being possible this century one can form an opinion on how low/high the probability of doom be to warrant concern.

comment by splinter · 2022-06-26T23:41:39.210Z · EA(p) · GW(p)

The negative reactions to this post are disheartening. I have a degree of affectionate fondness for the parodic levels of overthinking that characterize the EA community, but here you really see the downsides of that overthinking concretely. 

Of course it is meaningful that Eliezer Yudkowsky has made a bunch of terrible predictions in the past that closely echo predictions he continues to make in slightly different form today. Of course it is relevant that he has neither owned up to those earlier terrible predictions or explained how he has learned from those mistakes. Of course we should be more skeptical of similar claims he makes in the future. Of course we should pay more attention to broader consensus or aggregate predictions in the field than in outlier predictions.

This is sensible advice in any complex domain, and saying that we should "evaluate every argument in isolation on its merits" is a type of special pleading or sophistry. Sometimes (often!) the obvious conclusions are the correct ones: even extraordinarily clever people are often wrong; extreme claims that other knowledgeable experts disagree with are often wrong; and people who make extreme claims that prove to be wrong should be strongly discounted when they make further extreme claims.

None of this is to suggest in any what that  Yudkowsky should be ignored, or even is necessarily wrong. But if you yourself are not an expert in AI (as most of us aren't), his past bad predictions are highly relevant indicators when assessing his current predictions. 

comment by RobBensinger · 2022-06-23T03:26:07.590Z · EA(p) · GW(p)

I work at MIRI, but as usual, this comment is me speaking for myself, and I haven’t heard from Eliezer or anyone else on whether they'd agree with the following.

My general thoughts:

  • The primary things I like about this post are that (1) it focuses on specific points of disagreement, encouraging us to then hash out a bunch of object-level questions; and (2) it might help wake some people from their dream if they hero-worship Eliezer, or if they generally think that leaders in this space can do no wrong.

    • By "hero-worshipping" I mean a cognitive algorithm, not a set of empirical conclusions. I'm generally opposed to faux egalitarianism [LW · GW] and the Modest-Epistemology reasoning discussed in Inadequate Equilibria: if your generalized anti-hero-worship defenses force the conclusion that there just aren't big gaps in skills or knowledge (or that skills and knowledge always correspond to mainstream prestige and authority), then your defenses are ruling out reality a priori. In saying "people need to hero-worship Eliezer less", I'm opposing a certain kind of reasoning process and mindset, not a specific factual belief like "Eliezer is the clearest thinker about AI risk".

      In a sense, I want to promote the idea that the latter is a boring claim, to be evaluated like any other claim about the world; flinching away from it (e.g., because Eliezer is weird and says sci-fi-sounding stuff) and flinching toward it (e.g., because you have a bunch of your identity invested in the idea that the Sequences are awesome and rationalists are great) are both errors of process.
       
  • The main thing I dislike about this post is that it introduces a bunch of not-obviously-false Eliezer-claims — claims that EAs either widely disagree about, or haven’t discussed — and then dives straight into ‘therefore Eliezer has a bad track record'.

    E.g., I disagree that molecular nanotech isn't a big deal (if that's a claim you're making?), that Robin better predicted deep learning than Eliezer did, and that your counter-arguments against Eliezer and Bostrom are generally strong. Certainly I don't think these points have been well-established enough that it makes sense to cite them in the mode 'look at these self-evident ways Yudkowsky got stuff wrong; let us proceed straight to psychoanalysis, without dwelling on the case for why I think he's wrong about this stuff'. At this stage of the debate on those topics, it would be more appropriate to talk in terms of cruxes like 'I think the history of tech shows it's ~always continuous in technological change and impact', so it's clear why you disagree with Eliezer in the first place.
     
  • I generally think that EA’s core bottlenecks right now are related to ‘willingness to be candid and weird enough to make intellectual progress (especially on AI alignment), and to quickly converge on our models of the world’.

    My own models suggest to me that EA’s path to impact is almost entirely as a research community and a community that helps produce other research communities, rather than via ‘changing the culture of the world at large’ or going into politics or what-have-you. In that respect, rigor and skepticism is good, but singling out Eliezer because he’s unusually weird and candid is bad, because it discourages others from expressing weird/novel/minority views and from blurting out their true thought processes. (I recognize that this isn’t the only reason you’re singling Eliezer out, but it’s obviously a contributing factor.)
     
  • I am a big fan of Ben’s follow-up comment [EA(p) · GW(p)]. Especially the part where he outlines the thought process that led to him generating the post’s contents. I think this is an absolutely wonderful thing to include in a variety of posts, or to add in the comment sections for a lot of posts.

· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · 

Some specific thoughts on Ben's follow-up comment:

1. I agree with Ben on this: “If a lot of people in the community believe AI is probably going to kill everyone soon, then (if they’re wrong) this can have really important negative effects”.

I think they’re not wrong, and I think the benefits of discussing this openly strongly outweigh the costs. But the negative effects are no less real for that.

(Separately, I think the “death with dignity” post was a suboptimal way to introduce various people to the view that p(doom) is very high. I’m much more confident that we should discuss this at all, than that Eliezer or I or others have been discussing this optimally.)

2. “Directly and indirectly, deference to Yudkowsky has a significant influence on a lot of people’s views

Agreed.

Roughly speaking, my own view is:

  • EAs currently do a very high amount of deferring to others (both within EA and outside of EA) on topics like AI, global development, moral philosophy, economics, cause prioritization, organizational norms, personal career development, etc.
  • On the whole, EAs currently do a low amount of model-building and developing their own inside views.
  • EAs should switch to doing a medium amount of deference on topics like the ones I listed, and a very high amount of personal model-building.
    • Note that model-building can be useful even if you think all your conclusions will be strictly worse than the models of some other person you've identified. I'm pretty radical on this topic, and think that nearly all EAs should spend a nontrivial fraction of their time developing their own inside-view models of EA-relevant stuff, in spite of the obvious reasons (like gains from specialization) that this would normally not make sense.
      • Happy to say more about my views here, and I'll probably write a post explaining why I think this.
    • I think the Alignment Research Field Guide [LW · GW], in spite of nominally being about “alignment”, is the best current intro resource for “how should I go about developing my own models on EA stuff?” A lot of the core advice is important and generalizes extremely well, IMO.
  • Insofar as EAs should do deference at all, Eliezer is in the top tier of people it makes sense to defer to.
  • But I’d guess the current amount of Eliezer-deference is way too high, because the current amount of deference overall is way too high. Eliezer should get a relatively high fraction of the deference pie IMO, but the overall pie should shrink a lot.

3. I also agree with Ben on “The track records of influential intellectuals (including Yudkowsky) should be publicly discussed.

I don’t like the execution of the OP, but I strongly disagree with the people in the comments who have said “let us never publicly talk about individuals’ epistemic track records at all”—both because I think ‘how good is EY’s reasoning’ is a genuine crux for lots of people, and because I think this is a very common topic people think about, both in more pro-Eliezer and in more anti-Eliezer camps.

Discussing cruxes is obviously good, but even if this weren’t a crux for anyone, I’m strongly in favor of EAs doing a lot more “sharing their actual thoughts out loud”, including the more awkward and potentially inflammatory ones. (I’m happy to say more about why I think this.)

I do think it’s worth talking about what the best way is to discuss individuals' epistemic track records, without making EA feel hostile/unpleasant/scary. I think EAs are currently way too timid (on average) about sharing their thoughts, so I worry about any big norm shifts that might make that problem even worse.

But Eliezer’s views are influential enough (and cover a topic, AGI, that is complicated and difficult enough to reason about) that this just seems like an important topic to me (similar to ‘how much should we defer to Paul?’, etc.). I’d rather see crappy discussion of this in the community than zero discussion whatsoever.

· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · 

Some specific thoughts on claims in the OP:

such that all we can hope to do is “die with dignity [LW · GW].”

This is in large part Eliezer's fault for picking such a bad post title, but I should still note that this is a very misleading summary. "Dying with dignity" often refers to giving up on taking any actions to keep yourself alive.

Eliezer's version of "dying with dignity" is exactly the opposite: he's advocating for doing whatever it takes to maximize the probability that humanity survives.

It's true that he thinks we'll probably fail (and I agree), and he thinks we should emotionally reconcile ourselves with that fact (because he thinks this emotional reconciliation will itself increase our probability of surviving!!), but he doesn't advocate giving up.

Quoting the post:

"Q1:  Does 'dying with dignity' in this context mean accepting the certainty of your death, and not childishly regretting that or trying to fight a hopeless battle?

"Don't be ridiculous.  How would that increase the log odds of Earth's survival?"

At least up until 1999, admittedly when he was still only about 20 years old, Yudkowsky argued that transformative nanotechnology would probably emerge suddenly and soon (“no later than 2010”) and result in human extinction by default.

I think the "no later than 2010" prediction is from when Eliezer was 20, but the bulk of the linked essay was written when he was 17. The quotation here is: "As of '95, Drexler was giving the ballpark figure of 2015.  I suspect the timetable has been accelerated a bit since then.  My own guess would be no later than 2010."

The argument for worrying about extinction via molecular nanotech to some non-small degree seems pretty straightforward and correct: molecular nanotech lets you build arbitrary structures, including dangerous ones, and some humans would want to destroy the world given the power to do so.

Eliezer was overconfident about nanotech timelines (though roughly to the same degree as Drexler, the world's main authority on nanotech).

Eliezer may have also been overconfident about nanotech's riskiness, but the specific thing he said when he was 17 is that he considered it important for humanity to achieve AGI "before nanotechnology, given the virtual certainty of deliberate misuse - misuse of a purely material (and thus, amoral) ultratechnology, one powerful enough to destroy the planet".

It's not clear to me whether this is saying that human-extinction-scale misuse from nanotech is 'virtually certain', versus the more moderate claim that some misuse is 'virtually certain' if nanotech sees wide usage (and any misuse is pretty terrifying in EV terms). The latter seems reasonable to me, given how powerful molecular nanotechnology would be.

Eliezer denies [LW · GW] that he has a general tendency toward alarmism:

[Ngo][18:19]]
(As a side note, I think that if Eliezer had been around in the 1930s, and you described to him what actually happened with nukes over the next 80 years, he would have called that "insanely optimistic".)

[Yudkowsky][18:21]  
Mmmmmmaybe.  Do note that I tend to be more optimistic than the average human about, say, global warming, or everything in transhumanism outside of AGI. 

Nukes have going for them that, in fact, nobody has an incentive to start a global thermonuclear war.  Eliezer is not in fact pessimistic about everything and views his AGI pessimism as generalizing to very few other things, which are not, in fact, as bad as AGI. 

[Ngo][18:27]  
[...] So yeah, I picture 1930s-Eliezer pointing to technological trends and being like "by default, 30 years after the first nukes are built, you'll be able to build one in your back yard. And governments aren't competent enough to stop that happening."

And I don't think I could have come up with a compelling counterargument back then. 

[Yudkowsky][18:29] 
So, I mean, in fact, I don't prophesize doom from very many trends at all!  It's literally just AGI that is anywhere near that unmanageable!  Many people in EA are more worried about biotech than I am, for example.

It seems fair to note that nanotech is a second example of Eliezer raising alarm bells. But this remains a pretty small number of data points, and in neither of those cases does it actually look unreasonable to worry a fair bit—those are genuinely some of the main ways we could destroy ourselves.

I think 'Eliezer predicted nanotech way too early' is a better data point here, as evidence for 'maybe Eliezer tends to have overly aggressive tech forecasts'.

If Eliezer was deferring to Drexler to some extent, that makes the data a bit less relevant, but 'I was deferring to someone else who was also wrong' is not in fact a general-purpose excuse for getting the wrong answer.

In 2001, and possibly later, Yudkowsky apparently believed that his small team would be able to develop a “final stage AI” that would “reach transhumanity sometime between 2005 and 2020, probably around 2008 or 2010.”

In the first half of the 2000s, he produced a fair amount of technical and conceptual work related to this goal. It hasn't ultimately had much clear usefulness for AI development, and, partly on the basis, my impression is that it has not held up well - but that he was very confident in the value of this work at the time.

That view seems very dumb to me — specifically the belief that SingInst's very first unvetted idea would pan out and result in them building AGI, more so than the timelines per se.

I don't fault 21-year-old Eliezer for trying (except insofar as he was totally wrong about the probability of Unfriendly AI at the time!), because the best way to learn that a weird new path is unviable is often to just take a stab at it. But insofar as 2001-Eliezer thought his very first idea was very likely to work, this seems like a totally fair criticism of the quality of his reasoning at the time.

Looking at the source text, I notice that the actual text is much more hedged than Ben's summary (though it still sounds foreseeably overconfident to me, to the extent I can glean likely implicit probabilities from tone):

[...] The Singularity Institute is fully aware that creating true intelligence will not be easy.  In addition to the enormous power deficit between modern computers and the human brain, there is an even more severe software deficit.  The software of the human brain is the result of millions of years of evolution and contains perhaps tens of thousands of complex functional adaptations.  The human brain itself is not a homogenous lump but a highly modular supersystem; the cerebral cortex is divided into two hemispheres, each containing 52 areas, each area subdivided into a half-dozen distinguishable maps.  Cortical neurons group into minicolumns of perhaps a hundred neurons and macrocolumns of a few hundred minicolumns, with perhaps 1,000 macrocolumns to a cortical map.  Of the 750 megabytes of human DNA, the vast majority is believed to be junk and 98% is identical to chimpanzee DNA, with perhaps 1% being concerned with intelligence - leaving 7.5 megabytes to specify, not the actual wiring of the brain, but the neuroanatomy of areas and maps and pathways, and the initial tiling patterns and learning algorithms for neurons and minicolumns and macrocolumns.

The Singularity Institute seriously intends to build a true general intelligence, possessed of all the key subsystems of human intelligence, plus design features unique to AI.  We do not hold that all the complex features of the human mind are "emergent", or that intelligence is the result of some simple architectural principle, or that general intelligence will appear if we simply add enough data or computing power.  We are willing to do the work required to duplicate the massive complexity of human intelligence; to explore the functionality and behavior of each system and subsystem until we have a complete blueprint for a mind.  For more about our Artificial Intelligence plans, see the document General Intelligence and Seed AI.

Our specific cognitive architecture and development plan forms our basis for answering questions such as "Will transhumans be friendly to humanity?" and "When will the Singularity occur?"  At the Singularity Institute, we believe that the answer to the first question is "Yes" with respect to our proposed AI design - if we didn't believe that, the Singularity Institute would not exist.  Our best guess for the timescale is that our final-stage AI will reach transhumanity sometime between 2005 and 2020, probably around 2008 or 2010.  As always with basic research, this is only a guess, and heavily contingent on funding levels. [...]

 

A later piece of work which I also haven’t properly read is “Levels of Organization in General Intelligence.”

Note that this paper was written much earlier than its publication date. Description from yudkowsky.net: "Book chapter I wrote in 2002 for an edited volume, Artificial General Intelligence, which is now supposed to come out in late 2006. I no longer consider LOGI’s theory useful for building de novo AI. However, it still stands as a decent hypothesis about the evolutionary psychology of human general intelligence."

Although Hanson very clearly wasn’t envisioning something like deep learning either, his side of the argument seems to fit better with what AI progress has looked like over the past decade.

I agree that Eliezer loses Bayes points (e.g., relative to Shane Legg and Dario Amodei) for not predicting the enormous success of deep learning. See also Nate's recent post about this [LW · GW].

I disagree that Robin Hanson scored Bayes points off of Eliezer, on net, from the deep learning revolution, or that Hanson's side of the Foom debate looks good (compared to Eliezer's) with the benefit of hindsight. I side with Gwern [EA(p) · GW(p)] here; I think Robin's predictions and arguments on this topic have been terrible, as a rule.

I think that Yudkowsky's prediction - that a small amount of code, run using only a small amount of computing power, was likely to abruptly jump economic output upward by more than a dozen orders-of-magnitude - was extreme enough to require very strong justifications.

I think Eliezer assigned too high a probability to 'it's easy to find relatively clean, understandable approaches to AGI', and too low a probability to 'it's easy to find relatively messy, brute-forced approaches to AGI'. A consequence of the latter is that he (IMO) underestimated how compute-intensive AGI was likely to be, and overestimated how important recursive self-improvement was likely to be.

I otherwise broadly agree with his picture. E.g.:

  • I expect AGI to represent a large, sharp capabilities jump. (I think this is unlikely to require a bunch of recursive self-improvement.)
  • I think AGI is mainly bottlenecked on software, rather than hardware. (E.g., I think GPT-3 is impressive, but isn't a baby AGI; rather than AGI just being 'current systems but bigger', I expect at least one more key insight lies on the shortest likely path to AGI.)
  • And I expect AGI to be much more efficient than current systems at utilizing small amounts of data. Though (because it's likely to come from a relatively brute-forced, unalignable approach) I still expect it to be more compute-intensive than 2009-Eliezer was imagining.

However, later analysis has suggested that coherence arguments have either no or very limited implications for how we should expect future AI systems to behave.

This seems completely wrong to me. See Katja Grace's Coherence arguments imply a force for goal-directed behavior [AF · GW].

comment by iporphyry (iporophiry) · 2022-06-19T19:21:34.452Z · EA(p) · GW(p)

I like that you admit that your examples are cherry-picked. But I'm actually curious what a non-cherry-picked track record would show. Can people point to Yudkowsky's successes? What did he predict better than other people? What project did MIRI generate that either solved clearly interesting technical problems or got significant publicity in academic/AI circles outside of rationalism/EA? Maybe instead of a comment here this should be a short-form question on the forum.

Replies from: Matthew_Barnett
comment by Matthew_Barnett · 2022-06-19T20:31:48.581Z · EA(p) · GW(p)

I like that you admit that your examples are cherry-picked. But I'm actually curious what a non-cherry-picked track record would show. Can people point to Yudkowsky's successes?

While he's not single-handedly responsible, he lead the movement to take AI risk seriously at a time when  approximately no one was talking about it, which has now attracted the interests of top academics. This isn't a complete track record, but it's still a very important data-point. It's a bit like if he were the first person to say that we should take nuclear war seriously, and then five years later people are starting to build nuclear bombs and academics realize that nuclear war is very plausible.

Replies from: bmg
comment by Ben Garfinkel (bmg) · 2022-06-19T20:47:39.260Z · EA(p) · GW(p)

While he's not single-handedly responsible, he lead the movement to take AI risk seriously at a time when approximately no one was talking about it, which has now attracted the interests of top academics. This isn't a complete track record, but it's still a very important data-point.

I definitely do agree with that!

It's possible I should have emphasized the significance of it more in the post, rather than moving on after just a quick mention at the top.

If it's of interest: I say a little more about how I think about this, in response to Gwern's comment below. (To avoid thread-duplicating, people might want to respond there [EA(p) · GW(p)] rather than here if they have follow-on thoughts on this point.) My further comment is:

This is certainly a positive aspect of his track-record - that many people have now moved closer to his views. (It also suggests that his writing was, in expectation, a major positive contribution to the project of existential risk reduction - insofar as this writing has helped move people up and we assume this was the right direction to move.) But it doesn't imply that we should give him many more "Bayes points" to him than we give to the people who moved.

Suppose, for example, that someone says in 2020 that there was a 50% chance of full-scale nuclear war in the next five years. Then - due to Russia's invasion of Ukraine - most people move their credences upward (although they still remained closer to 0% than 50%). Does that imply the person giving the early warning was better-calibrated than the people who moved their estimates up? I don't think so. And I think - in this nuclear case - some analysis can be used to justify the view that the person giving the early warning was probably overconfident; they probably didn't have enough evidence or good enough arguments to actually justify a 50% credence.

It may still be the case that the person giving the early warning (in the hypothetical nuclear case) had some valuable and neglected insights, missed by others, that are well worth paying attention to and seriously reflecting on; but that's a different matter from believing they were overall well-calibrated or should be deferred to much more than the people who moved.

[[EDIT: Something else it might be worth emphasizing, here, is that I'm not arguing for the view "ignore Eliezer." It's closer to "don't give Eliezer's views outsized weight, compared to (e.g.) the views of the next dozen people you might be inclined to defer to, and factor in evidence that his risk estimates might have a significant upward bias to them."]]

comment by Lorenzo (Lorenzo Buonanno) · 2022-06-19T19:42:21.484Z · EA(p) · GW(p)

I believe Drexler is now giving the ballpark figure of 2013. My own guess would be no later than 2010…

 

I didn't see the "my own guess" part in the linked document  (or the archived version), but it seems to be visible here, was probably edited between 2001 and 2004. Mentioned it in case others are confused after trying to find the quote in context.

comment by elons.musk · 2022-06-25T19:21:53.346Z · EA(p) · GW(p)

Perhaps also relevant, though it isn’t forecasting, is Eliezer’s weak (in my opinion) attempted takedown of Ajeya Cotra’s bioanchors report on AI timelines [LW · GW]. Here’s Eliezer’s bioanchors takedown attempt [LW · GW], here’s Holden Karnofsky’s response [LW · GW] to Eliezer, and here’s Scott Alexander’s response.

Replies from: RobBensinger
comment by RobBensinger · 2022-06-28T20:39:17.932Z · EA(p) · GW(p)

Eliezer's post was less a takedown of the report, and more a takedown of the idea that the report provides a strong basis for expecting AGI in ~2050, or for discriminating scenarios like 'AGI in 2030', 'AGI in 2050', and 'AGI in 2070'.

The report itself was quite hedged, and Holden posted a follow-up clarification emphasizing that “biological anchors” is about bounding, not pinpointing, AI timelines. So it's not clear to me that Eliezer and Ajeya/Holden/etc. even disagree about the core question "do biological anchors provide a strong case for putting a median AGI year in ~2050?", though maybe they disagree on the secondary question of how useful the "bounds" are.

Copying over my high-level view, which I recently wrote on Twitter:

I agree with the basic Eliezer argument in Biology-Inspired AGI Timelines [LW · GW] that the bio-anchors stuff isn't important or useful because AGI is a software problem, and we neither know which specific software insights are needed, nor how long it will take to get to those software insights, nor the relationship between those insights and hardware requirements.

Focusing on things like bio-anchors and hardware trends is streetlight-fallacy reasoning: it's taking the 2% of the territory we do know about and heavily heavily focusing on that 2%, while shrugging our shoulders at the other 98%.

Like, bio-anchors reasoning might help tell you whether to expect AGI this century versus expecting it in a thousand years, but it won't help you discriminate 2030 from 2050 from 2070 at all.

Insofar as we need to think about timelines at all, it's true that we need some sort of prior, at least a very vague one.

The problem with the heuristic 'look under the streetlight and anchor your prior to whatever you found under the streetlight, however marginal' is that the info under the streetlight isn't a random sampling from the space of relevant unknown facts about AGI; it's a very specific and unusual kind of information.

IMO you'd be better off thinking first about that huge space of unknowns and anchoring to far fuzzier and more uncertain guesses about the whole space, rather than fixating on a very specific much-more-minor fact that's easier to gather data about.

E.g., consider five very different a priori hypotheses about 'what insights might be needed for AGI', another five very different hypotheses about 'how might different sorts of software progress relate to hardware requirements', etc.

Think about different world-histories that might occur, and how surprised you'd be by those world-histories.

Think about worlds where things go differently than you're expecting in 2060, and about what those worlds would genuinely retrodict about the present / past.

E.g., I think scenario analysis makes it more obvious that in worlds where AGI is 30 years away, current trends will totally break at some point on that path, radically new techniques will be developed, etc.

Think about how different the field of AI was in 1992 compared to today, or in 1962 compared to 1992.

When you're spending most of your time looking under the streetlight — rather than grappling with how little is known, trying to painstakingly refine your instincts and intuitions about the harder-to-reason-about aspects of the problem, etc. — I think it becomes overly tempting to treat current trendlines as laws of nature that will be true forever (or that at least have a strong default of being true forever), rather than as 'patterns that arose a few years ago and will plausibly continue for a few years more, before being replaced by new patterns and growth curves'.

Cf. https://twitter.com/robbensinger/status/1537585485211545604 

Replies from: RobBensinger
comment by RobBensinger · 2022-06-28T20:42:55.206Z · EA(p) · GW(p)

Commenting on a few minor points from Scott's post, since I meant to write a full reply at some point but haven't had the time:

But also, there are about 10^15 synapses in the brain, each one spikes about once per second, and a synaptic spike probably does about one FLOP of computation. [...] So a human-level AI would also need to do 10^15 floating point operations per second? Unclear.

I'd say 'clearly not, for some possible AI designs'; but maybe it will be true for the first AIs we actually build, shrug.

Or you might do what OpenPhil did and just look at a bunch of examples of evolved vs. designed systems and see which are generally better:

Why aren't there examples like 'amount of cargo a bird can carry compared to an airplane', or 'number of digits a human can multiply together in ten seconds compared to a computer'?

Seems like you'll get a skewed number if your brainstorming process steers away from examples like these altogether.

'AI physicist' is less like an artificial heart (trying to exactly replicate the structure of a biological organ functioning within a specific body), more like a calculator (trying to do a certain kind of cognitive work, without any constraint at all to do it in a human-like way).

comment by MichaelDickens · 2022-06-23T19:27:14.286Z · EA(p) · GW(p)

I read this post kind of quickly, so apologies if I'm misunderstanding. It seems to me that this post's claim is basically:

  1. Eliezer wrote some arguments about what he believes about AI safety.
  2. People updated toward Eliezer's beliefs.
  3. Therefore, people defer too much to Eliezer.

I think this is dismissing a different (and much more likely IMO) possibility, which is that Eliezer's arguments were good, and people updated based on the strength of the arguments.

(Even if his recent posts didn't contain novel arguments, the arguments still could have been novel to many readers.)

Replies from: Linch
comment by Linch · 2022-06-23T21:50:29.453Z · EA(p) · GW(p)

I'm a bit confused by both this post and comments about questions like what level/timing the deference happens.

Speaking for myself, if an internet rando wrote a random blog post called "AGI Ruin: A List of Lethalities," I probably would not read it.  But I did read Yudkowsky's post carefully and thought about it nontrivially, mostly due to his track record and writing ability (rather than e.g. because the title was engaging or because the first paragraph was really well-argued).

comment by VictorSintNicolaas · 2022-06-19T16:00:04.524Z · EA(p) · GW(p)

As someone not active in the field of AI risk, and having always used epistemic deference quite heavily, this feels very helpful. I hope it doesn't end up reducing society's efforts to stop AI from taking over the world some day.

Replies from: julianhazell
comment by JulianHazell (julianhazell) · 2022-06-19T16:26:12.797Z · EA(p) · GW(p)

On the contrary, my best guess is that the “dying with dignity” style dooming is harming the community’s ability to tackle AI risk as effectively as it otherwise could

comment by Jack Malde (jackmalde) · 2022-06-22T07:23:42.499Z · EA(p) · GW(p)

I'm confused by the fact Eliezer's post was posted on April Fool's day. To what extent does that contribute to conscious exaggeration on his part?

Replies from: Guy Raveh
comment by Guy Raveh · 2022-06-22T08:53:40.527Z · EA(p) · GW(p)

Right? Up to reading this post, I was convinced it was an April Fool's post.

Replies from: RobBensinger
comment by RobBensinger · 2022-06-23T04:37:26.786Z · EA(p) · GW(p)

The post is serious. Details: https://www.lesswrong.com/posts/j9Q8bRmwCgXRYAgcJ/miri-announces-new-death-with-dignity-strategy?commentId=FounAZsg4kFxBDiXs [LW(p) · GW(p)] 

Replies from: Dr. David Mathers
comment by Dr. David Mathers · 2022-06-23T23:31:55.555Z · EA(p) · GW(p)

It seems really bad, from a communications/PR point of view, to write something that was ambiguous in this way. Like, bad enough that it makes me slightly worried that MIRI will commit some kind of big communications error that gets into the newspapers and does big damage to the reputation of EA as a whole.

comment by David Johnston · 2022-06-20T22:46:15.688Z · EA(p) · GW(p)

I agree with many of the comments here that this is overall a bit unfair, and there are good reasons to take Yudkowsky seriously even if you don't automatically accept his self-expressed level of confidence.

My main criticism of Yudkowsky is that he has many innovative/somewhat compelling ideas, but even with many years and a research institution their evolution has been unsatisfying. Many of them are still imprecise, and some of those that are precise(ish) are not satisfactory (e.g the orthogonality thesis, mesa-optimizers). Furthermore, he still doesn't seem very interested in improving this situation.

comment by 𝕮𝖎𝖓𝖊𝖗𝖆 (Dragon God) · 2022-06-19T15:42:31.669Z · EA(p) · GW(p)

I prefer to just analyse and refute his concrete arguments on the object level.

I'm not a fan of engaging the person of the arguer instead of their arguments.

Granted, I don't practice epistemic deference in regards to AI risk (so I'm not the target audience here), but I'm really not a fan of this kind of post. It rubs me the wrong way.

Challenging someone's overall credibility instead of their concrete arguments feels like bad form and [logical rudeness] (https://www.lesswrong.com/posts/srge9MCLHSiwzaX6r/logical-rudeness [LW · GW]).

I wish EAs did not engage in such behaviour and especially not with respect to other members of the community.

Replies from: bmg, Dragon God
comment by Ben Garfinkel (bmg) · 2022-06-19T16:15:30.470Z · EA(p) · GW(p)

I prefer to just analyse and refute his concrete arguments on the object level.

I agree that work analyzing specific arguments is, overall, more useful than work analyzing individual people's track records. Personally, partly for that reason, I've actually done a decent amount of public argument analysis (e.g. here, here, and most recently here) but never written a post like this before.

Still, I think, people do in practice tend to engage in epistemic deference. (I think that even people who don't consciously practice epistemic deference tend to be influenced by the views of people they respect.) I also think that people should practice some level of epistemic deference, particularly if they're new to an area. So - in that sense - I think this kind of track record analysis is still worth doing, even if it's overall less useful than argument analysis.

Replies from: Dragon God
comment by 𝕮𝖎𝖓𝖊𝖗𝖆 (Dragon God) · 2022-06-19T16:25:13.590Z · EA(p) · GW(p)

(I hadn't seen this reply when I made my other reply).

What do you think of legitimising behaviour that calls out the credibility of other community members in the future?

I am worried about displacing the concrete object level arguments as the sole domain of engagement. A culture in which arguments cannot be allowed to stand by themselves. In which people have to be concerned about prior credibility, track record and legitimacy when formulating their arguments...

It feels like a worse epistemic culture.

Replies from: therealslimkt
comment by karthik-t (therealslimkt) · 2022-06-19T17:41:13.965Z · EA(p) · GW(p)

Expert opinion has always been a substitute for object level arguments because of deference culture. Nobody has object level arguments for why x-risk in the 21st century is around 1/6: we just think it might be because Toby Ord says so and he is very credible. Is this ideal? No. But we do it because expert priors are the second best alternative when there is no data to base our judgments off of.

Given this, I think criticizing an expert's priors is functionally an object level argument, since the expert's prior is so often used as a substitute for object level analysis.

I agree that a slippery slope endpoint would be bad but I do not think criticizing expert priors takes us there.

comment by 𝕮𝖎𝖓𝖊𝖗𝖆 (Dragon God) · 2022-06-19T16:23:47.568Z · EA(p) · GW(p)

To expand on my complaints in the above comment.

I do not want an epistemic culture that finds it acceptable to challenge an individuals overall credibility in lieu of directly engaging with their arguments.

I think that's unhealthy and contrary to collaborative knowledge growing.

Yudkowsky has laid out his arguments for doom at length. I don't fully agree with those arguments (I believe he's mistaken in 2 - 3 serious and important ways), but he has laid them out, and I can disagree on the object level with him because of that.

Given that the explicit arguments are present, I would prefer posts that engaged with and directly refuted the arguments if you found them flawed in some way.

I don't like this direction of attacking his overall credibility.

Attacking someone's credibility in lieu of their arguments feels like a severe epistemic transgression.

I am not convinced that the community is better for a norm that accepts such epistemic call out posts.

Replies from: bmg, Holly_Elmore, Guy Raveh
comment by Ben Garfinkel (bmg) · 2022-06-19T16:37:25.844Z · EA(p) · GW(p)

I do not want an epistemic culture that finds it acceptable to challenge an individuals overall credibility in lieu of directly engaging with their arguments.

I think I roughly agree with you on this point, although I would guess I have at least a somewhat weaker version of your view. If discourse about people's track records or reliability starts taking up (e.g.) more than a fifth of the space that object-level argument does, within the most engaged core of people, then I do think that will tend to suggest an unhealthy or at least not-very-intellectually-productive community.

One caveat: For less engaged people, I do actually think it can make sense to spend most of your time thinking about questions around deference. If I'm only going to spend ten hours thinking about nanotechnology risk, for example, then I might actually want to spend most of this time trying to get a sense of what different people believe and how much weight I should give their views; I'm probably not going to be able to make a ton of headway getting a good gears-level-understanding of the relevant issues, particularly as someone without a chemistry or engineering background.

comment by Holly_Elmore · 2022-06-19T18:35:09.197Z · EA(p) · GW(p)

 > I do not want an epistemic culture that finds it acceptable to challenge an individuals overall credibility in lieu of directly engaging with their arguments.

I think it's fair to talk about a person's lifetime performance when we are talking about forecasting. When we don't have the expertise ourselves, all we have to go on is what little we understand and the track records of the experts we defer to. Many people defer to Eliezer so I think it's a service to lay out his track record so that we can know how meaningful his levels of confidence and special insights into this kind of problem are. 

comment by Guy Raveh · 2022-06-20T18:48:33.368Z · EA(p) · GW(p)

I do not want an epistemic culture that finds it acceptable to challenge an individuals overall credibility in lieu of directly engaging with their arguments.

I don't think this is realistic. There is much more important knowledge than one can engage with in a lifetime. The only way of forming views about many things is to somehow decide who to listen to, or at least how to aggregate relevant more strongly based opinions (so, who to count as an expert and who not to and with what weight).

comment by Zach Stein-Perlman (zsp) · 2022-06-19T16:29:47.122Z · EA(p) · GW(p)

Almost all of this seems reasonable. But:

Yudkowsky has previously held short AI timeline views that turned out to be wrong

I don't think we should update based on this, or eg on the fact that we didn't go extinct due to nanotechnology, because anthropics / observer selection. (We should only update based on whether we think the reasons for those beliefs were bad.)

Replies from: Derek Shiller
comment by Derek Shiller · 2022-06-19T17:23:38.031Z · EA(p) · GW(p)

Suppose you've been captured by some terrorists and you're tied up with your friend Eli. There is a device on the other side of the room you that you can't quite make out. Your friend Eli says that he can tell (he's 99% sure) it is a bomb and that it is rigged to go off randomly. Every minute, he's confident there's a 50-50 chance it will explode, killing both of you. You wait a minute and it doesn't explode. You wait 10. You wait 12 hours. Nothing. He starts eying the light fixture, and say's he's pretty sure there's a bomb there too. You believe him?

Replies from: zsp, rhollerith, rhollerith
comment by Zach Stein-Perlman (zsp) · 2022-06-19T17:35:50.562Z · EA(p) · GW(p)

No, my survival for 12 hours is evidence against Eli being correct about the bomb.

So: oops, I think.

Replies from: zsp
comment by Zach Stein-Perlman (zsp) · 2022-06-20T17:00:05.764Z · EA(p) · GW(p)

I'm still not totally comfortable. I think my confusion arose because I was considering the related question of whether I could use my better knowledge than Eli to win money from bets (in expectation) -- I couldn't, because Eli has no reason to bet on the bomb going off. More generally, Eliezer never had reason to bet (in the sense that he gets epistemic credit if he's right) on nanotech-doom-by-2010, because in the worlds where he's right we're dead. It feels weird to update against Eliezer on the basis of beliefs that he wouldn't have bet on; updating against him doesn't seem to be incentive-compatible... but maybe  that's just the sacrifice immanent to the epistemic virtue of publicly sharing your belief in doom.

comment by rhollerith · 2022-06-19T19:43:36.498Z · EA(p) · GW(p)

I am willing to bite your bullet.

I had a comment here explaining my reasoning, but deleted it because I plan to make a post instead.

comment by rhollerith · 2022-06-19T19:03:37.533Z · EA(p) · GW(p)

You're example conflates 2 things: trusting someone else's probability on something and an observer-selection effect. For the purposes of keeping my reply short, please allow me to change your example as follows: "Through some combination of discussion with your friend Eli and your own mental modelling, you become 99% confident that the device is a bomb."

With that change, I will bite your bullet!

In reality a bomb might do something to reveal itself to be a bomb without killing everyone in the room: it might for example start burning or make a very small explosion. But assuming a "perfect" bomb with 100% probability of killing every one in the room if it does anything bomb-like at all, then observing that time has passed without an explosion is zero evidence against the device's being a bomb.

It is counter-intuitive, for sure, but it is so IMHO.

Since (as we have hypothesized) you really have enough causal knowledge about your situation (the room, the terrorists, the device) to arrive at the 99% probability you stated in your example and since (as we have hypothesized) the explosion of the bomb is certain to kill you and since (as we all know) dying prevents you from updating your beliefs, then it would be a violation of the law of the conservation of expected evidence [? · GW] for you to update your beliefs on observing the passage of a minute without the bomb's exploding.

Specifically, since you know that updating on the passage of time in this example would produce inside you a violation of the law of the conservation of expected evidence, which in turn would make you worse at reasoning, you should inhibit your natural human impulse to update on the passage of time.

ADDED. Observational selection effects are very tricky. I just realized that you might escape from the room before the device explodes, which would allow you to observe the explosion. So, to keep our example maximally illuminating, let us change it again. Specifically, let us add the assumption that you have some way to be 100% certain that you won't escape from the room before the explosion of the bomb.

Of course in reality it is impossible to become 100% certain of anything. An interesting question is if you are only 1 minus epsilon certain you won't escape the room and only 1 minus epsilon2 certain that it will not turn out that the device revealed itself to be a bomb (e.g., by spontaneously lighting on fire, then making a small explosion) without killing you, how to calculate the posterior probability the device is a bomb given that M minutes have gone by since the start of your stay in the room. I do not know how to calculate that, but I am fairly sure that it is humanly possible to learn how calculate such probabilities (and if someone wants to pay me, I can probably learn it, but it will probably take me months).

Replies from: Derek Shiller
comment by Derek Shiller · 2022-06-19T20:05:00.254Z · EA(p) · GW(p)

then it would be a violation of the law of the conservation of expected evidence for you to update your beliefs on observing the passage of a minute without the bomb's exploding.

Interesting! I would think this sort of case just shows that the law of conservation of expected evidence is wrong, at least for this sort of application. I figure it might depend on how you think about evidence. If you think of the infinite void of non-existence as possibly constituting your evidence (albeit evidence you're not in a position to appreciate, being dead and all), then that principle wouldn't push you toward this sort of anthropic reasoning.

I am curious, what do you make of the following case?

Suppose you're touring Acme Bomb & Replica Bomb Co with your friend Eli. ABRBC makes bombs and perfect replicas of bombs, but they're sticklers for safety so they alternate days for real bombs and replicas. You're not sure which sort of day it is. You get to the point of the tour where they show off the finished product. As they pass around the latest model from the assembly line, Eli drops it, knocking the safety back and letting the bomb (replica?) land squarely on its ignition button. If it were a real bomb, it would kill everyone unless it were one of the 1-in-a-million bombs that's a dud. You hold your breath for a second but nothing happens. Whew. How much do you want to bet that it's a replica day?

Replies from: rhollerith
comment by rhollerith · 2022-06-20T03:16:29.110Z · EA(p) · GW(p)

Nice example.

Bombs in reality have some probability of "fizzling", which means making a small explosion that probably won't kill anyone. If we assume that P(fizzle) is zero, then (and I know that most of my readers won't believe this) the observed result of Eli's little accident is zero evidence for or against the bomb's being a replica. So, P(today is a replica day) prior to Eli's accident is .5 (on which I expect you to agree with me) and the posterior probability is likewise .5 IMHO.

I know (because I've written a few comments on observational selection effects before, on lesswrong) that no one is going to agree with me on that. Maybe the long post I have planned will convince some of you.

I would prefer to post on lesswrong rather than on the EA forum, but I also want you to be able to reply to my post, so let me ask you if you mind replying there (which would of course require you to make an account there if you haven't already).

comment by Yonatan Cale (hibukki) · 2022-06-20T21:13:41.244Z · EA(p) · GW(p)

I think posts like this better open with "but consider forming your own opinions rather than relying on experts"

comment by genidma · 2022-06-25T13:31:01.111Z · EA(p) · GW(p)

Tldr

  • Personally and from my very uneducated vantage point. I question why a superintelligence with a truly universal set of ethics, would pose a risk to other lifeforms. But I also do not know how the initial conditions can be architected. If indeed the initial conditions can be set/architected. That could go a different set of ways and depending on who's values.
  • What I worry about is what humans (enhanced or not) and cyborgs may chose to do with the bread-crumbs (the leftovers). Or the steps taken to get to AGI.

Here is a schematic (link below) that I started meditating on yesterday. I am not sure if it's polite to share, particularly in light of a reality that I have not taken the time to absorb the post above. But here goes and sharing it, as it may (or may not) help provide some value to someone. Hopefully in a manner that is reasonable. https://qr.ae/pvoVJn 

comment by Charles He · 2022-06-19T17:13:47.630Z · EA(p) · GW(p)

The karma on this post is impressive especially since OP could have started this AM UK time but didn’t.

I want to say stuff but it’s not going to help?

comment by kokotajlod · 2022-06-19T23:33:32.669Z · EA(p) · GW(p)

I don't defer much myself on these matters (to anyone) and I don't recommend other people do. In fact I think that if people deferred less and read & thought through the arguments themselves instead, more people in the broad EA community would update closer to Yudkowsky's position than away from it. That's what happened to me.

But that said:

It seems to me that between about 2004 and 2014, Yudkowsky was the best person in the world to listen to on the subject of AGI and AI risks. That is, deferring to Yudkowsky would have been a better choice than deferring to literally anyone else in the world. Moreover, after about 2014 Yudkowsky would probably have been in the top 10; if you are going to choose 10 people to split your deference between (which I do not recommend, I recommend thinking for oneself), Yudkowsky should be one of those people and had you dropped Yudkowsky from the list in 2014 you would have missed out on some important stuff. Would you agree with this?

On the positive side, I'd be interested to see a top ten list from you of people you think should be deferred to as much or more than Yudkowsky on matters of AGI and AI risks.*

*What do I mean by this? Idk, here's a partial operationalization: Timelines, takeoff speeds, technical AI alignment, and p(doom).