Why EAs are skeptical about AI Safety

post by Lukas Trötzmüller (Lukas T) · 2022-07-18T19:01:37.457Z · EA · GW · 31 comments

Contents

  Summary
  Introduction
    Methodology
    How to read these arguments
    Demographics
      Current level of EA involvement
      Experience with AI Safety
      Are you working professionally with AI or ML?
    Personal Remarks
  Part 1: Arguments why existential risk from AGI seems implausible
    Progress will be slow / Previous AI predictions have been wrong
    AGI development is limited by a "missing ingredient"
    AI will not generalize to the real world
    Great intelligence does not translate into great influence
    AGI development is limited by neuroscience
    Many things would need to go wrong
    AGI development is limited by training
    Recursive self-improvement seems implausible
    AGI will be used as a tool with human oversight
    Humans collaborating are stronger than one AGI
    Constraints on Power Use and Resource Acquisition
    It is difficult to affect the physical world without a body
    AGI would be under strict scrutiny, preventing bad outcomes
    Alignment is easy
    Alignment is impossible
    We will have some amount of alignment by default
    Just switch it off
    AGI might have enormous benefits that outweigh the risks
    Civilization will not last long enough to develop AGI
    People are not alarmed
  Part 2: Arguments why AI Safety might be overrated within EA
    The A-Team is already working on this
    We are already on a good path
    Concerns about community, epistemics and ideology
    We might overlook narrow AI risk
    We might push too many people into the AI safety field
    I want more evidence
    The risk is too small
    Small research institutes are unlikely to have an impact
    Some projects have dubious benefits
    Why don't we just ban AI development?
    AI safety research is not communicated clearly enough for me to be convinced
    Investing great resources requires a great justification
    EA might not be the best platform for this
    Long timelines mean we should invest less
    We still have to prioritize
    We need a better Theory of Change
    Rogue Researchers
  Recent Resources on AI Safety Communication & Outreach
  Ideas for future projects
  Looking for Collaborators
None
31 comments

TL;DR I interviewed 22 EAs who are skeptical about AI safety. My belief is that there is demand for better communication of AI safety arguments. I have several project ideas aimed in that direction, and am open to meeting potential collaborators (more info at the end of the post).

Summary

I interviewed 22 EAs who are skeptical about existential risk from Artificial General Intelligence (AGI), or believe that it is overrated within EA. This post provides a comprehensive overview of their arguments. It can be used as a reference to design AI safety communication within EA, as a conversation starter, or as the starting point for further research.

Introduction

In casual conversation with EAs over the past months, I found that many are skeptical of the importance of AI safety. Some have arguments that are quite well reasoned. Others are bringing arguments that have been convincingly refuted somewhere - but they simply did not encounter that resource, and stopped thinking about it.

It seems to me that the community would benefit from more in-depth discussion between proponents and skeptics of AI Safety focus. To facilitate more of that, I conducted interviews with 22 EAs who are skeptical about AGI risk. The interviews circled around two basic questions:

  1. Do you believe the development of general AI can plausibly lead to human extinction (not including cases where bad actors intentionally use AI as a tool)?
  2. Do you believe AI safety is overrated within EA as a cause area?

Only people who said no to (1) or yes to (2) were interviewed. The goal was to get a very broad overview of their arguments.

Methodology

My goal was to better understand the viewpoints of my interview partners - not to engage in debate or convince anyone. That being said, I did bring some counterarguments to their position if that was helpful to gain better understanding.

The results are summarized in a qualitative fashion, making sure every argument is covered well enough. No attempt was made to quantify which arguments occured more or less often.

Most statements are direct quotes, slightly cleaned up. In some cases, when the interviewee spoke verbosely, I suggested a summarized version of their argument, and asked for their approval.

Importantly, the number of bullet points for each argument below does not indicate the prevalence of an argument. Sometimes, all bullet points correspond to a single interviewee and sometimes each bullet point is from a different person. Sub-points indicate direct follow-ups or clarifications from the same person.

Some interviewees brought arguments against their own position. These counterarguments are only mentioned if they are useful to illuminate the main point.

General longtermism arguments without relation to AI Safety were omitted.

How to read these arguments

Some of these arguments hint towards specific ways in which AI safety resources could be improved. Others might seem obviously wrong or contradictory in themselves, and some might even have factual errors.

However, I believe all of these arguments are useful data. I would suggest looking behind the argument and figuring out how each point hints at specific ways in which AI safety communication can be improved.

Also, I take responsibility for some of the arguments perhaps making more sense in the original interview than they do here in this post (taken out of context and re-arranged into bullet points).

Demographics

Interview partners were recruited from the /r/EffectiveAltruism subreddit, from the EA Groups slack channel and the EA Germany slack channel, as well as the Effective Altruism Facebook group. The invitation text was roughly like this:

Have you heard about the concept of existential risk from Advanced AI? Do you think the risk is small or negligible, and that advanced AI safety concerns are overblown? I'm doing research into people's beliefs on AI risk. Looking to interview EAs who believe that AI safety gets too much attention and is overblown.

Current level of EA involvement

How much time each week do you spend on EA activities, including your high-impact career, reading, thinking and meeting EAs?

Experience with AI Safety

How much time did you spend, in total, reading / thinking / talking / listening about AI safety?

Are you working professionally with AI or ML?

Personal Remarks

I greatly enjoyed having these conversations. My background is having studied AI Safety for about 150 hours throughout my EA career. I started from a position of believing in substantial existential AGI risk this century. Only a small number of arguments seemed convincing to me, and I have not meaningfully changed my own position through these conversations. I have, however, gained a much deeper appreciation of the variety of counterarguments that EAs tend to have.

Part 1: Arguments why existential risk from AGI seems implausible

Progress will be slow / Previous AI predictions have been wrong

AGI development is limited by a "missing ingredient"

AI will not generalize to the real world

Great intelligence does not translate into great influence

AGI development is limited by neuroscience

Many things would need to go wrong

AGI development is limited by training

Recursive self-improvement seems implausible

AGI will be used as a tool with human oversight

Humans collaborating are stronger than one AGI

Constraints on Power Use and Resource Acquisition

It is difficult to affect the physical world without a body

AGI would be under strict scrutiny, preventing bad outcomes

Alignment is easy

Alignment is impossible

We will have some amount of alignment by default

Just switch it off

AGI might have enormous benefits that outweigh the risks

Civilization will not last long enough to develop AGI

People are not alarmed

Part 2: Arguments why AI Safety might be overrated within EA

The A-Team is already working on this

We are already on a good path

Concerns about community, epistemics and ideology

We might overlook narrow AI risk

We might push too many people into the AI safety field

I want more evidence

The risk is too small

Small research institutes are unlikely to have an impact

Some projects have dubious benefits

Why don't we just ban AI development?

AI safety research is not communicated clearly enough for me to be convinced

Investing great resources requires a great justification

EA might not be the best platform for this

Long timelines mean we should invest less

We still have to prioritize

We need a better Theory of Change

Rogue Researchers

Recent Resources on AI Safety Communication & Outreach

Ideas for future projects

Looking for Collaborators

I am looking for ways to improve the AI safety outreach and discourse. Either by getting involved in an existing project or launching something new. Send me a message if you're interested to collaborate, would like help with your project, or would just like to bounce ideas around.

31 comments

Comments sorted by top scores.

comment by Roddy MacSween · 2022-07-18T21:53:14.167Z · EA(p) · GW(p)

I think it would be interesting to have various groups (e.g. EAs who are skeptical vs worried about AI risk) rank these arguments and see how their lists of the top ones compare.

comment by Yonatan Cale (hibukki) · 2022-07-18T22:49:55.508Z · EA(p) · GW(p)

Nice quality user research!

Consider adding a TL;DR including your calls to action - looking for collaborators and ideas for future projects, which I think will interest people

comment by Denise_Melchin · 2022-07-19T08:30:04.403Z · EA(p) · GW(p)

Thanks for doing this!

The strength of the arguments is very mixed as you say. If you wanted to find good arguments, I think it might have been better to focus on people with more exposure to the arguments. But knowing more about where a diverse set of EAs is at in terms of persuasion is good too, especially for AI safety community builders.

comment by niplav · 2022-07-19T09:09:55.715Z · EA(p) · GW(p)

This solidifies a conclusion for me: when talking about AI risk, the best/most rigorous resources aren't the ones which are most widely shared/recommended (rigorous resources are e.g. Ajeya Cotra's report on AI timelines, Carlsmith's report on power-seeking AI, Superintelligence by Bostrom or (to a lesser extent) Human Compatible by Russell).

Those might still not be satisfying to skeptics, but are probably more satisfying than " short stories by Eliezer Yudkowsky" (though one can take an alternative angle: skeptics wouldn't bother reading a >100 page report, and I think the complaint that it's all short stories by Yudkowsky comes from the fact that that's what people actually read).

Additionally, there appears to be a perception that AI safety research is limited to MIRI & related organisations, which definitely doesn't reflect the state of the field—but from the outside this multipolarity might be hard to discover (outgroup-ish homogeneity bias strikes again).

Replies from: Eddie K, Lukas T, Guy Raveh, Quadratic Reciprocity
comment by ekka (Eddie K) · 2022-07-20T14:48:37.303Z · EA(p) · GW(p)

Personally I find Human Compatible the best resource of the ones you mentioned. If it were just the others I'd be less bought into taking AI risk seriously.

Replies from: niplav
comment by niplav · 2022-07-20T15:05:45.142Z · EA(p) · GW(p)

I agree that it occupies a spot on the layperson-understandability/rigor Pareto-frontier, but that frontier is large and the other things I mentioned are at other points.

Replies from: Eddie K
comment by ekka (Eddie K) · 2022-07-21T05:09:06.178Z · EA(p) · GW(p)

Indeed. It just felt more grounded in reality to me than the other resources which may appeal more to us laypeople and the non laypeople prefer more speculative and abstract material.

Replies from: Oliver Sourbut
comment by Oliver Sourbut · 2022-07-21T07:59:11.865Z · EA(p) · GW(p)

Seconded/thirded on Human Compatible being near that frontier. I did find its ending 'overly optimistic' in the sense of framing it like 'but lo, there is a solution!' while other similar resources like Superintelligence and especially The Alignment Problem seem more nuanced in presenting uncertain proposals for paths forward not as oven-ready but preliminary and speculative.

comment by Lukas Trötzmüller (Lukas T) · 2022-07-19T09:57:55.098Z · EA(p) · GW(p)

I'm not quite sure I read the first two paragraphs correctly. Are you saying that Cotra, Carlsmith and Bostrom are the best resources but they are not widely recommended? And people mostly read short posts, like those by Eliezer, and those are accessible but might not have the right angle for skeptics?

Replies from: niplav
comment by niplav · 2022-07-19T12:55:05.325Z · EA(p) · GW(p)

Yes, I think that's a fair assessment of what I was saying.

Maybe I should have said that they're not widely recommended enough on the margin, and that there are surely many other good & rigorous-ish explanations of the problem out there.

I'm also always disappointed when I meet EAs who aren't deep into AI safety but curious, and the only things they have read is the List of Lethalities & the Death with Dignity post :-/ (which are maybe true but definitely not good introductions to the state of the field!)

Replies from: Pablo_Stafforini
comment by Pablo (Pablo_Stafforini) · 2022-07-19T14:01:07.950Z · EA(p) · GW(p)

As a friendly suggestion, I think the first paragraph of your original comment would be less confusing if the parenthetical clause immediately followed "the best/most rigorous resources". This would make it clear to the reader that Cotra, Carlsmith, et al are offered as examples of best/most rigorous resources, rather than as examples of resources that are widely shared/recommended.

Replies from: niplav
comment by niplav · 2022-07-20T09:22:50.026Z · EA(p) · GW(p)

Thanks, will edit.

comment by Guy Raveh · 2022-07-19T15:38:35.835Z · EA(p) · GW(p)

There are short stories by Yudkowsky? All I ever encountered were thousands-of-pages-long sequences of blog posts (which I hence did not read, as you suggest).

Replies from: hibukki, Lumpyproletariat
comment by Yonatan Cale (hibukki) · 2022-07-19T17:27:19.606Z · EA(p) · GW(p)

Lots of it is here [? · GW]

comment by Lumpyproletariat · 2022-08-03T22:40:53.901Z · EA(p) · GW(p)

If you're unconvinced about AI danger and you tell me what specifically are your cruxes, I might be able to connect you with Yudkowskian short stories that address your concerns. 

The ones which come immediately to mind are:

That Alien Message [LW · GW]

Sorting Pebbles Into Correct Heaps [LW · GW]

comment by Quadratic Reciprocity · 2022-08-14T19:06:22.569Z · EA(p) · GW(p)

I think I would have found Ajeya's cold takes guest post on "Why AI alignment could be hard with modern deep learning" persuasive back when I was skeptical. It is pretty short. I think the reason why I didn't find what you call "short stories by Eliezer Yudkowsky" persuasive was because they tended to not use concepts / terms from ML. I guess even stuff like orthogonality thesis and instrumental convergence thesis was not that convincing to me on a gut level even though I didn't disagree with the actual argument for them because I had the intuition that whether misaligned AI was a big deal depended on details of how ML actually worked, which I didn't know. To me back then it looked like most people I knew with much more knowledge of ML were not concerned about AI x-risk so probably it wasn't a big deal. 

comment by Marshall (mpt7) · 2022-07-19T10:57:21.362Z · EA(p) · GW(p)

Thanks! I thought this was great. I really like the goals of fostering a more in-depth discussion and understanding skeptics' viewpoints. 

I'm not sure about modeling a follow-up project on Skeptical Science, which is intended (in large part) to rebut misinformation about climate change. There's essentially consensus in the scientific community that human beings are causing climate change, so such a project seems appropriate.

  •  Is there an equally high level of expert consensus on the existential risks posed by AI?
  • Have all of the strongest of the AI safety skeptics' arguments been thoroughly debunked using evidence, logic, and reason?

If the answer to either of these questions is "no," then maybe more foundational work (in the vein of this interview project) should be done first. I like your idea of using double crux interviews to determine which arguments are the most important.

One other idea would be to invite some prominent skeptics and proponents to synthesize the best of their arguments and debate them, live or in writing, with an emphasis on clear, jargon-free language (maybe such a project already exists?).

Replies from: reallyeli, casebash
comment by Eli Rose (reallyeli) · 2022-07-20T07:04:00.327Z · EA(p) · GW(p)

Is there an equally high level of expert consensus on the existential risks posed by AI?

There isn't. I think a strange but true and important fact about the problem is that it just isn't a field of study in the same way e.g. climate science is — as argued in this Cold Takes post. So it's unclear who the relevant "experts" should be. Technical AI researchers are maybe the best choice, but they're still not a good one; they're in the business of making progress locally, not forecasting what progress will be globally and what effects that will have.

Replies from: mpt7
comment by Marshall (mpt7) · 2022-07-20T11:16:08.060Z · EA(p) · GW(p)

Thanks! I agree - AI risk is at a much earlier stage of development as a field. Even as the field develops and experts can be identified, I would not expect a very high degree of consensus. Expert consensus is more achievable for existential risks such as climate science and asteroid impacts that can be mathematically modeled with high historical accuracy - there's less to dispute on empirical / logical grounds. 

A campaign to educate skeptics seems appropriate for a mature field with high consensus, whereas constructively engaging skeptics supports the advancement of a nascent field with low consensus.

comment by Chris Leong (casebash) · 2022-07-20T06:42:58.885Z · EA(p) · GW(p)
One other idea would be to invite some prominent skeptics and proponents to synthesize the best of their arguments and debate them, live or in writing, with an emphasis on clear, jargon-free language (maybe such a project already exists?).

This is a pretty good idea!

comment by Raphaël S (charbel-raphael-segerie) · 2022-07-19T11:06:49.939Z · EA(p) · GW(p)

We could use kialo, a web app, to map those points and their counterarguments

Replies from: charbel-raphael-segerie, Harrison D
comment by Raphaël S (charbel-raphael-segerie) · 2022-07-19T11:09:25.719Z · EA(p) · GW(p)

I can organize a session with my AI safety novice group to build the kialo

comment by Harrison Durland (Harrison D) · 2022-07-20T11:58:50.152Z · EA(p) · GW(p)

I have been suggesting this (and other uses of Kialo) for a while, although perhaps not as frequently or forcefully as I ought to… I( would recommend linking to the site, btw)

comment by jacobpfau · 2022-07-20T19:55:21.728Z · EA(p) · GW(p)

Do you have a sense of which argument(s) were most prevalent and which were most frequently the interviewees crux?

It would also be useful to get a sense of which arguments are only common among those with minimal ML/safety engagement. If basic AI safety engagement reduces the appeal of a certain argument, then there's little need for further work on messaging in that area.

comment by Vaidehi Agarwalla (vaidehi_agarwalla) · 2022-07-19T23:59:21.938Z · EA(p) · GW(p)

Do you think the wording "Have you heard about the concept of existential risk from Advanced AI? Do you think the risk is small or negligible, and that advanced AI safety concerns are overblown? " might have biased your sample in some way? 

E.g. I can imagine people who are very worried about alignment but don't think current approaches are tractable. 

Replies from: thecommexokid, Lukas T
comment by thecommexokid · 2022-07-21T16:37:57.921Z · EA(p) · GW(p)

In case "I can imagine" was literal, then let me serve as proof-of-concept, as a person who thinks the risk is high but there's nothing we can do about it short of a major upheaval of the culture of the entire developed world. 

comment by Lukas Trötzmüller (Lukas T) · 2022-07-20T07:28:04.489Z · EA(p) · GW(p)

The sample is biased in many ways: Because of the places where I recruited, interviews that didn't work out because of timezone difference, people who responded too late, etc. I also started recruiting on Reddit and then dropped that in favour of Facebook.

So this should not be used as a representative sample, rather it's an attempt to get a wide variety of arguments.

I did interview some people who are worried about alignment but don't think current approaches are tractable. And quite a few people who are worried about alignment but don't think it should get more resources.

Referring to my two basic questions listed at the top of the post, I had a lot of people say "yes" to (1). So they are worried about alignment. I originally planned to provide statistics on agreement / disagreement on questions 1/2 but it turned out that it's not possible to make a clear distinction between the two questions - most people, when discussing (2) in detail, kept referring back to (1) in complex ways.

comment by Harrison Durland (Harrison D) · 2022-07-20T12:12:50.371Z · EA(p) · GW(p)

Once again, I’ll say that a study which analyzed the persuasion psychology/sociology of “x-risk from AI” (e.g., what lines of argument are most persuasive to what audiences, what’s the “minimal distance / max speed” people are willing to go from “what is AI risk” to “AI risk is persuasive,” how important is expert statements vs. theoretical arguments, what is the role of fiction in magnifying or undermining AI x-risk fears) seems like it would be quite valuable.

Although I’ve never held important roles or tried to persuade important people, in my conversations with peers I have found it difficult to walk the line between “sounding obsessed with AI x-risk” and “under emphasizing the risk,” because I just don’t have a good sense of how fast I can go from someone being unsure of whether AGI/superintelligence is even possible to “AI x-risk is >10% this century.”

comment by levin · 2022-07-20T11:03:20.767Z · EA(p) · GW(p)

Just added a link to the "A-Team is already working on this" section of this post to my "(Even) More EAs Should Try AI Safety Technical Research [EA · GW]," where I observe that people who disagree with basically every other claim in this post still don't work on AI safety because of this (flawed) perception.

comment by Locke · 2022-07-21T16:32:38.861Z · EA(p) · GW(p)

Did any of these arguments change your beliefs about AGI? I'd always love to get a definition of General Intelligence since that seems like an ill posed concept. 

Replies from: Samuel Shadrach
comment by acylhalide (Samuel Shadrach) · 2022-07-28T06:53:01.699Z · EA(p) · GW(p)

Yup agreed this is an open question.

Some stuff I've heard so far:

 - Homo sapiens went from ape-level cognition to massively capable, inventing language and all the vast capabilities it entails, in a very short span of time. See this list of features unique to human language. The fact that we invented all this capability in a short span of time means there weren't a large number of "special sauces" that had to each be evolutionarily selected for, it's one core change that captures most of our capability.

 - One way of defining "general" is just applying one capability obtained from one task, to a different task. Like if you are good at math and symbolically manipulating symbols, and you use this to design a reactor which is an entirely different task from doing math well. Animals don't seem to show much of such transfer. ML systems however are already showing capacity for transfer learning.

 - Lots of tasks require generality in the sense that learning general capabilities is by far the easiest way to git gud at those tasks. Brute forcing your way through 10^24 ways how not to build a reactor is an impractical way to learn how to build a reactor, whereas if you have a core that allows you to generalise stuff you already know, such as math, this works. It's not enough to know math, you also need to know how to generalise your math capability to designing a reactor.

 - Rob Bensinger has an analogy for this previous point where he says that it is weird to teach someone to park their car in the first parking spot or the third parking spot, such that they completely lack the capability to park in any other spot. Successfully teaching for such broad tasks automatically means teaching the agent how to also do all sorts of other tasks.