Intellectual Diversity in AI Safety

post by KR · 2020-07-22T19:07:24.538Z · EA · GW · 8 comments


  Why do I think that drawing in a wide variety of viewpoints is important?

There are these undercurrents running through the way I hear people talk about everyone not already inside the AI-safety umbrella that imply they’re not worth talking to until they understand all the basic premises, where basic premises are something like “all of Superintelligence and some of Yudkowsky”. If you talk to these AI safety people, they’re generally willing to acknowledge some version of this pretty explicitly.

No one wants to rehash the same arguments a million times. (“So, like Skynet? Killer robots? Come on, you can just unplug it.”) But if everyone has to be more-or-less on board with some mandatory reading as the price of entry, you’re going to get a more homogeneous field than you otherwise could have gotten.

Why do I think that drawing in a wide variety of viewpoints is important?

The less varied the intellectual pedigree of AI safety is, the more likely it is that everyone is making correlated mistakes.

In my opinion, the landscape of AI’s future is dominated by unknown unknowns. We have not yet even thought of all of the ways it could go, let alone which are more likely or how to deal with them.

In part, I think the homogeneity of people’s background worldviews is an effect of the small number of people that quite recently drew a reasonably large group of people’s attention to the issue, which is only to their credit (otherwise, there might be no conversation to speak of, homogeneous or otherwise). But if you’re trying to do creative work and come up with as many possibilities as you can, you want intellectual diversity in the people who are thinking about the problem. If everyone’s first exposure to AI safety involved foom, for instance, they’re going to be thinking very different thoughts from someone who’s never heard of it. Even if they disagree, it might color their later intuitions.

It seems to me that AI safety has already allowed weak, confused, or just plain incorrect arguments to stand due to insufficient questioning of shared assumptions. Ben Garfinkel argues in On Classic Arguments for AI Discontinuities that classical arguments fail to adequately distinguish between a sudden jump to AGI and one from AGI to superintelligent systems. By arguing for the latter assuming the former, they overestimate the possibility of a catastrophic jump from AGI to superintelligence.

That’s one set of assumptions that someone has put in effort to untangle. I would be very surprised if there weren’t a lot more buried in our fundamental understanding of the issues.


The obvious counter-argument is that most fields do not work like this and seem to be the better for it. No one’s going to take a biologist seriously if they’re running around quoting Lamark. Deriving your own physics from first principles is the domain of crackpots. In general, discarding the work of previous thinkers wholesale is not often a good idea.

Why do I think it’s worth trying here? AI safety is a pre-paradigmatic science that is much newer than biology and physics. As it stands, it is also much less grounded in testable facts. A lot of intellectual progress in the basic underpinnings seems to be made when someone says “I thought of a way that AI could go, here’s a blog post about why I think so”. If it’s a good, persuasive seeming argument, some people integrate it into their worldviews and consider that as a scenario that needs to be prepared for.

Other downsides:


I don’t think all the existing arguments are bad, or that we should jettison everything and start over, or anything so dramatic. The current state of knowledge is the work of a lot of very smart people that have created something very valuable. But I do think it would be helpful to aim for a wider variety of viewpoints.

Some possible actions:

Obviously this is kind of dumb as presented, no one does math by teaching people basic algebra and then going “okay, now rederive modern mathematics”, but I suspect there’s a better thought out version of this proposal that might have interesting results.


Comments sorted by top scores.

comment by Robert_Wiblin · 2020-07-23T11:14:21.474Z · EA(p) · GW(p)

It seems like lots of active AI safety researchers, even a majority, are aware of Yudkowsky and Bostrom's views but only agree with parts of what they have to say (e.g. Russell, Amodei, Christiano, the teams at DeepMind, OpenAI, etc).

There may still not be enough intellectual diversity, but having the same perspective as Bostrom or Yudkowsky isn't a filter to involvement.

comment by Geoffrey Irving (irving) · 2020-07-22T23:07:42.265Z · EA(p) · GW(p)

We should also mention Stuart Russell here, since he’s certainly very aware of Bostrom and MIRI but has different detail views and is very grounded in ML.

comment by Geoffrey Irving (irving) · 2020-07-22T22:14:45.933Z · EA(p) · GW(p)

I started working on AI safety prior to reading Superintelligence and despite knowing about MIRI et al. since I didn‘t like their approach. So I don’t think I agree with your initial premise that the field is as much a monoculture as you suggest.

Replies from: Buck, KR
comment by Buck · 2020-07-22T22:22:42.374Z · EA(p) · GW(p)

I'm curious what your experience was like when you started talking to AI safety people after already coming to come of your own conclusions. Eg I'm curious if you think that you missed major points that the AI safety people had spotted which felt obvious in hindsight, or if you had topics on which you disagreed with the AI safety people and think you turned out right.

Replies from: irving, irving
comment by Geoffrey Irving (irving) · 2020-07-22T23:00:58.384Z · EA(p) · GW(p)

I think mostly I arrived with a different set of tools and intuitions, in particular a better sense for numerical algorithms (Paul has that too, of course) and thus intuition about how things should work with finite errors and how to build toy models that capture the finite error setting.

I do think a lot of the intuitions built by Bostrom and Yudkowsky are easy to fix into a form that works in the finite error model (though not all of it), so I don’t agree with some of the recent negativity about these classical arguments. That is, some fixing is required to make me like those arguments, but it doesn’t feel like the fixing is particularly hard.

comment by Geoffrey Irving (irving) · 2020-07-22T23:13:20.471Z · EA(p) · GW(p)

In the other direction, I started to think about this stuff in detail at the same time I started working with various other people and definitely learned a ton from them, so there wasn’t a long period where I had developed views but hadn’t spent months talking to Paul.

comment by KR · 2020-07-22T22:39:17.046Z · EA(p) · GW(p)

My impression is that people like you are pretty rare, but all of this is based off subjective impressions and I could be very wrong.

Have you met a lot of other people who came to AI safety from some background other than the Yudkowsky/Superintelligence cluster?

Replies from: irving
comment by Geoffrey Irving (irving) · 2020-07-22T22:56:27.070Z · EA(p) · GW(p)

Well, part of my job is making new people that qualify, so yes to some extent. This is true both in my current role and in past work at OpenAI (e.g.,