Comment by irving on An update in favor of trying to make tens of billions of dollars · 2021-10-16T12:44:58.635Z · EA · GW

The rest of this comment is interesting, but opening with “Ummm, what?” seems bad, especially since it takes careful reading to know what you are specifically objecting to.

Edit: Thanks for fixing!

Comment by irving on Why aren't you freaking out about OpenAI? At what point would you start? · 2021-10-10T23:08:07.900Z · EA · GW

Unfortunately we may be unlikely to get a statement from a departed safety researcher beyond mine (, at least currently.

Comment by irving on Why aren't you freaking out about OpenAI? At what point would you start? · 2021-10-10T18:46:35.339Z · EA · GW

It can’t be up to date, since they recently announced that Helen Toner joined the board, and she’s not listed.

Comment by irving on Why aren't you freaking out about OpenAI? At what point would you start? · 2021-10-10T18:45:56.045Z · EA · GW

Unfortunately, a significant part of the situation is that people with internal experience and a negative impression feel both constrained and conflicted (in the conflict of interest sense) for public statements. This applies to me: I left OpenAI in 2019 for DeepMind (thus the conflicted).

Comment by irving on Why aren't you freaking out about OpenAI? At what point would you start? · 2021-10-10T14:58:50.227Z · EA · GW

Is Holden still on the board?

Comment by irving on Seeking social science students / collaborators interested in AI existential risks · 2021-09-25T13:51:21.192Z · EA · GW

I'm the author of the cited AI safety needs social scientists article (along with Amanda Askell), previously at OpenAI and now at DeepMind.  I currently work with social scientists in several different areas (governance, ethics, psychology, ...), and would be happy to answer questions (though expect delays in replies).

Comment by irving on DeepMind is hiring Long-term Strategy & Governance researchers · 2021-09-13T20:10:23.922Z · EA · GW

I lead some of DeepMind's technical AGI safety work, and wanted to add two supporting notes:

  1. I'm super happy we're growing strategy and governance efforts!
  2. We view strategy and governance questions as coupled to technical safety, and are working to build very close links between research in the two areas so that governance mechanisms and alignment mechanisms can be co-designed.  (This also applies to technical safety and the Ethics Team, among other teams.)
Comment by irving on It takes 5 layers and 1000 artificial neurons to simulate a single biological neuron [Link] · 2021-09-07T22:10:41.361Z · EA · GW

This paper has at least two significant flaws when used to estimate relative complexity for useful purposes.  In the authors' defense such an estimate wasn't the main motivation of the paper, but the Quanta article is all about estimation and the paper doesn't mention the flaws.

Flaw one: no reversed control
Say we have two parameterized model classes  and , and ask  what ns are necessary for  to approximate  and  to approximate .  It is trivial to construct model classes for which the n is large in both directions, just because  is a much better algorithm to approximate  than  and vice versa.  I'm not sure how much this cuts off the 1000 estimate, but it could easily be 10x.

Brief Twitter thread about this:

Flaw two: no scaling w.r.t. multiple neurons
I don't see any reason to believe the 1000 factor would remain constant as you add more neurons, so that we're approximating many real neurons with many (more) artificial neurons.  In particular, it's easy to construct model classes where the factor decays to 1 as you add more real neurons.  I don't know how strong this effect is, but again there is no discussion or estimation of it in the paper.

Comment by irving on Can you control the past? · 2021-09-02T13:28:54.493Z · EA · GW

Ah, I see: you’re going to lean on the difference between “cause” and “control”. So to be clear: I am claiming that, as an empirical matter, we also can’t control the past, or even “control” the past.

To expand, I’m not using physics priors to argue that physics is causal, so we can’t control the past. I’m using physics and history priors to argue that we exist in the non-prediction case relative to the past, so CDT applies.

Comment by irving on Can you control the past? · 2021-09-02T13:22:15.759Z · EA · GW

By “physics-based” I’m lumping together physics and history a bit, but it’s hard to disentangle them especially when people start talking about multiverses. I generally mean “the combined information of the laws of physics and our knowledge of the past”. The reason I do want to cite physics too, even for the past case of (1), is that if you somehow disagreed about decision theorists in WW1 I’d go to the next part of the argument, which is that under the technology of WW1 we can’t do the necessary predictive control (they couldn’t build deterministic twins back then).

However, it seems like we’re mostly in agreement, and you could consider editing the post to make that more clear. The opening line of your post is “I think that you can “control” events you have no causal interaction with, including events in the past.” Now the claim is “everyone agrees about the relevant physics — and in particular, that you can’t causally influence the past”. These two sentences seem inconsistent, and especially since your piece is long and quite technical opening with a wrong summary may confuse people.

I realize you can get out of the inconsistency by leaning on the quotes, but it still seems misleading.

Comment by irving on What problems in society are both mathematically challenging, verifiable, and competitively rewarding? · 2021-08-30T14:22:52.303Z · EA · GW

As a high-level comment, it seems bad to structure the world so that the smartest people compete against each other in zero-sum games.  It's definitely the case that zero-sum games are the best way to ensure technical hardness, as the games will by construction be right at the threshold of playability.  But if we do this we're throwing most of the value away in comparison to working on positive-sum games.

Comment by irving on Is volunteer computing an easily accessible way of effective altruism? · 2021-08-28T20:48:37.081Z · EA · GW

Unfortunately, this is unlikely to be an effective use of resources (speaking as someone who worked in high-performance computing for the past 18 years).  The resources you can contribute will be dwarfed by the volume and efficiency of cloud services and supercomputers.  Even then, due to network constraints the only possible tasks will be embarrassingly parallel computations that do not stress network or memory, and very few scientific computing tasks have this form.

Comment by irving on Can you control the past? · 2021-08-28T13:39:47.484Z · EA · GW

So certainly physics-based priors is a big component, and indeed in some sense is all of it.  That is, I think physics-based priors should give you an immediate answer of "you can't influence the past with high probability", and moreover that once you think through the problems in detail the conclusion will be that you could influence the past if physics were different (including boundary conditions, even if laws remain the same), but still that boundary condition priors should still tell us you can't influence the past.  I'm happy to elaborate.

First, I think saying CDT is wrong, full stop, is much less useful than saying that CDT has a limited domain of applicability (using Sean Carroll's terminology from The Big Picture).  Analogously, one shouldn't say that Newtonian physics is wrong, but that it is has a limited domain of applicability, and one should be careful to apply it only in that domain.  Of course, you can choose to stick to the "wrong" terminology; the claim is only that this is less useful.

So what's the domain of applicability of CDT?  Roughly, I think the domain is cases where the agent can't be predicted by other agents in the world.  I personally like to call this the "free will" case, but that's my personal definition, so if you don't like that definition we can call it the non-prediction case.  The deterministic twin case violates this, as there is a dimension of decision making where non-prediction fails: each twin can perfectly predict the other's actions conditional on their own actions.  So deterministic twins are outside the domain of applicability of CDT.

A consequence of this view is whether we are in or out of the domain of applicability of CDT is an empirical question: you can't resolve it from pure theory.  I further claim (without pinning down the definitions very well) that "generic, un-tuned" situations fall into the non-prediction case. This is again an empirical claim, and roughly says that "something needs to happen" to be outside the non-prediction case.  In the deterministic twin case, this "something" is the intentional construction of the twins.  Some detailed claims:

  1. Humanity's past fits the non-prediction case.   For example, it is not the case that "perhaps you and some of the guards are implementing vaguely similar decision procedures at some level" in World War 1, not least because most of decision theory was invented  after World War 1.  Again, this is a purely empirical claim: it could have been otherwise, and I'm claiming it wasn't.
  2. The multiverse fits the non-prediction case.  I also believe that once we have a sufficient understanding of cosmology, we will conclude that it is most likely that the multiverse fits the non-prediction case, roughly because the causal linkages behind the multiverse (through quantum branching, inflation, or logical possibilities) are high temperature in some sense.  This is an again an empirical prediction about cosmology, though of course it's much harder to check and I'm much less confident in it than for (1).
  3. The world does not entirely fall into the non-prediction case.  As an example, it is perilous when advertisers have too much information and computation asymmetry with users, since that asymmetry can break non-prediction (more here).  A consequence of this is that it's good that people are studying decision theories with larger domains of applicability.
  4. AGI safety v1 can likely be made to fall into the non-prediction case.  This is another highly contingent claim, and requires some action to ensure, namely somehow telling AGIs to avoid the non-prediction case in appropriate senses (and designing them so that this is possible to do). (I expect to get jumped on for this one, but before you believe I'm just ignorant it might be worth asking Paul whether I'm just ignorant.) And I do mean v1; it's quite possible that v2 goes better if we have the option of not telling them this.

I do want to emphasize that as a consequence of (3), (4), uncertainty about (2), and a way tinier amount of uncertainty about (1),  I'm happy people are exploring this space.  But of course I'm also going to place a lower estimate on its importance as a consequence of the above.

Comment by irving on Can you control the past? · 2021-08-27T19:45:51.240Z · EA · GW

I'm not sure anyone else is going to be brave enough to state this directly, so I'll do it:

After reading some of this post (and talking to Paul a bunch and Scott a little), I remain unconfused about whether we can control the past.

Comment by irving on Should EA have a career-focused “Do the most good" pledge? · 2021-07-20T14:21:49.599Z · EA · GW

I think we might just end up in the disaster scenario where you get a bunch of karma. :)

Comment by irving on Should EA have a career-focused “Do the most good" pledge? · 2021-07-20T14:21:08.889Z · EA · GW

If we want to include a hits-based approach to careers, but also respect people not having EA goals as the exclusive life goal, I'd have a worry that signing this pledge is incompatible with staying in a career that the EA community subsequently decides is ineffective.  This could be true even if under the information known at the time of career choice the career looked like terrific expected value.

The actual wording of the pledge seems okay under this metric, as it only promises to "seek out ways to increase the impact of my career", so maybe this is fine as long as the pledge doesn't rise to "switch career" in all cases.

Comment by irving on Should EA have a career-focused “Do the most good" pledge? · 2021-07-20T14:16:27.975Z · EA · GW

Won't this comment get hidden soon?

Comment by irving on High Impact Careers in Formal Verification: Artificial Intelligence · 2021-06-05T21:06:46.373Z · EA · GW

As someone who's worked both in ML for formal verification with security motivations in mind, and (now) directly on AGI alignment, I think most EA-aligned folk who would be good at formal verification will be close enough to being good at direct AGI alignment that it will be higher impact to work directly on AGI alignment.  It's possible this would change in the future if there are a lot more people working on theoretically-motivated prosaic AGI alignment, but I don't think we're there yet.

Comment by irving on Concerns with ACE's Recent Behavior · 2021-04-19T21:45:37.707Z · EA · GW

I think that isn't the right counterfactual since I got into EA circles despite having only minimal (and net negative) impressions of EA-related forums.  So your claim is narrowly true, but if instead the counterfactual was if my first exposure to EA was the EA forum, then I think yes the prominence of this kind of post would have made me substantially less likely to engage.

But fundamentally if we're running either of these counterfactuals I think we're already leaving a bunch of value on the table, as expressed by EricHerboso's post about false dilemmas.

Comment by irving on Concerns with ACE's Recent Behavior · 2021-04-19T08:24:58.970Z · EA · GW

I bounce off posts like this.  Not sure if you'd consider me net positive or not. :)

Comment by irving on Technology Non-Profits I could volunteer for? · 2020-10-21T07:40:15.711Z · EA · GW

Not a non-profit, but since you mention AI and X-risk it's worth mentioning DeepMind, since program managers are core to how research is organized and led here:

Comment by irving on Quantum computing timelines · 2020-09-15T16:27:46.819Z · EA · GW

5% probability by 2039 seems way too confident that it will take a long time: is this intended to be a calibrated estimate, or does the number have a different meaning?

Comment by irving on Assessing the impact of quantum cryptanalysis · 2020-07-23T13:15:40.454Z · EA · GW

Yep, that’s the right interpretation.

In terms of hardware, I don’t know how Chrome did it, but at least on fully capable hardware (mobile CPUs and above) you can often bitslice to make almost any circuit efficient if it has to be evaluated in parallel. So my prior is that quite general things don’t need new hardware if one is sufficiently motivated, and would want to see the detailed reasoning before believing you can’t do it with existing machines.

Comment by irving on Assessing the impact of quantum cryptanalysis · 2020-07-23T08:37:26.045Z · EA · GW

This is a great document! I agree with the conclusions, though there are a couple factors not mentioned which seem important:

On the positive side, Google has already deployed post-quantum schemes as a test, and I believe the test was successful ( This was explicitly just a test and not intended as a standardization proposal, but it's good to see that it's practical to layer a post-quantum scheme on top of an existing scheme in a deployed system. I do think if we needed to do this quickly it would happen; the example of Google and Apple working together to get contact tracing working seems relevant.

On the negative side, there may be significant economic costs due to public key schemes deployed "at rest" which are impossible to change after the fact. This includes any encrypted communication that has been stored by an adversary across the time when we switch from pre-quantum to post-quantum, and also includes slow-to-build up applications like PGP webs of trust which are hard to quickly swap out. I don't think this changes the overall conclusions, since I'd expect the going-forwards cost to be larger, but it's worth mentioning.

Comment by irving on Intellectual Diversity in AI Safety · 2020-07-22T23:13:20.471Z · EA · GW

In the other direction, I started to think about this stuff in detail at the same time I started working with various other people and definitely learned a ton from them, so there wasn’t a long period where I had developed views but hadn’t spent months talking to Paul.

Comment by irving on Intellectual Diversity in AI Safety · 2020-07-22T23:07:42.265Z · EA · GW

We should also mention Stuart Russell here, since he’s certainly very aware of Bostrom and MIRI but has different detail views and is very grounded in ML.

Comment by irving on Intellectual Diversity in AI Safety · 2020-07-22T23:00:58.384Z · EA · GW

I think mostly I arrived with a different set of tools and intuitions, in particular a better sense for numerical algorithms (Paul has that too, of course) and thus intuition about how things should work with finite errors and how to build toy models that capture the finite error setting.

I do think a lot of the intuitions built by Bostrom and Yudkowsky are easy to fix into a form that works in the finite error model (though not all of it), so I don’t agree with some of the recent negativity about these classical arguments. That is, some fixing is required to make me like those arguments, but it doesn’t feel like the fixing is particularly hard.

Comment by irving on Intellectual Diversity in AI Safety · 2020-07-22T22:56:27.070Z · EA · GW

Well, part of my job is making new people that qualify, so yes to some extent. This is true both in my current role and in past work at OpenAI (e.g.,

Comment by irving on Intellectual Diversity in AI Safety · 2020-07-22T22:14:45.933Z · EA · GW

I started working on AI safety prior to reading Superintelligence and despite knowing about MIRI et al. since I didn‘t like their approach. So I don’t think I agree with your initial premise that the field is as much a monoculture as you suggest.

Comment by irving on A list of good heuristics that the case for AI X-risk fails · 2020-07-17T16:27:24.983Z · EA · GW

Yes, the mocking is what bothers me. In some sense the wording of the list means that people on both sides of the question could come away feeling justified without a desire for further communication: AGI safety folk since the arguments seem quite bad, and AGI safety skeptics since they will agree that some of these heuristics can be steel-manned into a good form.

Comment by irving on A list of good heuristics that the case for AI X-risk fails · 2020-07-16T11:54:27.291Z · EA · GW

As a meta-comment, I think it's quite unhelpful that some of these "good heuristics" are written as intentional strawmen where the author doesn't believe the assumptions hold. E.g., the author doesn't believe that there are no insiders talking about X-risk. If you're going to write a post about good heuristics, maybe try to make the good heuristic arguments actually good? This kind of post mostly just alienates me from wanting to engage in these discussions, which is a problem given that I'm one of the more senior AGI safety researchers.

Comment by irving on Will protests lead to thousands of coronavirus deaths? · 2020-06-04T15:38:50.668Z · EA · GW

“Quite possible” means I am making a qualitative point about game theory but haven’t done the estimates.

Though if one did want to do estimates, that ratio isn’t enough, as spread is superlinear as a function of the size of a group arrested and put in a single room.

Comment by irving on Will protests lead to thousands of coronavirus deaths? · 2020-06-04T15:36:12.413Z · EA · GW

Thanks, that’s all reasonable. Though to clarify, the game theory point isn’t about deterring police but about whether to let potential arrests and coronavirus consequences deter the protests themselves.

Comment by irving on Will protests lead to thousands of coronavirus deaths? · 2020-06-03T21:40:04.354Z · EA · GW

It's worth distinguishing between the protests causing spread and arresting protesters causing spread. It's quite possible more spread will be caused by the latter, and calling this spread "caused by the protests" is game theoretically similar to "Why are you hitting yourself?" My guess is that you're not intending to lump those into the same bucket, but it's worth separating them out explicitly given the title.

Comment by irving on Racial Demographics at Longtermist Organizations · 2020-05-01T20:15:41.616Z · EA · GW

One note: DeepMind is outside the set of typical EA orgs, but is very relevant from a longtermist perspective. It fairs quite a bit better on this measure in terms of leadership: e.g., everyone above me in the hierarchy is non-white.

Comment by irving on How do you talk about AI safety? · 2020-04-20T10:48:15.572Z · EA · GW

Fixed, thanks!

Comment by irving on How do you talk about AI safety? · 2020-04-19T23:08:11.089Z · EA · GW

This isn't a complete answer, but I think it is useful to have a list of prosaic alignment failures to make the basic issue more concrete. Examples include fairness (bad data leading to inferences that reflect bad values), recommendation systems going awry, etc. I think Catherine Olsson has a long list of these, but I don't know where it is. We should generically effect some sort of amplification as AI strength increases; it's conceivable the amplification is in the good direction, but at a minimum we shouldn't be confident of that.

If someone is skeptical about AIs getting smart enough that this matters, you can point to the various examples of existing superhuman systems (game playing programs, dog distinguishers that beat experts, medical imaging systems that beat teams of experts, etc.). Narrow superintelligence should already be enough to worry, depending on how such systems are deployed.