Person-affecting intuitions can often be money pumped 2022-07-07T12:23:16.394Z
DeepMind is hiring for the Scalable Alignment and Alignment Teams 2022-05-13T12:19:37.379Z
Shah and Yudkowsky on alignment failures 2022-02-28T19:25:12.896Z
Conversation on technology forecasting and gradualism 2021-12-09T19:00:00.000Z
rohinmshah's Shortform 2021-08-25T15:43:46.964Z
[AN #80]: Why AI risk might be solved without additional intervention from longtermists 2020-01-03T07:52:24.981Z
Summary of Stuart Russell's new book, "Human Compatible" 2019-10-19T19:56:52.174Z
Alignment Newsletter One Year Retrospective 2019-04-10T07:00:34.021Z
Thoughts on the "Meta Trap" 2016-12-20T21:36:39.498Z
EA Berkeley Spring 2016 Retrospective 2016-09-11T06:37:02.183Z
EAGxBerkeley 2016 Retrospective 2016-09-11T06:27:16.316Z


Comment by Rohin Shah (rohinmshah) on Summaries are underrated · 2022-09-04T20:37:27.654Z · EA · GW

You can either interpret low karma as a sign that the karma system is broken or that the summaries aren't sufficiently good. In hindsight I think you're right and I lean more towards the former --even though people tell me they like my newsletter, it doesn't actually get that much karma.

I thought you thought that karma was a decent measure since you suggested

Putting the summary up as a Forum post and seeing if it gets a certain number of karma

as a way to evaluate how good a summary is.

Comment by Rohin Shah (rohinmshah) on Summaries are underrated · 2022-09-04T08:10:59.760Z · EA · GW

Idk, in my particular case I'd say writing summaries was a major reason that I now have prestige / access to resources.

I think it's probably just hard to write good summaries; many of the summaries posted here don't get very much karma.

Comment by Rohin Shah (rohinmshah) on Summaries are underrated · 2022-09-02T11:52:22.425Z · EA · GW

I'm surprised that "write summaries" isn't one of the proposed concrete solutions. One person can do a lot.

Comment by Rohin Shah (rohinmshah) on The Repugnant Conclusion Isn't · 2022-08-23T18:44:25.124Z · EA · GW

Yeah, I don't think it's clearly unreasonable (though it's not my intuition).

I agree that suicide rates are not particularly strong evidence one way or the other.

Comment by Rohin Shah (rohinmshah) on The Repugnant Conclusion Isn't · 2022-08-23T09:20:43.427Z · EA · GW

I broadly agree that "what does a life barely worth living look like" matters a lot, and you could imagine setting it to be high enough that the repugnant conclusion doesn't look repugnant.

That being said, if you set it too high, there are other counterintuitive conclusions. For example, if you set it higher than people alive today (as it sounds like you're doing), then you are saying that people alive today have negative terminal value, and (if we ignore instrumental value) it would be better if they didn't exist.

Comment by Rohin Shah (rohinmshah) on A concern about the “evolutionary anchor” of Ajeya Cotra’s report on AI timelines. · 2022-08-22T08:10:17.116Z · EA · GW

So, did I or didn't I come across as unfriendly/hostile?

You didn't to me, but also (a) I know you in person and (b) I'm generally pretty happy to be in forceful arguments and don't interpret them as unfriendly / hostile, while other people plausibly would (see also combat culture). So really I think I'm the wrong person to ask.

So, given that I wanted to do both 1 and 2, would you think it would have been fine if I had just made them as separate comments, instead of mentioning 1 in passing in the thread on 2? Or do you think I really should have picked one to do and not done both?

I think you can do both, if it's clear that you're doing these as two separate things. (Which could be by having two different comments, or by signposting clearly in a single comment.)

In this particular situation I'm objecting to starting with (2), then switching to (1) after a critique without acknowledging that you had updated on (2) and so were going to (1) instead. When I see that behavior from a random Internet commenter I'm like "ah, you are one of the people who rationalizes reasons for beliefs, and so your beliefs do not respond to evidence, I will stop talking with you now". You want to distinguish yourself from the random Internet commenter.

(And if you hadn't updated on (2), then my objection would have been "you are bad at collaborative truth-seeking, you started to engage on one node and then you jumped to a totally different node before you had converged on that one node, you'll never make progress this way".)

Comment by Rohin Shah (rohinmshah) on A concern about the “evolutionary anchor” of Ajeya Cotra’s report on AI timelines. · 2022-08-21T07:53:26.109Z · EA · GW

Did I come across as unfriendly and hostile? I am sorry if so, that was not my intent.

No, that's not what I meant. I'm saying that the conversational moves you're making are not ones that promote collaborative truth-seeking.

Any claim of actual importance usually has a giant tree of arguments that back it up. Any two people are going to disagree on many different nodes within this tree (just because there are so many nodes). In addition, it takes a fair amount of effort just to understand and get to the same page on any one given node.

So, if you want to do collaborative truth-seeking, you need to have the ability to look at one node of the tree in isolation, while setting aside the rest of the nodes.

In general when someone is talking about some particular node (like "evolution anchor for AGI timelines"), I think you have two moves available:

  1. Say "I think the actually relevant node to our disagreement is <other node>"
  2. Engage with the details of that particular node, while trying to "take on" the views of the other person for the other nodes

(As a recent example, the ACX post on underpopulation does move 2 for Sections 1-8 and move 1 for Section 9.)

In particular, the thing not to do is to talk about the particular node, then jump around into other nodes where you have other disagreements, because that's a way to multiply the number of disagreements you have and fail to make any progress on collaborative truth-seeking. Navigating disagreements is hard enough that you really want to keep them as local / limited as possible.

(And if you do that, then other people will learn that they aren't going to learn much from you because the disagreements keep growing rather than progress being made, and so they stop trying to do collaborative truth-seeking with you.)

Of course sometimes you start doing move (2) and then realize that actually you think your partner is correct in their assessment given their views on the other nodes, and so you need to switch to move (1). I think in that situation you should acknowledge that you agree with their assessment given their other views, and then say that you still disagree on the top-level claim because of <other node>.

Comment by Rohin Shah (rohinmshah) on Concrete Advice for Forming Inside Views on AI Safety · 2022-08-18T09:54:58.216Z · EA · GW

Lots of thoughts on this post:

Value of inside views

Inside Views are Overrated [...]

The obvious reason to form inside views is to form truer beliefs

No? The reason to form inside views is that it enables better research, and I'm surprised this mostly doesn't feature in your post. Quoting past-you:

  • Research quality - Doing good research involves having good intuitions and research taste, sometimes called an inside view, about why the research matters and what’s really going on. This conceptual framework guides the many small decisions and trade-offs you make on a daily basis as a researcher
    • I think this is really important, but it’s worth distinguishing this from ‘is this research agenda ultimately useful’. This is still important in eg pure maths research just for doing good research, and there are areas of AI Safety where you can do ‘good research’ without actually reducing the probability of x-risk.

Quoting myself:

There’s a longstanding debate about whether one should defer to some aggregation of experts (an “outside view”), or try to understand the arguments and come to your own conclusion (an “inside view”). This debate mostly focuses on which method tends to arrive at correct conclusions. I am not taking a stance on this debate; I think it’s mostly irrelevant to the problem of doing good research. Research is typically meant to advance the frontiers of human knowledge; this is not the same goal as arriving at correct conclusions. If you want to advance human knowledge, you’re going to need a detailed inside view.

Let’s say that Alice is an expert in AI alignment, and Bob wants to get into the field, and trusts Alice’s judgment. Bob asks Alice what she thinks is most valuable to work on, and she replies, “probably robustness of neural networks”. What might have happened in Alice’s head?

Alice (hopefully) has a detailed internal model of risks from failures of AI alignment, and a sketch of potential solutions that could help avert those risks. Perhaps one cluster of solutions seems particularly valuable to work on. Then, when Bob asks her what work would be valuable, she has to condense all of the information about her solution sketch into a single word or phrase. While “robustness” might be the closest match, it’s certainly not going to convey all of Alice’s information.

What happens if Bob dives straight into a concrete project to improve robustness? I’d expect the project will improve robustness along some axis that is different from what Alice meant, ultimately rendering the improvement useless for alignment. There are just too many constraints and considerations that Alice is using in rendering her final judgment, that Bob is not aware of.

I think Bob should instead spend some time thinking about how a solution to robustness would mean that AI risk has been meaningfully reduced. Once he has a satisfying answer to that, it makes more sense to start a concrete project on improving robustness. In other words, when doing research, use senior researchers as a tool for deciding what to think about, rather than what to believe.

It’s possible that after all this reflection, Bob concludes that impact regularization is more valuable than robustness. The outside view suggests that Alice is more likely to be correct than Bob, given that she has more experience. If Bob had to bet which of them was correct, he should probably bet on Alice. But that’s not the decision he faces: he has to decide what to work on. His options probably look like:

  1. Work on a concrete project in robustness, which has perhaps a 1% chance of making valuable progress on robustness. The probability of valuable work is low since he does not share Alice’s models about how robustness can help with AI alignment.
  2. Work on a concrete project in impact regularization, which has perhaps a 50% chance of making valuable progress on impact regularization.

It’s probably not the case that progress in robustness is 50x more valuable than progress in impact regularization, and so Bob should go with (2). Hence the advice: build a gearsy, inside-view model of AI risk, and think about that model to find solutions.

(Though I should probably edit that section to also mention that Bob could execute on Alice's research agenda, if Alice is around to mentor him; and that would probably be more directly impactful than either of the other two options.)

Other meta thoughts on inside views

  • Relatedly, it's much more important to understand other people's views than to evaluate them - if I can repeat a full, gears-level model of someone's view back to them in a way that they endorse , that's a lot more valuable than figuring out how much I agree or disagree with their various beliefs and conclusions.
    • [...] having several models lets you compare and contrast them, figure out novel predictions, better engage with technical questions, do much better research, etc


I'm having trouble actually visualizing a scenario where Alice understands Bob's views (well enough to make novel predictions that Bob endorses, and say how Bob would update upon seeing various bits of evidence), but Alice is unable to evaluate Bob's view. Do you think this actually happens? Any concrete examples that I can try to visualize?

(Based on later parts of the post maybe you are mostly saying "don't reject an expert's view before you've tried really hard to understand it and make it something that does work", which I roughly agree with.)

Forming a "true" inside view - one where you fully understand something from first principles with zero deferring - is wildly impractical.

Yes, clearly true. I don't think anyone is advocating for this. I would say I have an inside view on bio anchors as a way to predict timelines, but I haven't looked into the data for Moore's Law myself and am deferring to others on that.

People often orient to inside views pretty unhealthily.


What fraction of people who are trying to build inside views do you think have these problems? (Relevant since I often encourage people to do it)

I know some people who do great safety relevant work, despite not having an inside view.

Hmm, I kind of agree in that there are people without inside views who are working on projects that other people with inside views are mentoring them on. I'm not immediately thinking of examples of people without inside views doing independent research that I would call "great safety relevant work".

(Unless perhaps you're counting e.g. people who do work on forecasting AGI, without having an inside view on how AGI leads to x-risk? I would say they have a domain-specific inside view on forecasting AGI.)

Forming inside views will happen naturally, and will happen much better alongside actually trying to do things and contribute to safety - you don't form them by locking yourself in your room for months and meditating on safety!

Idk, I feel like I formed my inside views by locking myself in my room for months and meditating on safety. This did involve reading things other people wrote, and talking with other junior grad students at CHAI who were also orienting to the problem. But I think it did not involve trying to do things and contributing to safety (I did do some of that but I think that was mostly irrelevant to me developing an inside view).

I do agree that if you work on topic X, you will naturally form an inside view on topic X as you get more experience with it. But in AI safety that would look more like "developing a domain-specific inside view on (say) learning from human feedback and its challenges" rather than an overall view on AI x-risk and how to address it. (In fact it seems like the way to get experience with an overall view on AI x-risk and how to address it is to meditate on it, because you can't just run experiments on AGI.)

Inside views lie on a spectrum. You will never form a "true" inside view, but conversely, not having a true inside view doesn't mean you're failing, or that you shouldn't even try. You want to aim to get closer to having an inside view! And making progress here is great and worthy

Strong +1

Aim for domain specific inside views. As an interpretability researcher, it's much more important to me to have an inside view re how to make interpretability progress and how this might interact with AI X-risk, than it is for me to have an inside view on timelines, the worth of conceptual alignment work, etc.

Yes, once you've decided that you're going to be an interpretability researcher, then you should focus on an interpretability-specific inside view. But "what should I work on" is also an important decision, and benefits from a broader inside view on a variety of topics. (I do agree though that it is a pretty reasonable strategy to just pick a domain based on deference and then only build a domain-specific inside view.)

Concrete advice

inside views are about zooming in. Concretely, in this framework, inside views look like starting with some high-level confusing claim, and then breaking it down into sub-claims, breaking those down into sub-claims, etc.

I agree that this is a decent way to measure your inside view -- like, "how big can you make this zooming-in tree before you hit a claim where you have to defer" is a good metric for "how detailed your inside view is".

I'm less clear on whether this is a good way to build an inside view, because a major source of difficulty for this strategy is in coming up with the right decomposition into sub-claims. Especially in the earlier stages of building an inside view, even your first and second levels of decomposition are going to be bad and will change over time. (For example, even for something like "why work on AI safety", Buck and I have different decompositions.) It does seem more useful once you've got a relatively fleshed out inside view, as a way to extend it further -- at this point I can in fact write out a tree of claims and expect that they will stay mostly the same (at the higher levels) after a few years, and so the leaves that I get to probably are good things to investigate.


These seem great and I'd strongly recommend people try them out :)

Comment by Rohin Shah (rohinmshah) on A concern about the “evolutionary anchor” of Ajeya Cotra’s report on AI timelines. · 2022-08-18T08:46:53.723Z · EA · GW

Meta: I feel like the conversation here and with Nuno's reply looks kinda like:

Nuno: People who want to use the evolutionary anchor as an upper bound on timelines should consider that it might be an underestimate, because the environment might be computationally costly.

You: It's not an underestimate: here's a plausible strategy by which you can simulate the environment.

Nuno / me: That strategy does not seem like it clearly supports the upper bound on timelines, for X, Y and Z reasons.

You: The evolution anchor doesn't matter anyway and barely affects timelines.

This seems bad:

  1. If you're going to engage with a subpoint that OP made that was meant to apply in some context (namely, getting an upper bound on timelines), stick within that context (or at least signpost that you're no longer engaging with the OP).
  2. I don't really understand why you bothered to do the analysis if you're not changing the analysis based on critiques that you agree are correct. (If you disagree with the critique then say that instead.)

If I understand you correctly, you are saying that the Evolution Anchor might not decrease in cost with time as fast as the various neural net anchors?

Yes, and in particular, the mechanism is that environment simulation cost might not decrease as fast as machine learning algorithmic efficiency. (Like, the numbers for algorithmic efficiency are anchored on estimates like AI and Efficiency, those estimates seem pretty unlikely to generalize to "environment simulation cost".)

her spreadsheet splits up algorithmic progress into different buckets for each anchor, so the spreadsheet already handles this nuance.

Just because someone could change the numbers to get a different output doesn't mean that the original numbers weren't flawed and that there's no value in pointing that out?

E.g. suppose I had the following timelines model:

Input: N, the number of years till AGI.

Output: Timeline is 2022 + N.

I publish a report estimating N = 1000, so that my timeline is 3022. If you then come and give a critique saying "actually N should be 10 for a timeline of 2032", presumably I shouldn't say "oh, my spreadsheet already allows you to choose your own value of N, so it handles that nuance".

To be clear, my own view is also that the evolution anchor doesn't matter, and I put very little weight on it and the considerations in this post barely affect my timelines. 

Comment by Rohin Shah (rohinmshah) on A concern about the “evolutionary anchor” of Ajeya Cotra’s report on AI timelines. · 2022-08-17T17:35:40.266Z · EA · GW

Note that this analysis is going to wildly depend on how progress on "environment simulation efficiency" compares to progress on "algorithmic efficiency". If you think it will be slower then the analysis above doesn't work.

Comment by Rohin Shah (rohinmshah) on EA can sound less weird, if we want it to · 2022-08-15T05:28:08.126Z · EA · GW

Idk, what are you trying to do with your illegible message?

If you're trying to get people to do technical research, then you probably just got them to work on a different version of the problem that isn't the one that actually mattered. You'd probably be better off targeting a smaller number of people with a legible message.

If you're trying to get public support for some specific regulation, then yes by all means go ahead with the illegible message (though I'd probably say the same thing even given longer timelines; you just don't get enough attention to convey the legible message).

TL;DR: Seems to depend on the action / theory of change more than timelines.

Comment by Rohin Shah (rohinmshah) on Existential risk pessimism and the time of perils · 2022-08-13T17:57:52.442Z · EA · GW

Oh idk, I don't think about bio x-risk much. (Though intuitively I feel like the lab leak story seems more likely than the malicious actor story; see terrorism is not effective and terrorism is not about terror; but that's entirely a guess and not one I'd stand behind.)

Comment by Rohin Shah (rohinmshah) on Existential risk pessimism and the time of perils · 2022-08-13T17:44:27.912Z · EA · GW

But it’s hard to see how we could chop 3−4 orders of magnitude off these threats just by settling Mars. Are we to imagine that [...] a dastardly group of scientists designs and unleashes a pandemic which kills every human living on earth, but cannot manage to transport the pathogen to other planets within our solar system?

Nitpick: I was under the impression that a substantial portion of the "engineered pandemic" x-risk (as estimated in The Precipice) came from accidental lab release of diseases that someone did gain-of-function release on. Space settlement would probably eliminate that x-risk.

(This doesn't really change any of your conclusions though.)

Comment by Rohin Shah (rohinmshah) on By how much should Meta's BlenderBot being really bad cause me to update on how justifiable it is for OpenAI and DeepMind to be making significant progress on AI capabilities? · 2022-08-12T08:05:37.839Z · EA · GW

At the last two EAGs I manned the DeepMind stall; we were promoting alignment roles (though I would answer questions about DeepMind more broadly, including questions about capabilities roles, if people asked them).

Comment by Rohin Shah (rohinmshah) on Why does no one care about AI? · 2022-08-12T07:34:13.462Z · EA · GW

I mostly buy the story in this post.

Comment by Rohin Shah (rohinmshah) on EA can sound less weird, if we want it to · 2022-08-12T06:28:48.632Z · EA · GW

Yeah I'm generally pretty happy with "make EA more legible".

Comment by Rohin Shah (rohinmshah) on EA can sound less weird, if we want it to · 2022-08-11T10:24:46.723Z · EA · GW

I agree that if the listener interprets "make EA sound less weird" as "communicate all of your reasoning accurately such that it leads the listener to have correct beliefs, which will also sound less weird", then that's better than no advice.

I don't think that's how the typical listener will interpret "make EA sound less weird"; I think they would instead come up with surface analogies that sound less weird but don't reflect the underlying mechanisms, which listeners might notice then leading to all the problems you describe.

I definitely don't think we should just say all of our conclusions without giving our reasoning.

(I think we mostly agree on what things are good to do and we're now hung up on this not-that-relevant question of "should we say 'make EA sound less weird'" and we probably should just drop it. I think both of us would be happier with the advice "communicate a nuanced, accurate view of EA beliefs" and that's what we should go with.)

Comment by Rohin Shah (rohinmshah) on EA can sound less weird, if we want it to · 2022-07-28T22:58:38.349Z · EA · GW

I mostly agree with this.

I think it sounds a lot less weird to say "an AI system might be hard to control and because of that, some experts think it could be really dangerous". This doesn't mean the same thing as "an AI might kill us all"

I think it sounds a lot less weird in large part because you aren't saying that the AI system might kill us all. "Really dangerous" could mean all sorts of things, including "the chess-playing robot mistakes a child's finger for a chess piece and accidentally breaks the child's finger". Once you pin it down to "kills all humans" it sounds a lot weirder.

I still do agree with the general point that as you explain more of your reasoning and cover more of the inferential gap, it sounds less weird.

What do you think of a caveated version of the advice like "make EA ideas sound less weird without being misleading"?

I still worry that people will not realize the ways they're being misleading -- I think they'll end up saying true but vague statements that get misinterpreted. (And I worry enough that I feel like I'd still prefer "no advice".)

Comment by Rohin Shah (rohinmshah) on EA can sound less weird, if we want it to · 2022-07-22T14:32:00.617Z · EA · GW

I strongly agree that you want to avoid EA jargon when doing outreach. Ideally you would use the jargon of your audience, though if you're talking to a broad enough audience that just means "plain English".

I disagree that "sounding weird" is the same thing (or even all that correlated with) "using jargon". For example,

When a lion kills and eats a deer in nature, the deer suffers. This is bad, and we should take action to prevent it.

This has no jargon, but still sounds weird to a ton of people. Similarly I think with AI risk the major weird part is the thing where the AI kills all the humans, which doesn't seem to me to depend much on jargon.

(If anything, I've found that with more jargon the ideas actually sound less weird. I think this is probably because the jargon obscures the meaning and so people can replace it with some different less weird meaning and assume you meant that. If you say "a goal-directed AI system may pursue some goal that we don't want leading to a catastrophic outcome" they can interpret you as saying "I'm worried about AIs mimicking human biases"; that doesn't happen when you say "an AI system may deliberately kill all the humans".)

Comment by Rohin Shah (rohinmshah) on Confused about "making people happy" vs. "making happy people" · 2022-07-17T06:06:28.583Z · EA · GW

This intuition can be money pumped. (Many commenters have said this already tbc.)

Comment by Rohin Shah (rohinmshah) on Person-affecting intuitions can often be money pumped · 2022-07-15T05:55:23.905Z · EA · GW

When I said that there isn't any adversarial action, I really should have said that you are safe and your learning process is under your control. By default I'm imagining a reflection process under which (a) all of your basic needs are met (e.g. you don't have to worry about starving), (b) you get to veto any particular experience happening to you, (c) you can build tools (or have other people build tools) that help with your reflection, including by building situations where you can have particular experiences, or by creating simulations of yourself that have experiences and can report back, (d) nothing is trying to manipulate or otherwise attack you (unless you specifically asked for the manipulation / attack), whether it is intelligently designed or natural, (e) you don't have any time pressure on finishing the reflection.

To be clear this is pretty stringent -- the current state of affairs where you regularly go around talking to people who try to persuade you of stuff doesn't meet the criteria.

So to restate, your claim is that in the absence of such adversaries, moral reasoning processes will in fact all converge to the same place.

Given conditions of safety and control over the reflection.

It's also not that I think every such process converge to exactly the same place. Rather I'd say that (a) I feel pretty intuitively happy about anything that you get to via such a process, so it seems fine to get any one of them and (b) there is enough convergence that it makes sense to view that as a target which we can approximate or move towards.

Even if we're exposed to wildly different experiences/observations/futures, the only thing that determines whether there's convergence or divergence is whether those experiences contain intelligent adversaries or not.

Part of the reflection process would be to seek out different experiences / observations, so I'm not sure they would be "wildly different".

What precisely about our moral reasoning process make them unlikely to be attacked by "natural" conditions but attackable by an intelligently designed one? [...] Could natural conditions ever play the equivalent of intelligent adversaries?

If they're attacked by natural conditions that violates my requirements too. (I don't think I ever said the adversarial action had to be "intelligently designed" instead of "natural"?)

In this process fundamentally everything that happens to you is meant to be your own choice. It's still possible that you make a mistake, e.g. you send a simulation of yourself to listen to a persuasive argument and then report back, the simulation is persuaded that <bad thing> is great, comes back and persuades you of it as well. (Obviously you've already considered that possibility and taken precautions, but it happens anyway; your precautions weren't sufficient.)  But it at least feels unlikely, e.g. you shouldn't expect to make a mistake (if you did, you should just not do the thing instead).

Comment by Rohin Shah (rohinmshah) on On Deference and Yudkowsky's AI Risk Estimates · 2022-07-14T06:52:10.092Z · EA · GW

Note that this wouldn't actually make a big change for AI alignment, since we don't know how to use more funding.

Funding isn't the only resource:

  • You'd change how you introduce people to alignment (since I'd guess that has a pretty strong causal impact on what worldviews they end up acting on). E.g. if you previously flipped a 10%-weighted coin to decide whether to send them down the Eliezer track or the other track, now you'd flip a 20%-weighted coin, and this straightforwardly leads to different numbers of people working on particular research agendas that the worldviews disagree about. Or if you imagine the community as a whole acting as an agent, you send 20% of the people to MIRI fellowships and the remainder to other fellowships (whereas previously it would be 10%).
  • (More broadly I think there's a ton of stuff you do differently in community building, e.g. do you target people who know ML or people who are good at math?)
  • You'd change what you used political power for. I don't particularly understand what policies Eliezer would advocate for but they seem different, e.g. I think I'm more keen on making sure particular alignment schemes for building AI systems get used and less keen on stopping everyone from doing stuff besides one secrecy-oriented lab that can become a leader.

Experts are coherent within the bounds of conventional study.

Yeah, that's what I mean.

Comment by Rohin Shah (rohinmshah) on Person-affecting intuitions can often be money pumped · 2022-07-13T18:58:56.775Z · EA · GW

I definitely think these processes can be attacked. When I say "what I'd approve of after learning and thinking more" I'm imagining that there isn't any adversarial action during the learning and thinking. If I were forcibly exposed to a persuasive sequence of words, or manipulated / tricked into think that some sequence of words informed of benign facts but were in fact selected to hack my mind, that no longer holds.

Comment by Rohin Shah (rohinmshah) on Person-affecting intuitions can often be money pumped · 2022-07-13T07:44:04.326Z · EA · GW

Nice, I hadn't seen this argument before.

Comment by Rohin Shah (rohinmshah) on Person-affecting intuitions can often be money pumped · 2022-07-11T13:03:02.500Z · EA · GW

If we want to move people away from person-affecting views entirely, we need other arguments.

Fwiw, I wasn't particularly trying to do this. I'm not super happy with any particular view on population ethics and I wouldn't be that surprised if the actual view I settled on after a long reflection was pretty different from anything that exists today, and does incorporate something vaguely like person-affecting intuitions.

I mostly notice that people who have some but not much experience with longtermism are often very aware of the Repugnant Conclusion and other objections to total utilitarianism, and conclude that actually person-affecting intuitions are the right way to go. In at least two cases they seemed to significantly reconsider upon presenting this argument. It seems to me like, amongst the population of people who haven't engaged with the population ethics literature, critiques of total utilitarianism are much better known than critiques of person affecting intuitions. I'm just trying to fix that discrepancy.

Also a minor terminological note, you've called your argument a Dutch book and so have I. But I think it would be more standard to call it a money pump.

Thanks, I've changed this.

Comment by Rohin Shah (rohinmshah) on Person-affecting intuitions can often be money pumped · 2022-07-11T08:04:42.307Z · EA · GW

I kinda suspect no ethics are future-proof in this sense

Which sense do you mean?

I like Holden's description:

I expect some readers will be very motivated by something like "Making ethical decisions that I will later approve of, after I've done more thinking and learning," while others will be more motivated by something like "Making ethical decisions that future generations won't find abhorrent."

Personally I'm thinking more of the former reason than the latter reason. I think "things I'd approve of after more thinking and learning" is reasonably precise as a definition, and seems pretty clearly like a thing that can be approximated.

Comment by Rohin Shah (rohinmshah) on Person-affecting intuitions can often be money pumped · 2022-07-08T21:55:15.584Z · EA · GW

I agree that most philosophical literature on person-affecting views ends up focusing on transitive views that can't be Dutch booked in this particular way (I think precisely because not many people want to defend intransitivity).

I think the typical person-affecting intuitions that people actually have are better captured by the view in my post than by any of these four families of views, and that's the audience to which I'm writing. This wasn't meant to be a serious engagement with the population ethics literature; I've now signposted that more clearly.

EDIT: I just ran these positions (except actualism, because I don't understand how you make decisions with actualism) by someone who isn't familiar with population ethics, and they found all of them intuitively ridiculous. They weren't thrilled with the view I laid out but they did find it more intuitive.

Comment by Rohin Shah (rohinmshah) on Person-affecting intuitions can often be money pumped · 2022-07-08T09:51:47.619Z · EA · GW

I don't think anyone should aim towards a local decision rule as an ideal, though, so there's an important question of whether your Dutch book argument undermines person-affecting views much at all relative to alternatives. Local decision rules will undweight option value, value of information, investments for the future, and basic things we need to do survive.

I think it's worth separating:

  1. How to evaluate outcomes
  2. How to make decisions under uncertainty
  3. How to make decisions over time

The argument in this post is just about (1). Admittedly I've illustrated it with a sequence of trades (which seems more like (3)) but the underlying principle is just that of transitivity which is squarely within (1). When thinking about (1) I'm often bracketing out (2) and (3), and similarly when I think about (2) or (3) I often ignore (1) by assuming there's some utility function that evaluates outcomes for me. So I'm not saying "you should make decisions using a local rule that ignores things like information value"; I'm more saying "when thinking about (1) it is often a helpful simplifying assumption to consider local rules and see how they perform".

It's plausible that an effective theory will actually need to think about these areas simultaneously -- in particular, I feel somewhat compelled by arguments from (2) that you need to have a bounded mechanism for (1), which is mixing those two areas together. But I think we're still at the stage where it makes sense to think about these things separately, especially for basic arguments when getting up to speed (which is the sort of post I was trying to write).

Comment by Rohin Shah (rohinmshah) on Person-affecting intuitions can often be money pumped · 2022-07-08T09:38:57.733Z · EA · GW

I can't tell what you mean by an objective axiology. It seems to me like you're equivocating between a bunch of definitions:

  1. An axiology is objective if it is universally true / independent of the decision-maker / not reliant on goals / implied by math. (I'm pointing to a cluster of intuitions rather than giving a precise definition.)
  2. An axiology is objective if it provides a decision for every possible situation you could be in. (I would prefer to call this a "complete" axiology, perhaps.)
  3. An axiology is objective if its decisions can be computed by taking each world, summing some welfare function over all the people in that world, and choosing the decision that leads to the world with a higher number. (I would prefer to call this an "aggregative" axiology, perhaps.)

Examples of definition 1:

The search for an objective axiology assumes that there’s a well-defined “impartial perspective” that determines what’s intrinsically good/valuable. [...]

if there was an objective axiology, wouldn’t the people who don’t orient their goals around that axiology be making a mistake?

Examples of definition 2:

Without an objective axiology, the placeholder “do what’s most moral/altruistic” is under-defined. [...]

I think there's an incongruence behind how people think of population ethics in the standard way. (The standard way being something like: look for an objective axiology, something that has "intrinsic value," then figure out how we are to relate to that value/axiology and whether to add extra principles around it.)

Examples of definition 3:

we can note that population ethics has two separate perspectives: that of existing people/beings and that of newly created people/beings. (Without an objective axiology, these perspectives cannot be unified.)

I don't think I'm relying on an objective-axiology-by-definition-1. Any time I say "good" you can think of it as "good according to the decision-maker" rather than "objectively good". I think this doesn't affect any of my arguments.

It is true that I am imagining an objective-axiology-by-definition-2 (which I would perhaps call a "complete axiology"). I don't really see from your comment why this is a problem.

I agree this is "maximally ambitious morality" rather than "minimal morality". Personally if I were designing "minimal morality" I'd figure out what "maximally ambitious morality" would recommend we design as principles that everyone could agree on and follow, and then implement those. I'm skeptical that if I ran through such a procedure I'd end up choosing person-affecting intuitions (in the sense of "Making People Happy, Not Making Happy People", I think I plausibly would choose something along the lines of "if you create new people make sure they have lives well-beyond-barely worth living"). Other people might differ from me, since they have different goals, but I suspect not.

I agree that if your starting point is "I want to ensure that people's preferences are satisfied" you do not yet have a complete axiology, and in particular there's an ambiguity about how to make decisions about which people to create. If this is your starting point then I think my post is saying "if you resolve this ambiguity in this particular way, you get Dutch booked". I agree that you could avoid the Dutch book by resolving the ambiguity as "I will only create individuals whose preferences I have satisfied as best as I can".

Comment by Rohin Shah (rohinmshah) on Person-affecting intuitions can often be money pumped · 2022-07-08T06:26:03.364Z · EA · GW

Added an FAQ:

Q. This is a very consequentialist take on person-affecting views. Wouldn't a non-consequentialist version (e.g. this comment) make more sense? 

Personally I think of non-consequentialist theories as good heuristics that approximate the hard-to-compute consequentialist answer, and so I often find them irrelevant when thinking about theories applied in idealized thought experiments. If you are instead sympathetic to non-consequentialist theories as being the true answer, then the argument in this post probably shouldn't sway you too much. If you are in a real-world situation where you have person-affecting intuitions, those intuitions are there for a reason and you probably shouldn't completely ignore them until you know that reason.

In your millionaire example, I think the consequentialist explanation is "if people generally treat it as bad when Bob takes action A with mildly good first-order consequences  when Bob could instead have taken action B with much better first-order consequences, that creates an incentive through anticipated social pressure for Bob to take action B rather than A when otherwise Bob would have taken A rather than B".

(Notably, this reason doesn't apply in the idealized thought experiment where no one ever observes your decisions and there is no difference between the three worlds other than what was described.)

Comment by Rohin Shah (rohinmshah) on Person-affecting intuitions can often be money pumped · 2022-07-08T05:39:35.420Z · EA · GW

I was imagining a local decision rule that was global in only one respect, i.e. choosing which people to consider based on who would definitely exist regardless of what decision-making happens. But in hindsight I think this is an overly complicated rule that no one is actually thinking about; I'll delete it from the post.

Comment by Rohin Shah (rohinmshah) on Person-affecting intuitions can often be money pumped · 2022-07-08T05:31:12.032Z · EA · GW

I don't think the case for caring about Dutch books is "maybe I'll get Dutch booked in the real world". I like the Future-proof ethics series on why to care about these sorts of theoretical results.

I definitely agree that there are issues with total utilitarianism as well.

Comment by Rohin Shah (rohinmshah) on Person-affecting intuitions can often be money pumped · 2022-07-08T05:26:00.894Z · EA · GW

In practice, I think those with person-affecting views should refuse moves like trade 1 if they "expect" to subsequently make moves like trade 2, because World 1 > World 3.

You can either have a local decision rule that doesn't take into account future actions (and so excludes this sort of reasoning), or you can have a global decision rule that selects an entire policy at once. I was talking about the local kind.

You could have a global decision rule that compares worlds and ignores happy people who will only exist in some of the worlds. In that case I'd refer you to Chapter 4 of On the Overwhelming Importance of Shaping the Far Future.

EDIT: Added as an FAQ.

(Nitpick: Under the view I laid out World 1 is not better than World 3? You're indifferent between the two.)

Comment by Rohin Shah (rohinmshah) on Person-affecting intuitions can often be money pumped · 2022-07-07T20:05:15.784Z · EA · GW

You could extend the result to include incompleteness, intransitivity, dependence on irrelevant alternatives or being in principle Dutch bookable/money pumpable as alternative "bullets" you could bite on top of the 6 conditions.

Yeah, this is what I had in mind.

Comment by Rohin Shah (rohinmshah) on Person-affecting intuitions can often be money pumped · 2022-07-07T20:00:24.683Z · EA · GW

Oh, to be clear, my response to RedStateBlueState's comment was considering a new still-consequentialist view, that wouldn't take trade 3. None of the arguments in this post are meant to apply to e.g. deontological views. I've clarified this in my original response.

Comment by Rohin Shah (rohinmshah) on Person-affecting intuitions can often be money pumped · 2022-07-07T15:04:52.388Z · EA · GW

I totally agree this is a consequentialist critique. I don't think that negates the validity of the critique.

From a non-consequentialist point of view, whether a "no people to lots of happy people" move (like any other move) is good or not depends on other considerations, like the nature of the action, our duties or virtue. I guess what I want to say is that "going from state A to state B"-type thinking is evaluating world states in an outcome-oriented way, and that just seems like the wrong level of analysis for those other philosophies.

Okay, but I still don't know what the view says about x-risk reduction (the example in my previous comment)?

Comment by Rohin Shah (rohinmshah) on Person-affecting intuitions can often be money pumped · 2022-07-07T14:31:37.447Z · EA · GW

Oh, the view here only says that it's fine to prevent a happy person from coming into existence, not that it's fine to kill an already existing person.

Comment by Rohin Shah (rohinmshah) on Person-affecting intuitions can often be money pumped · 2022-07-07T13:55:39.308Z · EA · GW

Responded in the other comment thread.

Comment by Rohin Shah (rohinmshah) on Person-affecting intuitions can often be money pumped · 2022-07-07T13:54:46.450Z · EA · GW

Yeah, you could modify the view I laid out to say that moving from "happy person" to "no person" has a disutility equal in magnitude to the welfare that the happy person would have had. This new view can't be Dutch booked because it never takes trades that decrease total welfare.

My objection to it is that you can't use it for decision-making because it depends on what the "default" is. For example, if you view x-risk reduction as preventing a move from "lots of happy people to no people" this view is super excited about x-risk reduction, but if you view x-risk reduction as a move from "no people to lots of happy people" this view doesn't care.

(You can make a similar objection to the view in the post though it isn't as stark. In my experience, people's intuitions are closer to the view in the post, and they find the Dutch book argument at least moderately convincing.)

Comment by Rohin Shah (rohinmshah) on Strategic Perspectives on Long-term AI Governance: Introduction · 2022-07-03T10:11:15.055Z · EA · GW

I'm curious where the plan "convey an accurate assessment of misalignment risks to everyone, expect that they act sensibly based on that, which leads to low x-risk" fits here.

(I'm not saying I endorse this plan.)

Comment by Rohin Shah (rohinmshah) on On Deference and Yudkowsky's AI Risk Estimates · 2022-06-29T07:50:14.908Z · EA · GW

You were arguing above that the difference between your and Eliezer's views makes much more than a 2x difference;

I was arguing that EV estimates have more than a 2x difference; I think this is pretty irrelevant to the deference model you're suggesting (which I didn't know you were suggesting at the time).

do you now agree that, on my account of deference, a big change in the deference-weight you assign to Eliezer plausibly leads to a much smaller change in your policy from the perspective of other worldviews, because the Eliezer-worldview trades off influence over most parts of the policy for influence over the parts that the Eliezer-worldview thinks are crucial and other policies don't?

No, I don't agree with that. It seems like all the worldviews are going to want resources (money / time) and access to that is ~zero-sum. (All the worldviews want "get more resources" so I'm assuming you're already doing that as much as possible.) The bargaining helps you avoid wasting resources on counterproductive fighting between worldviews, it doesn't change the amount of resources each worldview gets to spend.

Going from allocating 10% of your resources to 20% of your resources to a worldview seems like a big change. It's a big difference if you start with twice as much money / time as you otherwise would have, unless there just happens to be a sharp drop in marginal utility of resources between those two points for some reason.

Maybe you think that there are lots of things one could do that have way more effect than "redirecting 10% of one's resources" and so it's not a big deal? If so can you give examples?

I think calibrated credences are badly-correlated with expected future impact

I agree overconfidence is common and you shouldn't literally calculate a Brier score to figure out who to defer to.

I agree that directionally-correct beliefs are better correlated than calibrated credences.

When I say "evaluate beliefs" I mean "look at stated beliefs and see how reasonable they look overall, taking into account what other people thought when the beliefs were stated" and not "calculate a Brier score"; I think this post is obviously closer to the former than the latter.

I agree that people's other goals make it harder to evaluate what their "true beliefs" are, and that's one of the reasons I say it's only 3/10 correlation.

I think coherence is very well-correlated with expected future impact (like, 5/10), because impact is heavy-tailed and the biggest sources of impact often require strong, coherent views. I don't think it's that hard to evaluate in hindsight, because the more coherent a view is, the more easily it's falsified by history.

Re: correlation, I was implicitly also asking the question "how much does this vary across experts". Across the general population, maybe coherence is 7/10 correlated with expected future impact; across the experts that one would consider deferring to I think it is more like 2/10, because most experts seem pretty coherent (within the domains they're thinking about and trying to influence) and so the differences in impact depend on other factors.

Re: evaluation, it seems way more common to me that there are multiple strong, coherent, conflicting views that all seem compelling (see epistemic learned helplessness), which do not seem to have been easily falsified by history (in sufficiently obvious manner that everyone agrees which one is false).

This too is in large part because we're looking at experts in particular. I think we're good at selecting for "enough coherence" before we consider someone an expert (if anything I think we do it too much in the "public intellectual" space), and so evaluating coherence well enough to find differences between experts ends up being pretty hard.

I think "hypothetical impact of past policies" is not that hard to evaluate.  E.g. in Eliezer's case the main impact is "people do a bunch of technical alignment work much earlier", which I think we both agree is robustly good.

I feel like looking at any EA org's report on estimation of their own impact makes it seem like "impact of past policies" is really difficult to evaluate?

Eliezer seems like a particularly easy case, where I agree his impact is probably net positive from getting people to do alignment work earlier, but even so I think there's a bunch of questions that I'm uncertain about:

  • How bad is it that some people completely dismiss AI risk because they encountered Eliezer and found it off putting? (I've explicitly heard something along the lines of "that crazy stuff from Yudkowsky" from multiple ML researchers.)
  • How many people would be working on alignment without Eliezer's work? (Not obviously hugely fewer, Superintelligence plausibly still gets published, Stuart Russell plausibly still goes around giving talks about value alignment and its importance.)
  • To what extent did Eliezer's forceful rhetoric (as opposed to analytic argument) lead people to focus on the wrong problems?
Comment by Rohin Shah (rohinmshah) on Examples of someone admitting an error or changing a key conclusion · 2022-06-28T09:00:07.747Z · EA · GW

Shutting down No Lean Season: 

Comment by Rohin Shah (rohinmshah) on On Deference and Yudkowsky's AI Risk Estimates · 2022-06-28T08:45:14.655Z · EA · GW

Meta: This comment (and some previous ones) get a bunch into "what should deference look like", which is interesting, but I'll note that most of this seems unrelated to my original claim, which was just "deference* seems important for people making decisions now, even if it isn't very important in practice for researchers", in contradiction to a sentence on your top-level comment. Do you now agree with that claim?

*Here I mean deference in the sense of how-much-influence-various-experts-have-over-your-actions. I initially called this "credences" because I thought you were imagining a model of deference in which literal credences determined how much influence experts had over your actions.

Your procedure is non-robust in the sense that, if Kurzweil transitions from total indifference to thinking that one policy is better by epsilon, he'll throw his full weight behind that policy.

Agreed, but I'm not too worried about that. It seems like you'll necessarily have some edge cases like this; I'd want to see an argument that the edge cases would be common before I switch to something else.

The chain of approximations could look something like:

  1. The correct thing to do is to consider all actions / policies and execute the one with the highest expected impact.
  2. First approximation: Since there are so many actions / policies, it would take too long to do this well, and so we instead take a shortcut and consider only those actions / policies that more experienced people have thought of, and execute the ones with the highest expected impact. (I'm assuming for now that you're not in the business of coming up with new ideas of things to do.)
  3. Second approximation: Actually it's still pretty hard to evaluate the expected impact of the restricted set of actions / policies, so we'll instead do the ones that the experts say is highest impact. Since the experts disagree, we'll divide our resources amongst them, in accordance with our predictions of which experts have highest expected impact across their portfolios of actions. (This is assuming a large enough pile of resources that it makes sense to diversify due to diminishing marginal returns for any one expert.)
  4. Third approximation: Actually expected impact of an expert's portfolio of actions is still pretty hard to assess, we can save ourselves decision time by choosing weights for the portfolios according to some proxy that's easier to assess.

It seems like right now we're disagreeing about  proxies we could use in the third approximation. It seems to me like proxies should be evaluated based on how close they reach the desired metric (expected future impact) in realistic use cases, which would involve both (1) how closely they align with "expected future impact" in general and (2) how easy they are to evaluate. It seems to me like you're thinking mostly of (1) and not (2) and this seems weird to me; if you were going to ignore (2) you should just choose "expected future impact". Anyway, individual proxies and my thoughts on them:

  1. Beliefs / credences: 5/10 on easy to evaluate (e.g. Ben could write this post). 3/10 on correlation with expected future impact. Doesn't take into account how much impact experts think their policies could have (e.g. the Kurzweil example above).
  2. Coherence:  3/10 on easy to evaluate (seems hard to do this without being an expert in the field). 2/10 on correlation with expected future impact (it's not that hard to have wrong coherent worldviews, see e.g. many pop sci books).
  3. Hypothetical impact of past policies: 1/10 on easy to evaluate (though it depends on the domain). 7/10 on correlation with expected future impact (it's not 9/10 or 10/10 because selection bias seems very hard to account for).

As is almost always the case with proxies, I would usually use an intuitive combination of all the available proxies, because that seems way more robust than relying on any single one. I am not advocating for only relying on beliefs.

Which I claim is an accurate description of what I was doing, and what Ben wasn't

I get the sense that you think I'm trying to defend "this is a good post and has no problems whatsoever"? (If so, that's not what I said.)

Summarizing my main claims about this deference model that you might disagree with:

  1. In practice, an expert's beliefs / credences will be relevant information into deciding what weight to assign them,
  2. Ben's post provides relevant information about Eliezer's beliefs (note this is not taking a stand on other aspects of the post, e.g. the claim about how much people should defer to Eliezer)
  3. The weights assigned to experts are important / valuable to people who need to make decisions now (but they are usually not very important / valuable to researchers).
Comment by Rohin Shah (rohinmshah) on On Deference and Yudkowsky's AI Risk Estimates · 2022-06-25T09:07:54.243Z · EA · GW

First change:

In your Kurzweil example I think the issue is not that you assigned weights based on hypothetical-Kurzweil's beliefs, but that hypothetical-Kurzweil is completely indifferent over policies. I think the natural fix is "moral parliament" style decision making where the weights can still come from beliefs but they now apply more to preferences-over-policies. In your example hypothetical-Kurzweil has a lot of weight but never has any preferences-over-policies so doesn't end up influencing your decisions at all.

That being said, I agree that if you can evaluate quality of past recommended policies well, without a ton of noise, that would be a better signal than accuracy of beliefs. This just seems extremely hard to do, especially given the selection bias in who comes to your attention in the first place, and idk how I'd do it for Eliezer in any sane way. (Whereas you get to see people state many more beliefs and so there are a lot more data points that you can evaluate if you look at beliefs.)

But notably, the quality of past recommended policies is often not very sensitive to credences!

I think you're thinking way too much about credences-in-particular. The relevant notion is not "credences", it's that-which-determines-how-much-influence-the-person-has-over-your-actions. In this model of deference the relevant notion is the weights assigned in step 2 (however you calculate them), and the message of Ben's post would be "I think people assign too high a weight to Eliezer", rather than anything about credences. I don't think either Ben or I care particularly much about credences-based-on-deference except inasmuch as they affect your actions.

I do agree that Ben's post looks at credences that Eliezer has given and considers those to be relevant evidence for computing what weight to assign Eliezer. You could take a strong stand against using people's credences or beliefs to compute weights, but that is at least a pretty controversial take (that I personally don't agree with), and it seems different from what you've been arguing so far (except possibly in the parent comment).

Second change:

This change seems fine. Personally I'm pretty happy with a rough heuristic of "here's how I should be splitting my resources across worldviews" and then going off of intuitive "how much does this worldview care about this decision" + intuitive trading between worldviews rather than something more fleshed out and formal but that seems mostly a matter of taste.

Comment by Rohin Shah (rohinmshah) on On Deference and Yudkowsky's AI Risk Estimates · 2022-06-24T09:59:56.511Z · EA · GW

Responding to other more minor points:

What do you mean "he doesn't expect this sort of thing to happen"?

I mean that he predicts that these costly actions will not be taken despite seeming good to him.

Because neither Ben nor myself was advocating for this.

I think it's also important to consider Ben's audience. If I were Ben I'd be imagining my main audience to be people who give significant deference weight to Eliezer's actual worldview. If you're going to write a top-level comment arguing against Ben's post it seems pretty important to engage with the kind of deference he's imagining (or argue that no one actually does that kind of deference, or that it's not worth writing to that audience, etc).

(Of course, I could be wrong about who Ben imagines his audience to be.)

Comment by Rohin Shah (rohinmshah) on On Deference and Yudkowsky's AI Risk Estimates · 2022-06-24T09:52:04.979Z · EA · GW

Okay, my new understanding of your view is that you're suggesting that (if one is going to defer) one should:

  1. Identify a panel of people to defer to
  2. Assign them weights based on how good they seem (e.g. track record, quality and novelty of ideas, etc)
  3. Allocate resources to [policies advocated by person X] in proportion to [weight assigned to person X].

I agree that (a) this is a reasonable deference model and (b) under this deference model most of my calculations and questions in this thread don't particularly make sense to think about.

However, I still disagree with the original claim I was disagreeing with:

when it comes to making big-picture forecasts, the main value of deference is in helping us decide which ideas and arguments to take seriously, rather than the specific credences we should place on them, since the space of ideas is so large.

Even in this new deference model, it seems like the specific weights chosen in step 2 are a pretty big deal (which seem like the obvious analogues of "credences", and the sort of thing that Ben's post would influence). If you switch from a weight of 0.5 to a weight of 0.3, that's a reallocation of 20% of your resources, which is pretty large!

Comment by Rohin Shah (rohinmshah) on On Deference and Yudkowsky's AI Risk Estimates · 2022-06-23T10:41:35.505Z · EA · GW

It doesn't make sense to defer to Eliezer's estimate of the relative importance of AI without also accounting for his estimate of the relative tractability of funding AI, which I infer he thinks is very low.

There's lots of things you can do under Eliezer's worldview that add dignity points, like paying relevant people millions of dollars to spend a week really engaging with the arguments, or trying to get whole-brain emulation before AGI. My understanding is that he doesn't expect those sorts of things to happen.

 I claim that the value of this organization is mainly determined by the likelihood that Eliezer is correct about a few key claims which underlie his research agenda. Suppose he thinks that's 90% likely and I think that's 10% likely.

This seems like a crazy way to do cost-effectiveness analyses.

Like, if I were comparing deworming to GiveDirectly, would I be saying "well, the value of deworming is mainly determined by the likelihood that the pro-deworming people are right, which I estimate is 70% but you estimate is 50%, so there's only a 1.4x difference"? Something has clearly gone wrong here.

It also feels like this reasoning implies that no EA action can be > 10x more valuable than any other action that an EA critic thinks is good? Since you assign a 90% chance that the EA is right, and the critic thinks there's a 10% chance of that, so there's only a 9x gap? And then once you do all of your adjustments it's only 2x? Why do we even bother with cause prioritization under this worldview?

I don't have a fleshed out theory of how and when to defer, but I feel pretty confident that even our intuitive pretheoretic deference should not be this sort of thing, and should be the sort of thing that can have orders of magnitude of difference between actions.

(One major thing is that I think you should be comparing between two actions, rather than evaluating an action by itself, which is why I compared to "all other alignment work".)

The debate is between two responses to that:

a)  Give him less deference weight than the cautious, sober, AI safety people who make few novel claims but are better-calibrated (which is what Ben advocates).

b) Try to adjust for his overconfidence and then give significant deference weight to a version of his worldview that isn't overconfident.

I don't see why you are not including "c) give significant deference weight to his actual worldview", which is what I'd be inclined to do if I didn't have significant AI expertise myself and so was trying to defer.

(Aside: note that Ben said "they shouldn’t defer to him more than they would defer to anyone else who seems smart and has spent a reasonable amount of time thinking about AI risk", which is slightly different from your rephrasing, but that's a nitpick)

Also, the numbers seem pretty wild; maybe a bit uncharitable to ascribe to Eliezer the view that his research would be 3 OOM more valuable than the rest of the field combined?

¯\_(ツ)_/¯ Both the 10% and 0.01% (= 100% - 99.99%) numbers are ones I've heard reported (though both second-hand, not directly from Eliezer), and it also seems consistent with other things he writes. It seems entirely plausible that people misspoke or misremembered or lied, or that Eliezer was reporting probabilities "excluding miracles" or something else that makes these not the right numbers to use.

I'm not trying to be "charitable" to Eliezer, I'm trying to predict his views accurately (while noting that often people predict views inaccurately by failing to be sufficiently charitable). Usually when I see people say things like "obviously Eliezer meant this more normal, less crazy thing" they seem to be wrong.

Rob thinking that it's not actually 99.99% is in fact an update for me.

Comment by Rohin Shah (rohinmshah) on On Deference and Yudkowsky's AI Risk Estimates · 2022-06-23T10:04:50.328Z · EA · GW

Even at 95% you get OOMs of difference by my calculations, though significantly fewer OOMs, so this doesn't seem like the main crux.

Comment by Rohin Shah (rohinmshah) on On Deference and Yudkowsky's AI Risk Estimates · 2022-06-22T08:46:32.631Z · EA · GW

Like, suppose you think that Eliezer's credences on his biggest claims are literally 2x higher than they should be, even for claims where he's 90% confident.

I think differences between Eliezer + my views often make way more than a 2x difference to the bottom line. I'm not sure why you're only considering probabilities on specific claims; when I think of "deferring" I also imagine deferring on estimates of usefulness of various actions, which can much more easily have OOMs of difference.

(Fwiw I also think Eliezer is way more than 2x too high for probabilities on many claims, though I don't think that matters much for my point.)

Taking my examples:

should funders reallocate nearly all biosecurity money to AI?

Since Eliezer thinks something like 99.99% chance of doom from AI, that reduces cost effectiveness of all x-risk-targeted biosecurity work by a factor of 10,000x (since only in 1 in 10,000 worlds does the reduced bio x-risk matter at all), whereas if you have < 50% of doom from AI (as I do) then that's a discount factor of < 2x on x-risk-targeted biosecurity work. So that's almost 4 OOMs of difference.

What should AI-focused community builders provide as starting resources?

Eliezer seems very confident that a lot of existing alignment work is useless. So if you imagine taking a representative set of such papers as starting resources, I'd imagine that Eliezer would be at < 1% on "this will help the person become an effective alignment researcher" whereas I'd be at > 50% (for actual probabilities I'd want a better operationalization), leading to a >50x difference in cost effectiveness.

(And if you compare against the set of readings Eliezer would choose, I'd imagine the difference becomes even greater -- I could imagine we'd each think the other's choice would be net negative.)

Should there be an organization dedicated to solving Eliezer's health problems? What should its budget be?

I don't have a citation but I'm guessing that Eliezer thinks that with more energy and project management skills he could make a significant dent in x-risk (perhaps 10 percentage points), while thinking the rest of the alignment field if fully funded can't make a dent of more than 0.01 percentage points, suggesting that "improve Eliezer's health + project management skills" is 3 OOM more important than "all other alignment work" (saying nothing about tractability, which I don't know enough to evaluate). Whereas I'd have it at, idk, 1-2 OOM less important, for a difference of 4-5 OOMs.

Should people try to solve technical AI alignment or try to, idk, create a culture of secrecy within AGI labs?

This one is harder to make up numbers for but intuitively it seems like there should again be many OOMs of difference, primarily because we differ by  many OOMs on "regular EAs trying to solve technical AI alignment" but roughly agree on the value of "culture of secrecy".

I realize I haven't engaged with the abstract points you made. I think I mostly just don't understand them and currently they feel like they have to be wrong given the obvious OOMs of difference in all of the examples I gave. If you still disagree it would be great if you could explain how your abstract points play out in some of my concrete examples.

Comment by Rohin Shah (rohinmshah) on On Deference and Yudkowsky's AI Risk Estimates · 2022-06-20T16:00:30.657Z · EA · GW

when it comes to making big-picture forecasts, the main value of deference is in helping us decide which ideas and arguments to take seriously, rather than the specific credences we should place on them, since the space of ideas is so large.

This seems like an overly research-centric position.

When your job is to come up with novel relevant stuff in a domain, then I agree that it's mostly about "which ideas and arguments to take seriously" rather than specific credences.

When your job is to make decisions right now, the specific credences matter. Some examples:

  • Any cause prioritization decision, e.g. should funders reallocate nearly all biosecurity money to AI?
  • What should AI-focused community builders provide as starting resources?
  • Should there be an organization dedicated to solving Eliezer's health problems? What should its budget be?
  • Should people try to solve technical AI alignment or try to, idk, create a culture of secrecy within AGI labs?