The alignment problem from a deep learning perspective 2022-08-11T03:18:36.435Z
Moral strategies at different capability levels 2022-07-27T20:20:55.877Z
Making decisions using multiple worldviews 2022-07-13T19:15:28.870Z
Three intuitions about EA: responsibility, scale, self-improvement 2022-04-15T07:55:07.003Z
Beyond micromarriages 2022-04-01T16:38:30.269Z
Ngo and Yudkowsky on scientific reasoning and pivotal acts 2022-02-21T17:00:01.453Z
Some thoughts on vegetarianism and veganism 2022-02-14T02:34:47.520Z
Examples of pure altruism towards future generations? 2022-01-26T16:42:40.032Z
Ngo's view on alignment difficulty 2021-12-14T19:03:07.377Z
Conversation on technology forecasting and gradualism 2021-12-09T19:00:00.000Z
What are some success stories of grantmakers beating the wider EA community? 2021-12-07T02:07:28.162Z
Ngo and Yudkowsky on AI capability gains 2021-11-19T01:54:56.512Z
Ngo and Yudkowsky on alignment difficulty 2021-11-15T22:47:46.125Z
Is there anyone working full-time on helping EAs address mental health problems? 2021-11-01T03:42:30.027Z
AGI Safety Fundamentals curriculum and application 2021-10-20T21:45:24.814Z
Suggested norms about financial aid for EAG(x) 2021-09-20T15:13:56.284Z
What are your main reservations about identifying as an effective altruist? 2021-03-30T09:55:03.249Z
Some thoughts on risks from narrow, non-agentic AI 2021-01-19T00:07:23.075Z
My evaluations of different domains of Effective Altruism 2021-01-15T23:15:17.010Z
Clarifying the core of Effective Altruism 2021-01-15T23:02:27.500Z
Lessons from my time in Effective Altruism 2021-01-15T21:54:54.565Z
Scope-sensitive ethics: capturing the core intuition motivating utilitarianism 2021-01-15T16:22:14.094Z
What foundational science would help produce clean meat? 2020-11-13T13:55:27.566Z
AGI safety from first principles 2020-10-21T17:42:53.460Z
EA reading list: utilitarianism and consciousness 2020-08-07T19:32:02.050Z
EA reading list: other reading lists 2020-08-04T14:56:28.422Z
EA reading list: miscellaneous 2020-08-04T14:42:44.119Z
EA reading list: futurism and transhumanism 2020-08-04T14:29:52.883Z
EA reading list: Paul Christiano 2020-08-04T13:36:51.331Z
EA reading list: global development and mental health 2020-08-03T11:53:10.890Z
EA reading list: Scott Alexander 2020-08-03T11:46:17.315Z
EA reading list: replaceability and discounting 2020-08-03T10:10:54.968Z
EA reading list: longtermism and existential risks 2020-08-03T09:52:41.256Z
EA reading list: suffering-focused ethics 2020-08-03T09:40:38.142Z
EA reading list: EA motivations and psychology 2020-08-03T09:24:07.430Z
EA reading list: cluelessness and epistemic modesty 2020-08-03T09:23:44.124Z
EA reading list: population ethics, infinite ethics, anthropic ethics 2020-08-03T09:22:15.461Z
EA reading list: moral uncertainty, moral cooperation, and values spreading 2020-08-03T09:21:36.288Z
richard_ngo's Shortform 2020-06-13T10:46:26.847Z
What are the key ongoing debates in EA? 2020-03-08T16:12:34.683Z
Characterising utopia 2020-01-02T00:24:23.248Z
Technical AGI safety research outside AI 2019-10-18T15:02:20.718Z
Does any thorough discussion of moral parliaments exist? 2019-09-06T15:33:02.478Z
How much EA analysis of AI safety as a cause area exists? 2019-09-06T11:15:48.665Z
How do most utilitarians feel about "replacement" thought experiments? 2019-09-06T11:14:20.764Z
Why has poverty worldwide fallen so little in recent decades outside China? 2019-08-07T22:24:11.239Z
Which scientific discovery was most ahead of its time? 2019-05-16T12:28:54.437Z
Why doesn't the EA forum have curated posts or sequences? 2019-03-21T13:52:58.807Z
The career and the community 2019-03-21T12:35:23.073Z
Arguments for moral indefinability 2019-02-08T11:09:25.547Z


Comment by richard_ngo on So, I Want to Be a "Thinkfluencer" · 2022-08-22T21:57:08.645Z · EA · GW

I like your overall ambitions! I want to note a couple of things that seemed incongruous to me/things I'd change about your default plan.

I'm 24 now, so I'm hoping to start my career trajectory at 32 (8 years forms a natural/compelling Schelling point

This seems like very much the wrong mindset. You're starting this trajectory now. In order to do great intellectual work, you should be aiming directly at the things you want to understand, and the topics you want to make progress on, as early as you can. A better alternative would be taking the mindset that your career will end in 8 years, and thinking about what you'd need to produce great work by that time. (This is deliberately provocative, and shouldn't be taken fully literally, but I think points in the right direction, especially given that you're aiming to do research where the credentials from a PhD that's successful by mainstream standards don't matter very much, like agent foundations research and more general high-level strategic thinking).

Pick a new important topic each month (or 2 -3 months)

Again, I'd suggest taking quite a different strategy here. In order to do really well at this, I think you don't want the mindset of shallowly exploring other people's work (although of course it's useful to have that as background knowledge). I think you want to have the mindset of identifying the things which seem most important to you, pushing forward the frontier of knowledge on those topics, following threads which arise from doing so, and learning whatever you need as you go along. What it looks like to be successful here is noticing a bunch of ways in which other people seem like they're missing stuff/overlooking things, digging into those, and finding new ways to understand these topics. (That's true even if your only goal is to popularise existing ideas - in order to be able to popularise them really well, you want the level of knowledge such that, if there were big gaps in those ideas, then you'd notice them.) This is related to the previous point: don't spend all this time preparing to do the thing - just do it!

I think that I am unusually positioned to be able to become such a person.

I think that doing well at this research is sufficiently heavy-tailed that it's very hard to reason your way into thinking you'll be great at it in advance. You'll get far far more feedback on this point by starting to do the work now, getting a bunch of feedback, and iterating fast.

Good luck!

Comment by richard_ngo on Population Ethics Without Axiology: A Framework · 2022-08-03T19:03:00.451Z · EA · GW

Makes sense, glad we're on the same page!

a more accurate title for my post would be “population ethics without objective axiology.”

Perhaps consider changing it to that, then? Since I'm a subjectivist, I consider all axiologies subjective - and therefore "without axiology" is very different from "without objective axiology".

(I feel like I would have understood that our arguments were consistent either if the title had been different, or if I'd read the post more carefully - but alas, neither condition held.)

I'd also consider that humans are biological creatures with “interests” – a system-1 “monkey brain” with its own needs, separate (or at least separable) from idealized self-identities that the rational, planning part of our brain may come up with. So, if we also want to fulfill these interests/needs, that could be justification for a quasi-hedonistic view or for the type of mixed view that you advocate?

I like this justification for hedonism. I suspect that a version of this is the only justification that will actually hold up in the long term, once we've more thoroughly internalized qualia anti-realism.

Comment by richard_ngo on Population Ethics Without Axiology: A Framework · 2022-08-03T01:37:55.733Z · EA · GW

I like this post; as you note, we've been thinking along very similar lines. But you reach different conclusions than I do - in particular, I disagree that "the ambitious morality of “do the most moral/altruistic thing” is something like preference utilitarianism." In other words, I think most of your arguments about minimal morality are still consistent with having an axiology.

I didn't read your post very carefully, but I think the source of the disagreement is that you're conflating objectivity/subjectivity with respect to the moral actor  and objectivity/subjectivity with respect to the moral patient.

More specifically: let's say that I'm a moral actor, and I have some axiology. I might agree that this axiology is not objective: it's just my own idiosyncratic axiology. But it nevertheless might be non-subjective with respect to moral patients, in the sense that my axiology says that some experiences have value regardless of what the people having those experiences want. So I could be a hedonist despite thinking that hedonism isn't the objectively-correct axiology.

This distinction also helps resolve the tension between "there's an objective axiology" and "people are free to choose their own life goals": the objective axiology of what's good for a person might in part depend on what they want.

Having an axiology which says things like "my account of welfare is partly determined by hedonic experiences and partly by preferences and partly by how human-like the agent is" may seem unparsimonious, but I think that's just what it means for humans to have complex values. And then, as you note, we can also follow minimal (cooperation) morality for people who are currently alive, and balance that with maximizing the welfare of people who don't yet exist.

Comment by richard_ngo on On Deference and Yudkowsky's AI Risk Estimates · 2022-07-13T19:41:46.390Z · EA · GW

I've now written up a more complete theory of deference here. I don't expect that it directly resolves these disagreements, but hopefully it's clearer than this thread.

Going from allocating 10% of your resources to 20% of your resources to a worldview seems like a big change.

Note that this wouldn't actually make a big change for AI alignment, since we don't know how to use more funding. It'd make a big change if we were talking about allocating people, but my general heuristic is that I'm most excited about people acting on strong worldviews of their own, and so I think the role of deference there should be much more limited than when it comes to money. (This all falls out of the theory I linked above.)

Across the general population, maybe coherence is 7/10 correlated with expected future impact; across the experts that one would consider deferring to I think it is more like 2/10, because most experts seem pretty coherent (within the domains they're thinking about and trying to influence) and so the differences in impact depend on other factors.

Experts are coherent within the bounds of conventional study. When we try to apply that expertise to related topics that are less conventional (e.g. ML researchers on AGI; or even economists on what the most valuable interventions are) coherence drops very sharply. (I'm reminded of an interview where Tyler Cowen says that the most valuable cause area is banning alcohol, based on some personal intuitions.)

I feel like looking at any EA org's report on estimation of their own impact makes it seem like "impact of past policies" is really difficult to evaluate?

The question is how it compares to estimating past correctness, where we face pretty similar problems. But mostly I think we don't disagree too much on this question - I think epistemic evaluations are gonna be bigger either way, and I'm mostly just advocating for the "think-of-them-as-a-proxy" thing, which you might be doing but very few others are.

Comment by richard_ngo on On Deference and Yudkowsky's AI Risk Estimates · 2022-06-28T21:17:46.452Z · EA · GW

Meta: I'm currently writing up a post with a fully-fleshed-out account of deference. If you'd like to drop this thread and engage with that when it comes out (or drop this thread without engaging with that), feel free; I expect it to be easier to debate when I've described the position I'm defending in more detail.

I'll note that most of this seems unrelated to my original claim, which was just "deference* seems important for people making decisions now, even if it isn't very important in practice for researchers", in contradiction to a sentence on your top-level comment. Do you now agree with that claim?

I always agreed with this claim; my point was that the type of deference which is important for people making decisions now should not be very sensitive to the "specific credences" of the people you're deferring to. You were arguing above that the difference between your and Eliezer's views makes much more than a 2x difference; do you now agree that, on my account of deference, a big change in the deference-weight you assign to Eliezer plausibly leads to a much smaller change in your policy from the perspective of other worldviews, because the Eliezer-worldview trades off influence over most parts of the policy for influence over the parts that the Eliezer-worldview thinks are crucial and other policies don't?

individual proxies and my thoughts on them

This is helpful, thanks. I of course agree that we should consider both correlations with impact and ease of evaluation; I'm talking so much about the former because not noticing this seems like the default mistake that people make when thinking about epistemic modesty. Relatedly, I think my biggest points of disagreement with your list are:

1. I think calibrated credences are badly-correlated with expected future impact, because:
a) Overconfidence is just so common, and top experts are often really miscalibrated even when they have really good models of their field
b) The people who are best at having impact have  goals other than sounding calibrated - e.g. convincing people to work with them, fighting social pressure towards conformity, etc. By contrast, the people who are best at being calibrated are likely the ones who are always stating their all-things-considered views, and who therefore may have very poor object-level models. This is particularly worrying when we're trying to infer credences from tone - e.g. it's hard to distinguish the hypotheses "Eliezer's inside views are less calibrated than other peoples" and "Eliezer always speaks based on his inside-view credences, whereas other people usually speak based on their all-things-considered credences".
c) I think that "directionally correct beliefs" are much better-correlated, and not that much harder to evaluate, and so credences are especially unhelpful by comparison to those (like, 2/10 before conditioning on directional correctness, and 1/10 after, whereas directional correctness is like 3/10).

2. I think coherence is very well-correlated with expected future impact (like, 5/10), because impact is heavy-tailed and the biggest sources of impact often require strong, coherent views. I don't think it's that hard to evaluate in hindsight, because the more coherent a view is, the more easily it's falsified by history.

3. I think "hypothetical impact of past policies" is not that hard to evaluate.  E.g. in Eliezer's case the main impact is "people do a bunch of technical alignment work much earlier", which I think we both agree is robustly good.

Comment by richard_ngo on On Deference and Yudkowsky's AI Risk Estimates · 2022-06-28T01:18:10.873Z · EA · GW

In your Kurzweil example I think the issue is not that you assigned weights based on hypothetical-Kurzweil's beliefs, but that hypothetical-Kurzweil is completely indifferent over policies.

Your procedure is non-robust in the sense that, if Kurzweil transitions from total indifference to thinking that one policy is better by epsilon, he'll throw his full weight behind that policy. Hmm, but then in a parliamentary approach I guess that if there are a few different things he cares epsilon about, then other policies could negotiate to give him influence only over the things they don't care about themselves. Weighting by hypothetical-past-impact still seems a bit more elegant, but maybe it washes out.

(If we want to be really on-policy then I guess the thing which we should be evaluating is whether the person's worldview would have had good consequences when added to our previous mix of worldviews. And one algorithm for this is assigning policies weights by starting off from a state where they don't know anything about the world, then letting them bet on all your knowledge about the past (where the amount they win on bets is determined not just by how correct they are, but also how much they disagree with other policies). But this seems way too complicated to be helpful in practice.)

I agree that if you can evaluate quality of past recommended policies well, without a ton of noise, that would be a better signal than accuracy of beliefs. This just seems extremely hard to do, especially given the selection bias in who comes to your attention in the first place, and idk how I'd do it for Eliezer in any sane way.

I think I'm happy with people spending a bunch of time evaluating accuracy of beliefs, as long as they keep in mind that this is a proxy for quality of recommended policies. Which I claim is an accurate description of what I was doing, and what Ben wasn't: e.g. when I say that credences matter less than coherence of worldviews, that's because the latter is crucial for designing good policies, whereas the former might not be; and when I say that all-things-considered estimates of things like "total risk level" aren't very important, that's because in principle we should be aggregating policies not risk estimates between worldviews.

I also agree that selection bias could be a big problem; again, I think that the best strategy here is something like "do the standard things while remembering what's a proxy for what".

Comment by richard_ngo on On Deference and Yudkowsky's AI Risk Estimates · 2022-06-24T23:03:38.064Z · EA · GW

Yepp, thanks for the clear rephrasing. My original arguments for this view were pretty messy because I didn't have it fully fleshed out in my mind before writing this comment thread, I just had a few underlying intuitions about ways I thought Ben was going wrong.

Upon further reflection I think I'd make two changes to your rephrasing.

First change: in your rephrasing, we assign people weights based on the quality of their beliefs, but then follow their recommended policies. But any given way of measuring the quality of beliefs (in terms of novelty, track record, etc) is only an imperfect proxy for quality of policies. For example, Kurzweil might very presciently predict that compute is the key driver of AI progress, but suppose (for the sake of argument) that the way he does so is by having a worldview in which everything is deterministic, individuals are powerless to affect the future, etc. Then you actually don't want to give many resources to Kurzweil's policies, because Kurzweil might have no idea which policies make any difference.

So I think I want to adjust the rephrasing to say: in principle we should assign people weights based on how well their past recommended policies for someone like you would have worked out, which you can estimate using things like their track record of predictions, novelty of ideas, etc. But notably, the quality of past recommended policies is often not very sensitive to credences! For example, if you think that there's a 50% chance of solving nanotech in a decade, or a 90% chance of solving nanotech in a decade, then you'll probably still recommend working on nanotech (or nanotech safety) either way.

Having said all that, since we only get one rollout, evaluating policies is very high variance. And so looking at other information like reasoning, predictions, credences, etc, helps you distinguish between "good" and "lucky". But fundamentally we should think of these as approximations to policy evaluation, at least if you're assuming that we mostly can't fully evaluate whether their reasons for holding their views are sound.

Second change: what about the case where we don't get to allocate resources, but we have to actually make a set of individual decisions? I think the theoretically correct move here is something like: let policies spend their weight on the domains which they think are most important, and then follow the policy which has spent most weight on that domain.

Some complications:

  • I say "domains" not "decisions" because you don't want to make a series of related decisions which are each decided by a different policy, that seems incoherent (especially if policies are reasoning adversarially about how to undermine each other's actions).
  • More generally, this procedure could in theory be sensitive to bargaining and negotiating dynamics between different policies, and also the structure of the voting system (e.g. which decisions are voted on first, etc). I think we can just resolve to ignore those and do fine, but in principle I expect it gets pretty funky.

Lastly, two meta-level notes:

  • I feel like I've probably just reformulated some kind of reinforcement learning. Specifically the case where you have a fixed class of policies and no knowledge of how they relate to each other, so you can only learn how much to upweight each policy. And then the best policy is not actually in your hypothesis space, but you can learn a simple meta-policy of when to use each existing policy.
  • It's very ironic that in order to figure out how much to defer to Yudkowsky we need to invent a theory of idealised cooperative decision-making. Since he's probably the person whose thoughts on that I trust most, I guess we should meta-defer to him about what that will look like...
Comment by richard_ngo on On Deference and Yudkowsky's AI Risk Estimates · 2022-06-24T00:10:21.628Z · EA · GW

(One major thing is that I think you should be comparing between two actions, rather than evaluating an action by itself, which is why I compared to "all other alignment work".)

IMO the crux is that I disagree with both of these. Instead I think you should use each worldview to calculate a policy, and then generate some kind of compromise between those policies. My arguments above were aiming to establish that this strategy is not very sensitive to exactly how much you defer to Eliezer, because there just aren't very many good worldviews going around - hence why I assign maybe 15 or 20% (inside view) credence to his worldview (updated from 10% above after reflection). (I think my all-things-considered view is similar, actually, because deference to him cancels out against deference to all the people who think he's totally wrong.)

Again, the difference is in large part determined by whether you think you're in a low-dimensional space (here are our two actions, which one should we take?) versus a high-dimensional space (millions of actions available to us, how do we narrow it down?) In a high-dimensional space the tradeoffs between the best ways to generate utility according to Eliezer's worldview and the best ways to generate utility according to other worldviews become much smaller.

This seems like a crazy way to do cost-effectiveness analyses.

Like, if I were comparing deworming to GiveDirectly, would I be saying "well, the value of deworming is mainly determined by the likelihood that the pro-deworming people are right, which I estimate is 70% but you estimate is 50%, so there's only a 1.4x difference"? Something has clearly gone wrong here.

Within a worldview, you can assign EVs which are orders of magnitude different. But once you do worldview diversification, if a given worldview gets even 1% of my resources, then in some sense I'm acting like that worldview's favored interventions are in a comparable EV ballpark to all the other worldviews' favored interventions. That's a feature not a bug.

It also feels like this reasoning implies that no EA action can be > 10x more valuable than any other action that an EA critic thinks is good? Since you assign a 90% chance that the EA is right, and the critic thinks there's a 10% chance of that, so there's only a 9x gap? And then once you do all of your adjustments it's only 2x? Why do we even bother with cause prioritization under this worldview?

I don't have a fleshed out theory of how and when to defer, but I feel pretty confident that even our intuitive pretheoretic deference should not be this sort of thing, and should be the sort of thing that can have orders of magnitude of difference between actions.

An arbitrary critic typically gets well less than 0.1% of my deference weight on EA topics (otherwise it'd run out fast!) But also see above: because in high-dimensional spaces there are few tradeoffs between different worldviews' favored interventions, changing the weights on different worldviews doesn't typically lead to many OOM changes in how you're acting like you're assigning EVs.

Also, I tend to think of cause prio as trying to integrate multiple worldviews into a single coherent worldview. But with deference you intrinsically can't do that, because the whole point of deference is you don't fully understand their views.

There's lots of things you can do under Eliezer's worldview that add dignity points, like paying relevant people millions of dollars to spend a week really engaging with the arguments, or trying to get whole-brain emulation before AGI. My understanding is that he doesn't expect those sorts of things to happen.

What do you mean "he doesn't expect this sort of thing to happen"? I think I would just straightforwardly endorse doing a bunch of costly things like these that Eliezer's worldview thinks are our best shot, as long as they don't cause much harm according to other worldviews.

I don't see why you are not including "c) give significant deference weight to his actual worldview", which is what I'd be inclined to do if I didn't have significant AI expertise myself and so was trying to defer.

Because neither Ben nor myself was advocating for this.

Comment by richard_ngo on On Deference and Yudkowsky's AI Risk Estimates · 2022-06-23T06:41:15.748Z · EA · GW

Thanks for writing this update. I think my number one takeaway here is something like: when writing a piece with the aim of changing community dynamics, it's important to be very clear about motivations and context. E.g. I think a version of the piece which said "I think people are overreacting to Death with Dignity, here are my specific models of where Yudkowsky tends to be overconfident, here are the reasons why I think people aren't taking those into account as much as they should" would have been much more useful and much less controversial than the current piece, which (as I interpret it) essentially pushes a general "take Yudkowsky less seriously" meme (and is thereby intrinsically political/statusy).

Comment by richard_ngo on On Deference and Yudkowsky's AI Risk Estimates · 2022-06-23T06:24:15.140Z · EA · GW

Musing out loud: I don't know of any complete model of deference which doesn't run into weird issues, like the conclusion that you should never trust yourself. But suppose you have some kind of epistemic parliament where you give your own views some number of votes, and assign the rest of the votes to other people in proportion to how defer-worthy they seem. Then you need to make a bunch of decisions, and your epistemic parliament keeps voting on what will best achieve your (fixed) goals.

If you do naive question-by-question majority voting on each question simultaneously then you can end up with an arbitrarily incoherent policy - i.e. a set of decisions that's inconsistent with each other. And if you make the decisions in some order, with the constraint that they each have to be consistent with all prior decisions, then the ordering of the decisions can become arbitrarily important.

Instead, you want your parliament to negotiate some more coherent joint policy to follow. And I expect that in this joint policy, each worldview gets its way on the questions that are most important to it, and cedes responsibility on the questions that are least important. So Eliezer's worldview doesn't end up reallocating all the biosecurity money, but it does get a share of curriculum time (at least for the most promising potential researchers). But in general how to conduct those negotiations is an unsolved problem (and pretty plausibly unsolveable).

Comment by richard_ngo on On Deference and Yudkowsky's AI Risk Estimates · 2022-06-23T05:52:34.179Z · EA · GW

Yeah, I'm gonna ballpark guess he's around 95%? I think the problem is that he cites numbers like 99.99% when talking about the chance of doom "without miracles", which in his parlance means assuming that his claims are never overly pessimistic. Which seems like wildly bad epistemic practice. So then it goes down if you account for that, and then maybe it goes down even further if he adjusts for the possibility that other people are more correct than him overall (although I'm not sure that's a mental move he does at all, or would ever report on if he did).

Comment by richard_ngo on On Deference and Yudkowsky's AI Risk Estimates · 2022-06-23T05:45:16.250Z · EA · GW

We both agree that you shouldn't defer to Eliezer's literal credences, because we both think he's systematically overconfident. The debate is between two responses to that:

a)  Give him less deference weight than the cautious, sober, AI safety people who make few novel claims but are better-calibrated (which is what Ben advocates).

b) Try to adjust for his overconfidence and then give significant deference weight to a version of his worldview that isn't overconfident.

I say you should do the latter, because you should be deferring to coherent worldviews (which are rare) rather than deferring on a question-by-question basis. This becomes more and more true the more complex the decisions you have to make. Even for your (pretty simple) examples, the type of deference you seem to be advocating doesn't make much sense.

For instance:

should funders reallocate nearly all biosecurity money to AI?

It doesn't make sense to defer to Eliezer's estimate of the relative importance of AI without also accounting for his estimate of the relative tractability of funding AI, which I infer he thinks is very low.

Should there be an organization dedicated to solving Eliezer's health problems? What should its budget be?

I'm guessing that Eliezer thinks that with more energy and project management skills he could make a significant dent in x-risk (perhaps 10 percentage points), while thinking the rest of the alignment field if fully funded can't make a dent of more than 0.01 percentage points, suggesting that "improve Eliezer's health + project management skills" is 3 OOM more important than "all other alignment work" (saying nothing about tractability, which I don't know enough to evaluate). Whereas I'd have it at, idk, 1-2 OOM less important, for a difference of 4-5 OOMs.

Again, the problem is that you're deferring on a question-by-question basis, without considering the correlations between different questions - in this case, the likelihood that Eliezer is right, and the value of his work. (Also, the numbers seem pretty wild; maybe a bit uncharitable to ascribe to Eliezer the view that his research would be 3 OOM more valuable than the rest of the field combined? His tone is strong but I don't think he's ever made a claim that big.)

Here's an alternative calculation which takes into account that correlation. I claim that the value of this organization is mainly determined by the likelihood that Eliezer is correct about a few key claims which underlie his research agenda. Suppose he thinks that's 90% likely and I think that's 10% likely. Then  if our choices are "defer entirely to Eliezer" or "defer entirely to Richard", there's a 9x difference in funding efficacy. In practice, though, the actual disagreement here is between "defer to Eliezer no more than a median AI safety researcher" and something like "assume Eliezer is, say, 2x overconfident and then give calibrated-Eliezer, say, 30%ish of your deference weight". If we assume for the sake of simplicity that every other AI safety researcher has my worldview, then the practical difference here is something like a 2x difference in this org's efficacy (0.1 vs 0.3*0.9*0.5+0.7*0.1). Which is pretty low!

Won't go through the other examples but hopefully that conveys the idea. The basic problem here, I think, is that the implicit "deference model" that you and Ben are using doesn't actually work (even for very simple examples like the ones you gave).

Comment by richard_ngo on On Deference and Yudkowsky's AI Risk Estimates · 2022-06-20T20:03:35.217Z · EA · GW

I haven't thought much about nuclear policy, so I can't respond there. But at least in alignment, I expect that pushing on variables where there's less than a 2x difference between the expected positive and negative effects of changing that variable is not a good use of time for altruistically-motivated people.

(By contrast, upweighting or downweighting Eliezer's opinions by a factor of 2 could lead to significant shifts in expected value, especially for people who are highly deferential. The specific thing I think doesn't make much difference is deferring to a version of Eliezer who's 90% confident about something, versus deferring to the same extent to a version of Eliezer who's 45% confident in the same thing.)

My more general point, which doesn't hinge on the specific 2x claim, is that naive conversions between metrics of calibration and deferential weightings are a bad idea, and that a good way to avoid naive conversions is to care a lot more about innovative thinking than calibration when deferring.

Comment by richard_ngo on On Deference and Yudkowsky's AI Risk Estimates · 2022-06-20T19:55:29.842Z · EA · GW

I phrased my reply strongly (e.g. telling people to read the other post instead of this one) because deference epistemology is intrinsically closely linked to status interactions, and you need to be pretty careful in order to make this kind of post not end up being, in effect, a one-dimensional "downweight this person". I don't think this post was anywhere near careful enough to avoid that effect. That seems particularly bad because I think most EAs should significantly upweight Yudkowsky's views if they're doing any kind of reasonable, careful deference, because most EAs significantly underweight how heavy-tailed the production of innovative ideas actually is (e.g. because of hindsight bias, it's hard to realise how much worse than Eliezer we would have been at inventing the arguments for AI risk, and how many dumb things we would have said in his position).

By contrast, I think your post is implicitly using a model where we have a few existing, well-identified questions, and the most important thing is to just get to the best credences on those questions, and we should do so partly by just updating in the direction of experts. But I think this model of deference is rarely relevant; see my reply to Rohin for more details. Basically, as soon as we move beyond toy models of deference, the "innovative thinking" part becomes crucially important, and the "well-calibrated" part becomes much less so.

One last intuition: different people have different relationships between their personal credences and their all-things-considered credences. Inferring track records in the way you've done here will, in addition to favoring people who are quieter and say fewer useful things, also favor people who speak primarily based on their all-things-considered credences rather than their personal credences. But that leads to a vicious cycle where people are deferring to people who are deferring to people who... And then the people who actually do innovative thinking in public end up getting downweighted to oblivion via cherrypicked examples.

Modesty epistemology delenda est.

Comment by richard_ngo on On Deference and Yudkowsky's AI Risk Estimates · 2022-06-20T19:25:36.969Z · EA · GW

I think that there are very few decisions which are both a) that low-dimensional and b) actually sensitive to the relevant range of credences that we're talking about.

Like, suppose you think that Eliezer's credences on his biggest claims are literally 2x higher than they should be, even for claims where he's 90% confident. This is a huge hit in terms of Bayes points; if that's how you determine deference, and you believe he's 2x off, then plausibly that implies you should defer to him less than you do to the median EA. But when it comes to grantmaking, for example, a cost-effectiveness factor of 2x is negligible given the other uncertainties involved - this should very rarely move you from a yes to no, or vice versa. (edit: I should restrict the scope here to grantmaking in complex, high-uncertainty domains like AI alignment).

Then you might say: well, okay, we're not just making binary decisions, we're making complex decisions where we're choosing between lots of different options. But the more complex the decisions you're making, the less you should care about whether somebody's credences on a few key claims are accurate, and the more you should care about whether they're identifying the right types of considerations, even if you want to apply a big discount factor to the specific credences involved.

As a simple example, as soon as you're estimating more than one variable, you typically start caring a lot about whether the errors on your estimates are correlated or uncorrelated. But there are so many different possibilities for ways and reasons that they might be correlated that you can't just update towards experts' credences, you have to actually update towards experts' reasons for those credences, which then puts you in the regime of caring more about whether you've identified the right types of considerations.

Comment by richard_ngo on On Deference and Yudkowsky's AI Risk Estimates · 2022-06-20T03:08:17.001Z · EA · GW

EDIT: I've now written up my own account of how we should do epistemic deference in general, which fleshes out more clearly a bunch of the intuitions I outline in this comment thread.

I think that a bunch of people are overindexing on Yudkowsky's views; I've nevertheless downvoted this post because it seems like it's making claims that are significantly too strong, based on a methodology that I strongly disendorse. I'd much prefer a version of this post which, rather than essentially saying "pay less attention to Yudkowsky", is more nuanced about how to update based on his previous contributions; I've tried to do that in this comment, for example. (More generally, rather than reading this post, I recommend people read this one by Paul Christiano, which outlines specific agreements and disagreements. Note that the list of agreements there, which I expect that many other alignment researchers also buy into, serves as a significant testament to Yudkowsky's track record.)

The part of this post which seems most wild to me is the leap from "mixed track record" to

In particular, I think, they shouldn’t defer to him more than they would defer to anyone else who seems smart and has spent a reasonable amount of time thinking about AI risk.

For any reasonable interpretation of this sentence, it's transparently false. Yudkowsky has proven to be one of the best few thinkers in the world on a very difficult topic. Insofar as there are others who you couldn't write a similar "mixed track record" post about, it's almost entirely because they don't have a track record of making any big claims, in large part because they weren't able to generate the relevant early insights themselves. Breaking ground in novel domains is very, very different from forecasting the weather or events next year; a mixed track record is the price of entry.

Based on his track record, I would endorse people deferring more towards the general direction of Yudkowsky's views than towards the views of almost anyone else. I also think that there's a good case to be made that Yudkowsky tends to be overconfident, and this should be taken into account when deferring; but when it comes to making big-picture forecasts, the main value of deference is in helping us decide which ideas and arguments to take seriously, rather than the specific credences we should place on them, since the space of ideas is so large. The EA community has ended up strongly moving in Yudkowsky's direction over the last decade, and that seems like much more compelling evidence than anything listed in this post.

Comment by richard_ngo on Three intuitions about EA: responsibility, scale, self-improvement · 2022-05-13T01:23:50.106Z · EA · GW

Great suggestion! I've added this now.

Comment by richard_ngo on Bad Omens in Current Community Building · 2022-05-12T18:18:20.709Z · EA · GW

Which stuff in particular?

Comment by richard_ngo on How I failed to form views on AI safety · 2022-04-25T23:15:47.044Z · EA · GW

I don't think that my main disagreement with Stuart is about how we'll reach AGI, because critiques of his approach, like this page, don't actually require any assumption that we're in the ML paradigm.

Whether AGI will be built in the ML paradigm or not, I think that CIRL does less than 5%, and probably less than 1%, of the conceptual work of solving alignment; whereas the rocket equation does significantly more than 5% of the conceptual work required to get to the moon. And then in both cases there's lots of engineering work required too. (If AGI will be built in a non-ML paradigm, then getting 5% of the way to solving alignment probably requires actually making claims about whatever the replacement-to-ML paradigm is, which I haven't seen from Stuart.)

But Stuart's presentation of his ideas seems wildly inconsistent with both my position and your position above (e.g. in Human Compatible he seems way more confident in his proposal than would be justified by having gotten even 5% of the way to a solution).

Comment by richard_ngo on How I failed to form views on AI safety · 2022-04-21T01:16:41.441Z · EA · GW

Superintelligence doesn't talk about ML enough to be strongly persuasive given the magnitude of the claims it's making (although it does a reasonable job of conveying core ideas like the instrumental convergence thesis and orthogonality thesis, which are where many skeptics get stuck).

Human Compatible only spends, I think, a couple of pages actually explaining the core of the alignment problem (although it does a good job at debunking some of the particularly bad responses to it). It doesn't do a great job at linking the conventional ML paradigm to the superintelligence paradigm, and I don't think the "assistance games" approach is anywhere near as promising as Russell makes it out to be.

Comment by richard_ngo on How I failed to form views on AI safety · 2022-04-18T18:59:45.750Z · EA · GW

Old: "The techniques discussed this week showcase a tradeoff between power and alignment: behavioural cloning provides the fewest incentives for misbehaviour, but is also hardest to use to go beyond human-level ability. Whereas reward modelling can reward agents for unexpected behaviour that leads to good outcomes (as long as humans can recognise them) - but this also means that those agents might find and be rewarded for manipulative or deceptive actions. Christiano et al. (2017) provide an example of an agent learning to deceive the human evaluator; and Stiennon et al. (2020) provide an example of an agent learning to “deceive” its reward model. Lastly, while IRL could in theory be used even for tasks that humans can’t evaluate, it relies most heavily on assumptions about human rationality in order to align agents."

New: "The techniques discussed this week showcase a tradeoff between power and alignment: behavioural cloning provides the fewest incentives for misbehaviour, but is also hardest to use to go beyond human-level ability. Reward modelling, by contrast, can reward agents for unexpected behaviour that leads to good outcomes - but also rewards agents for manipulative or deceptive actions. (Although deliberate deception is likely beyond the capabilities of current agents, there are examples of simpler behaviours have a similar effect: Christiano et al. (2017) describes an agent learning behaviour which misled the human evaluator; and Stiennon et al. (2020) describes an agent learning behaviour which was misclassified by its reward model.) Lastly, while IRL can potentially be used even for tasks that humans can’t evaluate, the theoretical justification for why this should work relies on implausibly strong assumptions about human rationality."

Comment by richard_ngo on How I failed to form views on AI safety · 2022-04-18T18:59:25.270Z · EA · GW

I really liked this post. I've often felt frustrated by how badly the alignment community has explained the problem, especially to ML practitioners and researchers, and I personally find neither Superintelligence nor Human Compatible very persuasive. For what it's worth, my default hypothesis is that you're unconvinced by the arguments about AI risk in significant part because you are applying an usually high level of epistemic rigour, which is a skill that seems valuable to continue applying to this topic (including in the case where AI risk isn't important, since that will help us uncover our mistake sooner). I can think of some specific possibilities, and will send you a message about them.

The frustration I mentioned was the main motivation for me designing the AGISF course; I'm now working on follow-up material to hopefully convey the key ideas in a simpler and more streamlined way (e.g. getting rid of the concept of "mesa-optimisers"; clarifying the relationship between "behaviours that are reinforced because they lead to humans being mistaken" and "deliberate deception"; etc). Thanks for noting the "deception" ambiguity in the AGI safety fundamentals curriculum - I've replaced it with a more careful claim (details in reply to this comment).

Comment by richard_ngo on Are there any AI Safety labs that will hire self-taught ML engineers? · 2022-04-13T09:52:34.898Z · EA · GW

"DeepMind allows REs to lead research projects" is consistent with "DeepMind restricts REs more than other places". E.g. OpenAI doesn't even officially distinguish RE from RS positions, whereas DeepMind has different ladders with different expectations for each. And I think the default expectations for REs and RSs are pretty different (although I agree that it's possible for REs to end up doing most of the same things as RSs).

Comment by richard_ngo on Other-centered ethics and Harsanyi's Aggregation Theorem · 2022-02-26T01:44:14.955Z · EA · GW

Situation 1 is better for person X than situation 2 (that is, it gives X higher utility) if and only if person X prefers that situation.

I think you need "prefers that situation for themselves". Otherwise, imagine person X who is a utilitarian - they'll always prefer a better world, but most ways of making the world better don't "benefit X".

Then, unfortunately, we run into the problem that we're unable to define what it means to prefer something "for yourself", because we can no longer use (even idealised) choices between different options.

Comment by richard_ngo on richard_ngo's Shortform · 2022-02-25T18:06:43.670Z · EA · GW

In the same way that covid was a huge opportunity to highlight biorisk, the current Ukraine situation may be a huge opportunity to highlight nuclear risks and possible solutions to them. What would it look like for this to work really well?

Comment by richard_ngo on Agrippa's Shortform · 2022-02-24T17:48:34.335Z · EA · GW

To me the core tension here is: even if a direct impact sense pure capabilities work is one of the most harmful things you can do (something which I feel fairly uncertain about), it's still also one of the most valuable things you can do, in an upskilling sense. So at least until the point where it's (ballpark) as effective and accessible to upskill in alignment by doing alignment directly rather than by doing capabilites, I think current charitability norms are better than the ostracism norms you propose. (And even after that point, charitability may still be better for talent acquisition, although the tradeoffs are more salient.)

Comment by richard_ngo on Some thoughts on vegetarianism and veganism · 2022-02-14T20:17:06.151Z · EA · GW

Was this meant to be in reply to another comment?

Comment by richard_ngo on Some thoughts on vegetarianism and veganism · 2022-02-14T06:27:46.105Z · EA · GW

Ty, fixed.

Comment by richard_ngo on Should you work in the European Union to do AGI governance? · 2022-02-03T22:05:03.163Z · EA · GW

Intuitive reaction: I think these are all valuable arguments to have explicitly laid out, thanks for doing so. I think they don't quite capture my main intuitions about the value of EU-directed governance work, though; let me try explain those below.

One intuition draws from the classic distinction between realism and liberalism in international relations. Broadly speaking, I see the EU as being most relevant from a liberalist perspective; whereas it's much less relevant from a realist perspective. And although I think of both sides as having important perspectives, the dynamics surrounding catastrophic AGI development feel much better-described by realism than by liberalism - it feels like that's the default that things will likely fall back to if the world gets much more chaotic and scary, and there are potential big shifts in the global balance of power.

Second intuition: when it comes to governing AGI, I expect that acting quickly and decisively will be crucial. I can kinda see the US govt. being able to do this (especially by spinning off new agencies, or by presidential power). I have a lot more trouble seeing the EU being able to do this, even in a best-case scenario (does the EU even have the ability in theory, let alone in practice, to empower fast-moving organisations with specific mandates)?

Compared with your arguments, I think these two intuitions are more focused on working backwards from a "theory of victory" to figure out what's useful today (as opposed to working forwards towards gaining more influence). Our overall thinking about theories of victory is still so nascent, though, that it feels like there's a lot of option value in having people going down a bunch of different pathways. Plus I have a few other intuitions in favour of the value of EU-directed governance research: firstly, I think people often overestimate the predictability of AGI development. E.g. a European DeepMind popping up within the next decade or two doesn't seem that much less plausible than the original DeepMind popping up in England. Might  just take a few outlier founders to make that happen. Secondly, separate from progress on AI itself, it does seem plausible that the EU will have significant influence over the chip supply chain going forward (right now most notably via ASML, as you mention).

Overall I do think people with a strong comparative advantage should do EU-governance-related things, I'm just very uncertain how strong the comparative advantage needs to be for that to be one of the best career pathways (although I do know at least a few people whose comparative advantage does seem strong enough for that to be the correct move).

Comment by richard_ngo on Ngo's view on alignment difficulty · 2022-01-22T18:41:28.696Z · EA · GW

Cliarly a good move.

Comment by richard_ngo on Comments for shorter Cold Takes pieces · 2022-01-05T12:39:59.617Z · EA · GW

You might be interested in my effort to characterise utopia.

Comment by richard_ngo on AGI Safety Fundamentals curriculum and application · 2021-12-27T17:57:05.788Z · EA · GW

Yeah, I also feel confused about why I didn't have this thought when talking to you about RAISE.

Most proximately, AGI safety fundamentals uses existing materials because its format is based on the other EA university programs; and also because I didn't have time to write (many) new materials for it.

I think the important underlying dynamic here is starting with a specific group of people with a problem, and then making the minimum viable product that solves their problem. In this case, I was explicitly thinking about what would have helped my past self the most.

Perhaps I personally didn't have this thought back in 2019 because I was still in "figure out what's up with AI safety" mode, and so wasn't in a headspace where it was natural to try to convey things to other people.

Comment by richard_ngo on AGI Safety Fundamentals curriculum and application · 2021-12-27T17:49:19.869Z · EA · GW

This post (plus the linked curriculum) is the most up-to-date resource.

There's also this website, but it's basically just a (less-up-to-date) version of the curriculum.

Comment by richard_ngo on Ngo's view on alignment difficulty · 2021-12-16T23:43:09.122Z · EA · GW

Seems like it was just repeated; fixed now.

Comment by richard_ngo on AGI Safety Fundamentals curriculum and application · 2021-11-29T20:14:14.143Z · EA · GW

Update: see here:

Comment by richard_ngo on We need alternatives to Intro EA Fellowships · 2021-11-20T22:23:10.426Z · EA · GW

I’ve advised one person to skip the fellowship and do the readings at an accelerated pace on their own and talk to other organizers about it.

This seems like good advice. In general I think fellowship curricula are pretty great resources regardless of whether you're actually doing the fellowship or not, so one low-effort change could just be to tell people "you can do this fellowship, or if you're really excited about spending much more time on this, you can just speedrun all the readings".

In fact, maybe the best option is for those people to do both. E.g. do all the readings up front, but still have ongoing fellowship sessions over the next 8 weeks to have higher-fidelity communication/make sure they have interpreted the readings in the right way/answer relevant questions.

(epistemic status: not strong opinions, since I don't have much context on student EA groups right now)

Comment by richard_ngo on AGI Safety Fundamentals curriculum and application · 2021-11-06T22:39:23.539Z · EA · GW

Not finalised, but here's a rough reading list which would replace weeks 5-7 for the governance track.

Comment by richard_ngo on AGI Safety Fundamentals curriculum and application · 2021-10-29T23:58:21.924Z · EA · GW

Actually, Joe Carlsmith does it better in Is power-seeking AI an existential risk? So I've swapped that in instead.

Comment by richard_ngo on AGI Safety Fundamentals curriculum and application · 2021-10-21T18:24:43.926Z · EA · GW

This is a great point, and I do think it's an important question for participants to consider; I should switch the last reading for something covering this. The bottleneck is just finding a satisfactory reading - I'm not totally happy with any of the posts covering this, but maybe AGI safety from first principles is the closest to what I want.

Comment by richard_ngo on richard_ngo's Shortform · 2021-10-11T17:39:58.355Z · EA · GW

Disproportionately many of the most agentic and entrepreneurial young EAs I know are community-builders. I think this is because a) EA community-building currently seems neglected compared to other cause areas, but b) there's currently no standard community-building career pathway, so to work on it they had to invent their own jobs.

Hopefully the people I'm talking about changing the latter will lead to the resolution of the former.

Comment by richard_ngo on SIA > SSA, part 3: An aside on betting in anthropics · 2021-10-01T15:21:40.536Z · EA · GW

Indeed, the EDT-ish thirder, here, actually ends up betting like a fifth-er. That is, if offered a “win twenty if heads, lose ten if tails” bet upon each waking, she reasons: “1/3rd I’m in a heads world and will win $20. But 2/3rds I’m in a tails world, and am about to take or reject this bet twice, thereby losing $20. Thus, I should reject. To accept, the heads payout would need to be $40 instead.”

To me this seems like a strong argument that we shouldn't separate credences from betting behaviour. If your arguments lead to a "special type of credences" which it's silly for EDT agents to use to bet, then that just indicates that EDT-type reasoning is built into the plausibility of SIA. 

In other words: you talk about contorting one's epistemology in order to bet a particular way, but what's the alternative? If I'm an EDT agent who wants to bet at odds of a third, what is the principled reasoning that leads me to have credence of a half? Seems like that's just SSA again.

In fact, I want to offer an alternative framing of your objections to SSA. You argue that questions like "could I have been a chimpanzee" seem ridiculous. But these are closely analogous to the types of questions that one needs to ask when making decisions according to FDT (e.g. "are the decisions of chimpanzees correlated with my own?") So, if we need to grapple with these questions somehow in order to make decisions, grappling with them via our choice of a reference class doesn't seem like the worst way to do so.

Suppose I am wondering “is there an X-type multiverse?” or “are there a zillion zillion copies of me somewhere in the universe?”. I feel like I’m just asking a question about what’s true, about what kind of world I’m living in — and I’m trying to use anthropics as a guide in figuring it out.

I'm reminded of Yudkowsky's writing about why he isn't prepared to get rid of the concept of "anticipated subjective experience", despite the difficulties it poses from a quantum-mechanical perspective.

Comment by richard_ngo on SIA > SSA, part 1: Learning from the fact that you exist · 2021-10-01T14:18:24.421Z · EA · GW

Can you explain what you mean by "people in your epistemic situation"? Do you intend it to be people who have all the information currently available to you? Or do you sometimes need to abstract away from some information that you have (e.g. specific details about yourself)?

Comment by richard_ngo on Suggested norms about financial aid for EAG(x) · 2021-09-25T09:20:49.813Z · EA · GW

I think this would be better than the current FAQ, but it seems like what you've said above mostly rephrases the ambiguities I highlighted without doing much to resolve them. E.g. the ambiguity of "for anyone who needs them" isn't much alleviated by using phrasing like "money is tight" and "doesn't give them serious pause". I'd take a wild guess that maybe the bottom 30% of westerners (by income) would say that "money is tight" for them, and that the top 5% would say that "spending $400+ doesn't give them serious pause". But it's not clear how the remaining 65% should think about it. (And that category is probably even more than 65% for people from poorer countries.) Hence my proposal in the post that you should pay full price iff you're already donating enough that you're comfortable for EAG to take up $400+ of your donation budget (I'll edit to make this more explicit).

Similarly, it also seems pretty hard to evaluate  where the bar should be for "it wouldn't be that good for them to come", since almost everyone will have improved decisions/strengthened motivations/etc from EAG to some degree. Should it be twice as good as a typical weekend in order for people to feel justified in taking a spot? Four times? Ten times? The highlight of the year? This seems very hard to judge. Hence my proposal in the post that the bar you should use is the marginal next person who would be accepted to EAG (while weighing your own acceptance as significant evidence that you should go).

Lastly, your proposal doesn't tell people whether they should think of EAG as "personal" or "altruistic" spending. For people who categorise these differently, their bars for marginal spending on personal versus altruistic things might be very different.

Comment by richard_ngo on Suggested norms about financial aid for EAG(x) · 2021-09-21T08:50:01.967Z · EA · GW

In hindsight I should have elaborated on the "cooperativeness" part more; I've edited the post to do so. The key point is made in this post about how donating only to what seems like the most neglected priority to you is partially a form of free-riding, because it means that others who have different values need to spend their resources on things that you both care about. So in order to have healthier relationships with other altruists, you should agree to both partially cover shared priorities, even when that is a less effective use of money in the short term.

Now, you might have stronger or weaker intuitions about how important this type of cooperation is. I think my intuition is that we should aim for cooperative norms that are strong enough that we can cooperate even across large value differences. But cooperative norms which are this strong will then weigh heavily in favour of cooperation between altruists with much smaller value differences, like CEA and EAG attendees (especially because CEA and/or big EA funders have thought about this and decided that the benefits of having people pay for their own tickets by default are more important, from their perspective, than downsides like tax inefficiency).

It also seems reasonable to disagree with this; it's something of a judgement call. But I claim that this is the right judgement call to be making.

Comment by richard_ngo on Forecasting transformative AI: what's the burden of proof? · 2021-09-01T14:52:55.155Z · EA · GW

Thanks for the response, that all makes sense. I missed some of the parts where you disambiguated those two concepts; apologies for that. I suspect I still see the disparity between "extraordinarily important century" and "most important century" as greater than you do, though, perhaps because I consider value lock-in this century less likely than you do - I haven't seen particularly persuasive arguments for it in general (as opposed to in specific scenarios, like AGIs with explicit utility functions or the scenario in your digital people post). And relatedly, I'm pretty uncertain about how far away technological completion is - I can imagine transitions to post-human futures in this century which still leave a huge amount of room for progress in subsequent centuries.

I agree that 'extraordinarily important century" and "transformative century" don't have the same emotional impact as "most important century".  I wonder if you could help address this by clarifying that you're talking about "more change this century than since X" (for x = a millennium ago, or since agriculture, or since cavemen, or since we diverged from chimpanzees). "Change" also seems like a slightly more intuitive unit than "importance", especially for non-EAs for whom "importance" is less strongly associated with "our ability to exert influence".

Comment by richard_ngo on Forecasting transformative AI: what's the burden of proof? · 2021-08-19T09:34:47.416Z · EA · GW

I very much like how careful you are in looking at this question of the burden of proof when discussing transformative AI. One thing I'm uncertain about, though: is the "most important century" framing the best one to use when discussing this? It seems to me like "transformative AI is coming this century" and "this century is the most important century" are very different claims which you tend to conflate in this sequence.

One way of thinking about this: suppose that, this century, there's an AI revolution at least as big as the industrial revolution. How many more similarly-sized revolutions are plausible before reaching a stable galactic civilisation? The answer to this question could change our estimate of P(this is the most important century) by an order of magnitude (or perhaps two, if we have good reasons to think that future revolutions will be more important than this century's TAI), but has a relatively small effect on what actions we should take now.

More generally, I think that claims which depend on the specifics of our long-term trajectory after transformative AI are much easier to dismiss as being speculative (especially given how much pushback claims about reaching TAI already receive for being speculative). So I'd much rather people focus on the claim that "AI will be really, really big" than "AI will be bigger than anything else which comes afterwards". But it seems like framing this sequence of posts as the "most important century" sequence pushes towards the latter.

Oh, also, depending on how you define "important", it may be the case that past centuries were more important because they contained the best opportunities to influence TAI - e.g. when the west became dominant, or during WW1 and WW2, or the cold war. Again, that's not very action-guiding, but it does make the "most important century" claim even more speculative.


Comment by richard_ngo on AGI safety from first principles · 2021-06-27T23:01:24.628Z · EA · GW

Ah, I like the multiagent example. So to summarise: I agree that we have some intuitive notion of what cognitive processes we think of as intelligent, and it would be useful to have a definition of intelligence phrased in terms of those. I also agree that Legg's behavioural definition might diverge from our implicit cognitive definition in non-trivial ways.

I guess the reason why I've been pushing back on your point is that I think that possible divergences between the two aren't the main thing going on here. Even if it turned out that the behavioural definition and the cognitive definition ranked all possible agents the same, I think the latter would be much more insightful and much more valuable for helping us think about AGI.

But this is probably not an important disagreement.

Comment by richard_ngo on AGI safety from first principles · 2021-06-27T01:12:39.945Z · EA · GW

Ah, I see. I thought you meant "situations" as in "individual environments", but it seems like you meant "situations" as in "possible ways that all environments could be".

In that case, I think you're right, but I don't consider it a problem. Why might it be the case that adding more compute, or more memory, or something like that, would be net negative across all environments? It seems like either we'd have to define the set of environments in a very gerrymandered way, or else there's something about the change we made that lands us in a valley of bad thinking. In the former case, we should use a wider set of environments; in the latter case, it seems easier to bite the bullet and say "Yeah, turns out that adding more of this usually-valuable trait makes agents less intelligent."

Comment by richard_ngo on AGI safety from first principles · 2021-06-26T05:03:26.113Z · EA · GW

One thing I'm confused about is whether Legg's definition (or your rephrasing) allows for situations where it's in principle possible that being smarter is ex ante worse for an agent (obviously ex post it's possible to follow the correct decision procedure and be unlucky).

There definitely are such cases - e.g. Omega penalises all smart agents. Or environments where there are several crucial considerations which you're able to identify at different levels of intelligence, so that as intelligence increases, your success increases and decreases.

But in general I agree with your complaint about Legg's definition being defined in behavioural terms, and how it'd be better to have a good definition of intelligence in terms of the cognitive processes involved (e.g. planning, abstraction, etc). I do think that starting off in behaviourist terms was a good move, back when people were much more allergic to talking about AGI/superintelligence. But now that we're past that point, I think we can do better. (I don't think I've written about this yet in much detail, but it's quite high on my list of priorities.)

Comment by richard_ngo on AGI safety from first principles · 2021-06-25T21:28:40.548Z · EA · GW

I intended mine to be a slight rephrasing of Legg and Hutter's definition to make it more accessible to people without RL backgrounds. One thing that's not obvious from the way they use "environments" is that the goal is actually built into the environment via a reward function, so describing each environment as a "task" seems accurate.

A second non-obvious thing is that the body the agent uses is also defined as part of the environment, so that the agent only performs the abstract task of sending instructions to that body. A naive reading of Legg and Hutter's definition would interpret a stronger agent as being more intelligent. Adding "cognitive" I think rules this out, while also remaining true to the spirit of the original definition.

Curious if you still disagree, and if so why - I don't really see what you're pointing at with the Raven's Matrices example.