EA reading list: utilitarianism and consciousness 2020-08-07T19:32:02.050Z · score: 16 (8 votes)
EA reading list: other reading lists 2020-08-04T14:56:28.422Z · score: 13 (5 votes)
EA reading list: miscellaneous 2020-08-04T14:42:44.119Z · score: 27 (9 votes)
EA reading list: futurism and transhumanism 2020-08-04T14:29:52.883Z · score: 19 (8 votes)
EA reading list: Paul Christiano 2020-08-04T13:36:51.331Z · score: 22 (9 votes)
EA reading list: global development and mental health 2020-08-03T11:53:10.890Z · score: 15 (7 votes)
EA reading list: Scott Alexander 2020-08-03T11:46:17.315Z · score: 40 (16 votes)
EA reading list: replaceability and discounting 2020-08-03T10:10:54.968Z · score: 9 (4 votes)
EA reading list: longtermism and existential risks 2020-08-03T09:52:41.256Z · score: 33 (11 votes)
EA reading list: suffering-focused ethics 2020-08-03T09:40:38.142Z · score: 35 (17 votes)
EA reading list: EA motivations and psychology 2020-08-03T09:24:07.430Z · score: 28 (9 votes)
EA reading list: cluelessness and epistemic modesty 2020-08-03T09:23:44.124Z · score: 21 (6 votes)
EA reading list: population ethics, infinite ethics, anthropic ethics 2020-08-03T09:22:15.461Z · score: 25 (7 votes)
EA reading list: moral uncertainty, moral cooperation, and values spreading 2020-08-03T09:21:36.288Z · score: 13 (6 votes)
richard_ngo's Shortform 2020-06-13T10:46:26.847Z · score: 6 (1 votes)
What are the key ongoing debates in EA? 2020-03-08T16:12:34.683Z · score: 68 (36 votes)
Characterising utopia 2020-01-02T00:24:23.248Z · score: 33 (20 votes)
Technical AGI safety research outside AI 2019-10-18T15:02:20.718Z · score: 80 (35 votes)
Does any thorough discussion of moral parliaments exist? 2019-09-06T15:33:02.478Z · score: 36 (14 votes)
How much EA analysis of AI safety as a cause area exists? 2019-09-06T11:15:48.665Z · score: 77 (29 votes)
How do most utilitarians feel about "replacement" thought experiments? 2019-09-06T11:14:20.764Z · score: 20 (16 votes)
Why has poverty worldwide fallen so little in recent decades outside China? 2019-08-07T22:24:11.239Z · score: 23 (10 votes)
Which scientific discovery was most ahead of its time? 2019-05-16T12:28:54.437Z · score: 34 (14 votes)
Why doesn't the EA forum have curated posts or sequences? 2019-03-21T13:52:58.807Z · score: 35 (17 votes)
The career and the community 2019-03-21T12:35:23.073Z · score: 87 (48 votes)
Arguments for moral indefinability 2019-02-08T11:09:25.547Z · score: 31 (12 votes)
Disentangling arguments for the importance of AI safety 2019-01-23T14:58:27.881Z · score: 57 (33 votes)
How democracy ends: a review and reevaluation 2018-11-24T17:41:53.594Z · score: 24 (12 votes)
Some cruxes on impactful alternatives to AI policy work 2018-11-22T13:43:40.684Z · score: 25 (13 votes)


Comment by richard_ngo on EA reading list: population ethics, infinite ethics, anthropic ethics · 2020-08-26T08:37:15.707Z · score: 4 (2 votes) · EA · GW

Huh, seems like you're right. Apologies for the mistake.

Comment by richard_ngo on EA reading list: replaceability and discounting · 2020-08-26T06:55:56.240Z · score: 3 (2 votes) · EA · GW

Done, thanks!

Comment by richard_ngo on The case of the missing cause prioritisation research · 2020-08-23T07:57:25.238Z · score: 21 (7 votes) · EA · GW

Thanks for making this post, I think this sort of discussion is very important.

It seems to me (predictably given the introduction) that far and away the most valuable thing EA has done is the development of and promotion of cause prioritisation as a concept.

I disagree with this. Here's an alternative framing:

  • EA's big ethical ideas are 1) reviving strong, active, personal moral duties, 2) longtermism, 3) some practical implications of welfarism that academic philosophy has largely overlooked (e.g. the moral importance of wild animal suffering, mental health, simulated consciousnesses, etc).
  • I don't think EA has had many big empirical ideas (by which I mean ideas about how the world works, not just ideas involving experimentation and observation). We've adopted some views about AI from rationalists (imo without building on them much so far, although that's changing), some views about futurism from transhumanists, and some views about global development from economists. Of course there's a lot of people in those groups who are also EAs, but it doesn't feel like many of these ideas have been developed "under the banner of EA".

When I think about successes of "traditional" cause prioritisation within EA, I mostly think of things in the former category, e.g. the things I listed above as "practical implications of welfarism". But I think that longtermism in some sense screens off this type of cause prioritisation. For longtermists, surprising applications of ethical principles aren't as valuable, because by default we shouldn't expect them to influence humanity's trajectory, and because we're mainly using a maxipok strategy.

Instead, from a longtermist perspective, I expect that biggest breakthroughs in cause prioritisation will come from understanding the future better, and identifying levers of large-scale influence that others aren't already fighting over. AI safety would be the canonical example; the post on reducing the influence of malevolent actors is another good example. However, we should expect this to be significantly harder than the types of cause prioritisation I discussed above. Finding new ways to be altruistic is very neglected. But lots of people want to understand and control the future of the world, and it's not clear how distinct doing this selfishly is from doing this altruistically. Also, futurism is really hard.

So I think a sufficient solution to the case of the missing cause prioritisation research is: more EAs are longtermists than before, and longtermist cause prioritisation is much harder than other cause prioritisation, and doesn't play to EA's strengths as much. Although I do think it's possible, and I plan to put up a post on this soon.

Comment by richard_ngo on EA reading list: population ethics, infinite ethics, anthropic ethics · 2020-08-13T20:53:13.139Z · score: 4 (2 votes) · EA · GW

I don't think variable populations are a defining feature of population ethics - do you have a source for that? Sure, they're a feature of the repugnant conclusion, but there are plenty more relevant topics in the field. For example, one question discussed in population ethics is when a more equal population with lower total welfare is better than a less equal population with higher total welfare. And this example motivates differences between utilitarian and egalitarian views. So more generally, I'd say that population ethics is the study of how to compare the moral value of different populations.

Comment by richard_ngo on EA reading list: population ethics, infinite ethics, anthropic ethics · 2020-08-05T15:06:21.696Z · score: 7 (4 votes) · EA · GW

To me they seem basically the same topic: infinite ethics is the subset of population ethics dealing with infinitely large populations. Do you disagree with this characterisation?

More generally, this reading list isn't strictly separated by topic, but sometimes throws together different (thematically related) topics where the reading list for each topic is too small to warrant a separate page. E.g. that's why I've put global development and mental health on the same page (this may change if I get lots of good suggestions for content on either or both).

Comment by richard_ngo on EA reading list: longtermism and existential risks · 2020-08-04T13:16:23.502Z · score: 4 (2 votes) · EA · GW

Yes, good point. Added.

Comment by richard_ngo on Objections to Value-Alignment between Effective Altruists · 2020-07-19T21:34:40.527Z · score: 24 (11 votes) · EA · GW

Yepp, that all makes sense to me. Another thing we can do, that's distinct from changing the overall level of respect, is changing the norms around showing respect. For example, whenever people bring up the fact that person X believes Y, we could encourage them to instead say that person X believes Y because of Z, which makes the appeal to authority easier to argue against.

Comment by richard_ngo on Objections to Value-Alignment between Effective Altruists · 2020-07-17T16:14:08.319Z · score: 50 (28 votes) · EA · GW

The thing I agree with most is the idea that EA is too insular, and that we focus on value alignment too much (compared with excellence). More generally, networking with people outside EA has positive externalities (engaging more people with the movement) whereas networking with people inside EA is more likely to help you personally (since that allows you to get more of EA's resources). So the former is likely undervalued.

I think the "revered for their intellect" thing is evidence of a genuine problem in EA, namely that we pay more attention to intelligence than we should, compared with achievements. However, the mere fact of having very highly-respected individuals doesn't seem unusual; e.g. in other fields that I've been in (machine learning, philosophy) pioneers are treated with awe, and there are plenty of memes about them.

Members write articles about him in apparent awe and possibly jest

Definitely jest.

Comment by richard_ngo on Systemic change, global poverty eradication, and a career plan rethink: am I right? · 2020-07-16T13:06:34.577Z · score: 5 (3 votes) · EA · GW

Point taken, but I think the correlation in China is so much larger than the correlation between African countries (with respect to the things we're most interested in, like the effects of policies) that it's reasonable to look at the data with China excluded when trying to find a long-term trend in the global economy.

Comment by richard_ngo on Systemic change, global poverty eradication, and a career plan rethink: am I right? · 2020-07-15T19:50:13.748Z · score: 2 (1 votes) · EA · GW

Ooops, fixed.

Comment by richard_ngo on Systemic change, global poverty eradication, and a career plan rethink: am I right? · 2020-07-14T16:13:19.554Z · score: 8 (4 votes) · EA · GW

A while back I read this article by Hickel, which was based on the book. See this EA forum post which I made after reading it, and also the comments which I wrote down at the time:

  • The article was much better than I thought it would be based on the first few paragraphs.
  • Still a few dodgy bits though - e.g. it quotes FAO numbers on how many people are hungry, but neglects to mention that this is good progress (quote from FAO): "The proportion of undernourished people in the developing regions has fallen by almost half. One in in seven children worldwide are underweight, down from one in four in 1990."
  • I also tried to factcheck the claim that the specific number $1.90 is a bad metric to use, by reading the report he said was a "trenchant critique" of it. There was a lot of stuff about how a single summary statistic can be highly uncertain, but not much about why any other poverty line would be better.
  • Overall I do think it's fairly valuable to point out that there's been almost no movement of people above $7.40 if you exclude China. But the achievement of moving a lot of people above the $1.90 number shouldn't be understated.
  • This graph seems like the most important summary of the claims in the article. So basically, the history of poverty reduction was almost entirely Asia, the future of it (or lack thereof) will be almost entirely about Africa.
  • Note that the graph displays absolute numbers of people. The population of Africa went from 600M in 1990 to 1.2B today to 1.6B in 2030, apparently. So according to this graph there’s been a reduction in percentage of extreme poverty in Africa, and that’s projected to continue, but it’s at a far lower rate than Asia.
Comment by richard_ngo on richard_ngo's Shortform · 2020-07-13T08:51:31.667Z · score: 5 (3 votes) · EA · GW

One use case of the EA forum which we may not be focusing on enough:

There are some very influential people who are aware of and somewhat interested in EA. Suppose one of those people checks in on the EA forum every couple of months. Would they be able to find content which is interesting, relevant, and causes them to have a higher opinion of EA? Or if not, what other mechanisms might promote the best EA content to their attention?

The "Forum Favourites" partly plays this role, I guess. Although because it's forum regulars who are most likely to highly upvote posts, I wonder whether there's some divergence between what's most valuable for them and what's most valuable for infrequent browsers.

Comment by richard_ngo on How should we run the EA Forum Prize? · 2020-06-23T13:21:26.424Z · score: 19 (12 votes) · EA · GW

Personally I find the prize disproportionately motivating, in that it increases my desire to write EA forum content to a level beyond what I think I'd endorse if I reflected for longer.

Sorry if this is not very helpful; I imagine it's also not very representative.

Comment by richard_ngo on Should EA Buy Distribution Rights for Foundational Books? · 2020-06-17T12:52:08.595Z · score: 11 (8 votes) · EA · GW

As one data point, the Institute of Economic Affairs (which has had pretty major success in spreading its views) prints out many short books advocating its viewpoints and hands them out at student events. That certainly made me engage with their ideas significantly more, then give the books to my friends, etc. I think they may get economies of scale from having their own printing press, but it might be worth looking into how cheaply you can print out 80-page EA primers for widespread distribution.

Comment by richard_ngo on EA Forum feature suggestion thread · 2020-06-16T18:37:55.551Z · score: 3 (2 votes) · EA · GW

+1 on this, and on curated posts. (As also discussed here).

Comment by richard_ngo on Max_Daniel's Shortform · 2020-06-16T17:11:26.470Z · score: 2 (1 votes) · EA · GW

People tend to underestimate the importance of ideas, because it's hard to imagine what impact they will have without doing the work of coming up with them.

I'm also uncertain how impactful it is to find people who're good at generating ideas, because the best ones will probably become prominent regardless. But regardless of that, it seems to me like you've now agreed with the three points that the influential EA made. Those weren't comparative claims about where to invest marginal resources, but rather the absolute claim that it'd be very beneficial to have more talented people.

Then the additional claim I'd make is: some types of influence are very valuable and can only be gained by people who are sufficiently good at generating ideas. It'd be amazing to have another Stuart Russell, or someone in Stephen Pinker's position but more onboard with EA. But they both got there by making pioneering contributions in their respective fields. So when you talk about "accumulating AI-weighted influence", e.g. by persuading leading AI researchers to be EAs, that therefore involves gaining more talented members of EA.

Comment by richard_ngo on [Link] "Will He Go?" book review (Scott Aaronson) · 2020-06-15T21:08:46.441Z · score: 5 (3 votes) · EA · GW
Thanks for sharing the last link, which I think provides useful context (that Open Philanthropy's funder has a history of donating to partisan political campaigns).

Why is this context useful? It feels like this the relevance of this post should not be particularly tied to Dustin and Cari's donation choices.

the upshot of this post is effectively an argument that supporting Biden's campaign should be thought of as an EA cause area

Is "X should be thought of as an EA cause area" distinct from "X would be good"? More generally, I'd like the forum to be a place where we can share important ideas without needing to include calls to action.

On the other hand, I also endorse holding political posts to a more stringent standard, so that we don't all get sucked in.

Comment by richard_ngo on Max_Daniel's Shortform · 2020-06-15T16:55:48.401Z · score: 4 (2 votes) · EA · GW

Task X for which the claim seems most true for me is "coming up with novel and important ideas". This seems to be very heavy-tailed, and not very teachable.

I would also expect that, if I poked a bit at these claims, it would usually turn out that X is something like "contribute to this software project at the pace and quality level of our best engineers, w/o requiring any management time" or "convince some investors to give us much more money, but w/o anyone spending any time transferring relevant knowledge".

Neither of these feel like central examples of the type of thing EA needs most. Most of the variance of the impact of the software project will be in how good the idea is; same for most of the variance of the impact of getting funding.

Robin Hanson is someone who's good at generating novel and important ideas. Idk how he got that way, but I suspect it'd be very hard to design a curriculum to recreate that. Do you disagree?

Comment by richard_ngo on richard_ngo's Shortform · 2020-06-15T13:29:41.202Z · score: 5 (3 votes) · EA · GW

Then there's the question of how many fields it's actually important to have good research in. Broadly speaking, my perspective is: we care about the future; the future is going to be influenced by a lot of components; and so it's important to understand as many of those components as we can. Do we need longtermist sociologists? Hell yes! Then we can better understand how value drift might happen, and what to do about it. Longtermist historians to figure out how power structures will work, longtermist artists to inspire people - as many as we can get. Longtermist physicists - Anders can't figure out how to colonise the galaxy by himself.

If you're excited about something that poses a more concrete existential risk, then I'd still advise that as a priority. But my guess is that there's also a lot of low-hanging fruit for would-be futurists in other disciplines.

Comment by richard_ngo on richard_ngo's Shortform · 2020-06-15T13:19:26.934Z · score: 7 (4 votes) · EA · GW

Another related thing that isn't discussed enough is the immense difficulty of actually doing good research, especially in a pre-paradigmatic field. I've personally struggled to transition from engineer mindset, where you're just trying to build a thing that works (and you'll know when it does), to scientist mindset, where you need to understand the complex ways in which many different variables affect your results.

This isn't to say that only geniuses make important advances, though - hard work and persistence go a long way. As a corollary, if you're in a field where hard work doesn't feel like work, then you have a huge advantage. And it's also good for building a healthy EA community if even people who don't manage to have a big impact are still excited about their careers. So that's why I personally place a fairly high emphasis on passion when giving career advice (unless I'm talking to someone with exceptional focus and determination).

Comment by richard_ngo on richard_ngo's Shortform · 2020-06-13T10:46:27.161Z · score: 36 (16 votes) · EA · GW

I'm leaning towards the view that "don't follow your passion" and "try do really high-leverage intellectual work" are both good pieces of advice in isolation, but that they work badly in combination. I suspect that there are very few people doing world-class research who aren't deeply passionate about it, and also that EA needs world-class research in more fields than it may often seem.

Comment by richard_ngo on Why might one value animals far less than humans? · 2020-06-11T18:28:21.832Z · score: 4 (2 votes) · EA · GW
Would you say the discrepancy between preferences and hedonism is because humans can (and do) achieve much greater highs than nonhuman animals under preferences, but human and nonhuman lows aren't so different?

Something like that. Maybe the key idea here is my ranking of possible lives:

  • Amazing hedonic state + all personal preferences satisfied >> amazing hedonic state.
  • Terrible hedonic state ≈ terrible hedonic state + all personal preferences violated.

In other words, if I imagine myself suffering enough hedonically I don't really care about any other preferences I have about my life any more by comparison. Whereas that isn't true for feelings of bliss.

I imagine things being more symmetrical for animals, I guess because I don't consider their preferences to be as complex or core to their identities.

Comment by richard_ngo on Why might one value animals far less than humans? · 2020-06-08T13:07:22.693Z · score: 8 (5 votes) · EA · GW

Insofar as I value conscious experiences purely by virtue of their valence (i.e. positivity or negativity), I value animals not too much less than humans (discounted to the extent I suspect that they're "less conscious" or "less capable of feeling highly positive states", which I'm still quite uncertain about).

Insofar as I value preference fulfilment in general, I value humans significantly more than animals (because human preferences are stronger and more complex than animals') but not overwhelmingly so, because animals have strong and reasonably consistent preferences too.

Insofar as I value specific types of conscious experiences and preference fulfilment, such as "reciprocated romantic love" or "achieving one's overarching life goals", then I value humans far more than animals (and would probably value posthumans significantly more than humans).

I don't think there are knock-down arguments in favour of any of these approaches, and so I usually try to balance all of these considerations. Broadly speaking, I do this by prioritising hedonic components when I think about preventing disvalue, and by prioritising the other components when I think about creating value.

Comment by richard_ngo on Moral Anti-Realism Sequence #2: Why Realists and Anti-Realists Disagree · 2020-06-06T09:30:32.231Z · score: 3 (2 votes) · EA · GW

Cool, glad we're on the same page. The following is a fairly minor point, but thought it might still be worth clarifying.

"You could switch back and forth between two ways of interpreting the realist's moral claims."

I guess that, while in principle this makes sense, in practice language is defined on a community level, and so it's just asking for confusion to hold this position. In particular, ethics is not cleanly separable from meta-ethics, and so I can't always reinterpret a realist's argument in a pragmatic way without losing something. But if realists use 'morality' to always implicitly mean 'objective morality', then I don't know when they're relying on the 'objective' bit in their arguments. That seems bad.

The alternative is to agree on a "lowest common denominator" definition of morality, and expect people who are relying on its objectiveness or subjectivity to explicitly flag that. As an analogy, imagine that person A thinks we live in a simulation, and person B doesn't, and person B tries to define "cats" so that their definition includes the criterion "physically implemented in the real world, not just in a simulation". In which case person A believes that no cats exist, in that sense.

I think the correct response from A is to say "No, you're making a power grab for common linguistic territory, which I don't accept. We should define 'cats' in a way that doesn't make it a vacuous concept for many members of our epistemic community. So I won't define cats as 'simulated beings' and you won't define them as 'physical beings', and if one of your arguments about cats relies on this distinction, then you should make that explicit."

This post is (as usual) relevant:

I could equivalently describe the above position as: "when your conception of something looks like Network 2, but not everyone agrees, then your definitions should look like Network 1."

Comment by richard_ngo on Moral Anti-Realism Sequence #2: Why Realists and Anti-Realists Disagree · 2020-06-05T20:56:49.632Z · score: 6 (4 votes) · EA · GW
The version of anti-realism I’m arguing for in this sequence is a blend of error theory and non-objectivism. It seems to me that any anti-realist has to endorse error theory (in some sense at least) because realists exist, and it would be uncharitable not to interpret their claims in the realist fashion. However, the non-objectivist perspective seems importantly correct as well

I think we probably have very similar views, but I am less of a fan of error theory. What might it look like to endorse error theory as an anti-realist? Well, as an anti-realist I think that my claims about morality are perfectly reasonable and often true, since I intend them to be speaker-dependent. It's just the moral realists whose claims are in error. So that leads to the bizarre situation where I can have a conversation about object-level morality with a moral realist, and we might even change each other's minds, but throughout the whole conversation I'm evaluating every statement he says as trivially incorrect. This seems untenable.

Even anti-realists can adopt the notion of “moral facts,” provided that we think of them as facts about a non-objective (speaker-dependent) reality, instead of facts about a speaker-independent (objective) one.

Again, I expect we mostly agree here, but the phrase "facts about a non-objective (speaker-dependent) reality" feels potentially confusing to me. Would you consider it equivalent to say that anti-realists can think about moral facts as facts about the implications of certain evaluation criteria? From this perspective, when we make moral claims, we're implicitly endorsing a set of evaluation criteria (making this position somewhere in the middle of cognitivism and non-cognitivism).

I've fleshed out this position a little more in this post on "a pragmatic approach to interpreting moral claims".

Comment by richard_ngo on Some thoughts on deference and inside-view models · 2020-06-01T14:06:54.885Z · score: 9 (6 votes) · EA · GW

My broader point is something like: in a discussion about deference and skepticism, it feels odd to only discuss deference to other EAs. By conflating "EA experts" and "people with good opinions", you're missing an important dimension of variation (specifically, the difference between a community-centred outside view and a broader outside view).

Apologies for phrasing the original comment as a "gotcha" rebuttal rather than trying to distill a more constructive criticism.

Comment by richard_ngo on Some thoughts on deference and inside-view models · 2020-05-29T00:12:01.346Z · score: 17 (6 votes) · EA · GW

I think one clear disanalogy with startups is that eventually startups are judged by reality. Whereas we aren't, because doing good and getting more money are not that strongly correlated. By just eating the risk of being wrong about something, the worst case is not failing, like it is for a startup, but rather sucking up all the resources into the wrong thing.

Also, small point, but I don't think Bayesian decision theory is particularly important for EA.

Anyway, maybe eventually this might be worth considering, but as it is we've done several orders of magnitude too little analysis to start conceding.

Comment by richard_ngo on Some thoughts on deference and inside-view models · 2020-05-28T09:34:32.877Z · score: 26 (13 votes) · EA · GW

"I think that it’s potentially very bad that young EAs don’t practice skeptical independent thinking as much (if this is indeed true)."

I agree that this is potentially very bad, but also perhaps difficult to avoid as EA professionalises, because you start needing more background and technical knowledge to weigh in on ongoing debates. Analogous to what happened in science.

On the other hand, we're literally interested the whole future, about which we currently know almost nothing. So there must be space for new ideas. I guess the problem is that, while "skeptical thinking" about received wisdom is hard, it's still easier than generative thinking (i.e. coming up with new questions). The problem with EA futurism is not so much that we believe a lot of incorrect statements, but that we haven't yet thought of most of the relevant concepts. So it may be particularly valuable for people who've thought about longtermism a bunch to make public even tentative or wacky ideas, in order to provide more surface area for others to cultivate skeptical thinking and advance the state of our knowledge. (As Buck has in fact done:

Example 1: a while back there was a post on why animal welfare is an important longtermist priority, and iirc Rob Wiblin replied saying something like "But we'll have uploaded by then so it won't be a big deal." I don't think that this argument has been made much in the EA context - which makes it both ripe for skeptical independent thinking, but also much less visible as a hypothesis that it's possible to disagree with.

Example 2: there's just not very much discussion in EA about what actual utopias might look like. Maybe that's because, to utilitarians, it's just hedonium. Or because we're punting it to the long reflection. But this seems like a very important topic to think about! I'm hoping that if this discussion gets kickstarted, there'll be a lot of room for people to disagree and come up with novel ideas. Related: a bunch of claims I've made about utopia.

I'm reminded of Robin Hanson's advice to young EAs: "Study the future. ... Go actually generate scenarios, explore them, tell us what you found. What are the things that could go wrong there? What are the opportunities? What are the uncertainties? ... The world needs more futurists."

See also:

Comment by richard_ngo on Some thoughts on deference and inside-view models · 2020-05-28T08:43:49.197Z · score: 2 (1 votes) · EA · GW

"When you're thinking about real life I often think that it's better to come to conclusions based on weighing a large number of arguments, rather than trying to make one complete calculation of your conclusion"

I'm a little confused about this distinction. The process of weighing a large number of arguments IS a calculation of your conclusion, that's complete insofar as you've weighed all the relevant arguments. Perhaps you mean something like "A complete calculation that mainly relies on only a few premises"? But in this case I'd say the main advantage of the EA mindset is in fact that it makes people more willing to change their careers in response to a few fundamental premises. I think most AI safety researchers, for instance, have (or should have) a few clear cruxes about why they're in the field, whereas most AI researchers don't. Or perhaps you're just warning us not to think that we can make arguments about reality that are as conclusive as mathematical arguments?

Comment by richard_ngo on Some thoughts on deference and inside-view models · 2020-05-28T08:33:55.671Z · score: 33 (22 votes) · EA · GW

"EAs generally have better opinions when they've been around EA longer"

Except on the issues that EAs are systematically wrong about, where they will tend to have worse opinions. Which we won't notice because we also share those opinions. For example, if AMF is actually worse than standard aid programs at reducing global poverty, or if AI risk is actually not a big deal, then time spent in EA is correlated with worse opinions on these topics.

Comment by richard_ngo on Aligning Recommender Systems as Cause Area · 2020-05-26T02:03:57.724Z · score: 11 (4 votes) · EA · GW

I'm not sure about users definitely preferring the existing recommendations to random ones - I actually have been trying to turn off YouTube recommendations because they make me spend more time on YouTube than I want. Meanwhile other recommendation systems send me news that is worse on average than the rest of the news I consume (from different channels). So in some cases at least, we could use a very minimal standard of: a system is aligned if the user better off because the recommendation system exists at all.

This is a pretty blunt metric, and probably we want something more nuanced, but at least to start off with it'd be interesting to think about how to improve whichever recommender systems are currently not aligned.

Comment by richard_ngo on Critical Review of 'The Precipice': A Reassessment of the Risks of AI and Pandemics · 2020-05-18T23:59:56.948Z · score: 10 (3 votes) · EA · GW

A few more meta points:

  • I'm very surprised that we're six levels deep into a disagreement and still actively confused about each other's arguments. I thought our opinions were much more similar. This suggests that we should schedule a time to talk in person, and/or an adversarial collaboration trying to write a version of the argument that you're thinking of. (The latter might be more efficient than this exchange, while also producing useful public records).
  • Thanks for the thorough + high-quality engagement, I really appreciate it.
  • Due to time constraints I'll just try hit two key points in this reply (even though I don't think your responses resolved any of the other points for me, which I'm still very surprised by).

If you replace "perfect optimization" with "significantly-better-than-human optimization" in all of my claims, I'd continue to agree with them.

We are already at significantly-better-than-human optimisation, because none of us can take an environment and output a neural network that does well in that environment, but stochastic gradient descent can. We could make SGD many many times better and it still wouldn't produce a malicious superintelligence when trained on CIFAR, because there just isn't any gradient pushing it in the direction of intelligence; it'll train an agent to memorise the dataset far before that. And if the path to tampering is a few dozen steps long, the optimiser won't find it before the heat death of the universe (because the agent has no concept of tampering to work from, all it knows is CIFAR). So when we're talking about not-literally-perfect optimisers, you definitely need more than just amazing optimisation and hard-coded objective functions for trouble to occur - you also need lots of information about the world, maybe a bunch of interaction with it, maybe a curriculum. This is where the meat of the argument is, to me.

I think spreading the argument "if we don't do X, then we are in trouble because of problem Y" seems better. ... The former is easier to understand and more likely to be true / correctly reasoned.

I previously said:

I'm still not sure what the value of a "default assumption" is if it's not predictive, though.

And I still have this confusion. It doesn't matter if the argument is true and easy to understand if it's not action-guiding for anyone. Compare the argument: "if we (=humanity) don't remember to eat food in 2021, then everyone will die". Almost certainly true. Very easy to understand. Totally skips the key issue, which is why we should assign high enough probability to this specific hypothetical to bother worrying about it.

So then I guess your response is something like "But everyone forgetting to eat food is a crazy scenario, whereas the naive extrapolation of the thing we're currently doing is the default scenario". (Also, sorry if this dialogue format is annoying, I found it an easy way to organise my thoughts, but I appreciate that it run the risk of strawmanning you).

To which I respond: there are many ways of naively extrapolating "the thing we are currently doing". For example, the thing we're currently doing is building AI with a 100% success record at not taking over the world. So my naive extrapolation says we'll definitely be fine. Why should I pay any attention to your naive extrapolation?

I then picture you saying: "I'm not using these extrapolations to make probabilistic predictions, so I don't need to argue that mine is more relevant than yours. I'm merely saying: once our optimisers get really really good, if we give them a hard-coded objective function, things will go badly. Therefore we, as humanity, should do {the set of things which will not lead to really good optimisers training on hard-coded objective functions}."

To which I firstly say: no, I don't buy the claim that once our optimisers get really really good, if we give them a hard-coded objective function, "an existential catastrophe almost certainly happens". For reasons which I described above.

Secondly, even if I do accept your claim, I think I could just point out: "You've defined what we should do in terms of its outcomes, but in an explicitly non-probabilistic way. So if the entire ML community hears your argument, agrees with it, and then commits to doing exactly what they were already doing for the next fifty years, you have no grounds to complain, because you have not actually made any probabilistic claims about whether "exactly what they were already doing for the next fifty years" will lead to catastrophe." So again, why is this argument worth making?

Man, this last point felt really nitpicky, but I don't know how else to convey my intuitive feeling that there's some sort of motte and bailey happening in your argument. Again, let's discuss this higher-bandwidth.

Comment by richard_ngo on Critical Review of 'The Precipice': A Reassessment of the Risks of AI and Pandemics · 2020-05-18T03:45:42.244Z · score: 24 (6 votes) · EA · GW

If you use a perfect optimizer and train in the real world with what you would intuitively call a "certain specification", an existential catastrophe almost certainly happens. Given agreement on this fact, I'm just saying that I want a better argument for safety than "it's fine because we have a less-than-perfect optimizer"

I think this is the central point of disagreement. I agree that perfect optimisers are pathological. But we are not going to train anything that is within light-years of perfect optimisation. Perfect optimisation is a totally different type of thing to what we're doing. This argument feels to me like saying "We shouldn't keep building bigger and bigger bombs because in the limit of size they'll form a black hole and destroy the Earth." It may be true that building sufficiently big bombs will destroy the earth, but the mechanism in the limit of size is not the relevant one, and is only very loosely analogous to the mechanism we're actually worried about. (In the case of AI, to be very explicit, I'm saying that inner misalignment is the thing which might kill us, and that outer misalignment of perfect optimizers is the thing that's only very loosely analogous to it. Outer misalignment of imperfect optimisers is somewhere in the middle).

The rest of this comment is more meta.

The reason I am particularly concerned about spreading arguments related to perfect optimisers is threefold. Firstly because it feels reminiscent of the utility-maximisation arguments made by Yudkowsky - in both cases the arguments are based on theoretical claims which are literally true but in practice irrelevant or vacuous. This is specifically what made the utility-maximisation argument so misleading, and why I don't want another argument of this type to gain traction.

Secondly because I think that five years ago, if you'd asked a top ML researcher why they didn't believe in the existing arguments for AI risk, they'd have said something like:

Well, the utility function thing is a trivial mathematical result. And the argument about paperclips is dumb because the way we train AIs is by giving them rewards when they do things we like, and we're not going to give them arbitrarily high rewards for building arbitrarily many paperclips. What if we write down the wrong specification? Well, we do that in RL but in supervised learning we use human-labeled data, so if there's any issue with written specifications we can use that approach.

I think that these arguments would have been correct rebuttals to the public arguments for AI risk which existed at that time. We may have an object-level disagreement about whether a top ML researcher would actually have said something like this, but I am now strongly inclined to give the benefit of the doubt to mainstream ML researchers when I try to understand their positions. In particular, if I were in their epistemic position, I'm not sure I would make specific arguments for why the "intends" bit will be easy either, because it's just the default hypothesis: we train things, then if they don't do what we want, we train them better.

Thirdly, because I am epistemically paranoid about giving arguments which aren't actually the main reason to believe in a thing. I agree that the post I linked is super speculative, but if someone disproved the core intuitions that the post is based on that'd make a huge dent in my estimates of AI risk. Whereas I suspect that the same is not really the case for you and the argument you give (although I feel a bit weird asserting things about your beliefs, so I'm happy to concede this point if you disagree). Firstly because (even disregarding my other objections) it doesn't establish that AI safety work needs to be done by someone, it just establishes that AI researchers have to avoid naively extrapolating their current work. Maybe they could extrapolate it in non-naive ways that doesn't look anything like safety work. "Don't continue on the naively extrapolated path" is often a really low bar, because naive extrapolations can be very dubious (if we naively extrapolate a baby's growth, it'll end up the size of the earth pretty quickly). Secondly because the argument is also true for image classifiers, since under perfect optimisation they could hack their loss functions. Insofar as we're much less worried about them than RL agents, most of the work needed to establish the danger of the latter must be done by some other argument. Thirdly because I do think that counterfactual impact is the important bit, not "AI safety work needs to be done by someone." I don't think there needs to be a robust demonstration that the problem won't be solved by default, but there do need to be some nontrivial arguments. In my scenario, one such argument is that we won't know what effects our labels will have on the agent's learned goals, so there's no easy way to pay more to get more safety. Other arguments that fill this role are appeals to fast takeoff, competitive pressures, etc.

I specifically said this was not a prediction for this reason

I didn't read this bit carefully enough, mea culpa. I'm still not sure what the value of a "default assumption" is if it's not predictive, though.

(We in that statement was meant to refer to humanity as a whole.)

I also didn't pick up on the we = humanity thing, sorry. Makes more sense now.

Comment by richard_ngo on Critical Review of 'The Precipice': A Reassessment of the Risks of AI and Pandemics · 2020-05-16T12:03:29.829Z · score: 17 (5 votes) · EA · GW
1. The stated goal of AI research would very likely lead to human extinction

I disagree pretty strongly with this. What does it even mean for a whole field to have a "stated goal"? Who stated it? Russell says in his book that "From the very beginnings of AI, intelligence in machines has been defined in the same way", but then a) doesn't give any citations or references to the definition he uses (I can't find the quoted definition online from before his book); and b) doesn't establish that building "intelligent machines" is the only goal of the field of AI. In fact there are lots of AI researchers concerned with fairness, accountability, transparency, and so on - not just intelligence. Insofar as those researchers aren't concerned about existential risk from AI, it's because they don't think it'll happen, not because they think it's somehow outside their remit.

Now in practice, a lot of AI researcher time is spent trying to make things that better optimise objective functions. But that's because this has been the hardest part so far - specification problems have just not been a big issue in such limited domains (and insofar as they are, that's what all the FATE researchers are working on). So this observed fact doesn't help us distinguish between "everyone in AI thinks that making AIs which intend to do what we want is an integral part of their mission, but that the 'intend' bit will be easy" vs "everyone in AI is just trying to build machines that can achieve hardcoded literal objectives even if it's very difficult to hardcode what we actually want". And without distinguishing them, then the "stated goal of AI" has no predictive power (if it even exists).

We'll continue to give certain specifications of what we want

What is a "certain specification"? Is training an AI to follow instructions, giving it strong negative rewards every time it misinterprets us, then telling it to do X, a "certain specification" of X? I just don't think this concept makes sense in modern ML, because it's the optimiser, not the AI, that is given the specification. There may be something to the general idea regardless, but it needs a lot more fleshing out, in a way that I don't think anyone has done.

More constructively, I just put this post online. It's far from comprehensive, but it points at what I'm concerned about more specifically than anything else.

Comment by richard_ngo on Critical Review of 'The Precipice': A Reassessment of the Risks of AI and Pandemics · 2020-05-14T21:19:05.792Z · score: 15 (6 votes) · EA · GW
at present they represent deep theoretical limitations of current methods

+1 on disagreeing with this. It's not clear that there's enough deep theory of current methods for them to have deep theoretical limitations :P

More generally, I broadly agree with Rohin, but (as I think we've discussed) find this argument pretty dubious:

Almost every AI system we've created so far (not just deep RL systems) have some predefined, hardcoded, certain specification that the AI is trying to optimize for.
A superintelligent agent pursuing a known specification has convergent instrumental subgoals (the thing that Toby is worried about).
Therefore, if we want superintelligent AI systems that don't have these problems, we need to change how AI is done.

Convergent instrumental subgoals aren't the problem. Large-scale misaligned goals (instrumental or not) are the problem. Whether or not a predefined specification gives rise to those sorts of goals depends on the AI architecture and training process in a complicated way. Once you describe in more detail what it actually means for an AI system to "have some specification", the "certain" bit also stop seeming like a problem.

I'd like to refer to a better argument here, but unfortunately there is no source online that makes the case that AGI will be dangerous in a satisfactory way. I think there are enough pieces floating around in people's heads/private notes to make a compelling argument, but the fact that they haven't been collated publicly is a clear failure of the field.

Comment by richard_ngo on The Case for Impact Purchase | Part 1 · 2020-04-21T16:26:27.377Z · score: 9 (7 votes) · EA · GW

Impact purchases are one way of creating more impact finance. In particular, they can make it worthwhile for non-altruistic financiers to fund altruistic projects. This is particularly beneficial in cases where it's hard for a single altruist to evaluate all the people who want funding.

With regard to (b), the incentives for impact purchasers are roughly similar to the incentives of someone who's announced a prize. In both cases, the payer create incentives for others to do the work that will lead to payouts.

Comment by richard_ngo on Some thoughts on Toby Ord’s existential risk estimates · 2020-04-07T13:47:39.799Z · score: 26 (13 votes) · EA · GW

What is the significance of the people on the ISS? Are you suggesting that six people could repopulate the human species? And what sort of disaster takes less time than a flight, and only kills people on the ground?

Also, I expect to see small engineered pandemics, but only after effective genetic engineering is widespread. So the fact that we haven't seen any so far is not much evidence.

Comment by richard_ngo on Launching An Introductory Online Textbook on Utilitarianism · 2020-04-01T10:10:01.965Z · score: 24 (12 votes) · EA · GW

I'd be more excited about seeing some coverage of suffering-focused ethics in general, rather than NU specifically. I think NU is a fairly extreme position, but the idea that suffering is the dominant component of the expected utility of the future is both consistent with standard utilitarian positions, and also captures the key point that most EA NU thinkers are making.

Comment by richard_ngo on What are some 1:1 meetings you'd like to arrange, and how can people find you? · 2020-03-18T14:49:49.022Z · score: 8 (6 votes) · EA · GW

Who are you?

I'm Richard. I'm a research engineer on the AI safety team at DeepMind.

What are some things people can talk to you about? (e.g. your areas of experience/expertise)

AI safety, particularly high-level questions about what the problems are and how we should address them. Also machine learning more generally, particularly deep reinforcement learning. Also careers in AI safety.

I've been thinking a lot about futurism in general lately. Longtermism assumes large-scale sci-fi futures, but I don't think there's been much serious investigation into what they might look like, so I'm keen to get better discussion going (this post was an early step in that direction).

What are things you'd like to talk to other people about? (e.g. things you want to learn)

I'm interested in learning about evolutionary biology, especially the evolution of morality. Also the neuroscience of motivation and goals.

I'd be interested in learning more about mainstream philosophical views on agency and desire. I'd also be very interested in collaborating with philosophers who want to do this type of work, directed at improving our understanding of AI safety.

How can people get in touch with you?

Here, or email: ngor [at]

Comment by richard_ngo on AMA: Toby Ord, author of "The Precipice" and co-founder of the EA movement · 2020-03-17T17:26:59.244Z · score: 31 (14 votes) · EA · GW

What would convince you that preventing s-risks is a bigger priority than preventing x-risks?

Suppose that humanity unified to pursue a common goal, and you faced a gamble where that goal would be the most morally valuable goal with probability p, and the most morally disvaluable goal with probability 1-p. Given your current beliefs about those goals, at what value of p would you prefer this gamble over extinction?

Comment by richard_ngo on AMA: Toby Ord, author of "The Precipice" and co-founder of the EA movement · 2020-03-17T17:10:59.936Z · score: 8 (2 votes) · EA · GW

We have a lot of philosophers and philosophically-minded people in EA, but only a tiny number of them are working on philosophical issues related to AI safety. Yet from my perspective as an AI safety researcher, it feels like there are some crucial questions which we need good philosophy to answer (many listed here; I'm particularly thinking about philosophy of mind and agency as applied to AI, a la Dennett). How do you think this funnel could be improved?

Comment by richard_ngo on AMA: Toby Ord, author of "The Precipice" and co-founder of the EA movement · 2020-03-17T17:06:14.110Z · score: 10 (7 votes) · EA · GW

If you could convince a dozen of the world's best philosophers (who aren't already doing EA-aligned research) to work on topics of your choice, which questions would you ask them to investigate?

Comment by richard_ngo on AMA: Toby Ord, author of "The Precipice" and co-founder of the EA movement · 2020-03-17T16:12:28.572Z · score: 17 (10 votes) · EA · GW

If you could only convey one idea from your new book to people who are already heavily involved in longtermism, what would it be?

Comment by richard_ngo on What are the key ongoing debates in EA? · 2020-03-15T15:08:51.407Z · score: 45 (13 votes) · EA · GW

Thanks for the list! As a follow-up, I'll try list places online where such debates have occurred for each entry:


2. Toby Ord has estimates in The Precipice. I assume most discussion occurs on specific risks.

3. Lots of discussion on this; summary here: . Also more recently

4. Best discussion of this is probably here:

5. Most stuff on addresses s-risks. In terms of pushback, Carl Shulman wrote and Toby Ord wrote (although I don't find either compelling). Also a lot of Simon Knutsson's stuff, e.g.

6a. ,

6b. ,


7. Nothing particularly comes to mind, although I assume there's stuff out there.


9. E.g. here, which also links to more discussions:

Comment by richard_ngo on Harsanyi's simple “proof” of utilitarianism · 2020-02-22T16:26:04.963Z · score: 12 (6 votes) · EA · GW
Because we are indifferent between who has the 2 and who has the 0

Perhaps I'm missing something, but where does this claim come from? It doesn't seem to follow from the three starting assumptions.

Comment by richard_ngo on Announcing the 2019-20 Donor Lottery · 2019-12-03T10:13:29.606Z · score: 12 (5 votes) · EA · GW
2018-19: a $100,000 lottery (no winners)

What happens to the money in this case?

Comment by richard_ngo on I'm Buck Shlegeris, I do research and outreach at MIRI, AMA · 2019-11-22T15:29:28.310Z · score: 4 (3 votes) · EA · GW
I think that they might have been better off if they'd instead spent their effort trying to become really good at ML in the hope of being better skilled up with the goal of working on AI safety later.

I'm broadly sympathetic to this, but I also want to note that there are some research directions in mainstream ML which do seem significantly more valuable than average. For example, I'm pretty excited about people getting really good at interpretability, so that they have an intuitive understanding of what's actually going on inside our models (particularly RL agents), even if they have no specific plans about how to apply this to safety.

Comment by richard_ngo on AI safety scholarships look worth-funding (if other funding is sane) · 2019-11-20T20:05:27.475Z · score: 3 (5 votes) · EA · GW
Students able to bring funding would be best-equipped to negotiate the best possible supervision from the best possible school with the greatest possible research freedom.

This seems like the key premise, but I'm pretty uncertain about how much freedom this sort of scholarship would actually buy, especially in the US (people who've done PhDs in ML please comment!) My understanding is that it's rare for good candidates to not get funding; and also that, even with funding, it's usually important to work on something your supervisor is excited about, in order to get more support.

In most of the examples you give (with the possible exceptions of the FHI and GPI scholarships) buying research freedom for PhD students doesn't seem to be the main benefit. In particular:

OpenPhil has its fellowship for AI researchers who happen to be highly prestigious

This might be mostly trying to buy prestige for safety.

and has funded a couple of masters students on a one-off basis.
FHI has its... RSP, which funds early-career EAs with slight supervision.
Paul even made grants to independent researchers for a while.

All of these groups are less likely to have other sources of funding compared with PhD students.

Having said all that, it does seem plausible that giving money to safety PhDs is very valuable, in particular via the mechanism of freeing up more of their time (e.g. if they can then afford shorter commutes, outsourcing of time-consuming tasks, etc).

Comment by richard_ngo on I'm Buck Shlegeris, I do research and outreach at MIRI, AMA · 2019-11-20T15:19:40.953Z · score: 12 (5 votes) · EA · GW
On a meta note: Different people who work on AI alignment have radically different pictures of what the development of AI will look like, what the alignment problem is, and what solutions might look like.

+1, this is the thing that surprised me most when I got into the field. I think helping increase common knowledge and agreement on the big picture of safety should be a major priority for people in the field (and it's something I'm putting a lot of effort into, so send me an email at if you want to discuss this).

I think the ideas described in the paper Risks from Learned Optimization are extremely important.

Also +1 on this.

Comment by richard_ngo on I'm Buck Shlegeris, I do research and outreach at MIRI, AMA · 2019-11-20T15:15:45.001Z · score: 13 (4 votes) · EA · GW
If I thought there was a <30% chance of AGI within 50 years, I'd probably not be working on AI safety.
I expect the world to change pretty radically over the next 100 years.

I find these statements surprising, and would be keen to hear more about this from you. I suppose that the latter goes a long way towards explaining the former. Personally, there are few technologies that I think are likely to radically change the world within the next 100 years (assuming that your definition of radical is similar to mine). Maybe the only ones that would really qualify are bioengineering and nanotech. Even in those fields, though, I expect the pace of change to be fairly slow if AI isn't heavily involved.

(For reference, while I assign more than 30% credence to AGI within 50 years, it's not that much more).