richard_ngo's Shortform 2020-06-13T10:46:26.847Z · score: 6 (1 votes)
What are the key ongoing debates in EA? 2020-03-08T16:12:34.683Z · score: 68 (36 votes)
Characterising utopia 2020-01-02T00:24:23.248Z · score: 30 (17 votes)
Technical AGI safety research outside AI 2019-10-18T15:02:20.718Z · score: 78 (33 votes)
Does any thorough discussion of moral parliaments exist? 2019-09-06T15:33:02.478Z · score: 36 (14 votes)
How much EA analysis of AI safety as a cause area exists? 2019-09-06T11:15:48.665Z · score: 76 (28 votes)
How do most utilitarians feel about "replacement" thought experiments? 2019-09-06T11:14:20.764Z · score: 19 (15 votes)
Why has poverty worldwide fallen so little in recent decades outside China? 2019-08-07T22:24:11.239Z · score: 23 (10 votes)
Which scientific discovery was most ahead of its time? 2019-05-16T12:28:54.437Z · score: 34 (14 votes)
Why doesn't the EA forum have curated posts or sequences? 2019-03-21T13:52:58.807Z · score: 35 (17 votes)
The career and the community 2019-03-21T12:35:23.073Z · score: 83 (45 votes)
Arguments for moral indefinability 2019-02-08T11:09:25.547Z · score: 31 (12 votes)
Disentangling arguments for the importance of AI safety 2019-01-23T14:58:27.881Z · score: 56 (32 votes)
How democracy ends: a review and reevaluation 2018-11-24T17:41:53.594Z · score: 24 (12 votes)
Some cruxes on impactful alternatives to AI policy work 2018-11-22T13:43:40.684Z · score: 25 (13 votes)


Comment by richard_ngo on How should we run the EA Forum Prize? · 2020-06-23T13:21:26.424Z · score: 14 (9 votes) · EA · GW

Personally I find the prize disproportionately motivating, in that it increases my desire to write EA forum content to a level beyond what I think I'd endorse if I reflected for longer.

Sorry if this is not very helpful; I imagine it's also not very representative.

Comment by richard_ngo on Should EA Buy Distribution Rights for Foundational Books? · 2020-06-17T12:52:08.595Z · score: 11 (8 votes) · EA · GW

As one data point, the Institute of Economic Affairs (which has had pretty major success in spreading its views) prints out many short books advocating its viewpoints and hands them out at student events. That certainly made me engage with their ideas significantly more, then give the books to my friends, etc. I think they may get economies of scale from having their own printing press, but it might be worth looking into how cheaply you can print out 80-page EA primers for widespread distribution.

Comment by richard_ngo on EA Forum feature suggestion thread · 2020-06-16T18:37:55.551Z · score: 3 (2 votes) · EA · GW

+1 on this, and on curated posts. (As also discussed here).

Comment by richard_ngo on Max_Daniel's Shortform · 2020-06-16T17:11:26.470Z · score: 2 (1 votes) · EA · GW

People tend to underestimate the importance of ideas, because it's hard to imagine what impact they will have without doing the work of coming up with them.

I'm also uncertain how impactful it is to find people who're good at generating ideas, because the best ones will probably become prominent regardless. But regardless of that, it seems to me like you've now agreed with the three points that the influential EA made. Those weren't comparative claims about where to invest marginal resources, but rather the absolute claim that it'd be very beneficial to have more talented people.

Then the additional claim I'd make is: some types of influence are very valuable and can only be gained by people who are sufficiently good at generating ideas. It'd be amazing to have another Stuart Russell, or someone in Stephen Pinker's position but more onboard with EA. But they both got there by making pioneering contributions in their respective fields. So when you talk about "accumulating AI-weighted influence", e.g. by persuading leading AI researchers to be EAs, that therefore involves gaining more talented members of EA.

Comment by richard_ngo on [Link] "Will He Go?" book review (Scott Aaronson) · 2020-06-15T21:08:46.441Z · score: 5 (3 votes) · EA · GW
Thanks for sharing the last link, which I think provides useful context (that Open Philanthropy's funder has a history of donating to partisan political campaigns).

Why is this context useful? It feels like this the relevance of this post should not be particularly tied to Dustin and Cari's donation choices.

the upshot of this post is effectively an argument that supporting Biden's campaign should be thought of as an EA cause area

Is "X should be thought of as an EA cause area" distinct from "X would be good"? More generally, I'd like the forum to be a place where we can share important ideas without needing to include calls to action.

On the other hand, I also endorse holding political posts to a more stringent standard, so that we don't all get sucked in.

Comment by richard_ngo on Max_Daniel's Shortform · 2020-06-15T16:55:48.401Z · score: 4 (2 votes) · EA · GW

Task X for which the claim seems most true for me is "coming up with novel and important ideas". This seems to be very heavy-tailed, and not very teachable.

I would also expect that, if I poked a bit at these claims, it would usually turn out that X is something like "contribute to this software project at the pace and quality level of our best engineers, w/o requiring any management time" or "convince some investors to give us much more money, but w/o anyone spending any time transferring relevant knowledge".

Neither of these feel like central examples of the type of thing EA needs most. Most of the variance of the impact of the software project will be in how good the idea is; same for most of the variance of the impact of getting funding.

Robin Hanson is someone who's good at generating novel and important ideas. Idk how he got that way, but I suspect it'd be very hard to design a curriculum to recreate that. Do you disagree?

Comment by richard_ngo on richard_ngo's Shortform · 2020-06-15T13:29:41.202Z · score: 4 (2 votes) · EA · GW

Then there's the question of how many fields it's actually important to have good research in. Broadly speaking, my perspective is: we care about the future; the future is going to be influenced by a lot of components; and so it's important to understand as many of those components as we can. Do we need longtermist sociologists? Hell yes! Then we can better understand how value drift might happen, and what to do about it. Longtermist historians to figure out how power structures will work, longtermist artists to inspire people - as many as we can get. Longtermist physicists - Anders can't figure out how to colonise the galaxy by himself.

If you're excited about something that poses a more concrete existential risk, then I'd still advise that as a priority. But my guess is that there's also a lot of low-hanging fruit for would-be futurists in other disciplines.

Comment by richard_ngo on richard_ngo's Shortform · 2020-06-15T13:19:26.934Z · score: 7 (4 votes) · EA · GW

Another related thing that isn't discussed enough is the immense difficulty of actually doing good research, especially in a pre-paradigmatic field. I've personally struggled to transition from engineer mindset, where you're just trying to build a thing that works (and you'll know when it does), to scientist mindset, where you need to understand the complex ways in which many different variables affect your results.

This isn't to say that only geniuses make important advances, though - hard work and persistence go a long way. As a corollary, if you're in a field where hard work doesn't feel like work, then you have a huge advantage. And it's also good for building a healthy EA community if even people who don't manage to have a big impact are still excited about their careers. So that's why I personally place a fairly high emphasis on passion when giving career advice (unless I'm talking to someone with exceptional focus and determination).

Comment by richard_ngo on richard_ngo's Shortform · 2020-06-13T10:46:27.161Z · score: 29 (14 votes) · EA · GW

I'm leaning towards the view that "don't follow your passion" and "try do really high-leverage intellectual work" are both good pieces of advice in isolation, but that they work badly in combination. I suspect that there are very few people doing world-class research who aren't deeply passionate about it, and also that EA needs world-class research in more fields than it may often seem.

Comment by richard_ngo on Why might one value animals far less than humans? · 2020-06-11T18:28:21.832Z · score: 4 (2 votes) · EA · GW
Would you say the discrepancy between preferences and hedonism is because humans can (and do) achieve much greater highs than nonhuman animals under preferences, but human and nonhuman lows aren't so different?

Something like that. Maybe the key idea here is my ranking of possible lives:

  • Amazing hedonic state + all personal preferences satisfied >> amazing hedonic state.
  • Terrible hedonic state ≈ terrible hedonic state + all personal preferences violated.

In other words, if I imagine myself suffering enough hedonically I don't really care about any other preferences I have about my life any more by comparison. Whereas that isn't true for feelings of bliss.

I imagine things being more symmetrical for animals, I guess because I don't consider their preferences to be as complex or core to their identities.

Comment by richard_ngo on Why might one value animals far less than humans? · 2020-06-08T13:07:22.693Z · score: 8 (5 votes) · EA · GW

Insofar as I value conscious experiences purely by virtue of their valence (i.e. positivity or negativity), I value animals not too much less than humans (discounted to the extent I suspect that they're "less conscious" or "less capable of feeling highly positive states", which I'm still quite uncertain about).

Insofar as I value preference fulfilment in general, I value humans significantly more than animals (because human preferences are stronger and more complex than animals') but not overwhelmingly so, because animals have strong and reasonably consistent preferences too.

Insofar as I value specific types of conscious experiences and preference fulfilment, such as "reciprocated romantic love" or "achieving one's overarching life goals", then I value humans far more than animals (and would probably value posthumans significantly more than humans).

I don't think there are knock-down arguments in favour of any of these approaches, and so I usually try to balance all of these considerations. Broadly speaking, I do this by prioritising hedonic components when I think about preventing disvalue, and by prioritising the other components when I think about creating value.

Comment by richard_ngo on Moral Anti-Realism Sequence #2: Why Realists and Anti-Realists Disagree · 2020-06-06T09:30:32.231Z · score: 3 (2 votes) · EA · GW

Cool, glad we're on the same page. The following is a fairly minor point, but thought it might still be worth clarifying.

"You could switch back and forth between two ways of interpreting the realist's moral claims."

I guess that, while in principle this makes sense, in practice language is defined on a community level, and so it's just asking for confusion to hold this position. In particular, ethics is not cleanly separable from meta-ethics, and so I can't always reinterpret a realist's argument in a pragmatic way without losing something. But if realists use 'morality' to always implicitly mean 'objective morality', then I don't know when they're relying on the 'objective' bit in their arguments. That seems bad.

The alternative is to agree on a "lowest common denominator" definition of morality, and expect people who are relying on its objectiveness or subjectivity to explicitly flag that. As an analogy, imagine that person A thinks we live in a simulation, and person B doesn't, and person B tries to define "cats" so that their definition includes the criterion "physically implemented in the real world, not just in a simulation". In which case person A believes that no cats exist, in that sense.

I think the correct response from A is to say "No, you're making a power grab for common linguistic territory, which I don't accept. We should define 'cats' in a way that doesn't make it a vacuous concept for many members of our epistemic community. So I won't define cats as 'simulated beings' and you won't define them as 'physical beings', and if one of your arguments about cats relies on this distinction, then you should make that explicit."

This post is (as usual) relevant:

I could equivalently describe the above position as: "when your conception of something looks like Network 2, but not everyone agrees, then your definitions should look like Network 1."

Comment by richard_ngo on Moral Anti-Realism Sequence #2: Why Realists and Anti-Realists Disagree · 2020-06-05T20:56:49.632Z · score: 6 (4 votes) · EA · GW
The version of anti-realism I’m arguing for in this sequence is a blend of error theory and non-objectivism. It seems to me that any anti-realist has to endorse error theory (in some sense at least) because realists exist, and it would be uncharitable not to interpret their claims in the realist fashion. However, the non-objectivist perspective seems importantly correct as well

I think we probably have very similar views, but I am less of a fan of error theory. What might it look like to endorse error theory as an anti-realist? Well, as an anti-realist I think that my claims about morality are perfectly reasonable and often true, since I intend them to be speaker-dependent. It's just the moral realists whose claims are in error. So that leads to the bizarre situation where I can have a conversation about object-level morality with a moral realist, and we might even change each other's minds, but throughout the whole conversation I'm evaluating every statement he says as trivially incorrect. This seems untenable.

Even anti-realists can adopt the notion of “moral facts,” provided that we think of them as facts about a non-objective (speaker-dependent) reality, instead of facts about a speaker-independent (objective) one.

Again, I expect we mostly agree here, but the phrase "facts about a non-objective (speaker-dependent) reality" feels potentially confusing to me. Would you consider it equivalent to say that anti-realists can think about moral facts as facts about the implications of certain evaluation criteria? From this perspective, when we make moral claims, we're implicitly endorsing a set of evaluation criteria (making this position somewhere in the middle of cognitivism and non-cognitivism).

I've fleshed out this position a little more in this post on "a pragmatic approach to interpreting moral claims".

Comment by richard_ngo on Some thoughts on deference and inside-view models · 2020-06-01T14:06:54.885Z · score: 9 (6 votes) · EA · GW

My broader point is something like: in a discussion about deference and skepticism, it feels odd to only discuss deference to other EAs. By conflating "EA experts" and "people with good opinions", you're missing an important dimension of variation (specifically, the difference between a community-centred outside view and a broader outside view).

Apologies for phrasing the original comment as a "gotcha" rebuttal rather than trying to distill a more constructive criticism.

Comment by richard_ngo on Some thoughts on deference and inside-view models · 2020-05-29T00:12:01.346Z · score: 17 (6 votes) · EA · GW

I think one clear disanalogy with startups is that eventually startups are judged by reality. Whereas we aren't, because doing good and getting more money are not that strongly correlated. By just eating the risk of being wrong about something, the worst case is not failing, like it is for a startup, but rather sucking up all the resources into the wrong thing.

Also, small point, but I don't think Bayesian decision theory is particularly important for EA.

Anyway, maybe eventually this might be worth considering, but as it is we've done several orders of magnitude too little analysis to start conceding.

Comment by richard_ngo on Some thoughts on deference and inside-view models · 2020-05-28T09:34:32.877Z · score: 24 (11 votes) · EA · GW

"I think that it’s potentially very bad that young EAs don’t practice skeptical independent thinking as much (if this is indeed true)."

I agree that this is potentially very bad, but also perhaps difficult to avoid as EA professionalises, because you start needing more background and technical knowledge to weigh in on ongoing debates. Analogous to what happened in science.

On the other hand, we're literally interested the whole future, about which we currently know almost nothing. So there must be space for new ideas. I guess the problem is that, while "skeptical thinking" about received wisdom is hard, it's still easier than generative thinking (i.e. coming up with new questions). The problem with EA futurism is not so much that we believe a lot of incorrect statements, but that we haven't yet thought of most of the relevant concepts. So it may be particularly valuable for people who've thought about longtermism a bunch to make public even tentative or wacky ideas, in order to provide more surface area for others to cultivate skeptical thinking and advance the state of our knowledge. (As Buck has in fact done:

Example 1: a while back there was a post on why animal welfare is an important longtermist priority, and iirc Rob Wiblin replied saying something like "But we'll have uploaded by then so it won't be a big deal." I don't think that this argument has been made much in the EA context - which makes it both ripe for skeptical independent thinking, but also much less visible as a hypothesis that it's possible to disagree with.

Example 2: there's just not very much discussion in EA about what actual utopias might look like. Maybe that's because, to utilitarians, it's just hedonium. Or because we're punting it to the long reflection. But this seems like a very important topic to think about! I'm hoping that if this discussion gets kickstarted, there'll be a lot of room for people to disagree and come up with novel ideas. Related: a bunch of claims I've made about utopia.

I'm reminded of Robin Hanson's advice to young EAs: "Study the future. ... Go actually generate scenarios, explore them, tell us what you found. What are the things that could go wrong there? What are the opportunities? What are the uncertainties? ... The world needs more futurists."

See also:

Comment by richard_ngo on Some thoughts on deference and inside-view models · 2020-05-28T08:43:49.197Z · score: 2 (1 votes) · EA · GW

"When you're thinking about real life I often think that it's better to come to conclusions based on weighing a large number of arguments, rather than trying to make one complete calculation of your conclusion"

I'm a little confused about this distinction. The process of weighing a large number of arguments IS a calculation of your conclusion, that's complete insofar as you've weighed all the relevant arguments. Perhaps you mean something like "A complete calculation that mainly relies on only a few premises"? But in this case I'd say the main advantage of the EA mindset is in fact that it makes people more willing to change their careers in response to a few fundamental premises. I think most AI safety researchers, for instance, have (or should have) a few clear cruxes about why they're in the field, whereas most AI researchers don't. Or perhaps you're just warning us not to think that we can make arguments about reality that are as conclusive as mathematical arguments?

Comment by richard_ngo on Some thoughts on deference and inside-view models · 2020-05-28T08:33:55.671Z · score: 30 (20 votes) · EA · GW

"EAs generally have better opinions when they've been around EA longer"

Except on the issues that EAs are systematically wrong about, where they will tend to have worse opinions. Which we won't notice because we also share those opinions. For example, if AMF is actually worse than standard aid programs at reducing global poverty, or if AI risk is actually not a big deal, then time spent in EA is correlated with worse opinions on these topics.

Comment by richard_ngo on Aligning Recommender Systems as Cause Area · 2020-05-26T02:03:57.724Z · score: 11 (4 votes) · EA · GW

I'm not sure about users definitely preferring the existing recommendations to random ones - I actually have been trying to turn off YouTube recommendations because they make me spend more time on YouTube than I want. Meanwhile other recommendation systems send me news that is worse on average than the rest of the news I consume (from different channels). So in some cases at least, we could use a very minimal standard of: a system is aligned if the user better off because the recommendation system exists at all.

This is a pretty blunt metric, and probably we want something more nuanced, but at least to start off with it'd be interesting to think about how to improve whichever recommender systems are currently not aligned.

Comment by richard_ngo on Critical Review of 'The Precipice': A Reassessment of the Risks of AI and Pandemics · 2020-05-18T23:59:56.948Z · score: 10 (3 votes) · EA · GW

A few more meta points:

  • I'm very surprised that we're six levels deep into a disagreement and still actively confused about each other's arguments. I thought our opinions were much more similar. This suggests that we should schedule a time to talk in person, and/or an adversarial collaboration trying to write a version of the argument that you're thinking of. (The latter might be more efficient than this exchange, while also producing useful public records).
  • Thanks for the thorough + high-quality engagement, I really appreciate it.
  • Due to time constraints I'll just try hit two key points in this reply (even though I don't think your responses resolved any of the other points for me, which I'm still very surprised by).

If you replace "perfect optimization" with "significantly-better-than-human optimization" in all of my claims, I'd continue to agree with them.

We are already at significantly-better-than-human optimisation, because none of us can take an environment and output a neural network that does well in that environment, but stochastic gradient descent can. We could make SGD many many times better and it still wouldn't produce a malicious superintelligence when trained on CIFAR, because there just isn't any gradient pushing it in the direction of intelligence; it'll train an agent to memorise the dataset far before that. And if the path to tampering is a few dozen steps long, the optimiser won't find it before the heat death of the universe (because the agent has no concept of tampering to work from, all it knows is CIFAR). So when we're talking about not-literally-perfect optimisers, you definitely need more than just amazing optimisation and hard-coded objective functions for trouble to occur - you also need lots of information about the world, maybe a bunch of interaction with it, maybe a curriculum. This is where the meat of the argument is, to me.

I think spreading the argument "if we don't do X, then we are in trouble because of problem Y" seems better. ... The former is easier to understand and more likely to be true / correctly reasoned.

I previously said:

I'm still not sure what the value of a "default assumption" is if it's not predictive, though.

And I still have this confusion. It doesn't matter if the argument is true and easy to understand if it's not action-guiding for anyone. Compare the argument: "if we (=humanity) don't remember to eat food in 2021, then everyone will die". Almost certainly true. Very easy to understand. Totally skips the key issue, which is why we should assign high enough probability to this specific hypothetical to bother worrying about it.

So then I guess your response is something like "But everyone forgetting to eat food is a crazy scenario, whereas the naive extrapolation of the thing we're currently doing is the default scenario". (Also, sorry if this dialogue format is annoying, I found it an easy way to organise my thoughts, but I appreciate that it run the risk of strawmanning you).

To which I respond: there are many ways of naively extrapolating "the thing we are currently doing". For example, the thing we're currently doing is building AI with a 100% success record at not taking over the world. So my naive extrapolation says we'll definitely be fine. Why should I pay any attention to your naive extrapolation?

I then picture you saying: "I'm not using these extrapolations to make probabilistic predictions, so I don't need to argue that mine is more relevant than yours. I'm merely saying: once our optimisers get really really good, if we give them a hard-coded objective function, things will go badly. Therefore we, as humanity, should do {the set of things which will not lead to really good optimisers training on hard-coded objective functions}."

To which I firstly say: no, I don't buy the claim that once our optimisers get really really good, if we give them a hard-coded objective function, "an existential catastrophe almost certainly happens". For reasons which I described above.

Secondly, even if I do accept your claim, I think I could just point out: "You've defined what we should do in terms of its outcomes, but in an explicitly non-probabilistic way. So if the entire ML community hears your argument, agrees with it, and then commits to doing exactly what they were already doing for the next fifty years, you have no grounds to complain, because you have not actually made any probabilistic claims about whether "exactly what they were already doing for the next fifty years" will lead to catastrophe." So again, why is this argument worth making?

Man, this last point felt really nitpicky, but I don't know how else to convey my intuitive feeling that there's some sort of motte and bailey happening in your argument. Again, let's discuss this higher-bandwidth.

Comment by richard_ngo on Critical Review of 'The Precipice': A Reassessment of the Risks of AI and Pandemics · 2020-05-18T03:45:42.244Z · score: 24 (6 votes) · EA · GW

If you use a perfect optimizer and train in the real world with what you would intuitively call a "certain specification", an existential catastrophe almost certainly happens. Given agreement on this fact, I'm just saying that I want a better argument for safety than "it's fine because we have a less-than-perfect optimizer"

I think this is the central point of disagreement. I agree that perfect optimisers are pathological. But we are not going to train anything that is within light-years of perfect optimisation. Perfect optimisation is a totally different type of thing to what we're doing. This argument feels to me like saying "We shouldn't keep building bigger and bigger bombs because in the limit of size they'll form a black hole and destroy the Earth." It may be true that building sufficiently big bombs will destroy the earth, but the mechanism in the limit of size is not the relevant one, and is only very loosely analogous to the mechanism we're actually worried about. (In the case of AI, to be very explicit, I'm saying that inner misalignment is the thing which might kill us, and that outer misalignment of perfect optimizers is the thing that's only very loosely analogous to it. Outer misalignment of imperfect optimisers is somewhere in the middle).

The rest of this comment is more meta.

The reason I am particularly concerned about spreading arguments related to perfect optimisers is threefold. Firstly because it feels reminiscent of the utility-maximisation arguments made by Yudkowsky - in both cases the arguments are based on theoretical claims which are literally true but in practice irrelevant or vacuous. This is specifically what made the utility-maximisation argument so misleading, and why I don't want another argument of this type to gain traction.

Secondly because I think that five years ago, if you'd asked a top ML researcher why they didn't believe in the existing arguments for AI risk, they'd have said something like:

Well, the utility function thing is a trivial mathematical result. And the argument about paperclips is dumb because the way we train AIs is by giving them rewards when they do things we like, and we're not going to give them arbitrarily high rewards for building arbitrarily many paperclips. What if we write down the wrong specification? Well, we do that in RL but in supervised learning we use human-labeled data, so if there's any issue with written specifications we can use that approach.

I think that these arguments would have been correct rebuttals to the public arguments for AI risk which existed at that time. We may have an object-level disagreement about whether a top ML researcher would actually have said something like this, but I am now strongly inclined to give the benefit of the doubt to mainstream ML researchers when I try to understand their positions. In particular, if I were in their epistemic position, I'm not sure I would make specific arguments for why the "intends" bit will be easy either, because it's just the default hypothesis: we train things, then if they don't do what we want, we train them better.

Thirdly, because I am epistemically paranoid about giving arguments which aren't actually the main reason to believe in a thing. I agree that the post I linked is super speculative, but if someone disproved the core intuitions that the post is based on that'd make a huge dent in my estimates of AI risk. Whereas I suspect that the same is not really the case for you and the argument you give (although I feel a bit weird asserting things about your beliefs, so I'm happy to concede this point if you disagree). Firstly because (even disregarding my other objections) it doesn't establish that AI safety work needs to be done by someone, it just establishes that AI researchers have to avoid naively extrapolating their current work. Maybe they could extrapolate it in non-naive ways that doesn't look anything like safety work. "Don't continue on the naively extrapolated path" is often a really low bar, because naive extrapolations can be very dubious (if we naively extrapolate a baby's growth, it'll end up the size of the earth pretty quickly). Secondly because the argument is also true for image classifiers, since under perfect optimisation they could hack their loss functions. Insofar as we're much less worried about them than RL agents, most of the work needed to establish the danger of the latter must be done by some other argument. Thirdly because I do think that counterfactual impact is the important bit, not "AI safety work needs to be done by someone." I don't think there needs to be a robust demonstration that the problem won't be solved by default, but there do need to be some nontrivial arguments. In my scenario, one such argument is that we won't know what effects our labels will have on the agent's learned goals, so there's no easy way to pay more to get more safety. Other arguments that fill this role are appeals to fast takeoff, competitive pressures, etc.

I specifically said this was not a prediction for this reason

I didn't read this bit carefully enough, mea culpa. I'm still not sure what the value of a "default assumption" is if it's not predictive, though.

(We in that statement was meant to refer to humanity as a whole.)

I also didn't pick up on the we = humanity thing, sorry. Makes more sense now.

Comment by richard_ngo on Critical Review of 'The Precipice': A Reassessment of the Risks of AI and Pandemics · 2020-05-16T12:03:29.829Z · score: 17 (5 votes) · EA · GW
1. The stated goal of AI research would very likely lead to human extinction

I disagree pretty strongly with this. What does it even mean for a whole field to have a "stated goal"? Who stated it? Russell says in his book that "From the very beginnings of AI, intelligence in machines has been defined in the same way", but then a) doesn't give any citations or references to the definition he uses (I can't find the quoted definition online from before his book); and b) doesn't establish that building "intelligent machines" is the only goal of the field of AI. In fact there are lots of AI researchers concerned with fairness, accountability, transparency, and so on - not just intelligence. Insofar as those researchers aren't concerned about existential risk from AI, it's because they don't think it'll happen, not because they think it's somehow outside their remit.

Now in practice, a lot of AI researcher time is spent trying to make things that better optimise objective functions. But that's because this has been the hardest part so far - specification problems have just not been a big issue in such limited domains (and insofar as they are, that's what all the FATE researchers are working on). So this observed fact doesn't help us distinguish between "everyone in AI thinks that making AIs which intend to do what we want is an integral part of their mission, but that the 'intend' bit will be easy" vs "everyone in AI is just trying to build machines that can achieve hardcoded literal objectives even if it's very difficult to hardcode what we actually want". And without distinguishing them, then the "stated goal of AI" has no predictive power (if it even exists).

We'll continue to give certain specifications of what we want

What is a "certain specification"? Is training an AI to follow instructions, giving it strong negative rewards every time it misinterprets us, then telling it to do X, a "certain specification" of X? I just don't think this concept makes sense in modern ML, because it's the optimiser, not the AI, that is given the specification. There may be something to the general idea regardless, but it needs a lot more fleshing out, in a way that I don't think anyone has done.

More constructively, I just put this post online. It's far from comprehensive, but it points at what I'm concerned about more specifically than anything else.

Comment by richard_ngo on Critical Review of 'The Precipice': A Reassessment of the Risks of AI and Pandemics · 2020-05-14T21:19:05.792Z · score: 14 (5 votes) · EA · GW
at present they represent deep theoretical limitations of current methods

+1 on disagreeing with this. It's not clear that there's enough deep theory of current methods for them to have deep theoretical limitations :P

More generally, I broadly agree with Rohin, but (as I think we've discussed) find this argument pretty dubious:

Almost every AI system we've created so far (not just deep RL systems) have some predefined, hardcoded, certain specification that the AI is trying to optimize for.
A superintelligent agent pursuing a known specification has convergent instrumental subgoals (the thing that Toby is worried about).
Therefore, if we want superintelligent AI systems that don't have these problems, we need to change how AI is done.

Convergent instrumental subgoals aren't the problem. Large-scale misaligned goals (instrumental or not) are the problem. Whether or not a predefined specification gives rise to those sorts of goals depends on the AI architecture and training process in a complicated way. Once you describe in more detail what it actually means for an AI system to "have some specification", the "certain" bit also stop seeming like a problem.

I'd like to refer to a better argument here, but unfortunately there is no source online that makes the case that AGI will be dangerous in a satisfactory way. I think there are enough pieces floating around in people's heads/private notes to make a compelling argument, but the fact that they haven't been collated publicly is a clear failure of the field.

Comment by richard_ngo on The Case for Impact Purchase | Part 1 · 2020-04-21T16:26:27.377Z · score: 9 (7 votes) · EA · GW

Impact purchases are one way of creating more impact finance. In particular, they can make it worthwhile for non-altruistic financiers to fund altruistic projects. This is particularly beneficial in cases where it's hard for a single altruist to evaluate all the people who want funding.

With regard to (b), the incentives for impact purchasers are roughly similar to the incentives of someone who's announced a prize. In both cases, the payer create incentives for others to do the work that will lead to payouts.

Comment by richard_ngo on Some thoughts on Toby Ord’s existential risk estimates · 2020-04-07T13:47:39.799Z · score: 26 (13 votes) · EA · GW

What is the significance of the people on the ISS? Are you suggesting that six people could repopulate the human species? And what sort of disaster takes less time than a flight, and only kills people on the ground?

Also, I expect to see small engineered pandemics, but only after effective genetic engineering is widespread. So the fact that we haven't seen any so far is not much evidence.

Comment by richard_ngo on Launching An Introductory Online Textbook on Utilitarianism · 2020-04-01T10:10:01.965Z · score: 18 (11 votes) · EA · GW

I'd be more excited about seeing some coverage of suffering-focused ethics in general, rather than NU specifically. I think NU is a fairly extreme position, but the idea that suffering is the dominant component of the expected utility of the future is both consistent with standard utilitarian positions, and also captures the key point that most EA NU thinkers are making.

Comment by richard_ngo on What are some 1:1 meetings you'd like to arrange, and how can people find you? · 2020-03-18T14:49:49.022Z · score: 8 (6 votes) · EA · GW

Who are you?

I'm Richard. I'm a research engineer on the AI safety team at DeepMind.

What are some things people can talk to you about? (e.g. your areas of experience/expertise)

AI safety, particularly high-level questions about what the problems are and how we should address them. Also machine learning more generally, particularly deep reinforcement learning. Also careers in AI safety.

I've been thinking a lot about futurism in general lately. Longtermism assumes large-scale sci-fi futures, but I don't think there's been much serious investigation into what they might look like, so I'm keen to get better discussion going (this post was an early step in that direction).

What are things you'd like to talk to other people about? (e.g. things you want to learn)

I'm interested in learning about evolutionary biology, especially the evolution of morality. Also the neuroscience of motivation and goals.

I'd be interested in learning more about mainstream philosophical views on agency and desire. I'd also be very interested in collaborating with philosophers who want to do this type of work, directed at improving our understanding of AI safety.

How can people get in touch with you?

Here, or email: ngor [at]

Comment by richard_ngo on AMA: Toby Ord, author of "The Precipice" and co-founder of the EA movement · 2020-03-17T17:26:59.244Z · score: 31 (14 votes) · EA · GW

What would convince you that preventing s-risks is a bigger priority than preventing x-risks?

Suppose that humanity unified to pursue a common goal, and you faced a gamble where that goal would be the most morally valuable goal with probability p, and the most morally disvaluable goal with probability 1-p. Given your current beliefs about those goals, at what value of p would you prefer this gamble over extinction?

Comment by richard_ngo on AMA: Toby Ord, author of "The Precipice" and co-founder of the EA movement · 2020-03-17T17:10:59.936Z · score: 8 (2 votes) · EA · GW

We have a lot of philosophers and philosophically-minded people in EA, but only a tiny number of them are working on philosophical issues related to AI safety. Yet from my perspective as an AI safety researcher, it feels like there are some crucial questions which we need good philosophy to answer (many listed here; I'm particularly thinking about philosophy of mind and agency as applied to AI, a la Dennett). How do you think this funnel could be improved?

Comment by richard_ngo on AMA: Toby Ord, author of "The Precipice" and co-founder of the EA movement · 2020-03-17T17:06:14.110Z · score: 10 (7 votes) · EA · GW

If you could convince a dozen of the world's best philosophers (who aren't already doing EA-aligned research) to work on topics of your choice, which questions would you ask them to investigate?

Comment by richard_ngo on AMA: Toby Ord, author of "The Precipice" and co-founder of the EA movement · 2020-03-17T16:12:28.572Z · score: 17 (10 votes) · EA · GW

If you could only convey one idea from your new book to people who are already heavily involved in longtermism, what would it be?

Comment by richard_ngo on What are the key ongoing debates in EA? · 2020-03-15T15:08:51.407Z · score: 40 (12 votes) · EA · GW

Thanks for the list! As a follow-up, I'll try list places online where such debates have occurred for each entry:


2. Toby Ord has estimates in The Precipice. I assume most discussion occurs on specific risks.

3. Lots of discussion on this; summary here: . Also more recently

4. Best discussion of this is probably here:

5. Most stuff on addresses s-risks. In terms of pushback, Carl Shulman wrote and Toby Ord wrote (although I don't find either compelling). Also a lot of Simon Knutsson's stuff, e.g.

6a. ,

6b. ,


7. Nothing particularly comes to mind, although I assume there's stuff out there.


9. E.g. here, which also links to more discussions:

Comment by richard_ngo on Harsanyi's simple “proof” of utilitarianism · 2020-02-22T16:26:04.963Z · score: 12 (6 votes) · EA · GW
Because we are indifferent between who has the 2 and who has the 0

Perhaps I'm missing something, but where does this claim come from? It doesn't seem to follow from the three starting assumptions.

Comment by richard_ngo on Announcing the 2019-20 Donor Lottery · 2019-12-03T10:13:29.606Z · score: 12 (5 votes) · EA · GW
2018-19: a $100,000 lottery (no winners)

What happens to the money in this case?

Comment by richard_ngo on I'm Buck Shlegeris, I do research and outreach at MIRI, AMA · 2019-11-22T15:29:28.310Z · score: 4 (3 votes) · EA · GW
I think that they might have been better off if they'd instead spent their effort trying to become really good at ML in the hope of being better skilled up with the goal of working on AI safety later.

I'm broadly sympathetic to this, but I also want to note that there are some research directions in mainstream ML which do seem significantly more valuable than average. For example, I'm pretty excited about people getting really good at interpretability, so that they have an intuitive understanding of what's actually going on inside our models (particularly RL agents), even if they have no specific plans about how to apply this to safety.

Comment by richard_ngo on AI safety scholarships look worth-funding (if other funding is sane) · 2019-11-20T20:05:27.475Z · score: 3 (5 votes) · EA · GW
Students able to bring funding would be best-equipped to negotiate the best possible supervision from the best possible school with the greatest possible research freedom.

This seems like the key premise, but I'm pretty uncertain about how much freedom this sort of scholarship would actually buy, especially in the US (people who've done PhDs in ML please comment!) My understanding is that it's rare for good candidates to not get funding; and also that, even with funding, it's usually important to work on something your supervisor is excited about, in order to get more support.

In most of the examples you give (with the possible exceptions of the FHI and GPI scholarships) buying research freedom for PhD students doesn't seem to be the main benefit. In particular:

OpenPhil has its fellowship for AI researchers who happen to be highly prestigious

This might be mostly trying to buy prestige for safety.

and has funded a couple of masters students on a one-off basis.
FHI has its... RSP, which funds early-career EAs with slight supervision.
Paul even made grants to independent researchers for a while.

All of these groups are less likely to have other sources of funding compared with PhD students.

Having said all that, it does seem plausible that giving money to safety PhDs is very valuable, in particular via the mechanism of freeing up more of their time (e.g. if they can then afford shorter commutes, outsourcing of time-consuming tasks, etc).

Comment by richard_ngo on I'm Buck Shlegeris, I do research and outreach at MIRI, AMA · 2019-11-20T15:19:40.953Z · score: 12 (5 votes) · EA · GW
On a meta note: Different people who work on AI alignment have radically different pictures of what the development of AI will look like, what the alignment problem is, and what solutions might look like.

+1, this is the thing that surprised me most when I got into the field. I think helping increase common knowledge and agreement on the big picture of safety should be a major priority for people in the field (and it's something I'm putting a lot of effort into, so send me an email at if you want to discuss this).

I think the ideas described in the paper Risks from Learned Optimization are extremely important.

Also +1 on this.

Comment by richard_ngo on I'm Buck Shlegeris, I do research and outreach at MIRI, AMA · 2019-11-20T15:15:45.001Z · score: 13 (4 votes) · EA · GW
If I thought there was a <30% chance of AGI within 50 years, I'd probably not be working on AI safety.
I expect the world to change pretty radically over the next 100 years.

I find these statements surprising, and would be keen to hear more about this from you. I suppose that the latter goes a long way towards explaining the former. Personally, there are few technologies that I think are likely to radically change the world within the next 100 years (assuming that your definition of radical is similar to mine). Maybe the only ones that would really qualify are bioengineering and nanotech. Even in those fields, though, I expect the pace of change to be fairly slow if AI isn't heavily involved.

(For reference, while I assign more than 30% credence to AGI within 50 years, it's not that much more).

Comment by richard_ngo on A conversation with Rohin Shah · 2019-11-13T01:51:43.888Z · score: 12 (6 votes) · EA · GW

For reference, here's the post on realism about rationality that Rohin mentioned several times.

Comment by richard_ngo on EA Hotel Fundraiser 5: Out of runway! · 2019-10-25T15:24:12.705Z · score: 31 (23 votes) · EA · GW

I'm planning to donate to the EA hotel. Given that it isn't a registered charity, I'm interested in doing donation swaps with EAs in countries where charitable donations aren't tax deductible (like Sweden) so that I can get tax deductions on my donations. Reach out or comment here if interested.

Comment by richard_ngo on Seeking EA experts interested in the evolutionary psychology of existential risks · 2019-10-24T09:11:06.677Z · score: 2 (2 votes) · EA · GW

Any of the authors of this paper:

Comment by richard_ngo on Only a few people decide about funding for community builders world-wide · 2019-10-23T01:56:45.558Z · score: 14 (10 votes) · EA · GW

This homogeneity might well be bad - in particular by excluding valuable but less standard types of community building. If so this problem would be mitigated by having more funding sources.

Comment by richard_ngo on Ineffective Altruism: Are there ideologies which generally cause there adherents to have worse impacts? · 2019-10-17T14:13:08.826Z · score: 14 (6 votes) · EA · GW

Agreed - in fact, maybe a better question is whether there are any ideologies where strong adherence doesn't lead you to make poor decisions.

Comment by richard_ngo on EA Handbook 3.0: What content should I include? · 2019-10-01T11:53:25.435Z · score: 13 (4 votes) · EA · GW

Here's my (in-progress) collation of important EA resources, organised by topic. Contributions welcome :)

Comment by richard_ngo on How do most utilitarians feel about "replacement" thought experiments? · 2019-09-13T09:52:03.451Z · score: 1 (1 votes) · EA · GW

Using those two different types of "should" makes your proposed sentence ("It seems that (at least) the humans who are utilitarians should commit mass suicide in order to bring the new beings into existence, because that's what utilitarianism implies is the right action in that situation.") unnecessarily confusing, for a couple of reasons.

1. Most moral anti-realists don't use "epistemic should" when talking about morality. Instead, I claim, they use my definition of moral should: "X should do Y means that I endorse/prefer some moral theory T and T endorses X doing Y". (We can test this by asking anti-realists who don't subscribe to negative utilitarianism whether a negative utilitarian should destroy the universe - I predict they will either say "no" or argue that the question is ambiguous.) And so introducing "epistemic should" makes moral talk more difficult.

2. Moral realists who are utilitarians and use "moral should" would agree with your proposed sentence, and moral anti-realists who aren't utilitarians and use "epistemic should" would also agree with your sentence, but for two totally different reasons. This makes follow-up discussions much more difficult.

How about "Utilitarianism endorses humans voluntarily replacing themselves with these new beings." That gets rid of (most of) the contractarianism. I don't think there's any clean, elegant phrasing which then rules out the moral uncertainty in a way that's satisfactory to both realists and anti-realists, unfortunately - because realists and anti-realists disagree on whether, if you prefer/endorse a theory, that makes it rational for you to act on that theory. (In other words, I don't know whether moral realists have terminology which distinguishes between people who act on false theories that they currently endorse, versus people who act on false theories they currently don't endorse).

Comment by richard_ngo on How do most utilitarians feel about "replacement" thought experiments? · 2019-09-11T11:50:33.241Z · score: 4 (2 votes) · EA · GW

I originally wrote a different response to Wei's comment, but it wasn't direct enough. I'm copying the first part here since it may be helpful in explaining what I mean by "moral preferences" vs "personal preferences":

Each person has a range of preferences, which it's often convenient to break down into "moral preferences" and "personal preferences". This isn't always a clear distinction, but the main differences:

1. Moral preferences are much more universalisable and less person-specific (e.g. "I prefer that people aren't killed" vs "I prefer that I'm not killed").

2. Moral preferences are associated with a meta-preference that everyone has the same moral preferences. This is why we feel so strongly that we need to find a shared moral "truth". Fortunately, most people are in agreement in our societies on the most basic moral questions.

3. Moral preferences are associated with a meta-preference that they are consistent, simple, and actionable. This is why we feel so strongly that we need to find coherent moral theories rather than just following our intuitions.

4. Moral preferences are usually phrased as "X is right/wrong" and "people should do right and not do wrong" rather than "I prefer X". This often misleads people into thinking that their moral preferences are just pointers to some aspect of reality, the "objective moral truth", which is what people "objectively should do".

When we reflect on our moral preferences and try to make them more consistent and actionable, we often end up condensing our initial moral preferences (aka moral intuitions) into moral theories like utilitarianism. Note that we could do this for other preferences as well (e.g. "my theory of food is that I prefer things which have more salt than sugar") but because I don't have strong meta-preferences about my food preferences, I don't bother doing so.

The relationship between moral preferences and personal preferences can be quite complicated. People act on both, but often have a meta-preference to pay more attention to their moral preferences than they currently do. I'd count someone as a utilitarian if they have moral preferences that favour utilitarianism, and these are a non-negligible component of their overall preferences.

Comment by richard_ngo on How do most utilitarians feel about "replacement" thought experiments? · 2019-09-11T11:44:19.167Z · score: 1 (1 votes) · EA · GW

My first objection is that you're using a different form of "should" than what is standard. My preferred interpretation of "X should do Y" is that it's equivalent to "I endorse some moral theory T and T endorses X doing Y". (Or "according to utilitarianism, X should do Y" is more simply equivalent to "utilitarianism endorses X doing Y"). In this case, "should" feels like it's saying something morally normative.

Whereas you seem to be using "should" as in "a person who has a preference X should act on X". In this case, should feels like it's saying something epistemically normative. You may think these are the same thing, but I don't, and either way it's confusing to build that assumption into our language. I'd prefer to replace this latter meaning of "should" with "it is rational to". So then we get:

"it is rational for humans who are utilitarians to commit mass suicide in order to bring the new beings into existence, because that's what utilitarianism implies is the right action."

My second objection is that this is only the case if "being a utilitarian" is equivalent to "having only one preference, which is to follow utilitarianism". In practice people have both moral preferences and also personal preferences. I'd still count someone as being a utilitarian if they follow their personal preferences instead of their moral preferences some (or even most) of the time. So then it's not clear whether it's rational for a human who is a utilitarian to commit suicide in this case; it depends on the contents of their personal preferences.

I think we avoid all of this mess just by saying "Utilitarianism endorses replacing existing humans with these new beings." This is, as I mentioned earlier, a similar claim to "ZFC implies that 1 + 1 = 2", and it allows people to have fruitful discussions without agreeing on whether they should endorse utilitarianism. I'd also be happy with Simon's version above: "Utilitarianism seems to imply that humans should...", although I think it's slightly less precise than mine, because it introduces an unnecessary "should" that some people might take to be a meta-level claim rather than merely a claim about the content of the theory of utilitarianism (this is a minor quibble though. Analogously: "ZFC implies that 1 + 1 = 2 is true").

Anyway, we have pretty different meta-ethical views, and I'm not sure how much we're going to converge, but I will say that from my perspective, your conflation of epistemic and moral normativity (as I described earlier) is a key component of why your position seems confusing to me.

Comment by richard_ngo on How much EA analysis of AI safety as a cause area exists? · 2019-09-10T09:35:35.079Z · score: 2 (2 votes) · EA · GW
Are you aware of any surveys or any other evidence supporting this? (I'd accept "most people in AI safety that I know started working in it because EA investigative work convinced them that AI safety matters" or something of that nature.)

I'm endorsing this, and I'm confused about which part you're skeptical about. Is it the "many EAs" bit? Obviously the word "many" is pretty fuzzy, and I don't intend it to be a strong claim. Mentally the numbers I'm thinking of are something like >50 people or >25% of committed (or "core", whatever that means) EAs. Don't have a survey to back that up though. Oh, I guess I'm also including people currently studying ML with the intention of doing safety. Will edit to add that.

Why are you trying to answer this, instead of "How should I update, given the results of all available investigations into AI safety as a cause area?"

There are other questions that I would like answers to, not related to AI safety, and if I trusted EA consensus, then that would make the process much easier.

For this question then, it seems that Paul Christiano also needs to be discounted (and possibly others as well but I'm not as familiar with them).

Indeed, I agree.

Comment by richard_ngo on How do most utilitarians feel about "replacement" thought experiments? · 2019-09-09T13:46:29.079Z · score: 3 (3 votes) · EA · GW

Okay, thanks. So I guess the thing I'm curious about now is: what heuristics do you have for deciding when to prioritise contractarian intuitions over consequentialist intuitions, or vice versa? In extreme cases where one side feels very strongly about it (like this one) that's relatively easy, but any thoughts on how to extend those to more nuanced dilemmas?

Comment by richard_ngo on How do most utilitarians feel about "replacement" thought experiments? · 2019-09-09T12:55:09.324Z · score: 9 (5 votes) · EA · GW

I think that "utilitarianism seems to imply that humans who are utilitarians should..." is a type error regardless of whether you're a realist or an anti-realist, in the same way as "the ZFC axioms imply that humans who accept those axioms should believe 1+1=2". That's not what the ZFC axioms imply - actually, they just imply that 1+1 = 2, and it's our meta-theory of mathematics which determines how you respond to this fact. Similarly, utilitarianism is a theory which, given some actions (or maybe states of the world, or maybe policies) returns a metric for how "right" or "good" they are. And then how we relate to that theory depends on our meta-ethics.

Given how confusing talking about morality is, I think it's important to be able to separate the object-level moral theories from meta-ethical theories in this way. (For more along these lines, see my post here).