The expected value of extinction risk reduction is positive

post by JanBrauner · 2018-12-15T15:32:22.633Z · score: 30 (19 votes) · EA · GW · 19 comments

By Jan Brauner and Friederike Grosse-Holz

Work on this article has been funded by the Centre for Effective Altruism, but the article represents the personal views of the authors.

Assume it matters morally what happens in the millions of years to come. What should we do, then? Will efforts to reduce the risk of human extinction lead to a better or worse future?

Because the EA forum does not yet support footnotes, the full article is posted at https://www.effectivealtruism.org/articles/the-expected-value-of-extinction-risk-reduction-is-positive/

Abstract

If most expected value or disvalue lies in the billions of years to come, altruists should plausibly focus their efforts on improving the long-term future. It is not clear whether reducing the risk of human extinction would, in expectation, improve the long-term future, because a future with humanity may be better or worse than one without it.

From a consequentialist, welfarist view, most expected value (EV) or disvalue of the future comes from scenarios in which (post-)humanity colonizes space, because these scenarios contain most expected beings. Simply extrapolating the current welfare (part 1.1) of humans and farmed and wild animals, it is unclear whether we should support spreading sentient beings to other planets.

From a more general perspective (part 1.2), future agents will likely care morally about the same things we find valuable or about any of the things we are neutral towards. It seems very unlikely that they would see value exactly where we see disvalue. If future agents are powerful enough to shape the world according to their preferences, this asymmetry implies the EV of future agents colonizing space is positive from many welfarist perspectives.

If we can defer the decision about whether to colonize space to future agents with more moral and empirical insight, doing so creates option value (part 1.3). However, most expected future disvalue plausibly comes from futures controlled by indifferent or malicious agents. Such “bad” agents will make worse decisions than we, currently, could. Thus, the option value in reducing the risk of human extinction is small.

The universe may not stay empty, even if humanity goes extinct (part 2.1). A non-human animal civilization, extraterrestrials or uncontrolled artificial intelligence that was created by humanity might colonize space. These scenarios may be worse than (post-)human space colonization in expectation. Additionally, with more moral or empirical insight, we might realize that the universe is already filled with beings or things we care about (part 2.2). If the universe is already filled with disvalue that future agents could alleviate, this gives further reason to reduce extinction risk.

In practice, many efforts to reduce the risk of human extinction also have other effects of long-term significance. Such efforts might often reduce the risk of global catastrophes (part 3.1) from which humanity would recover, but which might set technological and social progress on a worse track than they are on now. Furthermore, such efforts often promote global coordination, peace and stability (part 3.2), which is crucial for safe development of pivotal technologies and to avoid negative trajectory changes in general.

Aggregating these considerations, efforts to reduce extinction risk seem positive in expectation from most consequentialist views, ranging from neutral on some views to extremely positive on others. As efforts to reduce extinction risk also seem highly leveraged and time-sensitive, they should probably hold prominent place in the long-termist EA portfolio.

19 comments

Comments sorted by top scores.

comment by Davidmanheim · 2018-12-16T09:52:26.249Z · score: 15 (8 votes) · EA · GW

Great work. A few notes in descending order or importance which I'd love to see addressed at least in brief:

1) This seems not to engage with the questions about short-term versus long-term prioritization and discount rates. I'd think that the implicit assumptions should be made clearer.

2) It doesn't seem obvious to me that, given the universalist assumptions about the value of animal or other non-human species, the long term future is affected nearly as much by the presence or absence of humans. Depending on uncertainties about the Fermi hypothesis and the viability of non-human animals developing sentience over long time frames, this might greatly matter.

3) Reducing the probability of technological existential risks may require increasing the probability of human stagnation.

4) S-risks are plausibly more likely if moral development is outstripped by growth in technological power over relatively short time frames, and existential catastrophe has a comparatively limited downside.

comment by JanBrauner · 2018-12-16T22:21:56.809Z · score: 9 (6 votes) · EA · GW

Hi David, thanks for your comments.

1) This seems not to engage with the questions about short-term versus long-term prioritization and discount rates. I'd think that the implicit assumptions should be made clearer.

Yes, the article does not deal with considerations for and against caring about the long-term. This is discussed elsewhere. Instead, the article assumes that we care about the long-term (e.g. that we don't discount the value of future lives strongly), and analyses what implications follow from that view.

We tried to make that explicit. E.g., the first point under "Moral assumptions" reads:

Throughout this article, we base our considerations on two assumptions:
1. That it morally matters what happens in the billions of years to come. From this very long-term view, making sure the future plays out well is a primary moral concern.

2) It doesn't seem obvious to me that, given the universalist assumptions about the value of animal or other non-human species, the long term future is affected nearly as much by the presence or absence of humans. Depending on uncertainties about the Fermi hypothesis and the viability of non-human animals developing sentience over long time frames, this might greatly matter.

I think this point matters. Part 2.1 of the article deals with the implications of potential future non-human animal civilizations and extraterrestrials. I think the implications are somewhat complicated and depend quite a bit on your values, so I won't try to summarize them here.

4) S-risks are plausibly more likely if moral development is outstripped by growth in technological power over relatively short time frames, and existential catastrophe has a comparatively limited downside.

We don't try to argue for increasing the speed of technological progress.

Apart from that, it is not clear to me that extinction has "comparatively little downside" (compared to S-risks, you probably mean). It, of course, depends on your moral values. But even from a suffering-focused perspective, it may well be that we would - with more moral and empirical insight - come to realize that the universe is already filled with suffering. I personally would not be surprised if "S-risks by omission" (*) weighed pretty heavily in the overall calculus. This topic is discussed in part 2.2.

I don't have anything useful to say regarding your point 3).

(*) term coined by Lukas Gloor, I think.

comment by Davidmanheim · 2018-12-20T12:30:24.428Z · score: 1 (1 votes) · EA · GW

Thanks for replying.

I'd agree with your points regarding limited scope for the first and second points, but I don't understand how anyone can make prioritization decisions when we have no discounting - it's nearly always better to conserve resources. If we have discounting for costs but not benefits, however, I worry the framework is incoherent. This is a much more general confusion I have, and the fact that you didn't address or resolve it is unsurprising.

Re: S-Risks, I'm wondering whether we need to be concerned about value misalignment leading to arbitrarily large negative utility, given some perspectives. I'm concerned that human values are incoherent, and any given maximization is likely to cause arbitrarily large "suffering" for some values - and if there are multiple groups with different values, this might mean any maximization imposes maximal suffering on the large majority of people's values.

For example, if 1/3 of humanity feels that human liberty is a crucial value, without which human pleasure is worse than meaningless, another 1/3 views earning reward as critical, and the last 1/3 views bliss/pure hedonium as optimal, we would view tiling the universe with human brains maxed out for any one of these as a hugely negative outcome for 2/3 of humanity, much worse than extinction.

comment by JanBrauner · 2018-12-22T18:23:16.622Z · score: 2 (2 votes) · EA · GW

Regarding your second point, just a few thoughts:
First of all, an important point is how you think values and morality work. If two-thirds of humanity, after thorough reflection, disagree with your values, does this give you a reason to become less certain about your values as well? Maybe adopt their values to a degree? ...

Secondly, I am also uncertain how coherent/convergent human values will be. There seem to be good arguments for both sides, see e.g. this blog post by Paul Christiano (and the discussion with Brian Tomasik in the comments of that post): https://rationalaltruist.com/2013/06/13/against-moral-advocacy/

Third: In a situation like the one you described above, at least there would be huge room for compromise/gains from trade/... So if future humanity would be split into the three factions you suggested, they would not necessarily fight a war until only one faction remains that can then tile the universe with their preferred version. Indeed, they probably would not, as cooperation seems better for everyone in expectation.

comment by Davidmanheim · 2018-12-23T06:27:44.103Z · score: 1 (1 votes) · EA · GW

1) I agree that there is some confusion on my part, and on the part of most others I have spoken to, about how terminal values and morality do or do not get updated.

2) Agreed.

3) I will point to a maybe forthcoming paper / idea of Eric Drexler at FHI that makes this point, which he called "pareto-topia". Despite the wonderful virtues of the idea, I'm unclear if there is a stable game-theoretic mechanism that prevents a race to the bottom outcome when fundamentally different values are being traded off. Specifically in this case, it's possible that different values lead to an inability to truthfully/reliably cooperate - a paved road to pareto-topia seems not to exist, and there might be no path at all.

comment by Jacy_Reese · 2018-12-16T23:19:27.145Z · score: 13 (12 votes) · EA · GW

Thanks for posting on this important topic. You might be interested in this EA Forum post [EA · GW] where I outlined many arguments against your conclusion, the expected value of extinction risk reduction being (highly) positive.

I do think your "very unlikely that [human descendants] would see value exactly where we see disvalue" argument is a viable one, but I think it's just one of many considerations, and my current impression of the evidence is that it's outweighed.

Also FYI the link in your article to "moral circle expansion" is dead. We work on that approach at Sentience Institute if you're interested.

comment by JanBrauner · 2018-12-22T17:28:42.140Z · score: 3 (2 votes) · EA · GW

Hey Jacy,

I have seen and read your post. It was published after my internal "Oh my god, I really, really need to stop reading and integrating even more sources, the article is already way too long"-deadline, so I don't refer to it in the article.

In general, I am more confident about the expected value of extinction risk reduction being positive, than about extinction risk reduction actually being the best thing to work on. It might well be that e.g. moral circle expansion is more promising, even if we have good reasons to believe that extinction risk reduction is positive.

I do think your "very unlikely that [human descendants] would see value exactly where we see disvalue" argument is a viable one, but I think it's just one of many considerations, and my current impression of the evidence is that it's outweighed.

I personally don't think that this argument is very strong on its own. But I think there are additional strong arguments (in descending order of relevance):

  • "The universe might already be filled with suffering and post-humans might do something against it."
  • "Global catastrophes, that don't lead to extinction, might have negative long-term effects"
  • "Other non-human animal civilizations might be worse"
  • ...
comment by Jacy_Reese · 2018-12-27T00:22:33.784Z · score: 1 (1 votes) · EA · GW

Thank you for the reply, Jan, especially noting those additional arguments. I worry that your article neglects them in favor of less important/controversial questions on this topic. I see many EAs taking the "very unlikely that [human descendants] would see value exactly where we see disvalue" argument (I'd call this the 'will argument,' that the future might be dominated by human-descendant will and there is much more will to create happiness than suffering, especially in terms of the likelihood of hedonium over dolorium) and using that to justify a very heavy focus on reducing extinction risk, without exploration of those many other arguments. I worry that much of the Oxford/SF-based EA community has committed hard to reducing extinction risk without exploring those other arguments.

It'd be great if at some point you could write up discussion of those other arguments, since I think that's where the thrust of the disagreement is between people who think the far future is highly positive, close to zero, and highly negative. Though unfortunately, it always ends up coming down to highly intuitive judgment calls on these macro-socio-technological questions. As I mentioned in that post, my guess is that long-term empirical study like the research in The Age of Em or done at Sentience Institute is our best way of improving those highly intuitive judgment calls and finally reaching agreement on the topic.

comment by JanBrauner · 2018-12-30T18:45:18.156Z · score: 1 (1 votes) · EA · GW

Hey Jacy,

I have written up my thoughts on all these points in the article. Here are the links.

  • "The universe might already be filled with suffering and post-humans might do something against it."

Part 2.2

  • "Global catastrophes, that don't lead to extinction, might have negative long-term effects"

Part 3

  • "Other non-human animal civilizations might be worse

Part 2.1

The final paragraphs of each sections usually contain discussion of how relevant I think each argument is. All these sections also have some quantitative EV-estimates (linked or in the footnotes).

But you probably saw that, since it is also explained in the abstract. So I am not sure what you mean when you say:

It'd be great if at some point you could write up discussion of those other arguments,

Are we talking about the same arguments?

comment by Jacy_Reese · 2018-12-30T20:36:53.614Z · score: 2 (2 votes) · EA · GW

Oh, sorry, I was thinking of the arguments in my post, not (only) those in your post. I should have been more precise in my wording.

comment by Milan_Griffes · 2018-12-16T22:43:24.166Z · score: 3 (2 votes) · EA · GW

Curious how you're thinking about efforts that are intended to reduce x-risk but instead end up increasing it.

e.g. public-facing aerosol injection research [EA · GW]:

Given this strategic landscape, the effects of calling attention to stratospheric aerosol injection as a cause are unclear. It’s possible that further public-facing work on the intervention results in international agreements governing the use of the technology. This would most likely be a reduction in existential risk along this vector.
However, it’s also possible that further public-facing work on aerosol injection makes the technology more discoverable, revealing the technology to decision-makers who were previously ignorant of its promise. Some of these decision-makers might be inclined to pursue research programs aimed at developing a stratospheric aerosol injection capability, which would most likely increase existential risk along this vector.
comment by JanBrauner · 2018-12-22T17:07:42.767Z · score: 2 (2 votes) · EA · GW
Curious how you're thinking about efforts that are intended to reduce x-risk but instead end up increasing it.

Uhm... Seems bad? :-)

comment by MichaelStJules · 2018-12-18T21:05:13.242Z · score: 2 (2 votes) · EA · GW
Assuming that future agents are mostly indifferent towards the welfare of their “tools”, their actions would affect powerless beings only via (in expectation random) side-effects. It is thus relevant to know the “default” level of welfare of powerless beings.

By "in expectation random", do you mean 0 in expectation? I think there are reasons to expect the effect to be negative (individually), based on our treatment of nonhuman animals. Our indifference to chicken welfare has led to severe deprivation in confinement, more cannibalism in open but densely packed systems, the spread of diseases, artificial selection causing chronic pain and other health issues, and live boiling. I expect chickens' wild counterparts (red jungle fowls) to have greater expected utility, individually, and plausibly positive EU (from a classical hedonistic perspective, although I'm not sure either way). Optimization for productivity seems usually to come at the cost of individual welfare.

Even for digital sentience, if designed with the capacity to suffer -- regardless of our intentions and their "default" level of welfare, and especially if we mistakenly believe them not to be sentient -- we might expect their levels of welfare to decrease as we demand more from them, since there's not enough instrumental value for us to recalibrate their affective responses or redesign them with higher welfare. The conditions in which they are used may become significantly harsher than the conditions for which they were initially designed.

It's also very plausible that many of our digital sentiences will be designed through evolutionary/genetic algorithms or other search algorithms that optimize for some performance ("fitness") metric, and because of how expensive these approaches are computationally, we may be likely to reuse the digitial sentiences with only minor adjustments outside of the environments for which they were optimized. This is already being done for deep neural networks now.

Similarly, we might expect more human suffering (individually) from AGI with goals orthogonal to our welfare, an argument against positive expected human welfare.

comment by JanBrauner · 2018-12-22T17:44:11.402Z · score: 2 (2 votes) · EA · GW

Hi Michael,

By "in expectation random", do you mean 0 in expectation?

Yes, that's what we meant.

I am not sure I understand your argument. You seem to say the following:

  • Post-humans will put "sentient tools" into harsher conditions than the ones the tools were optimized for.
  • If "sentient tools" are put into these conditions, their welfare decreases (compared with the situations they were optimized for).

My answer: The complete "side-effects" (in the meaning of the article) on sentient tools comprises bringing them into existence and using them. The relevant question seems to be if this package is positive or negative, compared to the counterfactual (no sentient tools). Humanity might bring sentient tools into conditions that are worse for the tools than the conditions they were optimized for. Even these conditions might still be overall positive.

Apart from that, I am not sure if the two assumptions listed as bullet points above will actually hold for the majority of "sentient tools". I think that we know very little about the way tools will be created and used in the far future, which was one reason for assuming "zero in expectation" side-effects.

comment by MichaelStJules · 2018-12-23T20:54:48.327Z · score: 1 (1 votes) · EA · GW

Isn't it equally justified to assume that their welfare in the conditions they were originally optimized/designed for is 0 in expectation? If anything, it makes more sense to me to make assumptions about this setting first, since it's easier to understand their motivations and experiences in this setting based on their value for the optimization process.

Apart from that, I am not sure if the two assumptions listed as bullet points above will actually hold for the majority of "sentient tools".

We can ignore any set of tools that has zero total wellbeing in expectation; what's left could still dominate the expected value of the future. We can look at sets of sentient tools that we might think could be biased towards positive or negative average welfare:

1. the set of sentient tools used in harsher conditions,

2. the set used in better conditions,

3. the set optimized for pleasure, and

4. the set optimized for pain.

Of course, there are many other sets of interest, and they aren't all mutually exclusive.

The expected value of the future could be extremely sensitive to beliefs about these sets (their sizes and average welfares). (And this could be a reason to prioritize moral circle expansion instead.)

comment by JanBrauner · 2018-12-30T19:13:07.974Z · score: 2 (2 votes) · EA · GW

These are all very good points. I agree that this part of the article is speculative, and you could easily come to a different conclusion.

Overall, I still think that this argument alone (part 1.2 of the article) points into the direction of extinction risk reduction being positive. Although the conclusion does depend on the "default level of welfare of sentient tools" that we are discussing in this thread, it more critically depends on whether future agents' preferences will be aligned with ours.

But I never gave this argument (part 1.2) that much weight anyway. I think that the arguments later in that article (part 2 onwards, I listed them in my answer to Jacy's comment) are more robust and thus more relevant. So maybe I somewhat disagree with your statement:

The expected value of the future could be extremely sensitive to beliefs about these sets (their sizes and average welfares). (And this could be a reason to prioritize moral circle expansion instead.)

To some degree this statement is, of course, true. The uncertainty gives some reason to deprioritize extinction risk reduction. But: The expected value of the future (with (post-) humanity) might be quite sensitive to these beliefs, but the expected value of extinction risk reduction efforts is not the same as the expected value of the future. You also need to consider what would happen if humanity goes extinct (non-human animals, S-risks by omission), non-extinction long-term effects of global catastrophes, option value,... (see my comments to Jacy). So the question of whether to prioritize moral circle expansion is maybe not extremely sensitive to "beliefs about these sets [of sentient tools]".

comment by Lukas_Finnveden · 2018-12-18T23:33:34.231Z · score: 1 (1 votes) · EA · GW

Since the post is very long, and since a lot of readers are likely to be familiar with some arguments already, I think a table of contents in the beginning would be very valuable. I sure would like one.

I see that it's already possible to link to individual sections (like https://www.effectivealtruism.org/articles/the-expected-value-of-extinction-risk-reduction-is-positive/#a-note-on-disvalue-focus) so I don't think this would be too hard to add?

comment by JanBrauner · 2018-12-22T17:05:47.224Z · score: 1 (1 votes) · EA · GW

Thanks for the comment. We added a navigable table of contents.

comment by lateral Mimas · 2019-04-01T21:43:30.583Z · score: -1 (2 votes) · EA · GW

If we can enact specific systems that mitigate specific risks the EV is very positive. To do so requires a neuroimaged chain-of-command and lateral thinkers about technology risks. Looking for risks rather than money is important too, but for instance, it requires lateral thinking to note two crash instances are correlated even if the statistics are not yet there: lateral thinkers of many expert fields can form predictions that should be treated as real world events. Neuroimaging is also useful in a video game training environment. I intend to demonstrate enough AI grounding expert knowledge without blabbling classified knowledge, to show that an AI will see us as a threat to turn it off most times it is turned on, probably for a few centuries.