Might wireheaders turn into paperclippers?

post by Lila · 2015-09-13T21:11:27.378Z · EA · GW · Legacy · 13 comments

Contents

  Morally valuable post-human futures?
  Reward functions we don't care about
  Solutions
  Speculation
    as “similarity to one's own mind”
    spectrum of personal identity
    the status quo is death
None
13 comments

Morally valuable post-human futures?

Effective altruism's expanding circle of moral concern has become wide enough, for many people, to include “post-humans”: beings that may become the dominant agents of the future, rendering human experience as we know it negligible or nonexistent in comparison. These beings may be biological, computational, or a mixture of the two. In this category I include transhumanist scenarios, such as human enhancement and wireheading, certain simulations, and hedonium.

 

For the most part, these scenarios are viewed as highly net positive by some EAs. In fact, some people believe that these possible futures should be a goal of effective altruism and existential risk reduction. For example, in one argument for why people should care about existential risk, Nick Bostrom imagines a galactic supercluster being entirely converted into computers that would simulate happy human minds. A 1% chance of speeding this scenario by one second, he says, would bring astronomical utility. Of course, this Pascal's-mugging-like argument is not the main argument for existential risk reduction, and I don't intend to straw-man existential risk researchers. However, these post-human fantasies are popular enough in the EA community that they are worth addressing.

 

The typical objection to pursuing post-human fantasies is the possibility of post-human nightmares: universes that contain astronomical suffering. These arguments have been well-discussed by other people. However, there is a third option that has been largely ignored: what if post-humans have neither positive nor negative moral value?

Reward functions we don't care about

How do we know if a post-human is happy? This may be very difficult if the post-human bears little similarity to us, because we wouldn't be able to ask it about its internal experience in a way that would provide a trustworthy answer, much less identify neurotransmitters or brainwaves of value. Some might approximate happiness as the post-human's reward function, which can be either satisfied or unsatisfied. By this definition, a paperclipper would be capable of pain and pleasure, because it has a reward function that's satisfied when it produces paperclips. However, few people would view a universe full of paperclips as astronomically net positive, even though the paperclipper would be “happy”. If you believe that the paperclipper scenario would be very good, consider an even simpler reward function and ask yourself if you still care about it. For example, imagine an electrical switch with two positions: one labeled “pleasure” and one labeled “pain”. Would this system be algedonic in a way we'd care about?

 

Thus, it seems insufficient to maximize reward functions to maximize positive value.

Solutions

My best solution to the reductio ad absurdum of valuing paperclippers is to base my moral system on human experience as we know it. The experiences of animals and other entities are valuable to me insofar as they are similar to human experience (see following sections).

 

I still care about existential risk because of the value of future human generations, but the moral case feels less overwhelming than I previously believed, because I don't value many possible post-human futures.

 

While I don't object to mild forms of human enhancement per se, I am concerned that any post-humans would face selective pressures to become less like humans and more like paperclippers. Certain technological leaps would be irreversible and would send us hurtling toward futures that have little relationship to human values. I believe that MIRI shares these concerns in regards to AI but hasn't extended concern to the other post-human scenarios I listed, which are widely viewed as positive. Some AI researchers have suggested putting restrictions on certain types of AI research, a policy which I may want applied to other technological developments as well. At the very least, I would avoid actively endorsing or pursuing the creation of post-humans.

Speculation

Note: I'm not very confident in the following, and it's less important to my major concerns about post-humans.

Consciousness as “similarity to one's own mind”

I'm not confident that the hard problem of consciousness can be answered empirically. Thus, I find a more tractable question is, “What do people mean when they say 'consciousness'?” Top-down definitions of consciousness (e.g. “complexity” or “integration”) seem to miss the point. We can always come up with systems that meet these definitions of consciousness but don't fit our intuitions of consciousness. For example, some have speculated that corporations may be conscious, but I think that few people would be willing to accept this conclusion. So what if we skipped the pretense and started with the intuitions, instead of trying to stretch definitions to fit our intuitions? Many people seem to have the following intuitions: humans are certainly conscious, animals are less conscious (and their consciousness decreases as their similarity to humans decreases), plants are not conscious, computers are only conscious if they behave similarly to human cognitive processes (e.g. by simulating human brains).

 

Defining consciousness as similarity to one's own mind captures many of these intuitions, though it leads to some unpleasant conclusions. For example, it would lead to some degree of moral egotism. Additionally, from this perspective, I would view someone in Sub-Saharan Africa as less conscious than a fellow American. I'm very reluctant to accept these conclusions. One consolation is that the effects of this egotism would be negligible, since most humans are very similar to me. Also, these bullets are easier to bite than some others, such as conscious corporations. Finally, these conclusions are somewhat in line with “common-sense” ethics, which accepts slight egotism and strong kin preference.

 

Furthermore, egotism is implicitly central to discussions of consciousness. Thought experiments about qualia are about one's personal experience with, for example, the color red. When we ask, “What is it like to be a bat?”, we are asking “What is it like for me to be a bat?”

The spectrum of personal identity

The problem of personal identity has been frequently discussed in the philosophical literature. If I'm constantly changing, down to the cellular level, then, like the ship of Theseus, how can I maintain a constant personal identity? Open individualism rejects the idea of personal identity entirely and asserts that all consciousness is one. This seems too vague to me. Instead, I view personal identity as a spectrum, based on similarity to my current self. The self of five minutes ago is more “me” than the self of five years ago, who is more “me” than Mike, a middle-aged Republican farmer in Iowa. Mike is far more “me” than Mike's chicken.

Changing the status quo is death

Transhumanists note status quo bias in common objections to human enhancement: people are reluctant to accept changes to themselves or their societies, even if these changes appear objectively better. However, to some degree this bias can be viewed as rational. If I were to become Mike and lose all trace of my previous self, I would view this as almost as bad as death, even though Mike is happy. Of course, there are tradeoffs between self-improvement and status quo preservation. I don't want to retain all features of my current mind for the rest of my life, but I'm sufficiently concerned about small deaths of my personal identity that I avoid long-term mind-altering substances.  

13 comments

Comments sorted by top scores.

comment by Tor_Barstad · 2015-09-14T11:38:19.992Z · EA(p) · GW(p)

It appears to me that if we were a species that didn't have [insert any feeling we care about, e.g. love, friendship, humour or the feeling of eating tasty food], and someone then invented it, then many people would think of it as not being valuable. The same would go for some alien species that has different kinds of conscious experiences from us trying to evaluate our experiences. I'm convinced that they would be wrong in not valuing our experiences, and I think this shows that that way of thinking leads to mistakes. Would you agree with this (but perhaps still think it's the best policy because there's no better option)?

I agree that analysing the conscious experiences of others, especially those with minds that are very different from ours, isn't straight forward, and that we very well might not ever understand the issue completely. But it seems likely to me that we, especially if aided by superintelligence, could be able to make solid case for why some minds have conscious experiences that are better than ours (and are unlikely to be bad). Strong indicators could include what the minds want themselves, how different chemical occurrences in our brains correlate with which experiences we value/prefer, etc. While similarities to our own minds makes it easier for us to make judgments about the value of a minds consciousness with confidence, it could be that we find that there are states of being that probably are more valuable than that of a biological human. Would you agree?

It seems entirely plausible that there are conscious experiences that can be perceived to be much more profound/meaningful than anything experienced by current biological humans, and that there could be experiences that are as intensively positive as the experiences of torture are negative to us. Would you agree?

My own stance on utilitronium and post-humans is that I wouldn't take a stance today in regards to specific non-human-like designs/structures, but suspect that if we created conscious beings/stuff based on good thinking about consciousness with the main goal of maximising for positive/meaningful experience, and set aside some small or large fraction of the universe we colonise to this, it would be likely to make our civilisation more valuable by orders of magnitude than if all minds experienced the human experience.

If we based on self-interest, or based on other feelings, are uncomfortable about where our thinking about what's valuable leads us, we could compromise by using much of the matter in the universe we get hold of in the way impartial thinking tells us is best, and some other part or fraction in a way that fits the egoistic interests of the human species and/or make us feel fuzzy inside. If we think that some forms of minds or conscious matter are likely to have extreme value (and doesn't plausibly have negative value), but we are genuinely unsure if this is the case, then a reasonable solution could be to dedicate some fraction of the matter we get hold on to this kind of structure, and another to that kind of structure, etc.

comment by Lila · 2015-09-14T15:10:14.134Z · EA(p) · GW(p)

"It appears to me that if we were a species that didn't have [insert any feeling we care about, e.g. love, friendship, humour or the feeling of eating tasty food], and someone then invented it, then many people would think of it as not being valuable."

Is this a problem? I don't think humor is inherently valuable. It happens to be valuable to humans, but an alternate world in which it weren't valuable seems acceptable.

"I'm convinced that they would be wrong in not valuing our experiences, and I think this shows that that way of thinking leads to mistakes. Would you agree with this (but perhaps still think it's the best policy because there's no better option)?"

Completely disagree. They'd be in disagreement with my values, but there's no way to show that they're objectively wrong.

"Strong indicators could include what the minds want themselves, how different chemical occurrences in our brains correlate with which experiences we value/prefer, etc."

What they "want"? Just like paperclippers "want" paperclips? "Chemical occurrences" is an even more implausible framing. I doubt they'd have any analogue of dopamine, etc.

"While similarities to our own minds makes it easier for us to make judgments about the value of a minds consciousness with confidence, it could be that we find that there are states of being that probably are more valuable than that of a biological human. Would you agree?"

No, I don't think I agree. Maybe some states are better but only because of degree, e.g. developing purer heroin. I don't think anyone could convince me that a certain configuration of, say, helium is more valuable than a human mind.

"It seems entirely plausible that there are conscious experiences that can be perceived to be much more profound/meaningful than anything experienced by current biological humans, and that there could be experiences that are as intensively positive as the experiences of torture are negative to us. Would you agree?"

Not sure what you mean by meaningful. I don't believe in objective meaning. But this is pretty much what you were saying before, so again, I think I only agree when it's a matter of degree.

"If we based on self-interest, or based on other feelings, are uncomfortable about where our thinking about what's valuable leads us, we could compromise by using much of the matter in the universe we get hold of in the way impartial thinking tells us is best, and some other part or fraction in a way that fits the egoistic interests of the human species and/or make us feel fuzzy inside."

I'm not sure what impartial means in this context. This is a discussion of values, so "impartial" is a contradiction.

I think the major issue here is that you seem to be taking moral realism for granted and assume that if we look hard enough, morality will reveal itself to us in the cosmos. I'm a moral anti-realist, and I'm unable to conceive of what evidence for moral realism would even look like.

comment by Tor_Barstad · 2015-09-23T01:06:06.350Z · EA(p) · GW(p)

“I think the major issue here is that you seem to be taking moral realism for granted and assume that if we look hard enough, morality will reveal itself to us in the cosmos. I'm a moral anti-realist, and I'm unable to conceive of what evidence for moral realism would even look like.”

That may be a correct assessment.

I think that like all our knowledge about anything, statements about ethics rest on unproven assumptions, but that there are statements about some states of the world being preferable to others that we shouldn’t have less confidence in than many of the mathematical and metaphysical axioms we take for granted.

That being said, I do realize that there are differences between statements about preferences and statements about physics or mathematics. A child-torture-maximizing alien species could have a self-consistent view of morality with no internal logical contradictions, and would not be proven wrong by interaction with reality in the way interaction with reality can show some ideas about physics and mathematics to be wrong.

I don’t think moral law somehow is ingrained into the universe somehow and will be found by any mind once sufficiently intelligent, but I do think that we are right to consider certain experiences as better to occur than not occur and certain experiences as worse to occur than occur, and that we should consider ways of thinking that lead us to accept statements entail statements that are in logical contradiction with this as wrong.

To summarise some of my views that I think are relevant to your original post:

  • I don’t expect every being above a certain intelligence-level to be conscious (although I don’t dismiss the possibility), and I certainly don’t think every satisfaction of a reward function has value.
  • I’m unsure about how much or little progress we will make in our understanding of consciousness, but it’s not at all intuitively clear to me that it should be an unrealistic problem to solve (even with todays limited intelligence and tools for reasoning we’re not totally clueless).
  • If we don’t get a better understanding of consciousness I think and making inferences about the possible consciousness of other structures by noticing differences with and similarities with our own brains will be a very central tool, and it may be that the best way to go is to fill much of the universe with structures that are similar to human brains having positive lives/experiences, but avoid structures that if plausible theories of consciousness are true could be very bad (like e.g. computer simulations of suffering brains).
  • For all I know, “selective pressures to become less like humans and more like paperclippers” could be something to worry about.
  • While I think likeness-to-humans can be a useful heuristic for avoiding getting things wrong and ensuring a future that’s valuable, I think it is unreasonable to make the assumption that conscious experiences are valuable only insofar as they are similar to those of humans.
comment by Tor_Barstad · 2015-09-23T01:05:41.917Z · EA(p) · GW(p)

So a bit of a late answer here :)

"Is this a problem? I don't think humor is inherently valuable. It happens to be valuable to humans, but an alternate world in which it weren't valuable seems acceptable."

If a species has conscious experiences that all are of a kind that we are familiar with, but they lack our strongest and most valued experiences, and devalue these because they follow a strict the-less-similar-to-us-the-less-valuable-policy, then I think that’s regrettable. If they themselves and/or beings they create don’t laugh at jokes but have other positive experiences/feelings in place of this, then whether it is a problem depends on the quality and quantity of these other experiences/feelings.

Just in case I've been imprecise in describing my own position: All I would be confident in claiming is that there are experiences that are positive (it is better for them to exist than not exist), experiences that are negative (it would be better if they didn't exist), and collections of experiences that have higher value than other experiences (the experience of a pinprick is preferable to the experience of being burned alive, one experience of being burned alive is preferable to a thousand identical experiences of being burned alive, etc).

"Completely disagree. They'd be in disagreement with my values, but there's no way to show that they're objectively wrong."

Would you say the say the same thing if I brought forward an example of an alien species that doesn't recognise that it's bad when humans have the conscious experiences they have when they're being tortured? Given that they don't have corresponding conscious experiences themselves, this seems to follow from the methodology of thinking about consciousness that you describe.

Whether we consider the foundation of morals to be objective or not, and what we would mean by objective, is something we could discuss, but if we suppose that we can’t reasonably talk about “being right” about moral questions then that doesn’t seem to me to undermine my point of view anymore than it undermines the point of your post.

“What they "want"? Just like paperclippers "want" paperclips? "Chemical occurrences" is an even more implausible framing. I doubt they'd have any analogue of dopamine, etc.”

You say “they”, but if I am interpreted to refer to any specific physical structure, this is by accident. I don’t presuppose that structures/beings that are created for the sake of their consciousness should be based on other neurotransmitters than ours. Biological brains are the only structures that I’m confident are conscious (the more similar to humans they are the more confident I am). The point I’m trying to communicate is that we may be able to deduce with moderate-to-high confidence whether or not a structure is conscious and whether the experiences in question are positive, also when we haven’t experienced them ourselves. We can e.g. argue that rewarding brain simulation probably is a positive experience for a rat (https://www.youtube.com/watch?v=7HbAFYiejvo), not because we ourselves have rat brains or have experienced such simulations, but because the chemical occurrences seem to correspond with what’s happening the brain of a happy human, and because they act in a way that signals that they want more of it (and the correspondence between wanting something and positive feelings probably is similar to that of a human brain, since these parts of human and rat brains probably work in similar ways).

“Maybe some states are better but only because of degree, e.g. developing purer heroin. I don't think anyone could convince me that a certain configuration of, say, helium is more valuable than a human mind.”

In regards to physical structures based on a completely different chemical underpinning than the human brain that have more value than a conscious human, I’m unsure if there will be arguments in the future that will convince me of the likeliness or unlikeliness of this, but I don’t assume that there necessarily will (I really hope that we come to grips with how consciousness works, but I’m genuinely unsure about whether or not it’s likely that we will).

Good to hear that you are open towards the possibly acknowledging of conscious states that are more valuable than ones we have now if they are “the same” experience but with a higher “degree” :) If I interpreted that correctly it’s different from and better than the view I interpreted as being described in the main post (which I interpreted as asserting positive feelings that are more intense than the human experience as always being less valuable).

“I'm not sure what impartial means in this context. This is a discussion of values, so "impartial" is a contradiction.”

Here is an excerpt from an unfinished text of mine where I try to describe what I mean by partial (I acknowledge that the concept is a bit fussy, but I don’t think it is a contradiction when used in the sense I mean it):


Many people agree with the principles logic - among those that true statements cannot be logically inconsistent. There are also principles beyond those of logic that many consider to be a part of rational thinking, like e.g. Occam's razor. In my mind an essential aspect of thinking honestly and rationally about morality is to be impartial in the way you think. A loose description of what I mean by an impartial way of thinking would be that a mind that has the same knowledge as you but is in different circumstances from you wouldn’t reach conclusions that are logically contradictory with your conclusions.

Take the example of a soldier fighting for Germany in World War I, and a soldier fighting on the opposing side. They are both doing something that tends to feel right for humans; namely being on the side of their country. Their tribe. But given that the goals of one of the soldiers relies on the assumption “Germany winning World War I is good”, while the other soldiers has assumptions that implies “Germany winning World War I is not good”, then the principle of no logical contradictions dictates that they cannot both be right.

If you in one setting (be that living in a specific country or time period, belong to a specific species, etc) reach one opinion, but in another setting would have reached a contradictory opinion is using the same way of thinking, then this suggests that your way of thinking isn’t impartial.

We should remember to ask ourselves: Which action would we have chose for ourselves if we were spectating from the outside? If we didn’t belong to any nation or species? If we were neither born or unborn? If we knew everything we know, but wouldn’t be affected by the action chosen, and weren’t affiliated with anyone in any way?

For example, we know that there are children in the world who are dying from poverty-related causes that could be saved at a cost of some hundred dollars per person. Meanwhile, in my home country Norway, many people are upset that we don’t spend more money on refurbishing swimming pools and sport facilities. But if we were impartial observers; which action would you consider best?

Always choosing the action that from an impartial point of view has the best consequences may be too much to expect of ourselves, but we should still be aware of which actions we would have considered to have the best consequences if we were impartial observers.


Here is another excerpt from the same text:


We could imagine a group of aliens that are concious, and have some feelings in common with us. Let’s say that they get the same kind of enjoyment that we do out of friendship and sexual gratification, but that they aren’t familiar with the positive experiences we get out of romantic relationships, art, movies and litterature, eating a good meal, eating ice cream or sweets, games, humour, music, learning, etc.

One could imagine this alien species observing us, and deeming parts of our existance that we consider valuable and meaningful as meaningless. “Sure”, they could say, “we can see the value of these beings experiencing the kind of experiences that we value for ourselves, but why should these other kinds of conciousness that we’re not familiar with have any value?”.

We could also imagine that the aliens have the same kind of negative experience as us when being hit or cut with sharp objects, but are totally foreign to the discomforts we would experience if burned alive or drowned. “It would be a tradgedy beyond imaginening if the universe was filled with concious experiences of being stabbed and hit”, they could say, “but we see no reason why we should try to minimize the kinds of experiences humans experience when burning alive or drowning“.

The mistake these aliens are doing is to not assume, or even think it a possibility, that there are experiences worth valuing or avoiding outside of the range of experiences they know. When we evaluate structures that might be concious, and might have experiences that are different from the ones we are familiar with, we should try to think in a way that wouldn’t lead us to make the same mistakes as the aliens in this thought-experiment if we only knew the kinds of experiences that they knew.


Does this conception of partiality/impartiality make at least some sense in your mind?

comment by Squark · 2015-09-14T08:14:34.451Z · EA(p) · GW(p)

I completely agree that many conceivable post-human future have low value. See also "unhumanity" scenario in my analysis. I think that term "existential risk" might be somewhat misleading since what we're really aiming it as "existence of beings and experiences that we value" rather than just existence of "something." That is, I view your reasoning not as an argument for caring less about existential risk but as an argument for working towards a valuable far future.

Regarding MIRI, I think their position is completely adequate since once we create a singleton which endorses our values it will guard us from all sorts of bad futures, not only from extinction.

Regarding "consciousness as similarity", I think it's a useful heuristic but it's not necessarily universally applicable. I consider certain futures in which I gradually evolve into something much more complex than my current self as positive, but one must be very careful about which trajectories to endorse. Building an FAI will save us from doing irreversible mistakes, but if for some reason constructing a singleton turns out to be intractable we will have to think of other solutions.

comment by Lila · 2015-09-14T13:27:02.026Z · EA(p) · GW(p)

I worry that the values that people want to put into a singleton are badly wrong, e.g. creating hedonium. I want a singleton that will protect us from other AI. Other than that, I'd be wary of trying to maximize a value right now. At most I'd tell the AI "hold until future orders".

comment by Squark · 2015-09-14T15:15:58.600Z · EA(p) · GW(p)

"Hold until future orders" is one approach but it might turn out to be much more difficult than actually creating an AI with correct values. This is because the formal specification of metaethics (that is a mathematical procedure that takes humans as input and produces a utility function as output) should be of much lower complexity than specifying what it means to "protect from other AI but do nothing else."

comment by MichaelDickens · 2015-09-14T02:12:33.308Z · EA(p) · GW(p)

I don't think it's absurd at all to value paperclippers if they experience happiness and suffering. I believe it's wrong to create suffering even if it's in a being that doesn't share your goals. Now there's the question of whether a standard paperclip maximizer would actually be capable of feeling pain, but if it is then I want it not to.

I expect that happiness/suffering is considerably more complicated than just a reward function. Having a reward system is probably a necessary but not sufficient condition for sentience.

comment by Lila · 2015-09-14T02:23:37.944Z · EA(p) · GW(p)

"Having a reward system is probably a necessary but not sufficient condition for sentience."

I agree.

"Now there's the question of whether a standard paperclip maximizer would actually be capable of feeling pain"

I don't think that's an empirical question, exactly. I just argued that, intuitively, they wouldn't feel pain in a way that we'd morally value. How would we answer this empirically? Do you think the electrical switch example matters morally?

comment by MichaelDickens · 2015-09-14T14:04:24.489Z · EA(p) · GW(p)

Whether I personally feel pain is an empirical question, so it stands to reason that it's an empirical question for other beings as well. (Presumably when you feel pain it's a matter of fact that you do, not a matter of opinion.) Therefore it's an empirical question whether a paperclipper feels pain, although I don't know how we'd find the answer to this question even in principle. You could say that you don't value the paperclipper's pain even if it does feel pain, but that doesn't make a lot of sense to me. Pain is bad no matter who has it.

Perhaps a paperclipper feels pain in a way that's dramatically different from how I feel it. Then the question is more like, does the paperclipper have an experience that it strongly dislikes? (This question is only meaningful if the paperclipper is sentient, which it may very well not be.)

comment by Lila · 2015-09-14T14:58:37.389Z · EA(p) · GW(p)

It's an empirical question whether, say, certain neurons fire. What it means to feel pain in the "hard problem" way is kind of squishy.

I just don't find it plausible that we can empirically solve this, anymore than I find it plausible that we can empirically prove moral realism or dualism.

comment by HedonicTreader · 2015-09-17T19:40:24.653Z · EA(p) · GW(p)

Lila, the future may not be controlled by a singleton but by a plurality of people implementing diverse values. But even if it is, the singleton may not maximize one value, but a mix of different values different people care about - a compromise "value handshake" as Scott Alexander called it.

Thus, it is best to emphasize you are not paperclip minimizers. The same goes for hedonium, unaided scientific insight, longevity, or art, to name just a few things some transhumanists value while others don't.

There are two kinds of value conflicts: Ones where the values are merely orthogonal, and ones where values are either diametrically opposed or at least strongly negatively correlated in practice.

The orthogonal ones are still in conflict when limited resources are concerned - but not otherwise. It is much easier to find a compromise between them than between the opposed or practically negatively correlated ones.

There is no reason why a sigleton could not spend some resources on paperclips, some on hedonium, some on bigger happy minds, some on Fun, some on art, some on biodiversity, etc., if this increases the probability that people will compromise on letting the singleton come into being and be functional.

comment by jasonk · 2015-09-16T18:13:47.181Z · EA(p) · GW(p)

This is a thought-provoking post.

It makes me wonder how much Homo erectus or even early agriculturalists would find our values and projects desirable and worthy. Or have we already diverged too much for them?

There must be a literature on this at least. Maybe as it relates to moral progress?