[Link] How understanding valence could help make future AIs safer

post by Milan_Griffes · 2020-10-08T18:53:59.848Z · score: 22 (13 votes) · EA · GW · 2 comments

A blog post by Mike Johnson [EA · GW], Director of the Qualia Research Institute: https://opentheory.net/2015/09/fai_and_valence/ (a)

Excerpt:

What makes some patterns of consciousness feel better than others? I.e. can we crisply reverse-engineer what makes certain areas of mind-space pleasant, and other areas unpleasant?
If we make a smarter-than-human Artificial Intelligence, how do we make sure it has a positive impact?... The following outlines some possible ways that progress on the first question could help us with the second question.
...
1. Valence research could simplify the Value Problem and the Value Loading Problem.* If pleasure/happiness is an important core part of what humanity values, or should value, having the exact information-theoretic definition of it on-hand could directly and drastically simplify the problems of what to maximize, and how to load this value into an AGI**...
2. Valence research could form the basis for a well-defined ‘sanity check’ on AGI behavior. Even if pleasure isn’t a core terminal value for humans, it could still be used as a useful indirect heuristic for detecting value destruction. I.e., if we’re considering having an AGI carry out some intervention, we could ask it what the expected effect is on whatever pattern precisely corresponds to pleasure/happiness. If there’s be a lot less of that pattern, the intervention is probably a bad idea...
3. Valence research could help us be humane to AGIs and WBEs*. There’s going to be a lot of experimentation involving intelligent systems, and although many of these systems won’t be “sentient” in the way humans are, some system types will approach or even surpass human capacity for suffering. Unfortunately, many of these early systems won’t work well— i.e., they’ll be insane. It would be great if we had a good way to detect profound suffering in such cases and halt the system...
4. Valence research could help us prevent Mind Crimes. Nick Bostrom suggests in Superintelligence that AGIs might simulate virtual humans to reverse-engineer human preferences, but that these virtual humans might be sufficiently high-fidelity that they themselves could meaningfully suffer. We can tell AGIs not to do this- but knowing the exact information-theoretic pattern of suffering would make it easier to specify what not to do.
5. Valence research could enable radical forms of cognitive enhancement. Nick Bostrom has argued that there are hard limits on traditional pharmaceutical cognitive enhancement, since if the presence of some simple chemical would help us think better, our brains would probably already be producing it. On the other hand, there seem to be fewer a priori limits on motivational or emotional enhancement. And sure enough, the most effective “cognitive enhancers” such as adderall, modafinil, and so on seem to work by making cognitive tasks seem less unpleasant or more interesting. If we had a crisp theory of valence, this might enable particularly powerful versions of these sorts of drugs.
6. Valence research could help align an AGI’s nominal utility function with visceral happiness. There seems to be a lot of confusion with regard to happiness and utility functions. In short: they are different things! Utility functions are goal abstractions, generally realized either explicitly through high-level state variables or implicitly through dynamic principles. Happiness, on the other hand, seems like an emergent, systemic property of conscious states, and like other qualia but unlike utility functions, it’s probably highly dependent upon low-level architectural and implementational details and dynamics...
7. Valence research could help us construct makeshift utility functions for WBEs and Neuromorphic* AGIs...
8. Valence research could help us better understand, and perhaps prevent, AGI wireheading. How can AGI researchers prevent their AGIs from wireheading (direct manipulation of their utility functions)? I don’t have a clear answer, and it seems like a complex problem which will require complex, architecture-dependent solutions, but understanding the universe’s algorithm for pleasure might help clarify what kind of problem it is, and how evolution has addressed it in humans...
9. Valence research could help reduce general metaphysical confusion. We’re going to be facing some very weird questions about philosophy of mind and metaphysics when building AGIs, and everybody seems to have their own pet assumptions on how things work. The better we can clear up the fog which surrounds some of these topics, the lower our coordinational friction will be when we have to directly address them...
10. Valence research could change the social and political landscape AGI research occurs in. This could take many forms: at best, a breakthrough could lead to a happier society where many previously nihilistic individuals suddenly have “skin in the game” with respect to existential risk. At worst, it could be a profound information hazard, and irresponsible disclosure or misuse of such research could lead to mass wireheading, mass emotional manipulation, and totalitarianism. Either way, it would be an important topic to keep abreast of.

2 comments

Comments sorted by top scores.

comment by MichaelPlant · 2020-10-09T10:43:09.625Z · score: 3 (2 votes) · EA(p) · GW(p)

There are 10 reasons here, but isn't there just one key point: if we could explain to an AGI what happiness is, then we could get it to create more happiness (or, at least, not create more unhappiness)? I don't mean to sound like I'm dismissing this - this is an important and laudable goal - I'm wondering if I'm missing something.

comment by Milan_Griffes · 2020-10-09T17:24:27.212Z · score: 4 (3 votes) · EA(p) · GW(p)
... if we could explain to an AGI what happiness is, then we could get it to create more happiness (or, at least, not create more unhappiness)?

I think this captures #1, #2, #4, #6, #8.

But not #3 and #5, and not really #7, not really #9, not really #10.