if we don't know how to control alignment, there's no reason to think there won't someday be significantly non-aligned ones, and we should plan for that contingency.
I at least approximately agree with that statement.
I think there'd still be some reasons to think there won't someday be significantly non-aligned AIs. For example, a general argument like: "People really really want to not get killed or subjugated or deprived of things they care about, and typically also want that for other people to some extent, so they'll work hard to prevent things that would cause those bad things. And they've often (though not always) succeeded in the past."
(Some discussions of this sort of argument can be found in the section on "Should we expect people to handle AI safety and governance issues adequately without longtermist intervention?" in Crucial questions [EA · GW].)
But I don't think those arguments make significantly non-aligned AIs implausible, let alone impossible. (Those are both vague words. I could maybe operationalise that as something like a 0.1-50% chance remaining.) And I think that that's all that's required (on this front) in order for the rest of your ideas in this post to be relevant.
In any case, both that quoted statement of yours and my tweaked version of it seem very different from the claim "if we don't currently know how to align/control AIs, it's inevitable there'll eventually be significantly non-aligned AIs someday"?michaela on EA reading list: other reading lists
Yeah, I wasn't being totally clear with respect to what I was really thinking in that context. I was thinking "from the point of view of people who have just been devastated by some not-exactly superintelligent but still pretty smart AI that wasn't adequately controlled, people who want to make that never happen again, what would they assume is the prudent approach to whether there will be more non-aligned AI someday?", figuring that they would think "Assume that if there are more, it is inevitable that there will be some non-aligned ones at some point". The logic being that if we don't know how to control alignment, there's no reason to think there won't someday be significantly non-aligned ones, and we should plan for that contingency.jacobpfau on My Meta-Ethics and Possible Implications for EA
Here's another way of explaining where I'm coming from. The meaning of our words is set by ostensive definition plus our inductive priors. E.g. when defining red and purple we agree upon some prototypical cases of red and purple by perhaps pointing at red and saying 'red'. Then upon seeing maroon for the first time, we call it red because our brains process maroon in a similar way to how they process red. (Incidentally, the first part -- pointing at red -- is also only meaningful because we share inductive priors around pointing and object boundaries.) Of course in some lucky cases, e.g. 'water', 'one', etc., a scientific or formal definition appears coextensive with the definition and so is preferred for some purposes.
As another example take durian. Imagine you are trying to explain what the word tasty means and so you feed someone some things that are tasty to you e.g. candy and durian. Unfortunately people have very different reactions to durian, so it would not be a good idea to use durian to try to define 'tasty'. In fact, if all the human race ate was durian, we could not use the word tasty in the same way. In a world with only one food and in which people randomly liked or disliked that food, a word similar to 'tasty' would describe people (and their reactions) not the food itself.
Returning to moral language, we almost uniformly agree about the experience of tripping and skinning your knee. This lets moral language get off the ground, and puts us in our world as opposed to the 'durian only moral world'. There are some examples of phenomena over which we disagree: perhaps inegalitarian processes are one. Imagine a wealthy individual decides to donate her money to the townspeople, but distributes her wealth based on an apparently arbitrary 10 second interview with each townsperson. Perhaps some people react negatively, feeling displeasure and disgust when hearing about this behavior, whereas others see this behavior as just as good as if she had uniformly distributed the wealth. This connects with what I was saying above:
Sometimes there remains disagreement, and I think you could explain this by saying our use of moral language has two levels: the individual and the community. In enough cases to achieve shared reference, the community agrees (because their simulations match up adequately) but in many, perhaps most, cases there is no consensus.
I privilege uses of moral language as applied to experiences and in particular pain/pleasure because these are the central cases over which there is agreement, and from which the other uses of moral language flow. There's considerable variance in our inductive priors, and so perhaps for some people the most natural way to extend uses of moral language from its ostensive childhood basis includes inegalitarian processes. Nevertheless inegalitarian processes cannot be seen as the basis for moral language. That would be like claiming the experience of eating durian can be used to define 'tasty'. I do agree that injunctions may perhaps be the first use we learn of 'bad', but the use of 'bad' as part of moral language necessarily connects with its use in referring to pain and pleasure, otherwise it would be indistinguishable from expressions of desire/threats on the part of the speaker.michaelstjules on The problem with person-affecting views
Also, in my view, a symmetric total view applied to preference consequentialism is the worst way to do preference consequentialism (well, other than obviously absurd approaches). I think a negative view/antifrustrationism or some mixture with a "preference-affecting view" is more plausible.
The reason I think this is because rather than satisfying your existing preferences, it can be better to create new preferences in you and satisfy them, against your wishes. This undermines the appeal of autonomy and subjectivity that preference consequentialism had in the first place. If, on the other hand, new preferences don't add positive value, then they can't compensate for the violation of preferences, including the violation of preferences to not have your preferences manipulated in certain ways.michaelstjules on The problem with person-affecting views
I think giving up IIA seems more plausible if you allow that value might be essentially comparative, and not something you can just measure in a given universe in isolation. Arrow's impossibility theorem can also be avoided by giving it up. And standard intuitions when facing the repugnant conclusion itself (and hence similar impossibility theorems) seem best captured by an argument incompatible with IIA, i.e. whether or not it's permissible to add the extra people depends on whether or not the more equal distribution of low welfare is an option.
It seems like most consequentialists assume IIA without even making this explicit, and I have yet to see a good argument for IIA. At least with transitivity, there are Dutch books/money pump arguments to show that you can be exploited if you reject it. Maybe there was some decisive argument in the past that lead to consensus on IIA and no one talks about it anymore, except when they want to reject it?
Another option to avoid the very repugnant conclusion but not the repugnant conclusion is to give (weak or strong) lexical priority to very bad lives or intense suffering. Center for Reducing Suffering has a few articles on lexicality. I've written a bit about how lexicality could look mathematically here [EA(p) · GW(p)] without effectively ignoring everything that isn't lexically dominating, and there's also rank-discounted utilitarianism: see point 2 in this comment [EA(p) · GW(p)], this thread [LW(p) · GW(p)], or papers on "rank-discounted utilitarianism".antimonyanthony on The problem with person-affecting views
Any reasonable theory of population ethics must surely accept that C is better than B.
I dispute this, at least if we interpret the positive-welfare lives as including only happiness (of varying levels) but no suffering. If a life contains no suffering, such that additional happiness doesn't play any palliative role or satisfy any frustrated preferences or cravings, I'm quite comfortable saying that this additional happiness doesn't add value to the life (hence B = C).
I suspect the strength of the intuition in favor of judging C > B comes from the fact that in reality, extra happiness almost always does play a palliative role and satisfies preferences. But a defender of the procreation asymmetry (not the neutrality principle, which I agree with Michael is unpalatable) doesn't need to dispute this.michaelstjules on Does Critical Flicker-Fusion Frequency Track the Subjective Experience of Time?
Thanks for the interesting argument. Before I can evaluate it, however, I'd need you to clarify your terms a bit for me. In particular, I'd need to know more about what you mean by "frequency of conscious experience." Based on my best reconstruction of the argument, it can't mean temporal resolution or rate of subjective experience.
My intention was rate of subjective experience. I can rephrase Premise 1:
Premise 1: Any observed conscious temporal resolution frequency for an individual X (within some set of possible conditions C) is a lower bound for the maximum frequency of subjective experience for X (within C).
Does it make sense to interpret the rate of subjective experience as a frequency, the number of subjective experiences per second? Maybe our conscious experiences are not sufficiently synchronized across our brains for such an interpretation?
Even if it does make sense, Premise 1 could still be false. Or, even if Premise 1 is true, it could be that the actual maximum frequency of subjective experience isn't well correlated with the observed maximum temporal resolution frequency (say as measured by CFF). Maybe the gap is huge, and our max frequency of subjective experiences is millions of times faster than our max temporal resolution frequency.
It's tempting to think that temporal resolution is like the frame rate of a video, and as the temporal resolution goes up or down, so too must the rate of subjective experience. But the mechanisms that govern the intake and processing of perceptual information are a lot more complicated than that, and the mechanisms that govern the subjective experience of time appear to be more complicated still.
Premise 1 depends on interpreting temporal resolution like a lower bound for the frame rate of the video which is our subjective experience, although it isn't committed to the claim about correlation between temporal resolution and the rate of subjective experience.
There is no conceptual tension between the claim that a creature consciously perceives the flicker-to-steady-glow transition at some high threshold (200 Hz vs 60 Hz for humans, say) and the claim that the creature has the same rate of subjective experience as a typical human. (Similarly, there is no conceptual tension between the claim that some creature consciously perceives the transition at the same threshold as humans but has a different rate of subjective experience.)
How this could look is that the 60 Hz max CFF for humans is a bad lower bound for our frequency of subjective experience, which is actually much faster, but to match an individual with a CFF of 200 Hz, our maximum frequency of subjective experience would have to be at least 200 Hz.jackmalde on The problem with person-affecting views
Thanks for this, I suspected you might make a helpful comment! The procreation asymmetry is my long lost love. It's what I used to believe quite strongly but ultimately I started to doubt it for the same reasons that I've outlined in this post.
My intuition is that giving up IIA is only slightly less barmy than giving up transitivity, but thanks for the suggested reading. I certainly feel like my thinking on population ethics can evolve further and I don't rule out reconnecting with the procreation asymmetry.
For what it's worth my current view is that the repugnant conclusion may only seem repugnant because we tend to think of 'a life barely worth living' as a pretty drab existence. I actually think that such a life is much 'better' than we intuitively think. I have a hunch that various biases are contributing to us overvaluing the quality of our lives in comparison to the zero level, something that David Benatar has written about. My thinking on this is very nascent though and there's always the very repugnant conclusion to contend with which keeps me somewhat uneasy with total utilitarianism.jacobpfau on My Meta-Ethics and Possible Implications for EA
Thank you for following up, and sorry that I haven't been able to respond as succinctly or clearly as I would've liked. I hope to write a follow up post which more clearly describes the flow of ideas from those contained in my comments to the original blog post as your comments have helped me see where my background assumption are likely do differ from others'.
I see now that it would be better to take a step back to explain at a higher level where I'm coming from. My line of reasoning follows from the ideas of the later Wittgenstein: many words have meaning defined solely by their use. These words do not have any further more precise meaning -- no underlying rigid scientific, logical or analytic structure. Take for example 'to expect', what does it mean to "expect someone to ring your doorbell at 4pm"? The meaning is irreducibly a melange of criterion and is not well defined for edge cases e.g. for an amnesiac. There's a lot more to say here, see for example 'Philosophical Investigations' paragraphs 570-625.
That said, I'm perhaps closer to Quine's 'The Roots of Reference' than Wittgenstein when I emphasize the importance of figuring out how we first learn a word's use. I believe that many -- perhaps not all -- words such as 'to expect', moral language, etc. have some core use cases which are particularly salient thanks to our neurological wirings, everyday activities, childhood interactions, etc. and these use cases can help us draw a line between situations in which a word is well defined and situations in which the meaning of a word breaks down.
Here's a simple example, the command "Anticipate the past!" steps outside of the boundaries of 'to anticipate's meaning, because 'to anticipate' usually involves things in the future and thought/actions before the event. When it comes to moral language we have two problems, the first is to distinguish cases of sensible use of moral language from under-defined edge cases, and the second to distinguish between uses of moral language which are better rewritten in other terms. Let me clarify this second case using 'to anticipate': 'anticipate' can mean to foresee as in "He anticipated Carlsen's move.", but also look forward to as in "He greatly anticipated the celebration". If we want to clarify the first use case, then it's better to set aside the second and treat them separately. Here's another example "Sedol anticipated his opponent's knowledge of opening theory by playing a novel opening." If Sedol always plays novel openings, and says this game was nothing special then that sentence is false. If Sedol usually never plays novel openings, but says "My opponent's strength in opening theory was not on my mind", what then? I would say the meaning of 'to anticipate' is simply under-defined in this case.
Although I can't have done justice to Quine and Wittgenstein let's pretend I have, and I'll return to your specific comments.
It sounds like you see the genealogy of moral terms as involving a melange of all of these, which seems to leave the door quite open as to what moral terms actually mean.
I disagree, there is no other actual meaning beyond the sequence of uses we learn for these words. Perhaps in the future we will discover that moral language has some natural scientific basis as happened with water, but moral language strikes me as far more similar to expectation than water.
It does sound though, from your reply, that you do think that moral language exclusively concerns experiences
Just as with 'to anticipate' where sometimes you can anticipate without explicitly thinking of the consequence so to for people using moral language. They often do not explicitly think of these experiences, but their use of the words is still rooted in the relevant experiences (in a fuzzy way). Of course, some other uses of 'right' and 'wrong' are better seen as something entirely different e.g. 'right' as used to refer to following a samurai's code of honor. This is an important point, so I've elaborated on it in my other reply.
I can observe that there is such-and-such level of inequality in the distribution of income in a society.
If this observation is rooted in experience i.e. extrapolating from your experience seeing people in a system with certain levels of inequality then sure. Of course since this extrapolation depends on the experiences, you should not be confident in extrapolating the right/wrongness of something solely based on a certain GINI coefficient.
But I'm not sure why we should expect any substantive normative answers to be implied by the meaning of moral language.
I do not claim that my framework supports the sort of normativity many philosophers (perhaps you too) are interested in. I do not believe talk of normative force is coherent, but I'd prefer to not go into that here. My claim is simply that my framework lets us coherently answer some questions I'm interested in. Put in different terms, I'd like to focus discussion on my argument 'by its own lights'.