My deeply concerning impression is that OpenPhil (and the average funder) has timelines 2-3x longer than the median safety researcher. Daniel has his AGI training requirements set to 3e29, and I believe the 15th-85th percentiles among safety researchers would span 1e31 +/- 2 OOMs. On that view, Tom's default values are off in the tails.
My suspicion is that funders write off this discrepancy, if noticed, as inside-view bias i.e. thinking safety researchers self-select for scaling optimism. My, admittedly very crude, mental model of an OpenPhil funder makes two further mistakes in this vein: (1) Mistakenly taking the Cotra report's biological anchors weighting as a justified default setting of parameters rather than an arbitrary choice which should be updated given recent evidence. (2) Far overweighting the semi-informative priors report despite semi-informative priors abjectly failing to have predicted Turing-test level AI progress. Semi-informative priors apply to large-scale engineering efforts which for the AI domain has meant AGI and the Turing test. Insofar as funders admit that the engineering challenges involved in passing the Turing test have been solved, they should discard semi-informative priors as failing to be predictive of AI progress.
To be clear, I see my empirical claim about disagreement between the funding and safety communities as most important -- independently of my diagnosis of this disagreement. If this empirical claim is true, OpenPhil should investigate cruxes separating them from safety researchers, and at least allocate some of their budget on the hypothesis that the safety community is correct.
In my opinion, the applications of prediction markets are much more general than these. I have a bunch of AI safety inspired markets up on Manifold and Metaculus. I'd say the main purpose of these markets is to direct future research and study. I'd phrase this use of markets as "A sub-field prioritization tool". The hope is that markets would help me integrate information such as (1) methodology's scalability e.g. in terms of data, compute, generalizability (2) research directions' rate of progress (3) diffusion of a given research direction through the rest of academia, and applications.
Here are a few more markets to give a sense of what other AI research-related markets are out there: Google Chatbot, $100M open-source model, retrieval in gpt-4
Seems to me safety timeline estimation should be grounded by a cross-disciplinary, research timeline prior. Such a prior would be determined by identifying a class of research proposals similar to AI alignment in terms of how applied/conceptual/mathematical/funded/etc. they are and then collecting data on how long they took.
I'm not familiar with meta-science work, but this would probably involve doing something like finding an NSF (or DARPA) grant category where grants were made public historically and then tracking down what became of those lines of research. Grant-based timelines are likely more analogous to individual sub-questions of AI alignment than the field as a whole; e.g. the prospects for a DARPA project might be comparable to the prospects for working out the details of debate. Converting such data into a safety timelines prior would probably involve estimating how correlated progress is on grants within subfields.
Curating such data, and constructing such a prior would be useful both in terms of informing the above estimates, but also for identifying factors of variation which might be intervened on--e.g. how many research teams should be funded to work on the same project in theoretical areas? This timelines prior problem seems like a good fit for a prize, where entries would look like recent progress studies reports (c.f. here and here).
Do you have a sense of which argument(s) were most prevalent and which were most frequently the interviewees crux?
It would also be useful to get a sense of which arguments are only common among those with minimal ML/safety engagement. If basic AI safety engagement reduces the appeal of a certain argument, then there's little need for further work on messaging in that area.
A few thoughts on ML/AI safety which may or may not generalize:
You should read successful candidates' SOPs to get a sense of style, level of detail, and content c.f. 1, 2, 3. Ask current EA PhDs for feedback on your statement. Probably avoid writing a statement focused on an AI safety/EA idea which is not in the ML mainstream e.g. IDA, mesa-optimization, etc. If you have multiple research ideas, considering writing more than one (i.e. tailored) SOP and submit the SOP which is most relevant to faculty at each university.
Look at groups' pages to get a sense of the qualification distribution for successful applicants, this is a better way to calibrate where to apply than looking at rankings IMO. This is also a good way to calibrate how much experience you're expected to have pre-PhD. My impression is that in many ML programs it is very difficult to get in directly out of undergraduate if you do not have an exceptional track-record e.g. top publications, or Putnam high scores etc.
For interviews, bringing up concrete ideas on next steps for a professor's paper is probably very helpful.
My vague impression is that financial security and depression are less relevant than in other fields here, as you can probably find job opportunities partway through if either becomes problematic. Would be interested to hear disagreement.
On-demand Software Engineering Support for Academic AI Safety Labs
AI safety work, e.g. in RL and NLP, involves both theoretical and engineering work, but academic training and infrastructure does not optimize for engineering. An independent non-profit could cover this shortcoming by providing software engineers (SWE) as contractors, code-reviewers, and mentors to academics working on AI safety. AI safety research is often well funded, but even grant-rich professors are bottlenecked by university salary rules and professor hours which makes hiring competent SWE at market rate challenging. An FTX Foundation funded organization could get around these bottlenecks by doing independent vetting of SWE and offering industry-competitive salaries and then having hired SWE collaborate with academic safety researchers at no cost to the lab. If successful, academic AI safety work ends up faster in terms of researcher hours and higher impact because papers are accompanied by more legible and standardized code bases -- i.e. AI safety work ends up looking more like distill. Estimating potential impact of this proposal could be done by soliciting input from researchers who moved from academic labs to private AI safety organizations.
EDIT: This seems to already exist at https://alignmentfund.org/
Re: feasibility of AI alignment research, Metaculus already has Control Problem solved before AGI invented . Do you have a sense of what further questions would be valuable?
Ok, seems like this might have been more a terminological misunderstanding on my end. I think I agree with what you say here, 'What if the “Inner As AGI” criterion does not apply? Then the outer algorithm is an essential part of the AGI’s operating algorithm'.
Ok, interesting. I suspect the programmers will not be able to easily inspect the inner algorithm, because the inner/outer distinction will not be as clear cut as in the human case. The programmers may avoid sitting around by fiddling with more observable inefficiencies e.g. coming up with batch-norm v10.
Good clarification. Determining which kinds of factoring are the ones which reduce valence is more subtle than I had thought. I agree with you that the DeepMind set-up seems more analogous to neural nociception (e.g. high heat detection). My proposed set-up (Figure 5) seems significantly different from the DM/nociception case, because it factors the step where nociceptive signals affect decision making and motivation. I'll edit my post to clarify.
Your new setup seems less likely to have morally relevant valence. Essentially the more the setup factors out valence-relevant computation (e.g. by separating out a module, or by accessing an oracle as in your example) the less likely it is for valenced processing to happen within the agent.
Just to be explicit here, I'm assuming estimates of goal achievement are valence-relevant. How generally this is true is not clear to me.
Thanks for the link. I’ll have to do a thorough read through your post in the future. From scanning it, I do disagree with much of it, many of those points of disagreement were laid out by previous commenters. One point I didn’t see brought up: IIRC the biological anchors paper suggests we will have enough compute to do evolution-type optimization before the end of the century. So even if we grant your claim that learning to learn is much harder to directly optimize for, I think it’s still a feasible path to AGI. Or perhaps you think evolution like optimization takes more compute than the biological anchors paper claims?
Certainly valenced processing could emerge outside of this mesa-optimization context. I agree that for "hand-crafted" (i.e. no base-optimizer) systems this terminology isn't helpful. To try to make sure I understand your point, let me try to describe such a scenario in more detail: Imagine a human programmer who is working with a bunch of DL modules and interpretability tools and programming heuristics which feed into these modules in different ways -- in a sense the opposite end of the spectrum from monolithic language models. This person might program some noxiousness heuristics that input into a language module. Those might correspond to a Phenumb-like phenomenology. This person might program some other noxiousness heuristics that input into all modules as scalars. Those might end up being valenced or might not, hard to say. Without having thought about this in detail, my mesa-optimization framing doesn't seem very helpful for understanding this scenario.
Ideally we'd want a method for identifying valence which is more mechanistic that mine. In the sense that it lets you identify valence in a system just by looking inside the system without looking at how it was made. All that said, most contemporary progress on AI happens by running base-optimizers which could support mesa-optimization, so I think it's quite useful to develop criterion which apply to this context.
Hopefully this answers your question and the broader concern, but if I'm misunderstanding let me know.
Your interpretation is a good summary!
Re comment 1: Yes, sorry this was just meant to point at a potential parallel not to work out the parallel in detail. I think it'd be valuable to work out the potential parallel between the DM agent's predicate predictor module (Fig12/pg14) with my factored-noxiousness-object-detector idea. I just took a brief look at the paper to refresh my memory, but if I'm understanding this correctly, it seems to me that this module predicts which parts of the state prevent goal realization.
Re comment 2: Yes, this should read "(positive/negatively)". Thanks for pointing this out.
Re EDIT: Mesa-optimizers may or may not represent a reward signal -- perhaps there's a connection here with Demski's distinction between search and control. But for the purposes of my point in the text, I don't think this much matters. All I'm trying to say is that VPG-type-optimizers have external reward signals, whereas mesa-optimizers can have internal reward signals.
Ah great, I have pledged. Is this new this year? Or maybe I didn't fill out the pledge last year; I don't remember.
Would it make sense for the Giving Tuesday organization to send out an annual reminder email? I have re-categorized all of my EA newsletters, and so they don't go to my main inbox. Maybe most people have calendar events, or the like, set up. Maybe though for people who almost forgot about Giving Tuesday (like me) a reminder email could be useful!
The question of how to aggregate over time may even have important consequences for population ethics paradoxes. You might be interested in reading Vanessa Kosoy's theory here in which she sums an individual's utility over time with an increasing penalty over life-span. Although I'm not clear on the justification for these choices, the consequences may be appealing to many: Vanessa, herself, emphasizes the consequences on evaluating astronomical waste and factory farming.
Agreed, I've been trying to help out a bit with Matt Barnett's new question here. Feedback period is still open, so chime in if you have ideas!
I suspect most Metaculites are accustomed to paying attention to how a question's operationalization deviates from its intent FWIW. Personally, I find the Montezuma's revenge criterion quite important without which the question would be far from AGI.
My intent with bringing up this question, was more to ask about how Linch thinks about the reliability of long-term predictions with no obvious frequentist-friendly track record to look at.
Sure at an individual level deference usually makes for better predictions, but at a community level deference-as-the-norm can dilute the weight of those who are informed and predict differently from the median. Excessive numbers of deferential predictions also obfuscate how reliable the median prediction is, and thus makes it harder for others to do an informed update on the median.
As you say, it's better if people contribute information where their relative value-add is greatest, so I'd say it's reasonable for people to have a 2:1 ratio of questions on which they deviate from the median to questions on which they follow the median. My vague impression is that the ratio may be lower -- especially for people predicting on <1 year time horizon events. I think you, linch and other heavier Metaculus users may have a more informed impression here though, so would be happy to see disagreement.
I think it would be interesting to have a Metaculus on which for every prediction you have to select a general category for your update e.g. "New Probability Calculation", "Updated to Median", "Information source released", etc. Seeing the various distributions for each would likely be quite informative.
Do your opinion updates extend from individual forecasts to aggregated ones? In particular how reliable do you think is the Metaculus median AGI timeline?
On the one hand, my opinion of Metaculus predictions worsened as I saw how the 'recent predictions' showed people piling in on the median on some questions I watch. On the other hand, my opinion of Metaculus predictions improved as I found out that performance doesn't seem to fall as a function of 'resolve minus closing' time (see https://twitter.com/tenthkrige/status/1296401128469471235). Are there some observations which have swayed your opinion in similar ways?
What kinds of evidence and experience could induce you to update for/against the importance of severe suffering?
Do you believe that exposure to or experience of severe suffering would cause the average EA to focus more heavily on it?
Edit: Moving the question "Thinking counterfactually, what evidence and experiences caused you to have the views you do on severe suffering?" down here because it looks like other commenters already asked another version of it.
Out of the rejection pool, are there any avoidable failure modes that come to mind -- i.e. mistakes made by otherwise qualified applicants which caused rejection? For example, in a previous EA-org application I found out that I ought to have included more detail regarding potential roadblocks to my proposed research project. This seemed like a valuable point in retrospect, but somewhat unexpected given my experience with research proposals outside of EA.
EDIT: (Thanks to Rose for for answering this question individually and agreeing to let me share her answer here) Failure modes include: Describing the value of proposed research ideas too narrowly instead of discussing long-term value. Apparent over-confidence in the description of ideas, i.e. neglecting potential road-bumps and uncertainty.
Thanks for the lively discussion! We've covered a lot of ground, so I plan to try to condense what was said into a follow-up blog post making similar points as the OP but taking into account all of your clarifications.
I’m not sure how broadly you’re construing ‘meta-reactions’, i.e. would this include basically any moral view which a person might reach based on the ordinary operation of their intuitions and reason and would all of these be placed on an equal footing?
'Meta-reactions' are the subset of our universalizable preferences which express preferences over other preferences (and/or their relation). What it means to be 'placed on equal footing' is that all of these preferences are comparable. Which of them will take precedence in a certain judgement depends on the relative intensity of feeling for each preference. This stands in contrast to views such as total utilitarianism in which certain preferences are considered irrational and are thus overruled independently of the force with which we feel them.
more or less any moral argument could result from a process of people reflecting on their views and the views of others and seeking consistency
The key point here is 'seeking consistency': my view is that the extent to which consistency constraints are morally relevant is contingent on the individual. Any sort of consistency only carries force insofar as it is one of the given individual's universalizable preferences. In a way, this view does ‘leave everything as it is’ for non-philosophers' moral debates. I also have no problem with a population ethicist who sees eir task as finding functions which satisfy certain population ethics intuitions. My view only conflicts with population ethics and animal welfare ethics insofar as ey take eir conclusions as a basis for language policing. E.g. When an ethicist claims eir preferred population axiology has implications on understanding everyday uses of moral language.
I have in mind cases of moral thinking, such as the example I gave where we override disgust responses based on reflecting that they aren’t actually morally valuable.
Within my framework we may override disgust responses by e.g. observing that they are less strong than our other responses, or by observing that -- unlike our other responses -- they have multiple meta-reactions stacked against them (fairness, 'call to universality', etc.) and we feel those meta-reactions more strongly. I do not endorse coming up with a theory about moral value and then overriding our disgust responses because of the theoretical elegance or epistemological appeal of that theory. I'm not sure whether you have in mind the former or the latter case?
[From a previous DM comment]
For moral talk to be capable of serving this practical purpose we just need some degree of people being inclined to respond to the same kinds of things or to be persuaded to share the same attitudes. But this doesn’t require any particularly strong, near-universal consensus or consensus on a particular single thing being morally good/bad. [...] This seems compatible with very, very widespread disagreement in fact: it might be that people are disposed to think that some varying combinations of “fraternity, blood revenge, family pride, filial piety, gavelkind, primogeniture, friendship, patriotism, tribute, diplomacy, common ownership, honour, confession, turn taking, restitution, modesty, mercy, munificence, arbitration, mendicancy, and queuing”
Sorry, I should've addressed this directly. The SMB-community picture is somewhat misleading. In reality, you likely have partial overlap in SMB and the intersection of your whole community of friends is less (but does include pain aversion). Moral disagreement attains a particular level of meaningfulness when both speakers share SMB relevant to their topic of debate. I now realize that my use of 'ostensive' was mistaken. I meant to say, as perhaps has already become clear, that SMB lends substance to moral disagreement. SMB plays a role in defining moral disagreement, but, as you say, SMB likely plays a lesser role when it comes to using moral language outside of disagreement.
It doesn’t seem to me like we have any particular reason to privilege these basic intuitive responses as foundational, in cases where they conflict with our more abstruse reasoning.
If we agree that SMB plays a crucial role in lending meaning to moral disagreement, then we can understand the nature of moral disagreement without appeal to any 'abstruse reasoning'. I argue that what we do when disagreeing is emphasizing various parts of SMB to the other. In this picture of moral language = universalizable preferences + elicit disapproval + SMB subset, where does abstruse reasoning enter the picture? It only enters when a philosopher sees a family resemblance between moral disagreement and other sorts of epistemological disagreement and thus feels the urge to bring in talk of abstruse reasoning. As described in the OP, for non-philosophers abstruse reasoning only matters as mediated by meta-reactions. In effect, reasoning constraints enter the picture as a subset of our universalizable preferences, but as such there's no basis for them to override our other object-level universalizable preferences. Of course, I use talk of preferences here loosely; I do believe that these preferences have vague intensities which may sometimes be compared. E.g. someone may feel their meta-reactions particularly strongly and so these preferences may carry more weight than other preferences because of this intensity of feeling.
This leads us back into the practical conclusions in your OP. Suppose that a moral aversion to impure, disgusting things is innate (and arguably one of the most basic moral dispositions). It still seems possible that people routinely overcome and override this basic disposition and just decide that impurity doesn’t matter morally and disgusting things aren’t morally bad.
I'm not sure if I know what you're talking about by 'impure things'. Sewage perhaps? I'm not sure what it means to have a moral aversion to sewage. Maybe you mean something like the aversion to the untouchable caste? I do not know enough about that to comment.
Independently of the meaning of 'impure', let me respond to "people routinely overcome and override this basic disposition": certainly people's moral beliefs often come into conflict e.g. trolley problems. I would describe most of these cases as having multiple conflicting universalizable preferences in play. Sometimes one of those preferences is a meta-reaction, e.g. 'call to universality', and if the meta-reaction is more salient or intense then perhaps it carries more weight than a 'basic disposition'. Let me stress again that I do not make a distinction between universalizable preferences which are 'basic dispositions' and those which I refer to as meta-reactions. These should be treated on an equal footing.
Thanks for the long reply. I feel like our conversation becomes more meaningful as it goes on.
Thanks for clarifying. This doesn't change my response though since I don't think there's a particularly notable convergence in emotional reactions to observing others in pain which would serve to make valenced emotional reactions a particularly central part of the meaning of moral terms. For example, it seems to me like children (and adults) often think that seeing others in pain is funny (c.f. punch and judy shows or lots of other comedy), fun to inflict and often well-deserved
Yes, it's hard to point to exactly what I'm talking about, and perhaps even somewhat speculative since the modern world doesn't have too much suffering. Let me highlight cases that could change my mind: Soldiers often have PTSD, and I suspect some of this is due to the horrifying nature of what they see. If soldiers' PTSD was found to be entirely caused by lost friends and had nothing to do with visual experience, I would reduce my credence on this point. When I watched Land of Hope and Glory I found seeing the suffering of animals disturbing, and this would obviously be worse if the documentary had people suffering in similar conditions to the animals. I am confident that most people have similar reactions, but if they don't I would change my view of the above. The most relevant childhood experiences are likely those which involve prolonged pain: a skinned knee, a fever, a burn etc. I think what I'm trying to point at could be described as 'pointless suffering'. Pain in the context of humor, cheap thrills, couch-viewing etc. is not what I'm referring to.
there's a good case that people (and primates for that matter) have innate moral reactions to (un)fairness
This seems plausible to me, and I don't claim that pleasure/pain serve as the only ostensive root grounding moral language. Perhaps (un)fairness is even more prominent, but nevertheless I claim that this group of ostensive bases (pain, unfairness, etc.) is necessary to understand some of moral language's distinctive features cf. my original post:
When confronted with such suffering we react sympathetically, experiencing sadness within ourselves. This sadness may be both attributable to a conscious process of building empathy by imagining the others’ experience, or perhaps an involuntary immediate reaction resulting from our neural wiring.
Perhaps some of these "involuntary immediate reaction"s are best described as reactions to unfairness. For brevity let me refer below to this whole family of ostensive bases by Shared Moral Base, SMB.
Notably, it seems like a very common feature (until very recently in advanced industrial societies anyway) of cases of children's initial training in morality involved parents or others directly inflicting pain on children when they did something wrong and often
Let me take this opportunity to emphasize that I agree: The subsequent tendency to disapprove following use of moral language is an important feature of moral language.
that I think others should disapprove of you and I would disapprove of them if they don't
This is the key point. Why do we express disapproval of others when they don't disapprove of the person who did the immoral act? I claim it's because we expect them to share certain common, basic reactions e.g. to pain, unfairness, etc and when these basic reactions are not salient enough in their actions and their mind, we express disapproval to remind them of SMB. Here's a prototypical example: an aunt chastises a mother for failing to stop her husband from striking their child in anger. The aunt does so because she knows the mother cares about her children, and more generally doesn't want people to be hurt unreasonably. If the mother were one of our madmen from above, then the aunt would find it futile to chastise her. To return to my example of "a world filled with people whose innate biases varied randomly", in that world we would not find it fruitful to disapprove of others when they didn't disapprove of you. Do you not agree that disapproval would have less significance in that world?
It doesn't seem to me that learning what it means for them to say that such and such is morally wrong vs what it means for them to say that they dislike something requires that we learn what specific things people (specifically or in general) think morally wrong / dislike.
True, the learner merely has to learn that they have within themselves some particular disposition towards the morally wrong cases. These dispositions may be various: aversion to pain, aversion to unfairness, guilt, etc. The learner later finds it useful to continue to use moral language, because others outside of her home share these dispositions to morally wrong cases. To hyperbolize this point: moral language would have a different role if SMB were similar to eye color i.e. usually shared within the family, but diverse outside of the family.
What seems to matter to me, as a test of the meaning of moral terms, is whether we can understand someone who says "Hurting people is good" as uttering a coherent moral sentence and, as I mentioned before, in this purely linguistic sense I think we can.
I agree that it would be natural to call "Hurting people is good" a use of moral language on the part of the madman. I only claim that we can have a different, more substantial, kind of disagreement within our community of people who share SMB than we can with the madman. E.g. the kind of disagreement I describe in the family with the aunt above.
I also agree that moral language is often used to persuade people who share some of our moral views or to persuade people to share our moral views, but don't think this requires that the meaning of the moral terms depends on or involves consensus about the rightness or wrongness of specific moral things. For moral talk to be capable of serving this practical purpose we just need some degree of people being inclined to respond to the same kinds of things or to be persuaded to share the same attitudes. But this doesn’t require any particularly strong, near-universal consensus or consensus on a particular single thing being morally good/bad.
Yes, I agree. However, cases in which our conversations are founded on SMB have a distinctive character which is of great importance. I agree that the view described in my original post likely becomes less relevant when applied to disagreements across moral cultures i.e. between groups with very different SMB. I'm not particularly bothered by this caveat since most discussion of object-level ethics seems to occur within communities of shared SMB e.g. medical ethics, population ethics, etc.
I don't think there's a particularly noteworthy consensus about it being bad for other people to be in pain
Sorry, I should've been more clear about what I'm referring to. When you say "People routinely seem to think" and "People sometimes try to argue", I suspect we're talking past each other. I am not concerned with such learned behaviors, but rather with our innate neurologically shared emotional response to seeing someone suffering. If you see someone dismembered it must be viscerally unpleasant. If you see someone strike your mother as a toddler it must be shocking and will make you cry. (To reiterate, I focus on these innate tendencies, because they are what let us establish common reference. Downstream uses of moral and other language are then determined by our shared and personal inductive biases.)
you would be wrong not to give me $10 and would be apt for disapproval if you did not
Exciting, perhaps we've gotten to the crux of our disagreement here! How do we learn what cases are have "aptness for disapproval"? This is only possible if we share some initial consensus over what aptness for disapproval involves. I suggest that this initial consensus is the abovementioned shared aversion to physical suffering. Of course, when you learn language from your parents they need not and cannot point at your aversions, but you implicitly use these aversions as the best fitting explanation to generalize your parents language. In effect, your task as a toddler is to figure out why your parents sometimes say "that was wrong, don't do that" instead of "I didn't like what you did, don't do that". I suggest the "that was wrong" cases more often involve a shared reaction on your part -- prototypically when your parents are referring to something that caused pain. Compare to a child whose parents' whose notion of bad includes burning your fingers but only on weekends, she will have more difficulty learning their uses of moral language, because this use does not match our genetic/neurological biases.
Another way of seeing why the core cases of agreement (aka the ostensive basis) for moral language is so important, is to look at what happens when someone disagrees with this basis: Consider a madman who believes hurting people is good and letting them go about their life is wrong. I suspect that most people believe we cannot meaningfully argue with him. He may utter moral words but always with entirely different meaning (extension). In slogan form, "There's no arguing with a madman". Or take another sort of madman: someone who agrees with you that usually hurting people is wrong, but then remorselessly goes berserk when he sees anyone with a nose of a certain shape. He simply has a different inductive bias (mental condition). If you deny the significance of the consensus I described in the first paragraph, how do you distinguish between these two madmen and more sensible cases of moral disagreement?
In a world filled with people whose innate biases varied randomly, and who had arbitrary aversions, one could still meaningfully single out a subset of an individual's preferences which had a universalisable character -- i.e. those preferences which she would prefer everyone to hold. However, peoples' universalisable preferences would hold no special significance to others, and would function in conversation just as all other preferences do. In contrast, in our world, many of our universalisable preferences are shared and so it makes sense to remind others of them. The fact that these universalisable preferences are shared makes them "apt for dissaproval" across the whole community, and this is why we use moral language.
One can sensibly say "I like/don't like this pleasant/painful sensation" without thereby saying "It is morally right that you act to promote/alleviate my experience"
Yes, naturally. The reason why the painful sensations matter is that they help us arrive at a shared understanding of the "aptness for disapproval" you describe.
[From DM's other comment]
Conversely it seems to me that moral discourse is characterised by widespread disagreement i.e. we can sensibly disagree about whether it's right or wrong to torture
Yes, I agree work has to be done to explain why utilitarianism parallels arithmetic despite apparent differences. I will likely disagree with you in many places, so hopefully I'll find time to re-read Kripke. I would enjoy talking about it then.
I had been accepted to study for a PhD on the implications of Wittgensteinian meta-philosophy for ethics.
Well, I for one, would've liked to have read the thesis! Wonderful, I suppose then most of my background talk was redundant. When it comes to mathematics, I found the arguments in Kripke's 'Wittgenstein on Rules and Private Language' quite convincing. I would love to see someone do an in depth translation applying everything Kripke says about arithmetic to total utilitarianism. I think this would be quite useful, and perhaps work well with my ideas here.
Yes, I agree that what I've been doing looks a lot like language policing, so let me clarify. Rather than claiming talk of population ethics etc. is invalid or incoherent, it would be more accurate to say I see it as apparently baseless and that I do not fully understand the connection with our other uses of moral language. When others choose to extend their moral language to population ethics, their language is likely coherent within their community. Probably, they have found a group within which they share similar inductive bias which endows their novel uses of moral language with reference. However, insofar as they expect me to follow along with this extension (indeed insofar as they expect their conclusions about population ethics to have force for non-population-ethicists) they must explain how their extension of moral language follows from our shared ostensive basis for moral language and our shared inductive biases. My arguments have attempted to show that our shared ostensive basis for moral language does not straight-forwardly support talk of population ethics, because such talk does not share the same basis in negatively/positively valenced emotions.
Put in more Wittgensteinian terms, population ethics language bears a family resemblance to our more mundane use of moral language, but it does not share the universal motivating force provided by our common emotional reactions to e.g. a hit a to the head. Of course, probably, some philosophers react viscerally and emotionally to talk of the repugnant conclusion. In that case, for them the repugnant conclusion carries some force that it does not for others. So to return to the policing question, I am not policing insofar as I agree that their language is meaningful and provides insight to their community. Claims like "Total utilitarianism better captures our population ethics intuitions than ..." can be true or false. However, any move to then say "Your use of moral language should be replaced by uses which agree with our population ethics intuitions" seems baseless and perhaps could be described as an act of policing on the part of the speaker.
Thanks for the clarification, this certainly helps us get more concrete.
We don't need people to agree even slightly about whether chocolate/durian are tasty or yucky to learn the meanings of terms.
I agree that I was exaggerating my case. In durian-type-food-only worlds we would merely no longer expect 'X is tasty' to convey information to the listener about whether she/he should eat it. This difference does the work in the analogy with morality. Moral language is distinct from expression of other preferences in that we expect morality-based talk to be somehow more universal instead of merely expressing our personal preference.
even that there is [not] much more consensus about the moral badness of pain/goodness of pleasure than about other issues
I believe that we have much greater overlap in our emotional reaction to experiencing certain events e.g. being hit, and we have much greater overlap in our emotional reaction to witnessing certain painful events e.g. seeing someone lose their child to an explosion. Perhaps you don't want to use the word consensus to describe this phenomenon? Or else you think these sorts of universally shared reactions are unimportant to how we learn moral language?
Likewise with moral language, I don't think we broadly need widespread agreement about whether specific things are good/bad to learn that if someone says something is "bad" this means they don't want us to do it, they disapprove of it and we will be punished if we do it etc.
The way you seem to be describing moral language, I'm not clear on how it is distinct from desire and other preferences? If we did not have shared aversions to pain, and a shared aversion to seeing someone in pain, then moral language would no longer be distinguishable from talk of desire. I suspect you again disagree here, so perhaps you could clarify how, on your account, we learn to distinguish moral injunctions from personal preference based injunctions?
Here's another way of explaining where I'm coming from. The meaning of our words is set by ostensive definition plus our inductive bias. E.g. when defining red and purple we agree upon some prototypical cases of red and purple by perhaps pointing at red and saying 'red'. Then upon seeing maroon for the first time, we call it red because our brains process maroon in a similar way to how they process red. (Incidentally, the first part -- pointing at red -- is also only meaningful because we share inductive biases around pointing and object boundaries.) Of course in some lucky cases, e.g. 'water', 'one', etc., a scientific or formal definition appears coextensive with the definition and so is preferred for some purposes.
As another example take durian. Imagine you are trying to explain what the word tasty means and so you feed someone some things that are tasty to you e.g. candy and durian. Unfortunately people have very different reactions to durian, so it would not be a good idea to use durian to try to define 'tasty'. In fact, if all the human race ate was durian, we could not use the word tasty in the same way. In a world with only one food and in which people randomly liked or disliked that food, a word similar to 'tasty' would describe people (and their reactions) not the food itself.
Returning to moral language, we almost uniformly agree about the experience of tripping and skinning your knee. This lets moral language get off the ground, and puts us in our world as opposed to the 'durian only moral world'. There are some examples of phenomena over which we disagree: perhaps inegalitarian processes are one. Imagine a wealthy individual decides to donate her money to the townspeople, but distributes her wealth based on an apparently arbitrary 10 second interview with each townsperson. Perhaps some people react negatively, feeling displeasure and disgust when hearing about this behavior, whereas others see this behavior as just as good as if she had uniformly distributed the wealth. This connects with what I was saying above:
Sometimes there remains disagreement, and I think you could explain this by saying our use of moral language has two levels: the individual and the community. In enough cases to achieve shared reference, the community agrees (because their simulations match up adequately) but in many, perhaps most, cases there is no consensus.
I privilege uses of moral language as applied to experiences and in particular pain/pleasure because these are the central cases over which there is agreement, and from which the other uses of moral language flow. There's considerable variance in our inductive biases, and so perhaps for some people the most natural way to extend uses of moral language from its ostensive childhood basis includes inegalitarian processes. Nevertheless inegalitarian processes cannot be seen as the basis for moral language. That would be like claiming the experience of eating durian can be used to define 'tasty'. I do agree that injunctions may perhaps be the first use we learn of 'bad', but the use of 'bad' as part of moral language necessarily connects with its use in referring to pain and pleasure, otherwise it would be indistinguishable from expressions of desire/threats on the part of the speaker.
Thank you for following up, and sorry that I haven't been able to respond as succinctly or clearly as I would've liked. I hope to write a follow up post which more clearly describes the flow of ideas from those contained in my comments to the original blog post as your comments have helped me see where my background assumption are likely do differ from others'.
I see now that it would be better to take a step back to explain at a higher level where I'm coming from. My line of reasoning follows from the ideas of the later Wittgenstein: many words have meaning defined solely by their use. These words do not have any further more precise meaning -- no underlying rigid scientific, logical or analytic structure. Take for example 'to expect', what does it mean to "expect someone to ring your doorbell at 4pm"? The meaning is irreducibly a melange of criterion and is not well defined for edge cases e.g. for an amnesiac. There's a lot more to say here, see for example 'Philosophical Investigations' paragraphs 570-625.
That said, I'm perhaps closer to Quine's 'The Roots of Reference' than Wittgenstein when I emphasize the importance of figuring out how we first learn a word's use. I believe that many -- perhaps not all -- words such as 'to expect', moral language, etc. have some core use cases which are particularly salient thanks to our neurological wirings, everyday activities, childhood interactions, etc. and these use cases can help us draw a line between situations in which a word is well defined and situations in which the meaning of a word breaks down.
Here's a simple example, the command "Anticipate the past!" steps outside of the boundaries of 'to anticipate's meaning, because 'to anticipate' usually involves things in the future and thought/actions before the event. When it comes to moral language we have two problems, the first is to distinguish cases of sensible use of moral language from under-defined edge cases, and the second to distinguish between uses of moral language which are better rewritten in other terms. Let me clarify this second case using 'to anticipate': 'anticipate' can mean to foresee as in "He anticipated Carlsen's move.", but also look forward to as in "He greatly anticipated the celebration". If we want to clarify the first use case, then it's better to set aside the second and treat them separately. Here's another example "Sedol anticipated his opponent's knowledge of opening theory by playing a novel opening." If Sedol always plays novel openings, and says this game was nothing special then that sentence is false. If Sedol usually never plays novel openings, but says "My opponent's strength in opening theory was not on my mind", what then? I would say the meaning of 'to anticipate' is simply under-defined in this case.
Although I can't have done justice to Quine and Wittgenstein let's pretend I have, and I'll return to your specific comments.
It sounds like you see the genealogy of moral terms as involving a melange of all of these, which seems to leave the door quite open as to what moral terms actually mean.
I disagree, there is no other actual meaning beyond the sequence of uses we learn for these words. Perhaps in the future we will discover that moral language has some natural scientific basis as happened with water, but moral language strikes me as far more similar to expectation than water.
It does sound though, from your reply, that you do think that moral language exclusively concerns experiences
Just as with 'to anticipate' where sometimes you can anticipate without explicitly thinking of the consequence so to for people using moral language. They often do not explicitly think of these experiences, but their use of the words is still rooted in the relevant experiences (in a fuzzy way). Of course, some other uses of 'right' and 'wrong' are better seen as something entirely different e.g. 'right' as used to refer to following a samurai's code of honor. This is an important point, so I've elaborated on it in my other reply.
I can observe that there is such-and-such level of inequality in the distribution of income in a society.
If this observation is rooted in experience i.e. extrapolating from your experience seeing people in a system with certain levels of inequality then sure. Of course since this extrapolation depends on the experiences, you should not be confident in extrapolating the right/wrongness of something solely based on a certain GINI coefficient.
But I'm not sure why we should expect any substantive normative answers to be implied by the meaning of moral language.
I do not claim that my framework supports the sort of normativity many philosophers (perhaps you too) are interested in. I do not believe talk of normative force is coherent, but I'd prefer to not go into that here. My claim is simply that my framework lets us coherently answer some questions I'm interested in. Put in different terms, I'd like to focus discussion on my argument 'by its own lights'.
Yes, thanks for clarifying. I believe that it is necessarily harder to make correct judgements in the domain of population ethics. My stronger claim is that any such judgements, even if correct, only carry force as mediated through our 'call to universality' meta-emotion. Hence, even if we have the right population axiology, this likely should not over-ride our more mundane moral intuitions.
Thanks for bringing up these points, I should've been more careful with these distinctions.
The learned meaning of moral language refers to our recollection/reaction to experiences. These reactions include approval, preferences and beliefs. I suspect that of these, approval is learned first. I imagine a parent harshly pronouncing 'Bad!' after a toddler gets singed wandering to close to a fire. Preferences enter the picture when we try to extend our use of moral language beyond the simple cases learned as a child. When we try to compare two things that are apparently both bad we might arrive at a preference for one over the other, and in that case the preference precedes the statement of approval/disapproval. Orthogonally, let me note that I think of moral language as necessarily one-step removed from experience. In using moral language you look at your experience of something and describes your stance on it. Exclaiming 'Ouch' describing your experience in the moment is not moral language. I'd like to make sure that my position here is coherent and clear, so do let me know if what I'm saying seems ambiguous or confused.
Thanks for bringing up the X,Y,Z point; I initially had some discussion of this point, but I wasn't happy with my exposition, so I removed it. Let me try again: In cases when there are multiple moral actors and patients there are two sets of considerations. First, the inside view, how would you react as X and Y. Second, the outside view, how would you react as person W who observes X and Y. It seems to me that we learn moral language as a fuzzy mixture of these two with the first usually being primary. E.g. X,Y are preschoolers on the playground and X accidentally trips Y feeling guilty, seeing Y crying and remembers crying herself on previous occasions. I don't think the exact formulation of how to "Simulate X and Y as Z" as an extension of these simple playground cases is well defined. Just as how our experience burning ourselves as children has considerable variance while having enough overlap to support shared use of language, so to for imagining multi-agent situations. Do we all imagine X then Y, rather than Y then X? Probably not, but we do mostly have a shared tendency to focus on the most salient/intense experiences first i.e. prioritizing the tortured. This shared tendency is probably part of what lets us achieve common reference. If X and Y have an interaction in which they share similar degrees of intensity, e.g. in a dispute between lovers, then my guess is people imagining this situation will tend to first imagine the situation of she/he who they knew better or identified with more closely. In this case, it also seems natural to say "Haven't you only seen one side of the story? Try putting yourself in the others' shoes." Sometimes there remains disagreement, and I think you could explain this by saying our use of moral language has two levels: the individual and the community. In enough cases to achieve shared reference, the community agrees (because their simulations match up adequately) but in many, perhaps most, cases there is no consensus. We often speak of right and wrong as determined by our own simulation process, but other times we speak of right and wrong as determined by the community -- e.g. when we realize we are an outlier w.r.t how we experience some thing.
As for animals, yes I agree to some extent, so I've edited the post to clarify. I do believe, however, that the outside/3rd person view is an important aspect of our use of moral language and it is the easiest and most natural basis for extending moral language to animal welfare.
On the other hand, I think your last paragraph does not follow from my meta-ethics. The spirit of my meta-ethics is to avoid having to rigorously but arbitrarily define 'impartial', and 'harmed'. Instead the appeal of all of those terms is explained in terms of our emotions/meta-emotions and reactions/meta-reactions. If you accept this, then you have to justify the importance of neuron counts on this basis. Much as with population ethics, I suspect this endeavor should be seen as beyond our ability and beyond the boundary of where our use of language remains well-defined. Imagine a future in which we had the scientific/engineering means to continuously morph someone into an animal with a corresponding continuous morph of their conscious processes. In such a future, I suspect moral language would naturally extend to animal welfare. Perhaps neuron counts etc. are evidence for what we would experience in such a world, but the force of this evidence is determined by a very exotic version of the 'call to universality' which I suspect few people identify with. (Perhaps you and EA are more inclined to subscribe to it, and I think that's laudable).
I think the 'Diets of EAs' question could be a decent proxy for the prominence of animal welfare within EA. I think there are similar questions on metaculus for the general US population https://www.metaculus.com/questions/?order_by=-activity&search=vegetarian
I don't see the ethics question as all that useful, since I think most of population ethics presupposes some form of consequentialism.
Somewhat unrelated, but I'll leave this thought here anyway: Maybe EA metaculus users could perhaps benefit from posting question drafts as short-form posts on the EA forum.
Thanks for doing this, great idea! I think Metaculus could provide some valuable insight into how society's/EA's/philosophy's values might drift or converge over the coming decades.
For instance, I'm curious about where population ethics will be in 10-25 years. Something like, 'In 2030 will the consensus within effective altruism be that "Total utilitarianism is closer to describing our best moral theories than average utilitarianism and person affecting views"?'
Having your insight on how to operationalize this would be useful, since I'm not very happy with my ideas: 1. Polling FHI and GW 2. A future PhilPapers Survey if there is one 3. Some sort of citation count/ number of papers on total/average/person utilitarianism. It would probably also be useful to get the opinion of a population ethicist.
Stepping back from that specific question, I think Metaculus could play a sort of sanity-checking, outside-view role for EA. Questions like 'Will EA see AI risk (climate change/bio-risk/etc.) as less pressing in 2030 than they do now?', or 'Will EA in 2030 believe that EA should've invested more and donated less over the 2020s?'
You discuss at one point in the podcast the claim that as AI systems take on larger and larger real world problems, the challenge of defining the reward function will become more and more important. For example for cleaning, the simple number-of-dust-particles objective is inadequate because we care about many other things e.g. keeping the house tidy and many side constraints e.g. avoiding damaging household objects. This isn't quite an argument for AI alignment solving itself, but it is an argument that the attention and resources poured into AI alignment may naturally rise to the challenge without EA effort, and thus perhaps EA effort is misplaced.
First off, I think this is a great steel-man of the Lecun/Etzioni safety skeptic position, and, importantly, I think it gives a more concrete/falsifiable position to argue against. On the other hand, this argument seems to go through only if most of the tasks worked on by AI researchers are of the kind described -- i.e. the designer of the system has it in their own interest to deal with side constraints and fix reward function specification. In my view, this condition is unlikely to be met. It seems to me likely that most tasks AI corps work on will have a principal-agent complication. Recommender system alignment, automated advertising, stock trading, etc. work in all of these domains maximize profit for the AI researchers' company when they run roughshod over side constraints. The side constraints here being mostly the preferences of users on the platform for tech, and other investors for finance.
Does this seem right? If so, what are the upshots? Could the legal/lobbying work of strengthening the positions of these principals become a high-value task for EA to take on?
Perhaps include a short form subsection under the Forum Favorites section? It seems to me that most short form posts have very low visibility.
If the forum admins have traffic statistics, they should be able to get a better sense of the visibility issue than I can. In particular, I suspect the short form section receives a fraction of the traffic of the frontpage, but this should be verified empirically.
I enjoyed reading this post! I like Wittgensteinian arguments, and applying them to ethics, so hurrah for this. There was also some lively discussion of it on the EA corner chat.
Another possible misleading motivation for irreducible normativity may be linguistic. It seems to me plausible that anyone who uses the word agony in the standard sense is committing her/himself to agony being undesirable. This is not an argument for irreducible normativity, but it may give you a feeling that there is some intrinsic connection underlying the set of self-evident cases.
From an EA perspective, I thought it could be useful to get a sense of the effectiveness of this post (series)? You could, for instance, identify a few philosophy graduate students who hold the position you're arguing against and compare their credence in the relevant position before and after reading. In my experience, people's cruxes for disagreement in ethics are all over the place, and you run a risk of missing the arguments which compel those who believe in e.g. irreducible normativity. I very much like Wittgensteinian arguments against motivations and coherence, but for those who subscribe to irreducible normativity I'm not sure they will find these arguments compelling. If this concern actualizes, you might find it useful to first poll people who disagree with you about the position of interest, and then write a post to address the cruxes you have identified.
Edit: At the moment EA forum spam filter is, for some reason, preventing me from replying to @antimonyanthony, so I will reply by edit instead: I think this is quite a subtle point, and as I understand it, there is some ongoing disagreement among philosophers about these issues. Let's make things clearer by replacing 'agony' with 'bad experience'. A bad experience for a paperclip maximizer is likely to involve difficulty producing paperclips. More generally, which experiences are considered bad is determined by the agent's nature. However for humans there's sufficient overlap in our neural nature for their to be self-evident cases of badness, e.g. extreme pain. If someone does not call these self-evident cases bad, then she/he is not using the word bad in its standard sense. There are a lot of complications in this argument cf. Kripke on c-fibers, but I believe the general argument I sketched holds.
Yes, I recently asked a metaculus mod about this, and they said they're hoping to bring back the ai.metaculus sub-domain eventually. For now, I'm submitting everything to the metaculus main domain.
Medium term AI forecasting with Metaculus
I'm working on a collection of metaculus.com questions intended to generate AI domain specific forecasting insights. These questions are intended to resolve in the 1-15 year range, and my hope is that if they're sufficiently independent, we'll get a range of positive and negative resolutions which will inform future forecasts.
I've already gotten a couple of them live, and am hoping for feedback on the rest:
1. When will AI out-perform humans on argument reasoning tasks?
2. When will multi-modal ML out-perform uni-modal ML?
3. (Not by me) When will image recognition be made robust against unrestricted adversary?
4. (WIP) When will reinforcement learning methods achieve sample efficiency within four orders of magnitude of human efficiency?*
5. (WIP) When will unsupervised learning methods achieve human level performance on image classification?
The more questions the better, so please make suggestions. Of course we have to avoid burdening the good folks working at metaculus, so 8-10 questions is probably the maximum I'd be willing to personally submit.
*I am not very familiar with reinforcement learning, so input here would be particularly helpful! What is the best way to operationalize this question? How many orders of magnitude? Is there a relevant benchmark? etc. I'd be happy for someone else to take the credit, and post the question themselves as well!
True, it seems like solar-aid's own estimate these days suggests around $5 per tonne. I can't find a more recent external review unfortunately.
This is a great point!
I'm somewhat hesitant about the CATF recommendation though. After a brief skim of the Founder's Pledge report, looks like they broke down CATF's efforts into three projects which have/are working out well. If we assume that Founder's Pledge reviewed a number of public advocacy/lobbying groups there's likely to have been a multiple testing issue. In that light, the retrospective cost of CO2e/$ estimate may not be predictive of their future CO2e/$ ratio. That said so long as the majority of their funds go into these current efforts, the first concern doesn't apply. A quick ctrl+f didn't pull up anything on their funding gap either.
I personally recommend SolarAid to friends. An old estimate put them at $1.34 per tonne. SolarAid's project may be easier to explain briefly as well?
Whatever you recommend, in my experience, telling non-EA people it only costs 10-30 dollars to offset your carbon for the year has gotten very enthusiastic responses!