Anti-tribalism and positive mental health as high-value cause areas 2017-10-17T11:10:52.069Z
An argument for broad and inclusive "mindset-focused EA" 2017-07-16T16:10:13.384Z
Cognitive Science/Psychology As a Neglected Approach to AI Safety 2017-06-05T13:46:49.688Z
Meetup : Social EA career path meetup 2014-10-10T15:26:50.876Z
Tell us about your recent EA activities 2014-10-10T13:18:14.313Z
Meetup : Johdatus efektiiviseen altruismiin 2014-09-27T21:09:33.762Z
Effective altruism as the most exciting cause in the world 2014-09-26T09:17:13.992Z


Comment by Kaj_Sotala on How to succeed as an early-stage researcher: the “lean startup” approach · 2021-09-12T12:09:40.859Z · EA · GW

Draft and re-draft (and re-draft). The writing should go through many iterations. You make drafts, you share them with a few people, you do something else for a week. Maybe nobody has read the draft, but you come back and you’ve rejuvenated your wonderful capacity to look at the work and know why it’s terrible.

Kind of related to this: giving a presentation about the ideas in your article is something that you can use as a form of a draft. If you can't get anyone to listen to a presentation, or don't want to give one quite yet, you can pick some people whose opinion you value and just make a presentation where you imagine that they're in the audience.

I find that if I'm thinking of how to present the ideas in a paper to an in-person audience, it makes me think about questions like "what would be a concrete example of this idea that I could start the presentation with, that would grab the audience's attention right away". And then if I come up with a good way of presenting the ideas in my article, I can rewrite the article to use that same presentation.

(Unfortunately myself I have mostly taken this advice in its reverse form. I've first written a paper and then given a presentation of it afterwards, at which point I've realized that this is actually what I should have said in the paper itself.)

Comment by Kaj_Sotala on "Disappointing Futures" Might Be As Important As Existential Risks · 2021-06-30T15:11:18.584Z · EA · GW

Depends on exactly which definition of s-risks you're using; one of the milder definitions is just "a future in which a lot of suffering exists", such as humanity settling most of the galaxy but each of those worlds having about as much suffering as the Earth has today. Which is arguably not a dystopian outcome or necessarily terrible in terms of how much suffering there is relative to happiness, but still an outcome in which there is an astronomically large absolute amount of suffering.

Comment by Kaj_Sotala on [deleted post] 2021-03-15T18:42:12.505Z

Fair point. Though apparently measures of 'life satisfaction' and 'meaning' produce different outcomes:

So, how did the World Happiness Report measure happiness? The study asked people in 156 countries to “value their lives today on a 0 to 10 scale, with the worst possible life as a 0 and the best possible life as a 10.” This is a widely used measure of general life satisfaction. And we know that societal factors such as gross domestic product per capita, extensiveness of social services, freedom from oppression, and trust in government and fellow citizens can explain a significant proportion of people’s average life satisfaction in a country.

In these measures the Nordic countries—Finland, Sweden, Norway, Denmark, Iceland—tend to score highest in the world. Accordingly, it is no surprise that every time we measure life satisfaction, these countries are consistently in the top 10. [...]

... some people might argue that neither life satisfaction, positive emotions nor absence of depression are enough for happiness. Instead, something more is required: One has to experience one’s life as meaningful. But when Shigehiro Oishi, of the University of Virginia, and Ed Diener, of the University of Illinois at Urbana-Champaign, compared 132 different countries based on whether people felt that their life has an important purpose or meaning, African countries including Togo and Senegal were at the top of the ranking, while the U.S. and Finland were far behind. Here, religiosity might play a role: The wealthier countries tend to be less religious on average, and this might be the reason why people in these countries report less meaningfulness.

Comment by Kaj_Sotala on [deleted post] 2021-03-15T10:24:55.301Z

It has been suggested that people are succumbing to a focusing illusion when they think that having children will make them happy, in that they focus on the good things without giving much thought to the bad.

Worth noting that you might get increased meaningfulness in exchange for the lost happiness, which isn't necessarily an irrational trade to make. E.g. Robin Hanson:

Stats suggest that while parenting doesn’t make people happier, it does give them more meaning. And most thoughtful traditions say to focus more on meaning that happiness. Meaning is how you evaluate your whole life, while happiness is how you feel about now. And I agree: happiness is overrated.

Parenting does take time. (Though, as Bryan Caplan emphasized in a book, less than most think.) And many people I know plan to have an enormous positive influences on the universe, far more than plausible via a few children. But I think they are mostly kidding themselves. They fear their future selves being less ambitious and altruistic, but its just as plausible that they will instead become more realistic.

Also, many people with grand plans struggle to motivate themselves to follow their plans. They neglect the motivational power of meaning. Dads are paid more, other things equal, and I doubt that’s a bias; dads are better motivated, and that matters. Your life is long, most big world problems will still be there in a decade or two, and following the usual human trajectory you should expect to have the most wisdom and influence around age 40 or 50. Having kids helps you gain both.

Comment by Kaj_Sotala on Some thoughts on the EA Munich // Robin Hanson incident · 2020-09-03T07:56:32.660Z · EA · GW

Thanks. It looks to me that much of what's being described at these links is about the atmosphere among the students at American universities, which then also starts affecting the professors there. That would explain my confusion, since a large fraction of my academic friends are European, so largely unaffected by these developments.

there could be a number of explanations aside from cancel culture not being that bad in academia.

I do hear them complain about various other things though, and I also have friends privately complaining about cancel culture in non-academic contexts, so I'd generally expect this to come up if it were an issue. But I could still ask, of course.

Comment by Kaj_Sotala on "Disappointing Futures" Might Be As Important As Existential Risks · 2020-09-03T07:55:23.343Z · EA · GW

We also discussed some possible reasons for why there might be a disappointing future in the sense of having a lot of suffering, in sections 4-5 of Superintelligence as a Cause or Cure for Risks of Astronomical Suffering. A few excerpts:

4.1 Are suffering outcomes likely?

Bostrom (2003a) argues that given a technologically mature civilization capable of space colonization on a massive scale, this civilization "would likely also have the ability to establish at least the minimally favorable conditions required for future lives to be worth living", and that it could thus be assumed that all of these lives would be worth living. Moreover, we can reasonably assume that outcomes which are optimized for everything that is valuable are more likely than outcomes optimized for things that are disvaluable. While people want the future to be valuable both for altruistic and self-oriented reasons, no one intrinsically wants things to go badly.

However, Bostrom has himself later argued that technological advancement combined with evolutionary forces could "lead to the gradual elimination of all forms of being worth caring about" (Bostrom 2004), admitting the possibility that there could be technologically advanced civilizations with very little of anything that we would consider valuable. The technological potential to create a civilization that had positive value does not automatically translate to that potential being used, so a very advanced civilization could still be one of no value or even negative value.

Examples of technology’s potential being unevenly applied can be found throughout history. Wealth remains unevenly distributed today, with an estimated 795 million people suffering from hunger even as one third of all produced food goes to waste (World Food Programme, 2017). Technological advancement has helped prevent many sources of suffering, but it has also created new ones, such as factory-farming practices under which large numbers of animals are maltreated in ways which maximize their production: in 2012, the amount of animals slaughtered for food was estimated at 68 billion worldwide (Food and Agriculture Organization of the United Nations 2012). Industrialization has also contributed to anthropogenic climate change, which may lead to considerable global destruction. Earlier in history, advances in seafaring enabled the transatlantic slave trade, with close to 12 million Africans being sent in ships to live in slavery (Manning 1992).

Technological advancement does not automatically lead to positive results (Häggström 2016). Persson & Savulescu (2012) argue that human tendencies such as “the bias towards the near future, our numbness to the suffering of great numbers, and our weak sense of responsibility for our omissions and collective contributions”, which are a result of the environment humanity evolved in, are no longer sufficient for dealing with novel technological problems such as climate change and it becoming easier for small groups to cause widespread destruction. Supporting this case, Greene (2013) draws on research from moral psychology to argue that morality has evolved to enable mutual cooperation and collaboration within a select group (“us”), and to enable groups to fight off everyone else (“them”). Such an evolved morality is badly equipped to deal with collective action problems requiring global compromises, and also increases the risk of conflict and generally negative-sum dynamics as more different groups get in contact with each other.

As an opposing perspective, West (2017) argues that while people are often willing to engage in cruelty if this is the easiest way of achieving their desires, they are generally “not evil, just lazy”. Practices such as factory farming are widespread not because of some deep-seated desire to cause suffering, but rather because they are the most efficient way of producing meat and other animal source foods. If technologies such as growing meat from cell cultures became more efficient than factory farming, then the desire for efficiency could lead to the elimination of suffering. Similarly, industrialization has reduced the demand for slaves and forced labor as machine labor has become more effective. At the same time, West acknowledges that this is not a knockdown argument against the possibility of massive future suffering, and that the desire for efficiency could still lead to suffering outcomes such as simulated game worlds filled with sentient non-player characters (see section on cruelty-enabling technologies below). [...]

4.2 Suffering outcome: dystopian scenarios created by non-value-aligned incentives.

Bostrom (2004, 2014) discusses the possibility of technological development and evolutionary and competitive pressures leading to various scenarios where everything of value has been lost, and where the overall value of the world may even be negative. Considering the possibility of a world where most minds are brain uploads doing constant work, Bostrom (2014) points out that we cannot know for sure that happy minds are the most productive under all conditions: it could turn out that anxious or unhappy minds would be more productive. [...]

More generally, Alexander (2014) discusses examples such as tragedies of the commons, Malthusian traps, arms races, and races to the bottom as cases where people are forced to choose between sacrificing some of their values and getting outcompeted. Alexander also notes the existence of changes to the world that nearly everyone would agree to be net improvements - such as every country reducing its military by 50%, with the savings going to infrastructure - which nonetheless do not happen because nobody has the incentive to carry them out. As such, even if the prevention of various kinds of suffering outcomes would be in everyone’s interest, the world might nonetheless end up in them if the incentives are sufficiently badly aligned and new technologies enable their creation.

An additional reason for why such dynamics might lead to various suffering outcomes is the so-called Anna Karenina principle (Diamond 1997, Zaneveld et al. 2017), named after the opening line of Tolstoy’s novel Anna Karenina: "all happy families are all alike; each unhappy family is unhappy in its own way". The general form of the principle is that for a range of endeavors or processes, from animal domestication (Diamond 1997) to the stability of animal microbiomes (Zaneveld et al. 2017), there are many different factors that all need to go right, with even a single mismatch being liable to cause failure.

Within the domain of psychology, Baumeister et al. (2001) review a range of research areas to argue that “bad is stronger than good”: while sufficiently many good events can overcome the effects of bad experiences, bad experiences have a bigger effect on the mind than good ones do. The effect of positive changes to well-being also tends to decline faster than the impact of negative changes: on average, people’s well-being suffers and never fully recovers from events such as disability, widowhood, and divorce, whereas the improved well-being that results from events such as marriage or a job change dissipates almost completely given enough time (Lyubomirsky 2010).

To recap, various evolutionary and game-theoretical forces may push civilization in directions that are effectively random, random changes are likely to bad for the things that humans value, and the effects of bad events are likely to linger disproportionately on the human psyche. Putting these considerations together suggests (though does not guarantee) that freewheeling development could eventually come to produce massive amounts of suffering.
Comment by Kaj_Sotala on Some thoughts on the EA Munich // Robin Hanson incident · 2020-09-03T06:29:48.183Z · EA · GW
yet academia is now the top example of cancel culture

I'm a little surprised by this wording? Certainly cancel culture is starting to affect academia as well, but I don't think that e.g. most researchers think about the risk of getting cancelled when figuring out the wording for their papers, unless they are working on some exceptionally controversial topic?

I have lots of friends in academia and follow academic blogs etc., and basically don't hear any of them talking about cancel culture within that context. I did recently see a philosopher recently post a controversial paper and get backlash for it on Twitter, but then he seemed to basically shrug it off since people complaining on Twitter didn't really affect him. This fits my general model that most of the cancel culture influence on academia comes from people outside academia trying to affect it, with varying success.

I don't doubt that there are individual pockets with academia that are more cancely, but the rest of academia seems to me mostly unaffected by them.

Comment by Kaj_Sotala on Some thoughts on the EA Munich // Robin Hanson incident · 2020-09-02T08:22:29.136Z · EA · GW

On the positive side, a recent attempt to bring cancel culture to EA was very resoundingly rejected, with 111 downvotes and strongly upvoted rebuttals.

Comment by Kaj_Sotala on Shifts in subjective well-being scales? · 2020-08-19T12:38:39.368Z · EA · GW

I don't know, but I get the impression that SWB questions are susceptible to framing effects in general: for example, Biswas-Diener & Diener (2001) found that when people in Calcutta were asked for their life satisfaction in general, and also for their satisfaction in 12 subdomains (material resources, friendship, morality, intelligence, food, romantic relationship, family, physical appearance, self, income, housing, and social life), they gave on average a slightly negative rating for the global satisfaction, while also giving positive ratings for all the subdomains. (This result was replicated at least by Cox 2011 in Nicaragua.)

Biswas-Diener & Diener 2001 (scale of 1-3):

The mean score for the three groups on global life satisfaction was 1.93 (on the negative side just under the neutral point of 2). [...] The mean ratings for all twelve ratings of domain satisfaction fell on the positive (satisfied) side, with morality being the highest (2.58) and the lowest being satisfaction with income (2.12).

Cox 2011 (scale of 1-7):

The sample level mean on global life satisfaction was 3.8 (SD = 1.7). Four is the mid-point of the scale and has been interpreted as a neutral score. Thus this sample had an overall mean just below neutral. [...] The specific domain satisfactions (housing, family, income, physical appearance, intelligence, friends, romantic relationships, morality, and food) have means ranging from 3.9 to 5.8, and a total mean of 4.9. Thus all nine specific domains are higher than global life satisfaction. For satisfaction with the broader domains (self, possessions, and social life) the means ranged from 4.4 to 5.2, with a mean of 4.8. Again, all broader domain satisfactions are higher than global life satisfaction. It is thought that global judgments of life satisfaction are more susceptible to positivity bias and that domain satisfaction might be more constrained by the concrete realities of an individual’s life (Diener et al. 2000)
Comment by Kaj_Sotala on A New X-Risk Factor: Brain-Computer Interfaces · 2020-08-14T15:09:10.606Z · EA · GW
In particular, Elon Musk claims that BCIs may allow us to integrate with AI such that AI will not need to outcompete us (Young, 2019). It is unclear at present by what exact mechanism a BCI would assist here, how it would help, whether it would actually decrease risk from AI, or if it is a valid claim at all. Such a ‘solution’ to AGI may also be entirely compatible with global totalitarianism, and may not be desirable. The mechanism by which integrating with AI would lessen AI risk is currently undiscussed; and at present, no serious academic work has been done on the topic.

We have a bit of discussion about this (predating Musk's proposal) in section 3.4. of Responses to Catastrophic AGI Risk; we're also skeptical, e.g. this excerpt from our discussion:

De Garis [82] argues that a computer could have far more processing power than a human brain, making it pointless to merge computers and humans. The biological component of the resulting hybrid would be insignificant compared to the electronic component, creating a mind that was negligibly different from a 'pure' AGI. Kurzweil [168] makes the same argument, saying that although he supports intelligence enhancement by directly connecting brains and computers, this would only keep pace with AGIs for a couple of additional decades.
The truth of this claim seems to depend on exactly how human brains are augmented. In principle, it seems possible to create a prosthetic extension of a human brain that uses the same basic architecture as the original brain and gradually integrates with it [254]. A human extending their intelligence using such a method might remain roughly human-like and maintain their original values. However, it could also be possible to connect brains with computer programs that are very unlike human brains and which would substantially change the way the original brain worked. Even smaller differences could conceivably lead to the adoption of 'cyborg values' distinct from ordinary human values [290].
Bostrom [49] speculates that humans might outsource many of their skills to non-conscious external modules and would cease to experience anything as a result. The value-altering modules would provide substantial advantages to their users, to the point that they could outcompete uploaded minds who did not adopt the modules. [...]
Moravec [194] notes that the human mind has evolved to function in an environment which is drastically different from a purely digital environment and that the only way to remain competitive with AGIs would be to transform into something that was very different from a human.
Comment by Kaj_Sotala on Slate Star Codex, EA, and self-reflection · 2020-06-27T10:38:02.390Z · EA · GW

Let's look at some of your references. You say that Scott has endorsed eugenics; let's look up the exact phrasing (emphasis mine):

Even though I like both basic income guarantees and eugenics, I don’t think these are two things that go well together – making the income conditional upon sterilization is a little too close to coercion for my purposes. Still, probably better than what we have right now.

"I don't like this, though it would probably be better than the even worse situation that we have today" isn't exactly a strong endorsement. Note the bit about disliking coercion which should already suggest that Scott doesn't like "eugenics" in the traditional sense of involuntary sterilization, but rather non-coercive eugenics that emphasize genetic engineering and parental choice.

Simply calling this "eugenics" with no caveats is misleading; admittedly Scott himself sometimes forgets to make this clarification, so one would be excused for not knowing what he means... but not when linking to a comment where he explicitly notes that he doesn't want to have coercive forms of eugenics.

Next, you say that he has endorsed "Charles Murray, a prominent proponent of racial IQ differences". Looking up the exact phrasing again, Scott says:

The only public figure I can think of in the southeast quadrant with me is Charles Murray. Neither he nor I would dare reduce all class differences to heredity, and he in particular has some very sophisticated theories about class and culture. But he shares my skepticism that the 55 year old Kentucky trucker can be taught to code, and I don’t think he’s too sanguine about the trucker’s kids either. His solution is a basic income guarantee, and I guess that’s mine too. Not because I have great answers to all of the QZ article’s problems. But just because I don’t have any better ideas1,2.

What is "the southeast quadrant"? Looking at earlier in the post, it reads:

The cooperatives argue that everyone is working together to create a nice economy that enriches everybody who participates in it, but some people haven’t figured out exactly how to plug into the magic wealth-generating machine, and we should give them a helping hand (“here’s government-subsidized tuition to a school where you can learn to code!”) [...] The southeast corner is people who think that we’re all in this together, but that helping the poor is really hard.

So Scott endorses Murray's claims that... cognitive differences may have a hereditary component, that it might be hard to teach the average trucker and his kids to become programmers, and that we should probably implement a basic income so that these people will still have a reasonable income and don't need to starve. Also, the position that he ascribes to both himself and Murray is the attitude that we should do our best to help everyone, and that it's basically good for everyone try to cooperate together. Not exactly ringing endorsements of white supremacy.

Also one of the foonotes to "I don't have any better ideas" is "obviously invent genetic engineering and create a post-scarcity society, but until then we have to deal with this stuff", which again ties to the part where to the extent that Scott endorses eugenics, he endorses liberal eugenics.

Finally, you note that Scott identifies with the "hereditarian left". Let's look at the article that Scott links to when he says that this term "seems like as close to a useful self-identifier as I’m going to get". It contains an explicit discussion of how the possibility of cognitive differences between groups does not in any sense imply that one of the groups would have more value, morally or otherwise, than the other:

I also think it’s important to stress that contemporary behavioral genetic research is — with very, very few exceptions — almost entirely focused on explaining individual differences within ancestrally homogeneous groups. Race has a lot to do with how behavioral genetic research is perceived, but almost nothing to do with what behavioral geneticists are actually studying. There are good methodological reasons for this. Twin studies are, of course, using twins, who almost always self-identify as the same race. And genome-wide association studies (GWASs) typically use a very large group of people who all have the same self-identified race (usually White), and then rigorously control for genetic ancestry differences even within that already homogeneous group. I challenge anyone to read the methods section of a contemporary GWAS and persist in thinking that this line of research is really about race differences.
Despite all this, racists keep looking for “evidence” to support racism. The embrace of genetic research by racists reached its apotheosis, of course, in Nazism and the eugenics movements in the U.S. After all, eugenics means “good genes”– ascribing value and merit to genes themselves. Daniel Kevles’ In the Name of Eugenics: Genetics and the Uses of Human Heredity should be required reading for anyone interested in both the history of genetic science and in how this research has been (mis)used in the United States. This history makes clear that the eugenic idea of conceptualizing heredity in terms of inherent superiority was woven into the fabric of early genetic science (Galton and Pearson were not, by any stretch, egalitarians) and an idea that was deliberately propagated. The idea that genetic influence on intelligence should be interpreted to mean that some people are inherently superior to other people is itself a racist invention.
Fast-forward to 2017, and nearly everyone, even people who think that they are radical egalitarians who reject racism and white supremacy and eugenic ideology in all its forms, has internalized this “genes == inherent superiority” equation so completely that it’s nearly impossible to have any conversation about genetic research that’s not tainted by it. On both the right and the left, people assume that if you say, “Gene sequence differences between people statistically account for variation in abstract reasoning ability,” what you really mean is “Some people are inherently superior to other people.” Where people disagree, mostly, is in whether they think this conclusion is totally fine or absolutely repugnant. (For the record, and this should go without saying, but unfortunately needs to be said — I fall in the latter camp.) But very few people try to peel apart those ideas. (A recent exception is this series of blog posts by Fredrik deBoer.) The space between, which says, “Gene sequence differences between people statistically account for variation in abstract reasoning ability” but also says “This observation has no bearing on how we evaluate the inherent value or worth of people” is astoundingly small. [...]
But must genetic research necessarily be interpreted in terms of superiority and inferiority? Absolutely not. To get a flavor of other possible interpretations, we can just look at how people describe genetic research on nearly any other human trait.
Take, for example, weight. Here, is a New York Times article that quotes one researcher as saying, “It is more likely that people inherit a collection of genes, each of which predisposes them to a small weight gain in the right environment.” Substitute “slight increase in intelligence” for “small weight gain” in that sentence and – voila! You have the mainstream scientific consensus on genetic influences on IQ. But no one is writing furious think pieces in reaction to scientists working to understand genetic differences in obesity. According to the New York Times, the implications of this line of genetic research is … people shouldn’t blame themselves for a lack of self-control if they are heavy, and a “one size fits all” approach to weight loss won’t be effective.
As another example, think about depression. The headline of one New York Times article is “Hunting the Genetic Signs of Postpartum Depression with an iPhone App.” Pause for a moment and consider how differently the article would be received if the headline were “Hunting the Genetic Signs of Intelligence with an iPhone App.” Yet the research they describe – a genome-wide association study – is exactly the same methodology used in recent genetic research on intelligence and educational attainment. The science isn’t any different, but there’s no talk of identifying superior or inferior mothers. Rather, the research is justified as addressing the needs of “mothers and medical providers clamoring for answers about postpartum depression.” [...]
1. The idea that some people are inferior to other people is abhorrent.
2. The mainstream scientific consensus is that genetic differences between people (within ancestrally homogeneous populations) do predict individual differences in traits and outcomes (e.g., abstract reasoning, conscientiousness, academic achievement, job performance) that are highly valued in our post-industrial, capitalist society.
3. Acknowledging the evidence for #2 is perfectly compatible with belief #1.
4. The belief that one can and should assign merit and superiority on the basis of people’s genes grew out of racist and classist ideologies that were already sorting people as inferior and superior.
5. Instead of accepting the eugenic interpretation of what genetic research means, and then pushing back against the research itself, people – especially people with egalitarian and progressive values — should stop implicitly assuming that genes==inherent merit.

So you are arguing that Scott is a white supremacist, and your pieces of evidence include:

  • A comment where Scott says that he doesn't want to have coercive eugenics
  • An essay where Scott talks about the best ways of helping people who might be cognitively disadvantaged, and suggests that we should give them a basic income guarantee
  • A post where Scott links to and endorses an article which focuses on arguing that considering some people as inferior to others is abhorrent, and that we should reject the racist idea of genetics research having any bearing to how inherently valuable people are
Comment by Kaj_Sotala on Slate Star Codex, EA, and self-reflection · 2020-06-27T09:39:46.950Z · EA · GW

Also the sleight of hand where the author implies that Scott is a white supremacist, and supports this not by referencing anything that Scott said, but by referencing things that unrelated people hanging out on the SSC subreddit have said and which Scott has never shown any signs of endorsing. If Scott himself had said anything that could be interpreted as an endorsement of white supremacy, surely it would have been mentioned in this post, so its absence is telling.

As Tom Chivers recently noted:

It’s part of the SSC ethos that “if you don’t understand how someone could possibly believe something as stupid as they do”, then you should consider the possibility that that’s because you don’t understand, rather than because they’re stupid; the “principle of charity”. So that means taking ideas seriously — even ones you’re uncomfortable with. And the blog and its associated subreddit have rules of debate: that you’re not allowed to shout things down, or tell people they’re racist; you have to politely and honestly argue the facts of the issue at hand. It means that the sites are homes for lively debate, rare on the modern internet, between people who actually disagree; Left and Right, Republican and Democrat, pro-life and pro-choice, gender-critical feminists and trans-activist, MRA and feminist.
And that makes them vulnerable. Because if you’re someone who wants to do a hatchet job on them, you can easily go through the comments and find something that someone somewhere will find appalling. That’s partly a product of the disagreement and partly a function of how the internet works: there’s an old law of the internet, the “1% rule”, which says that the large majority of online comments will come from a hyperactive 1% of the community. That was true when I used to work at Telegraph Blogs — you’d get tens of thousands of readers, but you’d see the same 100 or so names cropping up every time in the comment sections.
(Those names were often things like Aelfric225 or TheUnBrainWashed, and they were usually really unhappy about immigration.)
That’s why the rationalists are paranoid. They know that if someone from a mainstream media organisation wanted to, they could go through those comments, cherry-pick an unrepresentative few, and paint the entire community as racist and/or sexist, even though surveys of the rationalist community and SSC readership found they were much more left-wing and liberal on almost every issue than the median American or Briton. And they also knew that there were people on the internet who unambiguously want to destroy them because they think they’re white supremacists.
Comment by Kaj_Sotala on Slate Star Codex, EA, and self-reflection · 2020-06-27T09:32:45.083Z · EA · GW
Not to be rude, but what context do you recommend would help for interpreting the statement, "I like both basic income guarantees and eugenics," or describing requiring poor people to be sterilized to receive basic income as "probably better than what we have right now?"

The part from the middle of that excerpt that you left out certainly seems like relevant context: "Even though I like both basic income guarantees and eugenics, I don’t think these are two things that go well together – making the income conditional upon sterilization is a little too close to coercion for my purposes. Still, probably better than what we have right now." (see my top-level comment)

Comment by Kaj_Sotala on Reducing long-term risks from malevolent actors · 2020-04-30T11:01:04.340Z · EA · GW
Malevolent humans with access to advanced technology—such as whole brain emulation or other forms of transformative AI—could cause serious existential risks and suffering risks.

Possibly relevant: Machiavellians Approve of Mind Upload Technology Directly and Through Utilitarianism (Laakasuo et al. 2020), though it mainly tested whether machiavellians express moral condemnation of mind uploading, rather than their interest directly.

In this preregistered study, we have two novel findings: 1) Utilitarian moral preferences are strongly and psychopathy is mildly associated with positive approval of MindUpload; and 2) that Machiavellianism – essentially a calculative self-interest related trait – is strongly associated with positive approval of Mind Upload, even after controlling for Utilitarianism and the previously known predictor of Sexual Disgust (and conservatism). In our preregistration, we had assumed that the effect would be dependent on Psychopathy (another Dark Triad personality dimension), rather than Machiavellianism. However, given how closely related Machiavellianism and Psychopathy are, we argue that the results match our hypothesis closely. Our results suggest that the perceived risk of callous and selfish individuals preferring Mind Upload should be taken seriously, as previously speculated by Sotala & Yampolskiy (2015)
Comment by Kaj_Sotala on Finding it hard to retain my belief in altruism · 2019-01-02T18:58:54.652Z · EA · GW

You seem to be working under the assumption that we have either emotional or logical motivations for doing something. I think that this is mistaken: logic is a tool for achieving our motivations, and all of our motivations ultimately ground in emotional reasons. In fact, it has been my experience that focusing too much on trying to find "logical" motivations for our actions may lead to paralysis, since absent an emotional motive, logic doesn't provide any persuasive reason to do one thing over another.

You said that people act altruistically because "ultimately they're doing it to not feel bad, to feel good, or to help a loved one". I interpret this to mean that these are all reasons which you think are coming from the heart. But can you think of any reason for doing anything which does *not* ultimately ground in something like these reasons?

I don't know you, so I don't want to suggest that I think that I know how your mind works... but reading what you've written, I can't help getting the feeling that the thought of doing something which is motivated in emotion rather than logic makes you feel bad, and that the reason why you don't want to do things which are motivated by emotion is that you have an emotional aversion to it. In my experience, it's very common for people to have an emotional aversion to what they think emotional reasoning is, causing them to convince themselves that they are making their decisions based on logic rather than emotion. If someone has a strong (emotional) conviction that logic is good and emotion is bad, then they will be strongly motivated to try to ground all of their actions in logical reasoning. All the while being unmotivated to notice the reason why they are so invested in logical reasoning. I used to do something like this, which is how I became convinced of the inadequacy of logical reasoning for resolving conflicts such as these. I tried and failed for a rather long time before switching tactics.

The upside of this is that you don't really need to find a logical reason for acting altruistically. Yes, many people who are driven by emotion end up acting selfishly rather than altruistically. But since everyone is ultimately driven by emotions, then as long as you believe that there are people who act altruistically, then that implies that it's possible to act altruistically while being motivated emotionally.

What I would suggest, would be to embrace everything being driven by emotion, and then trying to find a solution which satisfies all of your emotional needs. You say that studying to get a PhD in machine learning would make you feel bad, and also that not doing it is also bad. I don't think that either of these feelings is going to just go away: if you just chose to do a machine learning PhD, or just chose to not do it, then the conflict would keep bothering you regardless, and you'd feel unhappy either way you chose. I'd recommend figuring out the reasons why you would hate the machine learning path, and also the conditions under which you feel bad about not doing enough altruistic work, and then figuring out a solution which would satisfy all of your emotional needs. (CFAR's workshops teach exactly this kind of thing .)

I should also remark that I was recently in a somewhat similar situation as you: I felt that the right thing to do would be to work on AI stuff, but also that I didn't want to. Eventually I came to the conclusion that the reason why I didn't want it was that a part of my mind was convinced that the kind of AI work that I could do, wouldn't actually be as impactful as other things that I could be doing - and this judgment has mostly held up under logical analysis. This is not to say that doing the ML PhD would genuinely be a bad idea for you as well, but I do think that it would be worth examining the reasons for why exactly you wouldn't want to do studies. Maybe your emotions are actually trying to tell you something important? (In my experience, they usually are, though of course it's also possible for them to be mistaken.)

One particular question that I would ask is: you say you would enjoy working in AI, but you wouldn't enjoy learning the stuff that you need to do in order to work in AI. This might make sense in a field where you are required to study something that's entirely unrelated to what's useful for your job. But particularly once you get around doing doing your graduate studies, much of that stuff will be directly relevant for your work. If you think that you would hate to be in an environment where you get to spend most of your time learning about AI, why do you think that you would enjoy a research job, which also requires you to spend a lot of time learning about AI?

Comment by Kaj_Sotala on The case for taking AI seriously as a threat to humanity · 2018-12-27T13:15:38.044Z · EA · GW
My perspective here is that many forms of fairness are inconsistent, and fall apart on significant moral introspection as you try to make your moral preferences consistent. I think the skin-color thing is one of them, which is really hard to maintain as something that you shouldn't pay attention to, as you realize that it can't be causally disentangled from other factors that you feel like you definitely should pay attention to (such as the person's physical strength, or their height, or the speed at which they can run).

I think that a sensible interpretation of "is the justice system (or society in general) fair" is "does the justice system (or society) reward behaviors that are good overall, and punish behaviors that are bad overall"; in other words, can you count on society to cooperate with you rather than defect on you if you cooperate with it. If you get jailed based (in part) on your skin color, then if you have the wrong skin color (which you can't affect), there's an increased probability of society defecting on you regardless of whether you cooperate or defect. This means that you have an extra incentive to defect since you might get defected on anyway. This feels like a sensible thing to try to avoid.

Comment by Kaj_Sotala on The harm of preventing extinction · 2018-12-27T11:12:10.910Z · EA · GW

On the other hand, there are also arguments for why one should work to prevent extinction even if one did have the kind of suffering-focused view that you're arguing for; see e.g. this article. To briefly summarize some of its points:

If humanity doesn't go extinct, then it will eventually colonize space; if we don't colonize space, it may eventually be colonized by an alien species with even more cruelty than us.

Whether alternative civilizations would be more or less compassionate or cooperative than humans, we can only guess. We may however assume that our reflected preferences depend on some aspects of being human, such as human culture or the biological structure of the human brain[48]. Thus, our reflected preferences likely overlap more with a (post-)human civilization than alternative civilizations. As future agents will have powerful tools to shape the world according to their preferences, we should prefer (post-)human space colonization over space colonization by an alternative civilization.

A specific extinction risk is the creation of unaligned AI, which might first destroy humanity and then go on to colonize space; if it lacked empathy, it might create a civilization where none of the agents cared about the suffering of others, causing vastly more suffering to exist.

Space colonization by an AI might include (among other things of value/disvalue to us) the creation of many digital minds for instrumental purposes. If the AI is only driven by values orthogonal to ours, it would likely not care about the welfare of those digital minds. Whether we should expect space colonization by a human-made, misaligned AI to be morally worse than space colonization by future agents with (post-)human values has been discussed extensively elsewhere. Briefly, nearly all moral views would most likely rather have human value-inspired space colonization than space colonization by AI with arbitrary values, giving extra reason to work on AI alignment especially for future pessimists.

Trying to prevent extinction also helps avoid global catastrophic risks (GCRs); GCRs could set social progress back, causing much more violence and other kinds of suffering than we have today.

Global catastrophe here refers to a scenario of hundreds of millions of human deaths and resulting societal collapse. Many potential causes of human extinction, like a large scale epidemic, nuclear war, or runaway climate change, are far more likely to lead to a global catastrophe than to complete extinction. Thus, many efforts to reduce the risk of human extinction also reduce global catastrophic risk. In the following, we argue that this effect adds substantially to the EV of efforts to reduce extinction risk, even from the very-long term perspective of this article. This doesn’t hold for efforts to reduce risks that, like risks from misaligned AGI, are more likely to lead to complete extinction than to a global catastrophe. [...]
Can we expect the “new” value system emerging after a global catastrophe to be robustly worse than our current value system? While this issue is debated[60], Nick Beckstead gives a strand of arguments suggesting the “new” values would in expectation be worse. Compared to the rest of human history, we currently seem to be on a unusually promising trajectory of social progress. What exactly would happen if this period was interrupted by a global catastrophe is a difficult question, and any answer will involve many judgements calls about the contingency and convergence of human values. However, as we hardly understand the driving factors behind the current period of social progress, we cannot be confident it would recommence if interrupted by a global catastrophe. Thus, if one sees the current trajectory as broadly positive, one should expect this value to be partially lost if a global catastrophe occurs.

Efforts to reduce extinction risk often promote coordination, peace and stability, which can be useful for reducing the kinds of atrocities that you're talking about.

Taken together, efforts to reduce extinction risk also promote a more coordinated, peaceful and stable global society. Future agents in such a society will probably make wiser and more careful decisions, reducing the risk of unexpected negative trajectory changes in general. Safe development of AI will specifically depend on these factors. Therefore, efforts to reduce extinction risk may also steer the world away from some of the worst non-extinction outcomes, which likely involve war, violence and arms races.
Comment by Kaj_Sotala on The harm of preventing extinction · 2018-12-26T21:51:52.109Z · EA · GW
Do you have a short summary of why he thinks that someone answering the question of "would you have preferred to die right after child birth?" with "No?" is not strong evidence that they should have been born?

I don't know what Benatar's response to this is, but - consider this comment by Eliezer in a discussion of the Repugnant Conclusion:

“Barely worth living” can mean that, if you’re already alive and don’t want to die, your life is almost but not quite horrible enough that you would rather commit suicide than endure. But if you’re told that somebody like this exists, it is sad news that you want to hear as little as possible. You may not want to kill them, but you also wouldn’t have that child if you were told that was what your child’s life would be like.

As a more extreme version, suppose that we could create arbitrary minds, and chose to create one which, for its entire existence, experienced immense suffering which it wanted to stop. Say that it experienced the equivalent of being burned with a hot iron, for every second of its existence, and never got used to it. Yet, when asked whether it wanted to die, or would have preferred to die right after it was born, we'd design it in such a way that it would consider death even worse and respond "no". Yet it seems obvious to me that it outputting this response is not a compelling reason to create such a mind.

If people already exist, then there are lots of strong reasons about respecting people's autonomy etc. for why we should respect their desire to continue existing. But if we're making the decision about what kinds of minds should come to existence, those reasons don't seem to be particularly compelling. Especially not since we can construct situations in which we could create a mind that preferred to exist, but where it nonetheless seems immoral to create it.

You can of course reasonably argue that whether a mind should exist, depends on whether they would want to exist and some additional criteria about e.g. how happy they would be. Then if we really could create arbitrary minds, then we might as well (and should) create ones that were happy and preferred to exist, as opposed to ones which were unhappy and preferred to exist. But in that case we've already abandoned the simplicity of just basing our judgment on asking whether they're happy with having survived to their current age.

I surely prefer to exist and would be pretty sad about a world in which I wasn't born (in that I would be willing to endure significant additional suffering in order to cause a world in which I was born).

This doesn't seem coherent to me; once you exist, you can certainly prefer to continue existing, but I don't think it makes sense to say "if I didn't exist, I would prefer to exist". If we've assumed that you don't exist, then how can you have preferences about existing?

If I ask myself the question, "do I prefer a world where I hadn't been born versus a world where I had been born", and imagine that my existence would actually hinge on my answer, then that means that I will in effect die if I answer "I prefer not having been born". So then the question that I'm actually answering is "would I prefer to instantly commit a painless suicide which also reverses the effects of me having come into existence". So that's smuggling in a fair amount of "do I prefer to continue existing, given that I already exist". And that seems to me unavoidable - the only way we can get a mind to tell us whether or not it prefers to exist, is by instantiating it, and then it will answer from a point of view where it actually exists.

I feel like this makes the answer to the question "if a person doesn't exist, would they prefer to exist" either "undefined" or "no" ("no" as in "they lack an active desire to exist", though of course they also lack an active desire to not-exist). Which is probably for the better, given that there exist all kinds of possible minds that would probably be immoral to instantiate, even though once instantiated they'd prefer to exist.

Comment by Kaj_Sotala on 2018 AI Alignment Literature Review and Charity Comparison · 2018-12-19T21:14:03.577Z · EA · GW
In the past [EAF/FRI] have been rather negative utilitarian, which I have always viewed as an absurd and potentially dangerous doctrine. If you are interested in the subject I recommend Toby Ord’s piece on the subject. However, they have produced research on why it is good to cooperate with other value systems, making me somewhat less worried.

(I work for FRI.) EA/FRI is generally "suffering-focused", which is an umbrella term covering a range of views; NU would be the most extreme form of that, and some of us do lean that way, but many disagree with it and hold some view which would be considered much more plausible by most people (see the link for discussion). Personally I used to lean more NU in the past, but have since then shifted considerably in the direction of other (though still suffering-focused) views.

Besides the research about the value of cooperation that you noted, this article discusses reasons why the expected value of x-risk reduction could be positive even from a suffering-focused view; the paper of mine referenced in your post also discusses why suffering-focused views should care about AI alignment and cooperate with others in order to ensure that we get aligned AI.

And in general it's just straightforwardly better and (IMO) more moral to try to create a collaborative environment where people who care about the world can work together in support of their shared points of agreement, rather than trying to undercut each other. We are also aware of the unilateralist's curse, and do our best to discourage any other suffering-focused people from doing anything stupid.

Comment by Kaj_Sotala on Is Effective Altruism fundamentally flawed? · 2018-03-16T13:02:36.817Z · EA · GW

The following is roughly how I think about it:

If I am in a situation where I need help, then for purely selfish reasons, I would prefer people-who-are-capable-of-helping-me to act in such a way that has the highest probability of helping me. Because I obviously want my probability of getting help, to be as high as possible.

Let's suppose that, as in your original example, I am one of three people who need help, and someone is thinking about whether to act in a way that helps one person, or to act in a way that helps two people. Well, if they act in a way that helps one person, then I have a 1/3 chance of being that person; and if they act in a way that helps two people, then I have a 2/3 chance of being one of those two people. So I would rather prefer them to act in a way that helps as many people as possible.

I would guess that most people, if they need help and are willing to accept help, would also want potential helpers to act in such a way that maximizes their probability of getting help.

Thus, to me, reason and empathy would say that the best way to respect the desires of people who want help, is to maximize the amount of people you are helping.

Comment by Kaj_Sotala on The Technological Landscape Affecting Artificial General Intelligence and the Importance of Nanoscale Neural Probes · 2018-01-12T15:01:14.076Z · EA · GW

Hi Daniel,

you argue in section 3.3 of your paper that nanoprobes are likely to be the only viable route to WBE, because of the difficulty in capturing all of the relevant information in a brain if an approach such as destructive scanning is used.

You don't however seem to discuss the alternative path of neuroprosthesis-driven uploading:

we propose to connect to the human brain an exocortex, a prosthetic extension of the biological brain which would integrate with the mind as seamlessly as parts of the biological brain integrate with each other. [...] we make three assumptions which will be further fleshed out in the following sections:

There seems to be a relatively unified cortical algorithm which is capable of processing different types of information. Most, if not all, of the information processing in the brain of any given individual is carried out using variations of this basic algorithm. Therefore we do not need to study hundreds of different types of cortical algorithms before we can create the first version of an exocortex.
We already have a fairly good understanding on how the cerebral cortex processes information and gives rise to the attentional processes underlying consciousness. We have a good reason to believe that an exocortex would be compatible with the existing cortex and would integrate with the mind.
The cortical algorithm has an inbuilt ability to transfer information between cortical areas. Connecting the brain with an exocortex would therefore allow the exocortex to gradually take over or at least become an interface for other exocortices.

In addition to allowing for mind coalescence, the exocortex could also provide a route for uploading human minds. It has been suggested that an upload can be created by copying the brain layer-by-layer [Moravec, 1988] or by cutting a brain into small slices and scanning them [Sandberg & Bostrom, 2008]. However, given our current technological status and understanding of the brain, we suggest that the exocortex might be a likely intermediate step. As an exocortex-equipped brain aged, degenerated and eventually died, an exocortex could take over its functions, until finally the original person existed purely in the exocortex and could be copied or moved to a different substrate.

This seems to avoid the objection of it being too hard to scan the brain in all detail. If we can replicate the high-level functioning of the cortical algorithm, then we can do so in a way which doesn't need to be biologically realistic, but which will still allow us to implement the brain's essential functions in a neural prosthesis (here's some prior work that also replicates some aspect of brain's functioning and re-implements it in a neuroprosthesis, without needing to capture all of the biological details). And if the cortical algorithm can be replicated in a way that allows the person's brain to gradually transfer over functions and memories as the biological brain accumulates damage, the same way that function in the biological brain gets reorganized and can remain intact even as it slowly accumulates massive damage61127-1), then that should allow the entirety of the person's cortical function to transfer over to the neuroprosthesis. (of course, there are still the non-cortical parts of the brain that need to be uploaded as well)

A large challenge here is in getting the required amount of neural connections between the exocortex and the biological brain; but we are already getting relatively close, taking into account that the corpus callosum that connects the two hemispheres "only" has on the order of 100 million connections:

Earlier this year, the US Defense Advanced Research Projects Agency (DARPA) launched a project called Neural Engineering System Design. It aims to win approval from the US Food and Drug Administration within 4 years for a wireless human brain device that can monitor brain activity using 1 million electrodes simultaneously and selectively stimulate up to 100,000 neurons. (source)

Comment by Kaj_Sotala on 2017 AI Safety Literature Review and Charity Comparison · 2017-12-22T15:53:01.215Z · EA · GW

Also, one forthcoming paper of mine released as a preprint; and another paper that was originally published informally last year but published in somewhat revised and peer-reviewed form this year:

Both were done as part of my research for the Foundational Research Institute; maybe include us in your organizational comparison next year? :)

Comment by Kaj_Sotala on Anti-tribalism and positive mental health as high-value cause areas · 2017-10-18T15:22:04.142Z · EA · GW

There seem to be a lot of leads that could help us figure out the high-value interventions, though: i) knowledge about what causes it and what has contributed to changes of it over time ii) research directions that could help further improve our understanding of what causes it / what doesn't cause it iii) various interventions which already seem like they work in a small-scale setting, though it's still unclear how they might be scaled up (e.g. something like Crucial Conversations is basically about increasing trust and safety in one-to-one and small-group conversations) iv) and of course psychology in general is full of interesting ideas for improving mental health and well-being that haven't been rigorously tested, which also suggests that v) any meta-work that would improve psychology's research practices would also be even more valuable than we previously thought.

As for the "pointing out a problem people have been aware of for millenia", well, people have been aware of global poverty for millenia too. Then we got science and randomized controlled trials and all the stuff that EAs like, and got better at fixing the problem. Time to start looking at how we could apply our improved understanding of this old problem, to fixing it.

Comment by Kaj_Sotala on Anti-tribalism and positive mental health as high-value cause areas · 2017-10-18T15:12:41.956Z · EA · GW

Thanks for the reference! That sounds valuable.

Comment by Kaj_Sotala on Why I think the Foundational Research Institute should rethink its approach · 2017-07-25T23:17:19.852Z · EA · GW

I think whether suffering is a 'natural kind' is prior to this analysis: e.g., to precisely/objectively explain the functional role and source of something, it needs to have a precise/crisp/objective existence.

I take this as meaning that you agree that accepting functionalism is orthogonal to the question of whether suffering is "real" or not?

If it is a placeholder, then I think the question becomes, "what would 'something better' look like, and what would count as evidence that something is better?

What something better would look like - if I knew that, I'd be busy writing a paper about it. :-) That seems to be a part of the problem - everyone (that I know of) agrees that functionalism is deeply unsatisfactory, but very few people seem to have any clue of what a better theory might look like. Off the top of my head, I'd like such a theory to at least be able to offer some insight into what exactly is conscious, and not have the issue where you can hypothesize all kinds of weird computations (like Aaronson did in your quote) and be left confused about which of them are conscious and which are not, and why. (roughly, my desiderata are similar to Luke Muehlhauser's)

Comment by Kaj_Sotala on Cognitive Science/Psychology As a Neglected Approach to AI Safety · 2017-07-25T13:10:42.532Z · EA · GW

That's super-neat! Thanks.

Comment by Kaj_Sotala on Why I think the Foundational Research Institute should rethink its approach · 2017-07-25T11:01:35.955Z · EA · GW

Wait, are you equating "functionalism" with "doesn't believe suffering can be meaningfully defined"? I thought your criticism was mostly about the latter; I don't think it's automatically implied by the former. If you had a precise enough theory about the functional role and source of suffering, then this would be a functionalist theory that specified objective criteria for the presence of suffering.

(You could reasonably argue that it doesn't look likely that functionalism will provide such a theory, but then I've always assumed that anyone who has thought seriously about philosophy of mind has acknowledged that functionalism has major deficiencies and is at best our "least wrong" placeholder theory until somebody comes up with something better.)

Comment by Kaj_Sotala on Towards a measure of Autonomy and what it means for EA · 2017-07-22T08:22:05.872Z · EA · GW

Another discussion and definition of autonomy, by philosopher John Danaher:

Many books and articles have been written on the concept of ‘autonomy’. Generations of philosophers have painstakingly identified necessary and sufficient conditions for its attainment, subjected those conditions to revision and critique, scrapped their original accounts, started again, given up and argued that the concept is devoid of meaning, and so on. I cannot hope to do justice to the richness of the literature on this topic here. Still, it’s important to have at least a rough and ready conception of what autonomy is and the most general (and hopefully least contentious) conditions needed for its attainment.

I have said this before, but I like Joseph Raz’s general account. Like most people, he thinks that an autonomous agent is one who is, in some meaningful sense, the author of their own lives. In order for this to happen, he says that three conditions must be met:

Rationality condition: The agent must have goals/ends and must be able to use their reason to plan the means to achieve those goals/ends.

Optionality condition: The agent must have an adequate range of options from which to choose their goals and their means.

Independence condition: The agent must be free from external coercion and manipulation when choosing and exercising their rationality.

I have mentioned before that you can view these as ‘threshold conditions’, i.e. conditions that simply have to be met in order for an agent to be autonomous, or you can have a slightly more complex view, taking them to define a three dimensional space in which autonomy resides. In other words, you can argue that an agent can have more or less rationality, more or less optionality, and more or less independence. The conditions are satisfied in degrees. This means that agents can be more or less autonomous, and the same overall level of autonomy can be achieved through different combinations of the relevant degrees of satisfaction of the conditions. That’s the view I tend to favour. I think there possibly is a minimum threshold for each condition that must be satisfied in order for an agent to count as autonomous, but I suspect that the cases in which this threshold is not met are pretty stark. The more complicated cases, and the ones that really keep us up at night, arise when someone scores high on one of the conditions but low on another. Are they autonomous or not? There may not be a simple ‘yes’ or ‘no’ answer to that question.

Anyway, using the three conditions we can formulate the following ‘autonomy principle’ or ‘autonomy test’:

Autonomy principle: An agent’s actions are more or less autonomous to the extent that they meet the (i) rationality condition; (ii) optionality condition and (iii) independence condition.

Comment by Kaj_Sotala on Why I think the Foundational Research Institute should rethink its approach · 2017-07-21T19:54:03.937Z · EA · GW

Rather than put words in the mouths of other people at FRI, I'd rather let them personally answer which philosophical premises they accept and what motivates them, if they wish.

For me personally, I've just had, for a long time, the intuition that preventing extreme suffering is the most important priority. To the best that I can tell, much of this intuition can be traced to having suffered from depression and general feelings of crushing hopelessness for large parts of my life, and wanting to save anyone else from experiencing a similar (or worse!) magnitude of suffering. I seem to recall that I was less suffering-focused before I started getting depressed for the first time.

Since then, that intuition has been reinforced by reading up on other suffering-focused works; something like tranquilism feels like a sensible theory to me, especially given some of my own experiences with meditation which are generally compatible with the kind of theory of mind implied by tranquilism. That's something that has come later, though.

To clarify, none of this means that I would only value suffering prevention: I'd much rather see a universe-wide flourishing civilization full of minds in various states of bliss, than a dead and barren universe. My position is more of a prioritarian one: let's first take care of everyone who's experiencing enormous suffering, and make sure none of our descendants are going to be subject to that fate, before we start thinking about colonizing the rest of the universe and filling it with entirely new minds.

Comment by Kaj_Sotala on Why I think the Foundational Research Institute should rethink its approach · 2017-07-20T23:10:53.756Z · EA · GW

This looks sensible to me. I'd just quickly note that I'm not sure if it's quite accurate to describe this as "FRI's metaphysics", exactly - I work for FRI, but haven't been sold on the metaphysics that you're criticizing. In particular, I find myself skeptical of the premise "suffering is impossible to define objectively", which you largely focus on. (Though part of this may be simply because I haven't yet properly read/considered Brian's argument for it, so it's possible that I would change my mind about that.)

But in any case, I've currently got three papers in various stages of review, submission or preparation (that other FRI people have helped me with), and none of those papers presuppose this specific brand of metaphysics. There's a bunch of other work being done, too, which I know of and which I don't think presupposes it. So it doesn't feel quite accurate to me to suggest that the metaphysics would be holding back our progress, though of course there can be some research being carried out that's explicitly committed to this particular metaphysics.

(opinions in this comment purely mine, not an official FRI statement etc.)

Comment by Kaj_Sotala on An argument for broad and inclusive "mindset-focused EA" · 2017-07-17T10:02:17.586Z · EA · GW

I agree that if one thinks that x-risk is an immediate concern, then one should focus specifically on that now. This is explicitly a long-term strategy, so assumes that there will be a long term.

Comment by Kaj_Sotala on An argument for broad and inclusive "mindset-focused EA" · 2017-07-17T10:00:47.167Z · EA · GW

I'm somewhat confused that you list the formation of many groups as a benefit of broad mindset spread, but then say that we should try to achieve the formation of one very large group (that of "low-level EA"). If our goal is many groups, maybe it would be better to just create many groups?

I must have expressed myself badly somehow - I specifically meant that "low-level EA" would be composed of multiple groups. What gave you the opposite impression?

For example, the current situation is that organizations like the Centre for Effective Altruism and Open Philanthropy Project are high-level organizations: they are devoted to finding the best ways of doing good in general. At the same time, organizations like Centre for the Study of Existential Risk, Animal Charity Evaluators, and Center for Applied Rationality are low-level organizations, as they are each devoted to some specific cause area (x-risk, animal welfare, and rationality, respectively). We already have several high- and low-level EA groups, and spreading the ideas would ideally cause even more of both to be formed.

If our goal is to spread particular memes, why not the naive approach of trying to achieve positions of influence in order to spread those particular memes?

This seems completely compatible with what I said? On my own behalf, I'm definitely interested in trying to achieve a position of higher influence to better spread these ideas.

Comment by Kaj_Sotala on An argument for broad and inclusive "mindset-focused EA" · 2017-07-17T09:49:46.564Z · EA · GW

"General vs. specific" could also be one

Comment by Kaj_Sotala on An argument for broad and inclusive "mindset-focused EA" · 2017-07-17T09:48:53.093Z · EA · GW

Ian David Moss has a post on this forum arguing for things along the lines of 'EA for the rich country fine arts' and other such restricted scope versions of EA.

Thanks for the link! I did a quick search to find if someone had already said something similar, but missed that.

My biggest objection to this is that to stay in line with people's habitual activities the rationales for the restricted scope have to be very gerrymandered (perhaps too much to be credible if stated explicitly), and optimizing within that restricted objective function may be pick out things that are overall bad,

I'm not sure whether the first one is really an issue - just saying that "these are general tools that you can use to improve whatever it is that you care about, and if you're not sure what you care about, you can also apply the same concepts to find that" seems reasonable enough to me, and not particularly gerrymandering.

I do agree that optimizing too specifically within some narrow domain can be a problem that produces results that are globally undesirable, though.

Comment by Kaj_Sotala on An argument for broad and inclusive "mindset-focused EA" · 2017-07-17T09:43:15.387Z · EA · GW

Thanks for the comment!

  1. It might be better to spread rationality and numeracy concepts like expected value, opportunity costs, comparative advantage, cognitive biases, etc completely unconnected to altruism than to try to explicitly spread narrow or cause-specific EA. People on average care much more about being productive, making money, having good relationships, finding meaning, etc than about their preferred altruistic causes. And it really would be a big win if they succeeded -- less ambiguously so than with narrow EA I think (see Carl's comment below). The biggest objection to this is probably crowdedness/lack of obvious low-hanging fruit.

I agree with the "lack of obvious low-hanging fruit". It doesn't actually seem obvious to me how useful these concepts are to people in general, as opposed to more specific concrete advice (such as specific exercises for improving their social skills etc.). In particular, Less Wrong has been devoted to roughly this kind of thing, and even among LW regulars who may have spent hundreds of hours participating on the site, it's always been controversial whether the concepts they've learned from the site have translated into any major life gains. My current inclination would be that "general thinking skills" just aren't very useful for dealing with your practical life, and that concrete domain-specific ideas are much more useful.

You said that people in general care much more about concrete things in their own lives than their preferred altruistic causes, and I agree with this. But on the other hand, the kinds of people who are already committed to working on some altruistic cause are probably a different case: if you're already devoted to some specific goal, then you might have more of an interest in applying those things. If you first targeted people working in existing organizations and won them over to using these ideas, then they might start teaching the ideas to all of their future hires, and over time the concepts could start to spread to the general population more.

  1. Another alternative might be to focus on spreading the prerequisites/correlates of cause-neutral, intense EA: e.g. math education, high levels of caring/empathy, cosmopolitanism, motivation to think systematically about ethics, etc. I'm unsure how difficult this would be.

Maybe. One problem here is that some of these correlate only very loosely with EA: a lot of people have completed math education who aren't EAs. And I think that another problem is that in order to really internalize an idea, you need to actively use it. My thinking here is similar to Venkatesh Rao's, who wrote:

Strong views represent a kind of high sunk cost. When you have invested a lot of effort forming habits, and beliefs justifying those habits, shifting a view involves more than just accepting a new set of beliefs. You have to:

  1. Learn new habits based on the new view
  2. Learn new patterns of thinking within the new view

The order is very important. I have never met anybody who has changed their reasoning first and their habits second. You change your habits first. This is a behavioral conditioning problem largely unrelated to the logical structure and content of the behavior. Once you’ve done that, you learn the new conscious analysis and synthesis patterns.

This is why I would never attempt to debate a literal creationist. If forced to attempt to convert one, I’d try to get them to learn innocuous habits whose effectiveness depends on evolutionary principles (the simplest thing I can think of is A/B testing; once you learn that they work, and then understand how and why they work, you’re on a slippery slope towards understanding things like genetic algorithms, and from there to an appreciation of the power of evolutionary processes).

I wouldn't know how to spread something like cosmopolitanism, to a large extent because I don't know how to teach the kind of thinking habits that would cause you to internalize cosmopolitanism. And even after that, there would still be the step of getting from all of those prerequisites to applying EA principles in concepts. In contrast, teaching EA concepts by getting people to apply them to a charitable field they already care about, gets them into applying EA-ish thinking habits directly.

Both of these alternatives seem to have what is (to me) an advantage: they don't involve the brand and terminology of EA. I think it would be easier to push on the frontiers of cause-neutral/broad EA if the label were a good signal of a large set of pretty unusual beliefs and attitudes, so that people can have high trust collaboration relatively quickly.

That's an interesting view, which I hadn't considered. I might view it more as a disadvantage, in that in the model that I was thinking of, people who got into low-level EA would almost automatically also be exposed to high-level EA, causing the idea of high-level EA to spread further. If you were only teaching related concepts, that jump from them to high-level EA wouldn't happen automatically, but would require some additional steps. (That said, if you could teach enough of those prerequisites, maybe the jump would be relatively automatic. But this seems challenging for the reasons I've mentioned above.)

Comment by Kaj_Sotala on An argument for broad and inclusive "mindset-focused EA" · 2017-07-17T09:02:55.127Z · EA · GW


1) Having a high/low distinction is part of what has led people to claim EAs are misleading. One version of it involves getting people interested through global poverty (or whatever causes they're already interested in), and then later trying to upsell them into high-level EA, which presumably has a major focus on GCRs, meta and so on.

Yeah, agreed. Though part of what I was trying to say is that, as you mentioned, we have the high/low distinction already - "implementing" that distinction would just be giving an explicit name to something that already exists. And something that has a name is easier to refer to and talk about, so having some set of terms for the two types could make it easier to be more transparent about the existence of the distinction when doing outreach. (This would be the case regardless of whether we want to expand EA to lower-impact causes or not.)

2) It sometimes seems like the most innovative and valuable idea within EA is cause-selection. It's what makes us different from simply "competent" do-gooding, and often seems to be where the biggest gains in impact lie. Low level EA seems to basically be EA minus cause selection, so by promoting it, you might lose most of the value. You might need a very big increase in scale of influence to offset this.

I guess the question here is, how much would efforts to bring in low-level EAs hurt the efforts to bring in high-level EAs. My intuition would be that the net effect would be to bring in more high-level EAs overall (a smaller percentage of incoming people would become high-level EAs, but that would be offset by there being more incoming people overall), but I don't have any firm support for that intuition and one would have to test it somehow.

3) Often the best way to promote general ideas is to live them. ... Maybe if EA wants to have more general impact on societal norms, the first thing we should focus on doing is just having a huge impact - finding the "airbnb of EA" or the "Newton of EA".

I agree that the best way to promote general ideas can be to live them. But I think we need to be more specific about what a "huge impact" would mean in this context. E.g. High Impact Science suggests that Norman Borlaug is one of the people who have had the biggest positive impact on the world - but most people have probably never heard of him. So for spreading social norms, it's not enough to live the ideas and make a big impact, one has to do it in a sufficiently visible way.

Comment by Kaj_Sotala on An argument for broad and inclusive "mindset-focused EA" · 2017-07-16T16:43:23.695Z · EA · GW

"Total indifference to cause area" isn't quite how I'd describe my proposal - after all, we would still be talking about high-level EA, a lot of people would still be focused on high-level EA and doing that, etc. The general recommendation would still be to go into high-impact causes if you had no strong preference.

Comment by Kaj_Sotala on My current thoughts on MIRI's "highly reliable agent design" work · 2017-07-10T18:45:32.478Z · EA · GW

There's a strong possibility, even in a soft takeoff, that an unaligned AI would not act in an alarming way until after it achieves a decisive strategic advantage.

That's assuming that the AI is confident that it will achieve a DSA eventually, and that no competitors will do so first. (In a soft takeoff it seems likely that there will be many AIs, thus many potential competitors.) The worse the AI thinks its chances are of eventually achieving a DSA first, the more rational it becomes for it to risk non-cooperative action at the point when it thinks it has the best chances of success - even if those chances were low. That might help reveal unaligned AIs during a soft takeoff.

Interestingly this suggests that the more AIs there are, the easier it might be to detect unaligned AIs (since every additional competitor decreases any given AI's odds of getting a DSA first), and it suggests some unintuitive containment strategies such as explicitly explaining to the AI when it would be rational for it to go uncooperative if it was unaligned, to increase the odds of unaligned AIs really risking hostile action early on and being discovered...

Comment by Kaj_Sotala on Cognitive Science/Psychology As a Neglected Approach to AI Safety · 2017-07-09T16:15:53.672Z · EA · GW

(We seem to be talking past each other in some weird way; I'm not even sure what exactly it is that we're disagreeing over.)

It would (arguably) give results that people wouldn't like, but assuming that the moral theory is correct and the machine understands it, almost necessarily it would do morally correct things.

Well sure, if we proceed from the assumption that the moral theory really was correct, but the point was that none of those proposed theories has been generally accepted by moral philosophers.

But that's an even stronger claim than the one that moral philosophy hasn't progressed towards such a goal. What reasons are there?

I gave one in the comment? That philosophy has accepted that you can't give a set of human-comprehensible set of necessary and sufficient criteria for concepts, and if you want a system for classifying concepts you have to use psychology and machine learning; and it looks like morality is similar.

Except the field of ethics does it with actual arguments among experts in the field. You could make the same story for any field: truths about physics can be determined by social consensus, since that's just what the field of physics is, a physicist presents an experiment or hypothesis, another attacks it, if the hypothesis survives the attacks and is compelling then it is eventually accepted! And so on for all non-moral fields of inquiry as well. I don't see why you think ethics would be special; basically everything can be modeled like this. But that's ridiculous. We don't look at social consensus for all forms of inquiry, because there is a difference between what ordinary people believe and what people believe when they are trained professionals in the subject.

I'm not sure what exactly you're disagreeing with? It seems obvious to me that physics does indeed proceed by social consensus in the manner you describe. Someone does an experiment, then others replicate the experiment until there is consensus that this experiment really does produce these results; somebody proposes a hypothesis to explain the experimental results, others point out holes in that hypothesis, there's an extended back-and-forth conversation and further experiments until there is a consensus that the modified hypothesis really does explain the results and that it can be accepted as an established scientific law. And the same for all other scientific and philosophical disciplines. I don't think that ethics is special in that sense.

Sure, there is a difference between what ordinary people believe and what people believe when they're trained professionals: that's why you look for a social consensus among the people who are trained professionals and have considered the topic in detail, not among the general public.

Then why don't you believe in morality by social consensus? (Or do you? It seems like you're probably not, given that you're an effective altruist.

I do believe in morality by social consensus, in the same manner as I believe in physics by social consensus: if I'm told that the physics community has accepted it as an established fact that e=mc^2 and that there's no dispute or uncertainty about this, then I'll accept it as something that's probably true. If I thought that it was particularly important for me to make sure that this was correct, then I might look up the exact reasoning and experiments used to determine this and try to replicate some of them, until I found myself to also be in consensus with the physics community.

Similarly, if someone came to me with a theory of what was moral and it turned out that the entire community of moral philosophers had considered this theory and accepted it after extended examination, and I could also not find any objections to that and found the justifications compelling, then I would probably also accept the moral theory.

But to my knowledge, nobody has presented a conclusive moral theory that would satisfy both me and nearly all moral philosophers and which would say that it was wrong to be an effective altruist - quite the opposite. So I don't see a problem in being an EA.

Comment by Kaj_Sotala on My current thoughts on MIRI's "highly reliable agent design" work · 2017-07-08T23:32:13.313Z · EA · GW

I haven't found any instances of complete axiomatic descriptions of AI systems being used to mitigate problems in those systems (e.g. to predict, postdict, explain, or fix them) or to design those systems in a way that avoids problems they'd otherwise face. [...] It seems plausible that the kinds of axiomatic descriptions that HRAD work could produce would be too taxing to be usefully applied to any practical AI system.

I wonder if slightly analogous example could be found in the design of concurrent systems.

As you may know, it's surprisingly difficult to design software that has multiple concurrent processes manipulating the same data. You typically either screw up by letting the processes edit the same data at the same time or in the wrong order, or by having them wait for each other forever.

So to help reason more clearly about this kind of thing, people developed different forms of temporal logic that let them express in a maximally unambiguous form different desiderata that they have for the system. Temporal logic lets you express statements that say things like "if a process wants to have access to some resource, it will eventually enter a state where it has access to that resource". You can then use temporal logic to figure out how exactly you want your system to behave, in order for it to do the things you want it to do and not run into any problems.

Building a logical model of how you want your system to behave is not the same thing as building the system. The logic only addresses one set of desiderata: there are many others it doesn't address at all, like what you want the UI to be like and how to make the system efficient in terms of memory and processor use. It's a model that you can use for a specific subset of your constraints, both for checking whether the finished system meets those constraints, and for building a system so that it's maximally easy for it to meet those constraints. Although the model is not a whole solution, having the model at hand before you start writing all the concurrency code is going to make things a lot easier for you than if you didn't have any clear idea of how you wanted the concurrent parts to work and were just winging it as you went.

So similarly, if MIRI developed HRAD into a sufficiently sophisticated form, it might yield a set of formal desiderata of how we want the AI to function, as well as an axiomatic model that can be applied to a part of the AI's design, to make sure everything goes as intended. But I would guess that it wouldn't really be a "complete axiomatic descriptions of" the system, in the way that temporal logics aren't a complete axiomatic description of modern concurrent systems.

Comment by Kaj_Sotala on Cognitive Science/Psychology As a Neglected Approach to AI Safety · 2017-06-22T20:38:59.791Z · EA · GW

it seems more useful to learn ML rather than cog sci/psych.

Got it. To clarify: if the question as framed as "should AI safety researchers learn ML, or should they learn cogsci/psych", then I agree that it seems better to learn ML.

Comment by Kaj_Sotala on Cognitive Science/Psychology As a Neglected Approach to AI Safety · 2017-06-20T21:09:49.609Z · EA · GW

Hi Peter, thanks for the response!

Your comment seems to suggest that you don't think the arguments in my post are relevant for technical AI safety research. Do you feel that I didn't make a persuasive case for psych/cogsci being relevant for value learning/multi-level world-models research, or do you not count these as technical AI safety research? Or am I misunderstanding you somehow?

I agree that the "understanding psychology may help persuade more people to work on/care about AI safety" and "analyzing human intelligences may suggest things about takeoff scenarios" points aren't related to technical safety research, but value learning and multi-level world-models are very much technical problems to me.

Comment by Kaj_Sotala on Cognitive Science/Psychology As a Neglected Approach to AI Safety · 2017-06-16T14:41:54.479Z · EA · GW

I am an anti-realist, and I think the prospects for identifying anything like moral truth are very low. I favor abandoning attempts to frame discussions of AI or pretty much anything else in terms of converging on or identifying moral truth.

Ah, okay. Well, in that case you can just read my original comment as an argument for why one would want to use psychology to design an AI that was capable of correctly figuring out just a single person's values and implementing them, as that's obviously a prerequisite for figuring out everybody's values. The stuff that I had about social consensus was just an argument aimed at moral realists, if you're not one then it's probably not relevant for you.

(my values would still say that we should try to take everyone's values into account, but that disagreement is distinct from the whole "is psychology useful for value learning" question)

I'm puzzled by this remark:

Sorry, my mistake - I confused utilitronium with hedonium.

Comment by Kaj_Sotala on Cognitive Science/Psychology As a Neglected Approach to AI Safety · 2017-06-12T06:57:32.261Z · EA · GW

My main objection is that that even if we pursue this project, it does not achieve the heavy metaethical lifting you were alluding to earlier. It doesn’t demonstrate nor provide any particularly good reason to regard the outputs of this process as moral truth.

Well, what alternative would you propose? I don't see how it would even be possible to get any stronger evidence for the moral truth of a theory, than the failure of everyone to come up with convincing objections to it even after extended investigation. Nor a strategy for testing the truth which wouldn't at some point reduce to "test X gives us reason to disagree with the theory".

I would understand your disagreement if you were a moral antirealist, but your comments seem to imply that you do believe that a moral truth exists and that it's possible to get information about it, and that it's possible to do "heavy metaethical lifting". But how?

I want to convert all matter in the universe to utilitronium.

I think anything as specific as this sounds worryingly close to wanting an AI to implement favoritepoliticalsystem.

What the first communist revolutionaries thought would happen, as the empirical consequence of their revolution, was that people’s lives would improve: laborers would no longer work long hours at backbreaking labor and make little money from it. This turned out not to be the case, to put it mildly. But what the first communists thought would happen, was not so very different from what advocates of other political systems thought would be the empirical consequence of their favorite political systems. They thought people would be happy. They were wrong.

Now imagine that someone should attempt to program a “Friendly” AI to implement communism, or libertarianism, or anarcho-feudalism, or favoritepoliticalsystem, believing that this shall bring about utopia. People’s favorite political systems inspire blazing suns of positive affect, so the proposal will sound like a really good idea to the proposer.

We could view the programmer’s failure on a moral or ethical level—say that it is the result of someone trusting themselves too highly, failing to take into account their own fallibility, refusing to consider the possibility that communism might be mistaken after all. But in the language of Bayesian decision theory, there’s a complementary technical view of the problem. From the perspective of decision theory, the choice for communism stems from combining an empirical belief with a value judgment. The empirical belief is that communism, when implemented, results in a specific outcome or class of outcomes: people will be happier, work fewer hours, and possess greater material wealth. This is ultimately an empirical prediction; even the part about happiness is a real property of brain states, though hard to measure. If you implement communism, either this outcome eventuates or it does not. The value judgment is that this outcome satisfices or is preferable to current conditions. Given a different empirical belief about the actual realworld consequences of a communist system, the decision may undergo a corresponding change.

We would expect a true AI, an Artificial General Intelligence, to be capable of changing its empirical beliefs (or its probabilistic world-model, et cetera). If somehow Charles Babbage had lived before Nicolaus Copernicus, and somehow computers had been invented before telescopes, and somehow the programmers of that day and age successfully created an Artificial General Intelligence, it would not follow that the AI would believe forever after that the Sun orbited the Earth. The AI might transcend the factual error of its programmers, provided that the programmers understood inference rather better than they understood astronomy. To build an AI that discovers the orbits of the planets, the programmers need not know the math of Newtonian mechanics, only the math of Bayesian probability theory.

The folly of programming an AI to implement communism, or any other political system, is that you’re programming means instead of ends. You’re programming in a fixed decision, without that decision being re-evaluable after acquiring improved empirical knowledge about the results of communism. You are giving the AI a fixed decision without telling the AI how to re-evaluate, at a higher level of intelligence, the fallible process which produced that decision.

Comment by Kaj_Sotala on Cognitive Science/Psychology As a Neglected Approach to AI Safety · 2017-06-11T19:14:35.648Z · EA · GW

Even if we found the most agreeable available set of moral principles, that amount may turn out not to constitute the vast majority of people. It may not even reach a majority at all. It is possible that there simply is no moral theory that is acceptable to most people.

It's certainly possible that this is the case, but looking for the kind of solution that would satisfy as many people as possible certainly seems like the thing we should try first and only give it up if it seems impossible, no?

More importantly, it is unclear whether or not I have any rational or moral obligation to care about the outputs of this exercise. I do not want to implement the moral system that most people find agreeable.

Well, the ideal case would be that the AI would show you a solution which it had found, and upon inspecting it and considering it through you'd be convinced that this solution really does satisfy all the things you care about - and all the things that most other people care about, too.

From a more pragmatic perspective, you could try to insist on an AI which implemented your values specifically - but then everyone else would also have a reason to fight to get an AI which fulfilled their values specifically, and if it was you versus everyone else in the world, it seems like a pretty high probability that somebody else would win. Which means that your values would have a much higher chance of getting shafted than if everyone had agreed to go for a solution which tried to take into everyone's preferences into account.

And of course, in the context of AI, everyone insisting on their own values and their values only means that we'll get arms races, meaning a higher probability of a worse outcome for everyone.

See also Gains from Trade Through Compromise.

Comment by Kaj_Sotala on Cognitive Science/Psychology As a Neglected Approach to AI Safety · 2017-06-11T14:50:25.560Z · EA · GW

Also, I find pretty compelling the argument that the classical definition of moral philosophy in trying to define "the good" is both impossible and not even a particularly good target to aim at, and that trying to find generally-agreeable moral solutions is something much more useful; and if we accept this argument, then moral psychology is relevant, because it can help us figure out generally-agreeable solutions.

As Martela (2017) writes:

...there is a deeper point in Williams's book that is even harder to rebut. Williams asks: What can an ethical theory do, if we are able to build a convincing case for one? He is skeptical about the force of ethical considerations and reminds us that even if we were to have a justified ethical theory, the person in question might not be concerned about it. Even if we could prove to some amoralists that what they are about to do is (a) against some universal ethical standard, (b) is detrimental to their own well-being, and/or (c) is against the demands of rationality or internal coherence, they still have the choice of whether to care about this or not. They can choose to act even if they know that what they are about to do is against some standard that they believe in. Robert Nozick—whom Williams quotes—describes this as follows: “Suppose that we show that some X he [the immoral man] holds or accepts or does commits him to behaving morally. He now must give up at least one of the following: (a) behaving immorally, (b) maintaining X, (c) being consistent about this matter in this respect. The immoral man tells us, ‘To tell you the truth, if I had to make the choice, I would give up being consistent’” (Nozick 1981, 408).

What Williams in effect says is that the noble task of finding ultimate justification for some ethical standards could not—even if it was successful—deliver any final argument in practical debates about how to behave. “Objective truth” would have only the motivational weight that the parties involved choose to give to it. It no longer is obvious what a philosophical justification of an ethical standard is supposed to do or even “why we should need such a thing” (Williams 1985, 23).

Yet when we look at many contemporary ethical debates, we can see that that they proceed as if the solutions to the questions they pose would matter. In most scientific disciplines the journal articles have a standard section called “practical bearings,” where the practical relevance of the accumulated results are discussed. Not so for metaethical articles, even though they otherwise simulate the academic and peer-reviewed writing style of scientific articles. When we read someone presenting a number of technical counterarguments against quasi-realist solutions to the Frege-Geach problem, there usually is no debate about what practical bearings the discussion would have, whether these arguments would be successful or not. Suppose that in some idealized future the questions posed by the Frege-Geach problem would be conclusively solved. A new argument would emerge that all parties would see as so valid and sound that they would agree that the problem has now been finally settled. What then? How would ordinary people behave differently, after the solution has been delivered to them? I would guess it is fair to say—at least until it is proven otherwise—that the outcome of these debates is only marginally relevant for any ordinary person's ethical life. [...]

This understanding of morality means that we have to think anew what moral inquiry should aim at. [...] Whatever justification can be given for one moral doctrine over the other, it has to be found in practice—simply because there are no other options available. Accordingly, for pragmatists, moral inquiry is in the end directed toward practice, its successfulness is ultimately judged by the practical bearings it has on people's experiences: “Unless a philosophy is to remain symbolic—or verbal—or a sentimental indulgence for a few, or else mere arbitrary dogma, its auditing of past experience and its program of values must take effect in conduct” (Dewey 1916, 315). Moral inquiry should thus aim at practice; its successfulness is ultimately measured by how it is able to influence people's moral outlook and behavior. [...]

Moral principles, ideals, rules, theories, or conclusions should thus be seen “neither as a cookbook, nor a remote calculus” (Pappas 1997, 546) but as instruments that we can use to understand our behavior and change it for the better. Instead of trying to discover the correct ethical theories, the task becomes one of designing the most functional ethical theories. Ethics serves certain functions in human lives and in societies, and the task is to improve its ability to serve these functions (Kitcher 2011b). In other words, the aim of ethical theorizing is to provide people with tools (see Hickman 1990, 113–14) that help them in living their lives in a good and ethically sound way. [...]

It is true that the lack of foundational principles in ethics denies the pragmatist moral philosopher the luxury of being objectively right in some moral question. In moral disagreements, a pragmatist cannot “solve” the disagreement by relying on some objective standards that deliver the “right” and final answer. But going back to Williams's argument raised at the beginning of this article, we can ask what would it help if we were to “solve” the problem. The other party still has the option to ignore our solution. Furthermore, despite the long history of ethics we still haven't found many objective standards or “final solutions” that everyone would agree on, and thus it seems that waiting for such standards to emerge is futile.

In practice, there seem to be two ways in which moral disagreements are resolved. First is brute force. In some moral disputes I am in a position in which I can force the other party to comply with my standards whether that other party agrees with me or not. The state with its monopoly on the legitimate use of violence can force its citizens to comply with certain laws even when the personal moral code of these citizens would disagree with the law. The second way to resolve a moral disagreement is to find some common ground, some standards that the other believes in, and start building from there a case for one's own position.

In the end, it might be beneficial that pragmatism annihilates the possibility of believing that I am absolutely right and the other party is absolutely wrong. As Margolis notes: “The most monstrous crimes the race has ever (been judged to have) perpetrated are the work of the partisans of ‘right principles’ and privileged revelation” (1996, 213). Instead of dismissing the other's perspective as wrong, one must try to understand it in order to find common ground and shared principles that might help in progressing the dialogue around the problem. If one really wants to change the opinion of the other party, instead of invoking some objective standards one should invoke some standards that the other already believes in. This means that one has to listen to the other person, try to see the world from his or her point of view. Only through understanding the other's perspective one can have a chance to find a way to change it—or to change one's own opinion, if this learning process should lead to that. One can aim to clarify the other's points of view, unveil their hidden assumptions and values, or challenge their arguments, but one must do this by drawing on principles and values that the other is already committed to if one wants to have a chance to have a real impact on the other's way of seeing the world, or actually to resolve the disagreement. I believe that this kind of approach, rather than a claim for a more objective position, has a much better chance of actually building common understanding around the moral issue at hand.

Comment by Kaj_Sotala on Cognitive Science/Psychology As a Neglected Approach to AI Safety · 2017-06-11T14:49:19.117Z · EA · GW

It took me a while to respond to this because I wanted to take the time to read "The Normative Insignificance of Neuroscience" first. Having now read it, I'd say that I agree with its claims with regard to criticism of Greene's approach. I don't think it disproves the notion of psychology being useful for defining human values, though, for I think there's an argument for psychology's usefulness that's entirely distinct from the specific approach that Greene is taking.

I start from the premise that the goal of moral philosophy is to develop a set of explicit principles that would tell us what is good. Now this is particularly relevant for designing AI, because we also want our AIs to follow those principles. But it's noteworthy that at their current state, none of the existing ethical theories are up to the task of giving us such a set of principles that, when programmed into an AI, would actually give results that could be considered "good". E.g. Muehlhauser & Helm 2012:

Let us consider the implications of programming a machine superoptimizer to implement particular moral theories.

We begin with hedonistic utilitarianism, a theory still defended today (Tännsjö 1998). If a machine superoptimizer’s goal system is programmed to maximize pleasure, then it might, for example, tile the local universe with tiny digital minds running continuous loops of a single, maximally pleasurable experience. We can’t predict exactly what a hedonistic utilitarian machine superoptimizer would do, but we think it seems likely to produce unintended consequences, for reasons we hope will become clear. [...]

Suppose “pleasure” was specified (in the machine superoptimizer’s goal system) in terms of our current understanding of the human neurobiology of pleasure. Aldridge and Berridge (2009) report that according to “an emerging consensus,” pleasure is “not a sensation” but instead a “pleasure gloss” added to sensations by “hedonic hotspots” in the ventral pallidum and other regions of the brain. A sensation is encoded by a particular pattern of neural activity, but it is not pleasurable in itself. To be pleasurable, the sensation must be “painted” with a pleasure gloss represented by additional neural activity activated by a hedonic hotspot (Smith et al. 2009).

A machine superoptimizer with a goal system programmed to maximize human pleasure (in this sense) could use nanotechnology or advanced pharmaceuticals or neurosurgery to apply maximum pleasure gloss to all human sensations—a scenario not unlike that of plugging us all into Nozick’s experience machines (Nozick 1974, 45). Or, it could use these tools to restructure our brains to apply maximum pleasure gloss to one consistent experience it could easily create for us, such as lying immobile on the ground.

Or suppose “pleasure” was specified more broadly, in terms of anything that functioned as a reward signal—whether in the human brain’s dopaminergic reward system (Dreher and Tremblay 2009), or in a digital mind’s reward signal circuitry (Sutton and Barto 1998). A machine superoptimizer with the goal of maximizing reward signal scores could tile its environs with trillions of tiny minds, each one running its reward signal up to the highest number it could. [...]

What if a machine superoptimizer was programmed to maximize desire satisfaction in humans? Human desire is implemented by the dopaminergic reward system (Schroeder 2004; Berridge, Robinson, and Aldridge 2009), and a machine superoptimizer mizer could likely get more utility by (1) rewiring human neurology so that we attain maximal desire satisfaction while lying quietly on the ground than by (2) building and maintaining a planet-wide utopia that caters perfectly to current human preferences. [...]

Consequentialist designs for machine goal systems face a host of other concerns (Shulman, Jonsson, and Tarleton 2009b), for example the difficulty of interpersonal comparisons of utility (Binmore 2009), and the counterintuitive implications of some methods of value aggregation (Parfit 1986; Arrhenius 2011). [...]

We cannot show that every moral theory yet conceived would produce substantially unwanted consequences if used in the goal system of a machine superoptimizer. Philosophers have been prolific in producing new moral theories, and we do not have the space here to consider the prospects (for use in the goal system of a machine superoptimizer) for a great many modern moral theories. These include rule utilitarianism (Harsanyi 1977), motive utilitarianism (Adams 1976), two-level utilitarianism (Hare 1982), prioritarianism (Arneson 1999), perfectionism (Hurka 1993), welfarist utilitarianism (Sen 1979), virtue consequentialism (Bradley 2005), Kantian consequentialism (Cummiskey 1996), global consequentialism (Pettit and Smith 2000), virtue theories (Hursthouse 2012), contractarian theories (Cudd 2008), Kantian deontology (R. Johnson 2010), and Ross’ prima facie duties (Anderson, Anderson, and Armen 2006).

Yet the problem remains: the AI has to be programmed with some definition of what is good.

Now this alone isn't yet sufficient to show that philosophy wouldn't be up to the task. But philosophy has been trying to solve ethics for at least the last 2500 years, and it doesn't look like there would have been any major progress towards solving it. The PhilPapers survey didn't show any of the three major ethical schools (consequentialism, deontology, virtue ethics) being significantly more favored by professional philosophers than the others, nor does anyone - to my knowledge - even know what a decisive theoretical argument in favor of one of them could be.

And at this point, we have pretty good theoretical reasons for believing that the traditional goal of moral philosophy - "developing a set of explicit principles for telling us what is good" - is in fact impossible. Or at least, it's impossible to develop a set of principles that would be simple and clear enough to write down in human-understandable form and which would give us clear answers to every situation, because morality is too complicated for that.

We've already seen this in trying to define concepts: as philosophy noted a long time ago, you can't come up with a set of explicit rules that would define even any concept even as simple as "man" in such a way that nobody could develop a counterexample. "The Normative Insignificance of Neuroscience" also notes that the situation in ethics looks similar to the situation with trying to define many other concepts:

... what makes the trolley problem so hard—indeed, what has led some to despair of our ever finding a solution to it—is that for nearly every principle that has been proposed to explain our intuitions about trolley cases, some ingenious person has devised a variant of the classic trolley scenario for which that principle yields counterintuitive results. Thus as with the Gettier literature in epistemology and the causation and personal identity literatures in metaphysics, increasingly baroque proposals have given way to increasingly complex counterexamples, and though some have continued to struggle with the trolley problem, many others have simply given up and moved on to other topics.

Yet human brains do manage to successfully reason with concepts, despite it being impossible to develop a set of explicit necessary and sufficient criteria. The evidence from both psychology and artificial intelligence (where we've managed to train neural nets capable of reasonably good object recognition) is that a big part of how they do it is by building up complicated statistical models of what counts as a "man" or "philosopher" or whatever.

So given that

  • we can't build explicit verbal models of what a concept is * but we can build machine-learning algorithms that use complicated statistical analysis to identify instances of a concept


  • defining morality looks similar to defining concepts, in that we can't build explicit verbal models of what morality is

it would seem reasonable to assume that

  • we can build machine-learning algorithms that can learn to define morality, in that it can give such answers to moral dilemmas that a vast majority of people would consider them acceptable

But here it looks likely that we need information from psychology to narrow down what those models should be. What humans consider to be good has likely been influenced by a number of evolutionary idiosyncrasies, so if we want to come up with a model of morality that most humans would agree with, then our AI's reasoning process should take into account those considerations. And we've already established that defining those considerations on a verbal level looks insufficient - they have to be established on a deeper level, of "what are the actual computational processes that are involved when the brain computes morality".

Yes, I am here assuming "what is good" to equate to "what do human brains consider good", in a way that may be seen as reducing to "what would human brains accept as a persuasive argument for what is good". You could argue that this is flawed, because it's getting dangerously close to defining "good" by social consensus. But then again, the way the field of ethics itself proceeds is basically the same: a philosopher presents an argument for what is good, another attacks it, if the argument survives attacks and is compelling then it is eventually accepted. For empirical facts we can come up with objective tests, but for moral truths it looks to me unavoidable - due to the is-ought gap - that some degree of "truth by social consensus" is the only way of figuring out what the truth is, even in principle.

Comment by Kaj_Sotala on Cognitive Science/Psychology As a Neglected Approach to AI Safety · 2017-06-09T12:31:02.358Z · EA · GW

I'd seen that, but re-reading it was useful. :)

Comment by Kaj_Sotala on Cognitive Science/Psychology As a Neglected Approach to AI Safety · 2017-06-09T11:31:09.393Z · EA · GW

Neat, thanks for the find. :)

Comment by Kaj_Sotala on Cognitive Science/Psychology As a Neglected Approach to AI Safety · 2017-06-06T18:07:34.850Z · EA · GW

My main guess is "they mostly come from a math/CS background so haven't looked at this through a psych/cogsci perspective and seen how it could be useful".

That said, some of my stuff linked to above has been mostly met with a silence, and while I presume it's a question of inferential silence - a sufficiently long inferential distance that a claim doesn't provoke even objections, just uncomprehending or indifferent silence - there is also the possibility of me just being so wrong about the usefulness of my ideas that nobody's even bothering to tell me.