Toby Ord gives a good summary of a range of arguments against negative utilitarianism here.

Personally, I think that valuing positive experiences instrumentally is insufficient, given that the future has the potential to be fantastic.

The argument for doom by default seems to rest on a default misunderstanding of human values as the programmer attempts to communicate them to the AI.

I don't think this is correct. The argument rests on AIs having any values which aren't human values (e.g. maximising paperclips), not just misunderstood human values.

Multiple terminal values will always lead to irreconcilable conflicts.

This is not the case when there's a well-defined procedure for resolving such conflicts. For example, you can map several terminal values onto a numerical "utility" scale.

From skimming the SEP article on pluralism, it doesn't quite seem like what I'm talking about. Pluralism + incomparability comes closer, but still seems like a subset of my position, since there are other ways that indefinability could be true (e.g. there's only one type of value, but it's intrinsically vague)

This seems plausible, but also quite distinct from the claim that "roles for programmers in direct work tend to sit open for a long time", which I took the list of openings to be supporting evidence for.

The OpenAI and DeepMind posts you linked aren't necessarily relevant, e.g. the Software Engineer, Science role is not for DeepMind's safety team, and it's pretty unclear to me whether the OpenAI ML engineer role is safety-relevant.

The example you've given me shows that agents which implement exactly the same (high-level) algorithm can cooperate with each other. The metric I'm looking for is: how can we decide how similar two agents are when their algorithms are non-identical? Presumably we want a smoothness property for that metric such that if our algorithms are very similar (e.g. only differ with respect to some radically unlikely edge case) the reduction in cooperation is negligible. But it doesn't seem like anyone knows how to do this.

Can you give some examples of "more responsible" ways?

I agree that in general calculating your own random digits feels a lot like rolling your own crypto. (Edit: I misunderstood the method and thought there was an easy exploit, which I was wrong about. Nevertheless at least 1/3 of the digits in the API response are predictable, maybe more, and the whole thing is quite small, so it might be possible to increase your probability of winning slightly by brute force calculating possibilities, assuming you get to pick your own contiguous ticket number range. My preliminary calculations suggest that this method would be too difficult, but I'm not an expert, there may be more sophisticated hacks).

(edited) I just saw your link above about growth vs value investing. I don't think that's a helpful distinction in this case, and when people talk about a company being undervalued I think that typically includes both unrecognised growth potential and unrecognised current value. (Maybe that's less true for startups, but we're talking about already-listed companies here).

I do think the core claim of "if AGI will be as big a deal as we think it'll be, then the markets are systematically undervaluing AI companies" is a reasonable one, but the arguments you've given here aren't precise enough to justify confidence, especially given the aforementioned need for caution. For example, premise 4 doesn't actually follow directly from premise 3 because the returns could be large but not outsized compared with other investments. I think you can shore that link up, but not without contradicting your other point:

I'm not claiming that investing in AI companies will generate higher-than-average returns in the long run.

Which means (under the definition I've been using) that you're not claiming that they're undervalued.

I agree that the extent to which individual humans are rational agents is often overstated. Nevertheless, there are many examples of humans who spend decades striving towards distant and abstract goals, who learn whatever skills and perform whatever tasks are required to reach them, and who strategically plan around or manipulate the actions of other people. If AGI is anywhere near as agentlike as humans in the sense of possessing the long-term goal-directedness I just described, that's cause for significant concern.

If AI research companies aren't currently undervalued, then your Premise 4 (being an investor in such companies will generate outsized returns on the road to slow-takeoff AGI) is incorrect, because the market will have anticipated those outsized returns and priced them in to the current share price.

"returns that can later be deployed to greater altruistic effect as AI research progresses"

This is hiding an important premise, which is that you'll actually be able to deploy those increased resources well enough to make up for the opportunities you forego now. E.g. Paul thinks that (as an operationalisation of slow takeoff) the economy will double in 4 years before the first 1 year doubling period starts. So after that 4 year period you might end up with twice as much money but only 1 or 2 years to spend it on AI safety.

I've actually spent a fair while thinking about CAIS, and written up my thoughts here. Overall I'm skeptical about the framework, but if it turns out to be accurate I think that would heavily mitigate arguments 1 and 2, somewhat mitigate 3, and not affect the others very much. Insofar as 4 and 5 describe AGI as an agent, that's mostly because it's linguistically natural to do so - I've now edited some of those phrases. 6b does describe AI as a species, but it's unclear whether that conflicts with CAIS, insofar as the claim that AI will never be agentlike is a very strong one, and I'm not sure whether Drexler makes it explicitly (I discuss this point in the blog post I linked above).

I agree that it's not too concerning, which is why I consider it weak evidence. Nevertheless, there are some changes which don't fit the patterns you described. For example, it seems to me that newer AI safety researchers tend to consider intelligence explosions less likely, despite them being a key component of argument 1. For more details along these lines, check out the exchange between me and Wei Dai in the comments on the version of this post on the alignment forum.

I like "science-aligned" better than "secular", since the former implies the latter as well as a bunch of other important concepts.

Also, it's worth noting that "everyone's welfare is to count equally" in Will's account is approximately equivalent to "effective altruism values all people equally" in Ozymandias' account, but neither of them imply the following paraphrase: "from the effective altruism perspective, saving the life of a baby in Africa is exactly as good as saving the life of a baby in America, which is exactly as good as saving the life of Ozy’s baby specifically." I understand the intention of that phrase, but actually I'd save whichever baby would grow up to have the best life. Is there any better concrete description of what impartiality actually implies?

Your points seem plausible to me. While I don't remember exactly what I intended by the claim above, I think that one influence was some material I'd read referencing the original "productivity paradox" of the 70s and 80s. I wasn't aware that there was a significant uptick in the 90s, so I'll retract my claim (which, in any case, wasn't a great way to make the overall point I was trying to convey).

CBT-I is also recommended in Why We Sleep (see my summary of the book).

Nitpick: "The former two have diminishing returns, but the latter does not." It definitely does - I think getting 12 or 13 hours sleep is actively worse for you than getting 9 hours.

Posts on the new Forum are split into two categories:
Frontpage posts are timeless content covering the ideas of effective altruism. They should be useful or interesting even to readers who only know the basic concepts of EA and aren’t very active within the community.

I'm a little confused about this description. I feel like intellectual progress often requires presupposition of fairly advanced ideas which build on each other, and which are therefore inaccessible to "readers who only know the basic concepts". Suppose that I wrote a post outlining views on AI safety aimed at people who already know the basics of machine learning, or a post discussing a particular counter-argument to an unusual philosophical position. Would those not qualify as frontpage posts? If not, where would they go? And where do personal blogs fit into this taxonomy?

It's a clever explanation, but I'm not sure how much to believe it without analysing other hypotheses. E.g. maybe tax-deductibility is a major factor, or maybe it's just much harder to give away large amounts of money quickly.

I think it's a mischaracterisation to think of virtue ethics in terms of choosing the most virtuous actions (in fact, one common objection to virtue ethics is that it doesn't help very much in choosing actions). I think virtue ethics is probably more about being the most virtuous, and making decisions for virtuous reasons. There's a difference: e.g. you're probably not virtuous if you choose normally-virtuous actions for the wrong reasons.

For similar reasons, I disagree with cole_haus that virtue ethicists choose actions to produce the most virtuous outcomes (although there is at least one school of virtue ethics which seems vaguely consequentialist, the eudaimonists. See Note however that I haven't actually looked into virtue ethics in much detail.

Edit: contractarianism is a fourth approach which doesn't fit neatly into either division

My default position would be that IKEA have an equal obligation, but that it's much more difficult and less efficient to try and make IKEA fulfill that obligation.

A few doubts:

  1. It seems like MSR requires a multiverse large enough to have many well-correlated agents, but not large enough to run into the problems involved with infinite ethics. Most of my credence is on no multiverse or infinite multiverse, although I'm not particularly well-read on this issue.

  2. My broad intuition is something like "Insofar as we can know about the values of other civilisations, they're probably similar to our own. Insofar as we can't, MSR isn't relevant." There are probably exceptions, though (e.g. we could guess the direction in which an r-selected civilisation's values would vary from our own).

  3. I worry that MSR is susceptible to self-mugging of some sort. I don't have a particular example, but the general idea is that you're correlated with other agents even if you're being very irrational. And so you might end up doing things which seem arbitrarily irrational. But this is just a half-fledged thought, not a proper objection.

  4. And lastly, I would have much more confidence in FDT and superrationality in general if there were a sensible metric of similarity between agents, apart from correlation (because if you always cooperate in prisoner's dilemmas, then your choices are perfectly correlated with CooperateBot, but intuitively it'd still be more rational to defect against CooperateBot, because your decision algorithm isn't similar to CooperateBot in the same way that it's similar to your psychological twin). I guess this requires a solution to logical uncertainty, though.

Happy to discuss this more with you in person. Also, I suggest you cross-post to Less Wrong.

As a followup to byanyothername's questions: Could you say a little about what distinguishes your coaching from something like a CFAR workshop?

Kudos for doing this. The main piece of advice which comes to mind is to make sure to push this via university EA groups. I don't think you explicitly identified students as a target demographic in your post, but current students and new grads have the three traits which make the hotel such an attractive proposition: they're unusually time-rich, cash-poor, and willing to relocate.