Fair! Sorry for the slow reply, I missed the comment notification earlier.
I could have been clearer in what I was trying to point at with my comment. I didn't mean to fault you for not meeting an (unmade) challenge to list all your assumptions--I agree that would be unreasonable.
Instead, I meant to suggest an object-level point: that the argument you mentioned seems pretty reliant on a controversial discontinuity assumption--enough that the argument alone (along with other, largely uncontroversial assumptions) doesn't make it "quite easy to reach extremely dire forecasts about AGI." (Though I was thinking more about 90%+ forecasts.)
(That assumption--i.e. the main claims in the 3rd paragraph of your response--seems much more controversial/non-obvious among people in AI safety than the other assumptions you mention, as evidenced by researchers criticizing it and researchers doing prosaic AI safety work.)
Thanks for doing this! I think the most suspicious part of what you found is the donations to representatives who sit on the subcommittee that oversees the CFTC (i.e. the House Agriculture Subcommittee on Commodity Exchanges, Energy, and Credit), so I wanted to look into this more. From a bit of Googling:
- It looks like you're right that Rep. Delgado sits on (and worse, is the Chair of) this subcommittee.
- On the other hand, it looks like Rep. Spanberger doesn't actually sit on this subcommittee, and hasn't done so since 2021. In other words, she hasn't been on this subcommittee since the Protect our Future PAC was founded (which was early 2022).
- Spanberger's Wikipedia page does say she sits on this subcommittee, but Spanberger's own website (both now and before the most recent elections) and the Wikipedia page on the subcommittee don't list her as having served on this subcommittee in the 2021-23 or current session of Congress.
- The latter source also says she served on the subcommittee before 2021, so my guess is that Spanberger's Wikipedia page just has outdated info.
- (I don't think this settles doubts about the PAC.)
- Spanberger's Wikipedia page does say she sits on this subcommittee, but Spanberger's own website (both now and before the most recent elections) and the Wikipedia page on the subcommittee don't list her as having served on this subcommittee in the 2021-23 or current session of Congress.
I didn't spend much time on this, so I very possibly missed or misinterpreted things.
Nitpick: doesn't the argument you made also assume that there'll be a big discontinuity right before AGI? That seems necessary for the premise about "extremely novel software" (rather than "incrementally novel software") to hold.
why they would want to suggest to these bunch of concerned EAs how to go about trying to push for the ideas that Buck disagrees with better
My guess was that Buck was hopeful that, if the post authors focus their criticisms on the cruxes of disagreement, that would help reveal flaws in his and others' thinking ("inasmuch as I'm wrong it would be great if you proved me wrong"). In other words, I'd guess he was like, "I think you're probably mistaken, but in case you're right, it'd be in both of our interests for you to convince me of that, and you'll only be able to do that if you take a different approach."
[Edit: This is less clear to me now - see Gideon's reply pointing out a more recent comment.]
I interpreted Buck's comment differently. His comment reads to me, not so much like "playing the man," and more like "telling the man that he might be better off playing a different game." If someone doesn't have the time to write out an in-depth response to a post that takes 84 minutes to read, but they take the time to (I'd guess largely correctly) suggest to the authors how they might better succeed at accomplishing their own goals, that seems to me like a helpful form of engagement.
CEA / EVF (I'd never heard it called EVF before FTX)
They announced and explained the name change to Effective Ventures here, about a month before the FTX collapse. (Tl;dr: CEA is still CEA; an umbrella org now goes by EVF.)
This seems helpful, though I'd guess another team that's in more frequent contact with AI safety orgs could do this for significantly lower cost, since they'll be starting off with more of the needed info and contacts.
Thanks for sharing! The speakers on the podcast might not have had the time to make detailed arguments, but I find their arguments here pretty uncompelling. For example:
- They claim that "many belief systems they have a way of segregating and limiting the impact of the most hardcore believers." But (at least from skimming) their evidence for this seems to be just the example of monastic traditions.
- A speaker claims that "the leaders who take ideas seriously don't necessarily have a great track record." But they just provide a few cherry-picked (and dubious) examples, which is a pretty unreliable way of assessing a track record.
- Counting Putin a "man of ideas" because he made a speech with lots of historical references--while ignoring the many better leaders who've also made history-laden speeches--looks like especially egregious cherry-picking.
So I think, although their conclusions are plausible, these arguments don't pass enough of an initial sanity check to be worth lots of our attention.
Thanks for the clarifications!
I think my main hesitation is still... scope seems like an afterthought here? This reasoning from my previous comment--for why scope should get lots of emphasis--still feels relevant:
There are lots of people out there (e.g. many researchers, policy professionals, entrepreneurs) who do good using reasoning; this community's concern for scope seems rare, important, and totally compatible with integrity.
A related intuition: people overwhelmingly default to overlooking scope, so if a community doesn't explicitly emphasize it, they'll miss it.
Thanks for writing this! I want to push back a bit. There's a big middle ground between (i) naive, unconstrained welfare maximization and (ii) putting little to no emphasis on how much good one does. I think "do good, using reasoning" is somewhat too quick to jump to (ii) while passing over intermediate options, like:
- "Do lots of good, using reasoning" (roughly as in this post)
- "be a good citizen, while ambitiously working towards a better world" (as in this post)
- "maximize good under constraints or with constraints incorporated into the notion of goodness"
There are lots of people out there (e.g. many researchers, policy professionals, entrepreneurs) who do good using reasoning; this community's concern for scope seems rare, important, and totally compatible with integrity. Given the large amounts of good you've done, I'd guess you're sympathetic to considering scope. Still, it seems important enough to include in the tagline.
Also, a nitpick:
now it's obvious that the idea of maximizing goodness doesn't work in practice--we have a really clear example of where trying to do that fails (SBF if you attribute pure motives to him); as well as a lot of recent quotes from EA luminaries saying that you shouldn't do that
This feels a bit fast; the fact that this example had to include a (dubious) "if" clause means it's not a really clear example, and maximizing goodness is compatible with constraints if we incorporate constraints into our notion of goodness (just by the fact that any behavior can be thought of as maximizing some notion of goodness).
(Made minor edits within 15 mins of commenting.)
Chris Olah's research taste exercises might be examples of this.
Readers might be interested in the comments over here, especially Daniel K.'s comment:
The only viable counterargument I've heard to this is that the government can be competent at X while being incompetent at Y, even if X is objectively harder than Y. The government is weird like that. It's big and diverse and crazy. Thus, the conclusion goes, we should still have some hope (10%?) that we can get the government to behave sanely on the topic of AGI risk, especially with warning shots, despite the evidence of it behaving incompetently on the topic of bio risk despite warning shots.
Or, to put it more succinctly: The COVID situation is just one example; it's not overwhelmingly strong evidence.
[Mostly typed this before I saw another reply which makes this less relevant.]
Thanks for adding these explanations.
research affiliates: 4
Affiliates aren't staff though, and several of them are already counted anyway under the other orgs or the "Other" section. (Note the overlap between CSER and Leverhulme folks.)
FLI: counted 5 people working on AI policy and governance.
Sure, but that's not the same as 5 non-technical safety researchers. A few of their staff are explicitly listed as e.g. advocates, rather than as researchers.
I think 5 is a good conservative estimate.
I don't think we should be looking at all their researchers. They have a section on their site that specifically lists safety-related people, and this is the section my previous comment was addressing. Counting people who aren't on that website seems like it'll get us counting non-safety-focused researchers.
There are about 45 research profile on Google Scholar with the 'AI governance' tag. I counted about 8 researchers who weren't at the other organizations listed.
Thanks for adding this, but I'm not sure about this either--as with Leverhulme, just because someone is researching AI governance doesn't mean they're a non-technical safety researcher; there's lots of problems other than safety that AI governance researchers can be interested in.
[Edit: I think the following no longer makes sense because the comment it's responding to was edited to add explanations, or maybe I had just missed those explanations in my first reading. See my other response instead.]
Thanks for this. I don't see how the new estimates incorporate the above information. (The medians for CSER, Leverhulme, and FLI seem to still be at 5 each.)
(Sorry for being a stickler here--I think it's important that readers get accurate info on how many people are working on these problems.)
Thanks for the updates!
I have it on good word that CSET has well under 10 safety-focused researchers, but fair enough if you don't want to take an internet stranger's word for things.
I'd encourage you to also re-estimate the counts for CSER, Leverhulme, and the Future of Life Institute.
- CSER's list of team members related to AI lists many affiliates, advisors, and co-founders but only ~3 research staff.
- The Future of Life Institute seems more focused on policy and field-building than on research; they don't even have a research section on their website. Their team page lists ~2 people as researchers.
- Of the 5 people listed in Leverhulme's relevant page, one of them was already counted for CSER, and another one doesn't seem safety-focused.
I also think the number of "Other" is more like 4.
Thanks for the response! Maybe readers would find it helpful if the summary of your post was edited to incorporate this info, so those who don't scroll to the comments can still get our best estimate.
Thanks for posting, seems good to know these things! I think some of the numbers for non-technical research should be substantially lower--enough that an estimate of ~55 non-technical safety researchers seems more accurate:
- CSET isn't focused on AI safety; maybe you could count a few of their researchers (rather than 10).
- I think SERI and BERI have 0 full-time non-technical research staff (rather than 10 and 5).
- As far as I'm aware, the Leverhulme Centre for the Future of Intelligence + CSER only have at most a few non-technical researchers in total focused on AI safety (rather than 10 & 5). Same for FLI (rather than 5).
- I hear Epoch has ~3 FTEs (rather than 10).
- GoodAI's research roadmap makes no mention of public/corporate policy or governance, so I'd guess they have at most a few non-technical safety-focused researchers (rather than 10).
If I didn't mess up my math, all that should shift our estimate from 93 to ~42. Adding in 8 from Rethink (going by Peter's comment) and 5 (?) from OpenPhil, we get ~55.
Thanks for posting! I'm sympathetic to the broad intuition that any one person being at the sweet spot where they make a decisive impact seems unlikely , but I'm not sold on most of the specific arguments given here.
Recall that there are decent reasons to think goal alignment is impossible - in other words, it's not a priori obvious that there's any way to declare a goal and have some other agent pursue that goal exactly as you mean it.
I don't see why this is the relevant standard. "Just" avoiding egregiously unintended behavior seems sufficient for avoiding the worst accidents (and is clearly possible, since humans do it often).
Also, I don't think I've heard these decent reasons--what are they?
Recall that engineering ideas very, very rarely work on the first try, and that if we only have one chance at anything, failure is very likely.
It's also unclear that we only have one chance at this. Optimistically (but not that optimistically?), incremental progress and failsafes can allow for effectively multiple chances. (The main argument against seems to involve assumptions of very discontinuous or abrupt AI progress, but I haven't seen very strong arguments for expecting that.)
Recall that getting "humanity" to agree on a good spec for ethical behavior is extremely difficult: some places are against gene drives to reduce mosquito populations, for example, despite this saving many lives in expectation.
Agree, but also unclear why this is the relevant standard. A smaller set of actors agreeing on a more limited goal might be enough to help.
Recall that there is a gigantic economic incentive to keep pushing AI capabilities up, and referenda to reduce animal suffering in exchange for more expensive meat tend to fail.
Yup, though we should make sure not to double-count this, since this point was also included earlier (which isn't to say you're necessarily double-counting).
Recall that we have to implement any solution in a way that appeals to the cultural sensibilities of all major and technically savvy governments on the planet, plus major tech companies, plus, under certain circumstances, idiosyncratic ultra-talented individual hackers.
This also seems like an unnecessarily high standard, since regulations have been passed and enforced before without unanimous support from affected companies.
Also, getting acceptance from all major governments does seem very hard but not quite as hard as the above quotes makes it sound. After all, many major governments (developed Western ones) have relatively similar cultural sensibilities, and ambitious efforts to prevent unilateral actions have previously gotten very broad acceptance (e.g. many actors could have made and launched nukes, done large-scale human germline editing, or maybe done large-scale climate engineering, but to my knowledge none of those have happened).
The we-only-get-one-shot idea applies on this stage too.
Yup, though this is also potential double-counting.
+1 on this being a relevant intuition. I'm not sure how limited these scenarios are - aren't information asymmetries and commitment problems really common?
Ah sorry, I had totally misunderstood your previous comment. (I had interpreted "multiply" very differently.) With that context, I retract my last response.
By "satisfaction" I meant high performance on its mesa-objective (insofar as it has one), though I suspect our different intuitions come from elsewhere.
it should robustly include "building copy of itself"
I think I'm still skeptical on two points:
- Whether this is significantly easier than other complex goals
- (The "robustly" part seems hard.)
- Whether this actually leads to a near-best outcome according to total preference utilitarianism
- If satisfying some goals is cheaper than satisfying others to the same extent, then the details of the goal matter a lot
- As a kind of silly example, "maximize silicon & build copies of self" might be much easier to satisfy than "maximize paperclips & build copies of self." If so, a (total) preference utilitarian would consider it very important that agents have the former goal rather than the latter.
- If satisfying some goals is cheaper than satisfying others to the same extent, then the details of the goal matter a lot
getting the "multiply" part right is sufficient, AI will take care of the "satisfaction" part on its own
I'm struggling to articulate how confused this seems in the context of machine learning. (I think my first objection is something like: the way in which "multiply" could be specified and the way in which an AI system pursues satisfaction are very different; one could be an aspect of the AI's training process, while another is an aspect of the AI's behavior. So even if these two concepts each describe aspects of the AI system's objectives/behavior, that doesn't mean its goal is to "multiply satisfaction." That's sort of like arguing that a sink gets built to be sturdy, and it gives people water, therefore it gives people sturdy water--we can't just mash together related concepts and assume our claims about them will be right.)
(If you're not yet familiar with the basics of machine learning and this distinction, I think that could be helpful context.)
I can't, but I'm not sure I see your point?
Maybe, but is "multiply" enough to capture the goal we're talking about? "Maximize total satisfaction" seems much harder to specify (and to be robustly learned) - at least I don't know what function would map states of the world to total satisfaction.
I think this gets a lot right, though
As I am not a preference utilitarian I strongly reject this identification.
While this does seem to be part of the confusion of the original question, I'm not sure (total) preference vs. hedonic utilitarianism is actually a crux here. An AI system pursuing a simple objective wouldn't want to maximize the number of satisfied AI systems; it would just pursue its objective (which might involve relatively few copies of itself with satisfied goals). So highly capable AI systems pursuing very simple or random goals aren't only bad by hedonic utilitarian lights; they're also bad by (total) preference utilitarian lights (not to mention "common sense ethics").
I think many alignment researchers don't accept (2), and also don't accept the claim that the proposed "alternative to alignment" would be much easier than alignment.
my point is that, within the FAW and altpro movements, A is mentioned
Oh interesting, I wasn't aware this point came up much. Taking your word for it, I agree then that (A) shouldn't get more weight than (B) (except insofar as we have separate, non-speculative reasons to be more bullish about economic interventions).
I think you kind of changed the "latter argument" a bit here from what we were discussing before.
Sorry for the confusion--I was trying to say that alt-pro advocates often have an argument that's different (and better-grounded) than (A) and (B).
In other words, my current view is that (A) and (B) roughly "cancel out" due to being similarly speculative, while the separate view that "good, lasting value change is more likely when it's convenient" is better-grounded than its opposite.
Thanks for the thoughtful response!
I actually think this paragraph you created is worth presenting and considering. The thing is, it's pretty much been presented already. This is, for example, roughly the story of Bruce Friedrich (founder and CEO of GFI), and maybe pretty much GFI too. And that was my story too, and might be the story of a lot of EA animal/alt-pro advocates. So if this argument is presented, why not also consider its counterpart? (what I did)
I think this is subtly off. The story I've heard from alt-pro advocates is that we should focus on making it easier for people to drop factory farming because that would get people to do so, while generations of moral advocacy against factory farming have failed to achieve mass consumer change. That's a historical argument about tractability--it's not a speculative argument about how we might inspire or mislead future advocates.
(To be fair, the above is still not an argument about long-term impacts. But I think the related long-term argument that "good, lasting value change is more likely when it's convenient" is a much better-grounded claim than "good, lasting value change is more likely when advocates have historical examples of entirely morality-driven change"; the latter claim seems entirely speculative, while the former is at least in line with various historical examples and psychological findings.)
I'm not very familiar with the field so not sure how much these ideas are in scope or already well-researched, but I'd guess a bunch of questions at the intersection of American politics and automation/AI could be interesting (for thinking about the political context in which AI policymaking will play out):
- What are the electoral impacts of job loss from automation, or other severe and sudden job losses?
- Why do some accidents or near-catastrophes bring about serious U.S. policy change, while others don't? When these events do bring about serious change, what factors influence the nature of this change?
- What institutional design features could help make AI-regulating agencies organizationally competent, technically informed, resilient to regulatory capture, and able to quickly adapt to technological changes? (What features have made U.S. regulatory agencies better or worse along these dimensions?)
- How might U.S. legislative, regulatory, and judicial bodies incorporate AI tools for faster or otherwise improved decision-making? How much potential do these tools have for speeding up these organizations' decision-making processes (for better dealing with time-sensitive issues)?
I'm not sure how much of a pain this would be implementation-wise (or stylistically), but I'd be curious to see agree/disagree voting for posts (rather than just comments). After all, arguments for having this type of voting for comments seem to roughly generalize to posts, e.g. it seems useful for readers to be able to quickly distinguish between (i) critical posts that the community tends to appreciate and agree with, and (ii) critical posts that the community tends to appreciate but disagree with.
Thanks for writing! I'm skeptical that a non-morally-motivated ban would create bad value lock-in. Most of this post's arguments for that premise seem to be just the author's speculative intuitions, given with no evidence or argument (e.g. " I also worry that using laws to capture our abolition of moral catastrophes after they become economically inviable, can create a false sense of progress [...] Always waiting for technological changes might mislead us to think that we have less obligation to improve our moral values or actions when the technological/economic incentives are lower.") But I don't think ungrounded intuitions about how society might work are good ways to make predictions; there's too many complications and alternatives that approach might miss.
- As a reason why this kind of argument isn't reliable, we could just as easily come up with intuitive stories that point to the opposite conclusion, e.g. "economic changes that drive moral progress will inspire and inform future advocates to take pragmatic approaches that actually work well rather than engaging in endless but ineffective moral grandstanding; always waiting for moral progress might mislead us to think we have less obligation to improve economic incentives when the tractability of moral advocacy is lower."
- Also, I think the historical importance of economic and military motives for the abolition of slavery are understated.
(Edited within a few mins to delete a weaker argument.)
Thanks for writing this!
I find it unlikely (~20%) that regulation based on the number of floating-point operations needed to train a model would produce a California Effect.
To clarify a detail, do you mean this as an absolute probability, or as a conditional probability (that is, conditional on California passing such regulation)? I didn't fully understand that, since:
- An argument for this forecast seems to be that there's "apparent lack of public will to regulate large-scale models," which seems to suggest this forecast is not conditioning on the regulation passing.
- But the phrasing ("would" rather than "will") maybe has connotations of conditional probability.
But it seems like such a narrow notion of alignment that it glosses over almost all of the really hard problems in real AI safety -- which concern the very real conflicts between the humans who will be using AI.
I very much agree these these political questions matter, and that alignment to multiple humans is conceptually pretty shaky; thanks for bringing up these issues. Still, I think some important context is that many AI safety researchers think that it's a hard, unsolved problem to just keep future powerful AI systems from literally killing everyone (or doing other unambiguously terrible things). They're often worried that CIRL and every other approach that's been proposed will completely fail. From that perspective, it no longer looks like almost all of the really hard problems are about conflicts between humans.
(On CIRL, here's a thread and a longer writeup on why some think that "it almost entirely fails to address the core problems" of AI safety. This video and this post outline some broader potential limitations of current approaches to safety.)
Thanks for the comment! I agree these are important considerations and that there's plenty my post doesn't cover. (Part of that is because I assumed the target audience of this post--technical readers of this forum--would have limited interest in governance issues and would already be inclined to think about the impacts of their work. Though maybe I'm being too optimistic with the latter assumption.)
Were there any specific misuse risks involving the tools discussed in the post that stood out to you as being especially important to consider?
Thanks for writing this. I think there are actually some pretty compelling examples of people/movements being quite successful at helping future generations (while partly trying to do so):
- Some sources suggest that Lincoln had long-term motivations for permanently abolishing slavery, saying, "The abolition of slavery by constitutional provision settles the fate, for all coming time, not only of the millions now in bondage, but of unborn millions to come--a measure of such importance that these two votes must be procured." Looking back now, abolition still looks like a great move for future generations.
- I don't know how accurate those sources are, but at least a U.S. constitutional amendment is structured to have very long-lasting impacts, given the extreme difficulty of undoing it.
- The U.S. constitution appears to have been partly aiming to create a long-lasting democracy, citing "our posterity" in its preamble. It seems to have largely worked.
- Proponents of measures to avoid nuclear war and reduce nuclear weapons testing often cited future generations as one motivation. (For example, in a famous speech he gave before launching U.S.-Soviet cooperation on limiting nuclear testing and nonproliferation, Kennedy appealed to the importance of "not merely peace in our time but peace for all time" and to "the right of future generations to a healthy existence.") These efforts have been quite successful; we've had about 77 years with no wartime use of nuclear weapons, nuclear testing has plummeted, and far few states than once feared now have nuclear weapons.
[Edited to add] All this looks to me like a mixed (and maybe fairly good overall) track record, not a terrible one. (Though a deeper problem is that we can't justifiably draw almost any conclusions about base rates from these or the post's examples, since we've made no serious efforts to find a representative sample of historical longtermist efforts.)
Maybe, I'm not sure though. Future applications that do long-term, large-scale planning seem hard to constrain much while still letting them do what they're supposed to do. (Bounded goals--if they're bounded to small-scale objectives--seem like they'd break large-scale planning, time limits seem like they'd break long-term planning, and as you mention the "don't kill people" counter would be much trickier to implement.)
The research summarized here and this paper are the main efforts at quantifying these risks that I'm aware of. (They usually take an approximately Bayesian approach to forecasting, which can be criticized for its subjectivity, but I don't know of any approaches that seem better--ignoring risks that we can't objectively quantify doesn't seem like a good alternative.)
I also used to be pretty skeptical about the credibility of the field. I was surprised to learn about how much mainstream, credible support AI safety concerns have received:
- Multiple leading AI labs have large (e.g. 30-person) teams of researchers dedicated to AI alignment.
- They sometimes publish statements like, "Unaligned AGI could pose substantial risks to humanity and solving the AGI alignment problem could be so difficult that it will require all of humanity to work together. "
- Key findings that are central to concerns over AI risk have been accepted (with peer review) into top ML conferences.
- A top ML conference is hosting a workshop on ML safety (with a description that emphasizes "long-term and long-tail safety risks").
- Reports and declarations from some major governments have endorsed AI risk worries.
- The UK's National AI Strategy states, "The government takes the long term risk of non-aligned Artificial General Intelligence, and the unforeseeable changes that it would mean for the UK and the world, seriously."
- There are AI faculty at universities including MIT, UC Berkeley, and Cambridge who endorse AI risk worries.
To be fair, AI risk worries are far from a consensus view. But in light of the above, the idea that all respected AI researchers find AI risk laughable seems plainly mistaken. Instead, it seems clear that a significant fraction of respected AI researchers and institutions are worried. Maybe these concerns are misguided, but probably not for any reason that's obvious to whoever has basic knowledge of AI--or these worried AI experts would have noticed.
(Also, in case you haven't seen it yet, you might find this discussion on whether there are any experts on these questions interesting.)
To counter that, let me emphasize the aspects of AI risk that are not disproven here.
Adding to this list, much of the field thinks a core challenge is making highly capable, agentic AI systems safe. But (ignoring inner alignment issues) severe constraints create safe AI systems that aren't very capable agents. (For example, if you make an AI that only considers what will happen within a time limit of 1 minute, it probably won't be very good at long-term planning. Or if you make an AI system that only pursues very small-scale goals, it won't be able to solve problems that you don't know how to break up into small-scale goals.) So on its own, this doesn't seem to solve outer alignment for highly capable agents.
(See e.g. the "2. Competitive" section of this article by Paul Christiano for some more discussion of why a core desiderata for safety solutions is their performance competitiveness.)
Thanks for the reply, I'm not sure I buy this.
Even now, climate change is a very clear existential risk causing tons of damage, and yet we still haven't really managed to rally the world's resources against it.
I'm not sure about the analogy here. I think we haven't rallied the worlds resources against climate change largely because most people are kind of apathetic toward it, i.e. most people don't see it as a terrible dystopia.
As another disanalogy, opponents of a new dystopian proposal just have to get governments to not act, while advocates of climate spending/regulations have to get governments to act (which is harder).
I used the contraception/eugenics/nuclear-war examples because they demonstrate that it's relatively easy to start creating dystopian outcomes
Do they demonstrate this? E.g. I wouldn't describe a global contraception ban as "relatively easy" to bring about--seems extremely difficult to bring about.
I think EA is an extremely unusual movement in that it is [...]
(1) and (2) seem to me like they'd describe many movements and social systems, rather than being particularly unusual.
I don't buy that (3) on its own is nearly enough to make EA unusually threatening, but also I think (3) isn't right. There's significant diversity of views within individual EAs and across EAs--enough that I don't think EA is that unusually single-minded. You mention OpenPhil's worldview diversification; many others in the movement (including some influential figures) have varied and ~pluralistic views:
- MacAskill doesn't consider himself a utilitarian, instead embracing moral uncertainty
- Bostrom is apparently not a consequentialist, and neither are about 1 in 5 self-reported EAs
- Ord has written on the importance of moral trade and moral reflection
- A significant fraction of the movement is more sympathetic to preference utilitarianism, which inherently makes them less likely to do things that many people don't like
- Deontology-flavored writings have been well-received by the community
- A very well-received post on the forum argued that, "You have more than one goal, and that's fine"
(I skimmed; apologies if I missed relevant things.)
no one can bring about a dystopian future unless their ability to accomplish their goals is significantly more advanced than everyone else’s
[...] the EA community [...] is itself a substantial existential risk
This post seems to rely on the assumption that, in the absence of extremely unusual self-limits, EA's ability to accomplish its goals will somehow become significantly more advanced than those of the rest of the world combined. That's quite a strong, unusual assumption to make about any social movement--I think it'd take lots more argument to make a convincing case for it.
I'm not sure how much I agree with this / its applicability, but one argument I've heard is that, for individual decision-making and social norm-setting,
total abstinence is easier than perfect moderation
(Kind of a stretch, but I enjoyed this speech on the cultural and coordinating power of simple norms, which can be seen as a case against nuanced norms. Maybe the simplicity of some standards as individuals' principles, advocacy goals, and social norms makes them more resilient to pressure, whereas more nuanced standards might more easily fall down slippery slopes, incrementally succumbing to pressure to expand the scope of their exceptions until they're far too broad.)
I think preference-based views fit neatly into the asymmetry.
Here I'm moving on from the original topic, but if you're interested in following this tangent--I'm not quite getting how preference-based views (specifically, person-affecting preference utilitarianism) maintain the asymmetry while avoiding (a slightly/somewhat weaker version of) "killing happy people is good."
Under "pure" person-affecting preference utilitarianism (ignoring broader pluralistic views of which this view is just one component, and also ignoring instrumental justifications), clearly one reason why it's bad to kill people is that this would frustrate some of their preferences. Under this view, is another (pro tanto) reason why it's bad to kill (not-entirely-satisfied) people that their satisfaction/fulfillment is worth preserving (i.e. is good in a way that outweighs associated frustration)?
My intuition is that one answer to the above question breaks the asymmetry, while the other revives some very counterintuitive implications.
- If we answer "Yes," then, through that answer, we've accepted a concept of "actively good things" into our ethics, rejecting the view that ethics is just about fixing states of affairs that are actively problematic. Now we're back in (or much closer to?) a framework of "maximize goods minus bads" / "there are intrinsically good things," which seems to (severely) undermines the asymmetry.
- If we answer "No," on the grounds that fulfillment can't outweigh frustration, this would seem to imply that one should kill people, whenever their being killed would frustrate them less than their continued living. Problematically, that seems like it would probably apply to many people, including many pretty happy people.
- After all, suppose someone is fairly happy (though not entirely, constantly fulfilled), is quite myopic, and only has a moderate intrinsic preference against being killed. Then, the preference utilitarianism we're considering seems to endorse killing them (since killing them would "only" frustrate their preferences for a short while, while continued living would leave them with decades of frustration, amid their general happiness).
- There seem to be additional bizarre implications, like "if someone suddenly gets an unrealizable preference, even if they mistakenly think it's being satisfied and are happy about that, this gives one stronger reasons to kill them." (Since killing them means the preference won't go unsatisfied as long.)
- (I'm assuming that frustration matters (roughly) in proportion to its duration, since e.g. long-lasting suffering seems especially bad.)
- (Of course, hedonic utilitarianism also endorses some non-instrumental killing, but only under what seem to be much more restrictive conditions--never killing happy people.)
Of the experience-based asymmetric views discussed in the OP, my posts on tranquilism and suffering-focused ethics mention value pluralism and the idea that things other than experiences (i.e., preferences mostly) could also be valuable. Given these explicit mentions it seems false to claim that "these views don't easily fit into a preference-focused framework." [...] I'm not sure why you think [a certain] argument would have to be translated into a preference-focused framework.
I think this misunderstands the point I was making. I meant to highlight how, if you're adopting a pluralistic view, then to defend a strong population asymmetry (the view emphasized in the post's title), you need reasons why none of the components of your pluralistic view value making happy people.* This gets harder the more pluralistic you are, especially if you can't easily generalize hedonic arguments to other values. As you suggest, you can get the needed reasons by introducing additional assumptions/frameworks, like rejecting the principle that it's better for there to be more good things. But I wouldn't call that an "easy fit"; that's substantial additional argument, sometimes involving arguing against views that many readers of this forum find axiomatically appealing (like that it's better for there to be more good things).
(* Technically you don't need reasons why none of the views consider the making of happy people valuable, just reasons why overall they don't. Still, I'd guess those two claims are roughly equivalent, since I'm not aware of any prominent views which hold the creation of purely happy people to be actively bad.)
Besides that, I think at this point we're largely in agreement on the main points we've been discussing?
- I've mainly meant to argue that some of the ethical frameworks that the original post draws on and emphasizes, in arguing for a population asymmetry, have implications that many find very counterintuitive. You seem to agree.
- If I've understood, you've mainly been arguing that there are many other views (including some that the original post draws on) which support a population asymmetry while avoiding certain counterintuitive implications. I agree.
- Your most recent comment seems to frame several arguments for this point as arguments against the first bullet point above, but I don't think they're actually arguments against the above, since the views you're defending aren't the ones my most-discussed criticism applies to (though that does limit the applicability of the criticism).
- In the context of massive nuclear attacks, why isn't the danger of nuclear winter widely seen as making nuclear retaliation redundant (as Ellsberg suggested on another 80K podcast episode)?
- The US threatens to retaliate (with nukes) against anyone who nukes certain US allies--how credible is this threat, and why?
- Part of how the US tries to make this threat more credible is by sharing nukes with some of its allies. How does this sharing work? Does the US share nukes in such a way that, in a crisis, a non-nuclear host country could easily seize and launch a nuke?
- Why has the US relied on mutually assured destruction instead of minimal deterrence?
- What are some ways in which recent developments in cybersecurity and machine learning interact with nuclear deterrence?
- Why have the US and the USSR/Russia invested so much in making lots of land-based nuclear missiles? (If they were mainly trying to ensure their nuclear weapons would survive a first strike, wouldn't it have been better to put that money toward making even more nuclear-armed submarines?)
Other questions on nuclear politics:
- Why did it take decades after the first nuclear weapons for countries to sign nuclear nonproliferation and arms control agreements?
On verification of compliance with arms control agreements:
- The US uses satellites to check whether Russia is following nuclear arms control agreements--as far as is publicly known, just how good are US spy satellites? What about commercial satellites?
- The UN's nuclear watchdog extensively monitors a bunch of nuclear power facilities, to make sure they're not using uranium to make nukes. One might think it'd be helpful to also do these things at uranium mines, but they usually don't do this--why not?
- How good do you think current systems are at detecting secret nuclear facilities?
On what else to check out:
- In addition to your own blog and podcast as well as this podcast, what are some of your favorite blogs or podcasts on nuclear security or international security?
(Edited to add a few.)
(I don't know that there's much of an EA consensus on nuclear weapons issues--and if there is, I don't know what the consensus is--so these aren't quite questions about what EA gets right/wrong on this.)
Thanks for the thoughtful reply; I've replied to many of these points here.
In short, I think you're right that Magnus doesn't explicitly assume consequentialism or hedonism. I understood him to be implicitly assuming these things because of the post's focus on creating happiness and suffering, as well as the apparent prevalence of these assumptions in the suffering-focused ethics community (e.g. the fact that it's called "suffering-focused ethics" rather than "frustration-focused ethics"). But I should have more explicitly recognized those assumptions and how my arguments are limited to them.
Thanks for the thoughtful reply; I've replied to many of these points here.
On a few other ends:
- I agree that strong negative utilitarian views can be highly purposeful and compassionate. By "semi-nihilistic" I was referring to how some of these views also devalue much (by some counts, half) of what others value. [Edit: Admittedly, many pluralists could say the same to pure classical utilitarians.]
- I agree classical utilitarianism also has bullets to bite (though many of these sure look like they're appealing to our intuitions in scenarios where we should expect to have bad intuitions, due to scope insensitivity).
Thanks for the thoughtful reply. You're right, you can avoid the implications I mentioned by adopting a preference/goal-focused framework. (I've edited my original comment to flag this; thanks for helping me recognize it.) That does resolve some problems, but I think it also breaks most of the original post's arguments, since they weren't made in (and don't easily fit into) a preference-focused framework. For example:
- The post argues that making happy people isn't good and making miserable people is bad, because creating happiness isn't good and creating suffering is bad. But it's unclear how this argument can be translated into a preference-focused framework.
- Could it be that "satisfying preferences isn't good, and frustrating preferences is bad"? That doesn't make sense to me; it's not clear to me there's a meaningful distinction between satisfying a preference and keeping it from being frustrated.
- Could it be that "satisfying positive preferences isn't good, and satisfying negative preferences is good?" But that seems pretty arbitrary, since whether we call some preference positive or negative seems pretty arbitrary (e.g. do I have a positive preference to eat or a negative preference to not be hungry? Is there a meaningful difference?).
- The second section of the original post emphasizes extreme suffering and how it might not be outweighable. But what does this mean in a preference-focused context? Extreme preference frustration? I suspect, for many, that doesn't have the intuitive horribleness that extreme suffering does.
- The third section of the post focuses on surveys that ask questions about happiness and suffering, so we can't easily generalize these results to a preference-focused framework.
(I also agree--as I tried to note in my original comment's first bullet point--that pluralistic or "all-things-considered" views avoid the implications I mentioned. But I think ethical views should be partly judged based on the implications they have on their own. The original post also seems to assume this, since it highlights the implications total utilitarianism has on its own rather than as a part of some broader pluralistic framework.)
Thanks for writing. You're right that MacAskill doesn't address these non-obvious points, though I want to push back a bit. Several of your arguments are arguments for the view that "intrinsically positive lives do not exist," and more generally that intrinsically positive moments do not exist. Since we're talking about repugnant conclusions, readers should note that this view has some repugnant conclusions of its own.
[Edit: I stated the following criticism too generally; it only applies when one makes an additional assumption: that experiences matter, while things that don't affect anyone's experiences don't matter. As I argue in the below comment thread, that strong focus on experiences seems necessary for some of the original post's main arguments to work.]
It implies that there wouldn't be anything wrong with immediately killing everyone reading this, their families, and everyone else, since this supposedly wouldn't be destroying anything positive. It also implies that there was nothing good (in an absolute sense) in the best moment of every reader's life--nothing actively good about laughing with friends, or watching a sunset, or hugging a loved one. To me, that's deeply and obviously counter-intuitive. (And, as the survey you brought up shows, the large majority of people don't hold these semi-nihilistic views.) Still, to caveat:
- In practice, to their credit, people sympathetic with these views tend to appreciate that, all things considered, they have many reasons to be cooperative toward others and avoid violence.
- None of the above criticisms apply to weak versions of the asymmetry. People can think that reducing suffering is somewhat more important than increasing happiness--rather than infinitely so--and then they avoid most of these criticisms. But they can't ground these views on the premise that good experiences don't exist.
Also, as long as we're reviving old debates, readers may be interested in Toby Ord's arguments against many of these views (and e.g. this response).
Thanks for the thoughtful post!
Some of the disconnect here might be semantic - my sense is people here often use "moral progress" to refer to "progress in people's moral views," while you seem to be using the term to mean both that and also other kinds of progress.
Other than that, I'd guess people might not yet be sold on how tractable and high-leverage these interventions are, especially in comparison to other interventions this community has identified. If you or others have more detailed cases to make on the tractability of any of these important problems, I'd be curious to see them, and I imagine others would be, too. (As you might have guessed, you might find more ears if you argue for relevance to x-risks, since the risk aversion of global health and development parts of EA seems to leave them with little interest in hard-to-measure interventions.)
Some GiveWell charities largely benefit young children, too, but if I recall correctly, I think donations have been aimed at uses for the next year or two, so maybe only very young children would not benefit on such a person-affecting view, and this wouldn't make much difference.
Agreed that this wouldn't make much of a difference for donations, although maybe it matters a lot for some career decisions. E.g. if future people weren't ethically important, then there might be little value in starting a 4+ year academic degree to then donate to these charities.
(Tangentially, the time inconsistency of presentists' preferences seems pretty inconvenient for career planning.)