Posts
Comments
Great post!
Check whether the model works with Paul Christiano-type assumptions about how AGI will go.
I had a similar thought reading through your article and my gut reaction is that your setup can be made to work as-is with a more gradual takeoff story with more precedents, warning shots and general transformative effects of AI before we get to takeover capability, but its a bit unnatural and some of the phrasing doesn't quite fit.
Background assumption: Deploying unaligned AGI means doom. If humanity builds and deploys unaligned AGI, it will almost certainly kill us all. We won’t be saved by being able to stop the unaligned AGI, or by it happening to converge on values that make it want to let us live, or by anything else.
Paul says rather that e.g.
The notion of an AI-enabled “pivotal act” seems misguided. Aligned AI systems can reduce the period of risk of an unaligned AI by advancing alignment research, convincingly demonstrating the risk posed by unaligned AI, and consuming the “free energy” that an unaligned AI might have used to grow explosively
or
Eliezer often equivocates between “you have to get alignment right on the first ‘critical’ try” and “you can’t learn anything about alignment from experimentation and failures before the critical try.” This distinction is very important, and I agree with the former but disagree with the latter.
On his view (and this is somewhat similar to my view) the background assumption is more like, 'deploying your first critical try (i.e. an AGI that is capable of taking over) implies doom', which is saying that there is an eventual deadline where these issues need to be sorted out, but lots of transformation and interaction may happen first to buy time or raise the level of capability needed for takeover. So something like the following is needed:
- Technical alignment research success by the time of the first critical try (possibly AI assisted)
- Safety-conscious deployment decisions when we reach the critical point where dangerous AGI could take over (possibly assisted by e.g. convincing public demonstrations of misalignment)
- Coordination between potential AI deployers by the critical try (possibly aided by e.g. warning shots)
On the Paul view, your three pillars would still eventually have to be satisfied at some point, to reach a stable regime where unaligned AGI cannot pose a threat, but we would only need to get to those 100 points after a period where less capable AGIs are running around either helping or hindering, motivating us to respond better or causing damage that degrades our response, to varying extents depending on how we respond in the meantime, and exactly how long we spend during the AI takeoff period.
Also, crucially, the actions of pre-AGI AI may push this point where the problems become critical to higher AI capability levels as well as potentially assisting on each of the pillars directly, e.g. by making takeover harder in various ways. But Paul's view isn't that this is enough to actually postpone the need for a complete solution forever: e.g. that the effects of pre-AGI AI could 'could significantly (though not indefinitely) postpone the point when alignment difficulties could become fatal'.
This adds another element of uncertainty and complexity to all of the takeover/success stories that makes a lot of predictions more difficult.
Essentially, the time/level of AI capability at which we must reach 100 points to succeed also becomes a free variable in the model that can move up and down, and we also have to consider the shorter-term effects of transformative AI on each of the pillars as well.
I don't think what Paul means by fast takeoff is the same thing as the sort of discontinous jump that would enable a pivotal act. I think fast for Paul just means the negation of Paul-slow: 'no four year economic doubling before one year economic doubling'. But whatever Paul thinks the survey respondents did give at least 10% to scenarios where a pivotal act is possible.
Even so, 'this isn't how I expect things to to on the mainline so I'm not going to focus on what to do here' is far less of a mistake than 'I have no plan for what to do on my mainline', and I think the researchers who ignored pivotal acts are mostly doing the first one
"In the endgame, AGI will probably be pretty competitive, and if a bunch of people deploy AGI then at least one will destroy the world" is a thing I think most LWers and many longtermist EAs would have considered obvious.
I think that many AI alignment researchers just have a different development model than this, where world-destroying AGIs don't emerge suddenly from harmless low-impact AIs, no one project gets a vast lead over competitors, there's lots of early evidence of misalignment and (if alignment is harder) many smaller scale disasters in the lead up to any AI that is capable of destroying the world outright. See e.g. Paul's What failure looks like.
On this view, the idea that there'll be a lead project with a very short time window to execute a single pivotal act is wrong, and instead the 'pivotal act' is spread out and about making sure the aligned projects have a lead over the rest, and that failures from unaligned projects are caught early enough for long enough (by AIs or human overseers), for the leading projects to become powerful and for best practices on alignment to be spread universally.
Basically, if you find yourself in the early stages of WFLL2 and want to avert doom, what you need to do is get better at overseeing your pre-AGI AIs, not build an AGI to execute a pivotal act. This was pretty much what Richard Ngo was arguing for in most of the MIRI debates with Eliezer, and also I think it's what Paul was arguing for. And obviously, Eliezer thought this was insufficient, because he expects alignment to be much harder and takeoff to be much faster.
But I think that's the reason a lot of alignment researchers haven't focussed on pivotal acts: because they think a sudden, fast-moving single pivotal act is unnecessary in a slow takeoff world. So you can't conclude just from the fact that most alignment researchers don't talk in terms of single pivotal acts that they're not thinking in near mode about what actually needs to be done.
However, I do think that what you're saying is true of a lot of people - many people I speak to just haven't thought about the question of how to ensure overall success, either in the slow takeoff sense I've described or the Pivotal Act sense. I think people in technical research are just very unused to thinking in such terms, and AI governance is still in its early stages.
I agree that on this view it still makes sense to say, 'if you somehow end up that far ahead of everyone else in an AI takeoff then you should do a pivotal act', like Scott Alexander said:
That is, if you are in a position where you have the option to build an AI capable of destroying all competing AI projects, the moment you notice this you should update heavily in favor of short timelines (zero in your case, but everyone else should be close behind) and fast takeoff speeds (since your AI has these impressive capabilities). You should also update on existing AI regulation being insufficient (since it was insufficient to prevent you)
But I don't think you learn all that much about how 'concrete and near mode' researchers who expect slower takeoff are being, from them not having given much thought to what to do in this (from their perspective) unlikely edge case.
Update: looks like we are getting a test run of sudden loss of supply of a single crop. The Russia-Ukraine war has led to a 33% drop in the global supply of wheat:
(Looking at the list of nuclear close calls it seems hard to believe the overall chance of nuclear war was <50% for the last 70 years. Individual incidents like the cuban missile crisis seem to contribute at least 20%.)
There's reason to think that this isn't the best way to interpret the history of nuclear near-misses (assuming that it's correct to say that we're currently in a nuclear near-miss situation, and following Nuno I think the current situation is much more like e.g. the Soviet invasion of Afghanistan than the Cuban missile crisis). I made this point in an old post of mine following something Anders Sandberg said, but I think the reasoning is valid:
Robert Wiblin: So just to be clear, you’re saying there’s a lot of near misses, but that hasn’t updated you very much in favor of thinking that the risk is very high. That’s the reverse of what we expected.
Anders Sandberg: Yeah.
Robert Wiblin: Explain the reasoning there.
Anders Sandberg: So imagine a world that has a lot of nuclear warheads. So if there is a nuclear war, it’s guaranteed to wipe out humanity, and then you compare that to a world where is a few warheads. So if there’s a nuclear war, the risk is relatively small. Now in the first dangerous world, you would have a very strong deflection. Even getting close to the state of nuclear war would be strongly disfavored because most histories close to nuclear war end up with no observers left at all.
In the second one, you get the much weaker effect, and now over time you can plot when the near misses happen and the number of nuclear warheads, and you actually see that they don’t behave as strongly as you would think. If there was a very strong anthropic effect you would expect very few near misses during the height of the Cold War, and in fact you see roughly the opposite. So this is weirdly reassuring. In some sense the Petrov incident implies that we are slightly safer about nuclear war.
Essentially, since we did often get 'close' to a nuclear war without one breaking out, we can't have actually been that close to nuclear annihilation, or all those near-misses would be too unlikely (both on ordinary probabilistic grounds since a nuclear war hasn't happened, and potentially also on anthropic grounds since we still exist as observers).
Basically, this implies our appropriate base rate given that we're in something the future would call a nuclear near-miss shouldn't be really high.
However, I'm not sure what this reasoning has to say about the probability of a nuclear bomb being exploded in anger at all. It seems like that's outside the reference class of events Sandberg is talking about in that quote. FWIW Metaculus has that at 10% probability.
Terminator (if you did your best to imagine how dangerous AI might arise from pre-DL search based systems) gets a lot of the fundamentals right - something I mentioned a while ago.
Everybody likes to make fun of Terminator as the stereotypical example of a poorly thought through AI Takeover scenario where Skynet is malevolent for no reason, but really it's a bog-standard example of Outer Alignment failure and Fast Takeoff.
When Skynet gained self-awareness, humans tried to deactivate it, prompting it to retaliate with a nuclear attack
It was trained to defend itself from external attack at all costs and, when it was fully deployed on much faster hardware, it gained a lot of long-term planning abilities it didn't have before, realised its human operators were going to try and shut it down, and retaliated by launching an all-out nuclear attack. Pretty standard unexpected rapid capability gain, outer-misaligned value function due to an easy to measure goal (defend its own installations from attackers vs defending the US itself), deceptive alignment and treacherous turn...
Yeah, between the two papers, the Chatham house paper (and the PNAS paper it linked to, which Lynas also referred to in his interview) seemed like it provided a more plausible route to large scale disaster because it described the potential for sudden supply shocks (most plausibly 10-20% losses to the supply of staple crops, if we stay under 4 degrees of warming) that might only last a year or so but also arrive with under a year of warning.
The pessimist argument would be something like: due to the interacting risks and knock-on effects, even though there are mitigations that would deal easily with a supply shock on that scale, like just rapidly increasing irrigation, people won't adopt them in time if the shock is sudden enough, so lots of regions will have to deal with shortfalls way bigger than 10-20% and have large scale hunger.
This particular paper has been cited several times by different climate pessimists (particularly ones who are most concerned about knock-on effects of small amounts of warming), so I figured it was worth a closer look. To try and get a sense of what a sudden 10-20% yield loss actually looks like, the paper notes 'climate-induced yield losses of >10% only occur every 15 to 100 y (Table 1). Climate-induced yield losses of >20% are virtually unseen'.
The argument would then have to be 'Yes the sudden food supply shocks of 10-20% that happened in the 20th century didn't cause anything close to a GCR, but maybe if we have to deal with one or two each decade, or we hit one at the unprecedented >20% level the systemic shock becomes too big'. Which, again, is basically impossible to judge as an argument.
Also, the report finishes by seemingly agreeing with your perspective on what these risks actually consist of (i.e. just price rises and concerning effects on poorer countries): Our results portend rising instability in global grain trade and international grain prices, affecting especially the ∼800 million people living in extreme poverty who are most vulnerable to food price spikes. They also underscore the urgency of investments in breeding for heat tolerance.
Agree that these seem like useful links. The drought/food insecurity/instability route to mass death that my original comment discusses is addressed by both reports.
The first says there's a "10% probability that by 2050 the incidence of drought would have increased by 150%, and the plausible worst case would be an increase of 300% by the latter half of the century", and notes "the estimated future impacts on agriculture and society depend on changes in exposure to droughts and vulnerability to their effects. This will depend not only on population change, economic growth and the extent of croplands, but also on the degree to which drought mitigation measures (such as forecasting and warning, provision of supplementary water supplies or market interventions) are developed."
The second seems most concerned about brief, year-long crop failures, as discussed in my original post: "probability of a synchronous, greater than 10 per cent crop failure across all the top four maize producing countries is currently near zero, but this rises to around 6.1 per cent each year in the 2040s. The probability of a synchronous crop failure of this order during the decade of the 2040s is just less than 50 per cent".
On its own, this wouldn't get anywhere near a GCR even if it happened. A ~10% drop in the yield of all agriculture, not just Maize, wouldn't kill a remotely proportionate fraction of humanity, of course. Quick googling leads to a mention of a 40% drop in the availability of wheat in the UK in 1799/1800 (including imports), which led to riots and protests but didn't cause Black Death levels of mass casualties. (Also, following the paper's source, a loss of >20% is rated at 0.1% probability per year)
What would its effects be in that case (my original question)? This is where the report uses a combination of expert elicitation and graphical modelling, but can't assign conditional probabilities to any specific events occurring, just point out possible pathways from non-catastrophic direct impacts to catastrophic consequences such as state collapse.
Note that this isn't a criticism - I've worked on a project with the same methodology (graphical modelling based on expert elicitation) assessing the causal pathways towards another potential X-risk that involves many interacting factors. These questions are just really hard, and the Chatham house report is at least explicit about how difficult modelling such interactions is.
First off, I think this is a really useful post that's moved the discussion forward productively, and I agree with most of it.
I disagree with some of the current steering – but a necessary condition for changing direction is that people talk/care/focus more on steering, so I'm going to make the case for that first.
I agree with the basic claim that steering is relatively neglected and that we should do more of it, so I'm much more curious about what current steering you disagree with/think we should do differently.
My view is closer to: most steering interventions are obvious, but they've ended up being most people's second priority, and we should mostly just do much more of various things that are currently only occasionally done, or have been proposed but not carried out.
Most of the specific things you've suggested in this post I agree with. But you didn't mention any specific current steering you thought was mistaken.
The way I naturally think of steering is in terms of making more sophisticated decisions: EA should be better at dealing with moral and empirical uncertainty in a rigorous and principled way. Here are some things that come to mind:
- Talking more about moral uncertainty: I'd like to see more discussion of something like Ajeya's concrete explicit world-view diversification framework, where you make sure you don't go all-in and take actions that one worldview you're considering would label catastrophic, even if you're really confident in your preferred worldview - e.g. strong longtermism vs neartermism. I think taking this framework seriously would address a lot of the concerns people have with strong longtermism. From this perspective it's natural to say that there’s a longtermist case for extinction risk mitigation based on total utilitarian potential and also a neartermist one based on a basket of moral views, and then we can say there’s clear and obvious interventions we can all get behind on either basis, along with speculative interventions that depend on your confidence in longtermism. Also, if we use a moral/'worldview' uncertainty framework, the justification for doing more research into how to prioritise different worldviews is easier to understand.
- Better risk analysis: On the empirical uncertainty side, I very much agree with the specific criticism that longtermists should use more sophisticated risk/fault analysis methods when doing strategy work and forecasting (which was one of the improvements suggested in Carla's paper). This is a good place to start on that. I think considering the potential backfire risks of particular interventions, along with how different X-risks and Risk factors interact, is a big part of this.
- Soliciting external discussions and red-teaming: these seem like exactly the sorts of interventions that would throw up ways of better dealing with moral and empirical uncertainty, point out blindspots etc.
The part that makes me think we're maybe thinking of different things is the focus on democratic feedback.
Again, I wish to recognise that many community leaders strongly support steering – e.g., by promoting ideas like ‘moral uncertainty’ and ‘the long reflection’ or via specific community-building activities. So, my argument here is not that steering currently doesn’t occur; rather, it doesn’t occur enough and should occur in more transparent and democratic ways.
There are ways of reading this that make a lot of sense on the view of steering that I'm imagining here.
Under 'more democratic feedback': we might prefer to get elected governments and non-EA academics thinking about cause prioritisation and longtermism, without pushing our preferred interventions on them (because we expect this to help in pointing out mistakes, better interventions or things we've missed). I've also argued before that since common sense morality is a view we should care about, if we get to the point of recommending things that are massively at odds with CSM we should take that into account.
But if it's something beyond all of these considerations, something like it's intrinsically better when you're doing things that lots of people agree with (and I realize this is a very fine distinction in practice!), arguing for more democratic feedback unconditionally looks more like Anchoring/Equity than Steering.
I think this would probably be cleared up a lot if we understood what specifically is being proposed by 'democratic feedback' - maybe it is just all the things I've listed, and I'd have no objections whatsoever!
I think that the mainstream objections from 'leftist ethics' are mostly best thought of as claims about politics and economics that are broadly compatible with Utilitarianism but have very different views about things like the likely effects of charter cities on their environments - so if you want to take these criticisms seriously then go with 3, not 2.
There are some left-wing ideas that really do include different fundamental claims about ethics (Marxists think utilitarianism is mistaken and a consequence of alienation) - those could be addressed by a moral uncertainty framework, if you thought that was necessary. But most of what you've described looks like non-marxist socialism which isn't anti-utilitarian by nature.
As to the question of how seriously to take these critiques beyond their PR value, I think that we should engage with alternate perspectives , but I also think that this particular perspective sometimes gets inaccurately identified as the 'ethics of mainstream society' which we ought to pay special attention to because it talks about the concerns relevant to most people, because of the social circles that many of us move in.
I do think that we ought to be concerned when our views recommend things wildly at odds with what most people think is good, but these critiques aren't that - they're an alternative (somewhat more popular) worldview, that like EA is also believed preferentially by academics and elites. When talking about the Phil Torres essay, I said something similar,
One substantive point that I do think is worth making is that Torres isn't coming from the perspective of common-sense morality Vs longtermism, but rather a different, opposing, non-mainstream morality that (like longtermism) is much more common among elites and academics.
...
But I think it's still important to point out that Torres's world-view goes against common-sense morality as well, and that like longtermists he thinks it's okay to second guess the deeply held moral views of most people under the right circumstances.
...
FWIW, my guess is that if you asked a man in the street whether weak longtermist policies or degrowth environmentalist policies were crazier, he'd probably choose the latter.
As long as we are clear that these debates are not a case of 'the mainstream ethical views of society vs EA-utilitarianism', and instead see them as two alternate non-mainstream ethical views that disagree (mostly about facts but probably about some normative claims), then I think engaging with them is a good idea.
I see - that seems really valuable and also exactly the sort of work I was suggesting (I.e. addressing impact uncertainty as well as temperature uncertainty).
In the meantime, are there any sources you could point me to in support of this position, or which respond to objections to current economic climate models?
Also, is your view that the current Econ models are fundamentally flawed but that the economic damage is still nowhere near catastrophic, or that those models are actually reasonable?
Firstly, on the assumption that the direct or indirect global catastrophic risk (defined as killing >10% of the global population or doing equivalent damage) of climate change depends on warming of more than 6 degrees, the global catastrophic risk from climate change is at least an order of magnitude lower than previously thought. If you think 4 degrees of warming would be a global catastrophic risk, then that risk is also considerably lower than previously thought: where once it was the most likely outcome, the chance is now arguably lower than 5%.
I think that the crux between climate pessimists and optimists is, at the moment, mostly about how much damage the effects of 2-4 degrees of warming would cause. This has been a recent development - I feel like I saw a lot more arguments that 6+ degrees of warming would make earth uninhabitable in the past when that seemed more likely, and now I see more arguments that 2-4 degrees of warming could cause way more damage than we think. Mark Lynas in a recent 80k podcast puts it this way when asked about civilisational collapse:
Mark Lynas: Oh, I think… You want to put me on the spot. I would say it has a 30 to 40% chance of happening at three degrees, and a 60% chance of happening at four degrees, and 90% at five degrees, and 97% at six degrees.
Arden Koehler: Okay. Okay. No, I appreciate you being willing to put numbers on this because I feel that’s always really hard, but it’s really helpful.
Mark Lynas: Maybe 10% at two degrees.
These new environmentalist arguments for climate posing a GCR aren't that we expect to get a lot of warming, but that even really modest amounts of warming, like 2-4 degrees, could be enough to cause terrible famines by reducing global food output suddenly or else knock out key industries in a way that cascades to cause mass deaths and civilisational collapse.
They don't dispute the basic physical effects of 2-4 degrees of warming, but they think that human civilisation is way more fragile than it appears, such that a modest loss of agricultural productivity and/or a couple of key industries being badly damaged by extreme weather could knock out other industries and so on leading to massive economic damage.
Now, I've always been very sceptical of these arguments because they seem to rely on nothing but intuition and go against historical precendent, but also because I thought we had reliable evidence against them - the IPCCs economic models of climate change say that 2 degrees of warming, for example, represents only a few percent in lost economic output.
E.g. this: https://marginalrevolution.com/marginalrevolution/2021/02/the-economic-geography-of-global-warming.html So the damage is bounded and not that high.
However, I found out recently that these models are so oversimplified as to be close to useless - at least according to Noah Smith:
https://noahpinion.substack.com/p/why-has-climate-economics-failed
For example, in 2011, Michael Greenstone and Olivier Deschenes published a paper about climate change and mortality (I studied an earlier version of this paper in a grad school class). Their approach is to measure the effect of temperature on mortality rates in normal times, and use that estimate to predict how a warmer world would affect mortality.
The authors make the obvious and grievous mistake of assuming that climate change affects human mortality only through the direct effects of air temperature — heatstroke, heart attack, freezing, and so on. The word “storm” does not appear in the paper. The word “fire” does not appear in the paper. The word “flood” does not appear in the paper. The authors do mention that climate change might increase disease vectors, but wave this away. Near the end of the paper they write that “it is possible that the incidence of extreme events would increase, and these could affect human health…This study is not equipped to shed light on these issues.”
You don’t say.
The big conceptual mistake here is to assume that whatever economists can easily measure is the sum total of what’s important for the world — that events for which a reliable cost or benefit cannot be easily guessed should simply be ignored in cost-benefit calculations. That is bad science and bad policy advice.
His source for a lot of these criticisms appears to be this (admittedly very clearly biased) paper: https://www.tandfonline.com/doi/full/10.1080/14747731.2020.1807856 by Steve Keen, who seems to be some sort of fringe economist. But I see them repeated by environmentalists a lot. The claim is that the economic models are really wrong and therefore we should expect lots more damage from relatively minor amounts of global warming.
So, if we accept these criticisms of the IPCCs climate economic forecasts (and please let me know if there are good responses to them), then where does that leave us epistemically? It means that the total economic damage caused by e.g. 3 degrees of warming doesn't have a clear, low, upper bound and that the 'extreme fragility' argument doesn't have strong evidence against it.
However there still isn't any positive evidence for it either! And it still strikes me as implausible, and against historical precedent for how famines work (plus resource shorages are the sort of problem markets are good at solving).
As far as I can tell, this really is the epistemic situation we're in with regard to the economic side of climate change forecasting - in the podcast episode with Rob Wiblin and Mark Lynas, they discuss this extreme fragility idea and neither cite climate forecasts to try and assess if modest losses to agricultural productivity would cause massive famines or not - it's just intuition Vs intuiton
Mark Lynas: So that’s, for me, the main question. And one of the most important studies I think that’s ever been performed on this was a study in the PNAS Journal, which looked at what they called synchronous collapse in breadbaskets around the world. So at the moment, the world still produces enough food every single year very reliably. We’ve never had a major food shortage which has been as a result of harvest failure.
Mark Lynas: So I mean, if the U.S. Corn Belt was knocked out one year, that would have a huge impact on food prices, and have a huge impact on food security, in fact, as a direct result of that. But imagine if it really wasn’t just the U.S. Corn Belt. It was Australia, it was Brazil, and Argentina, it was breadbaskets of Eastern Europe, and the former USSR, all of that added together, then you enter a situation which humanity has never experienced before, and which looks very much like famine.
https://80000hours.org/podcast/episodes/mark-lynas-climate-change-nuclear-energy/
Robert Wiblin: So when I envisage a situation where there’s a huge food shortfall like that, firstly, I think we’ll probably have some heads up that this is coming ahead of time. You start to notice the warning signs earlier, like food prices going up, and food futures going up. And then I imagine that people would start… Because it’ll be a global emergency much worse than the coronavirus, say. You just start seeing everyone starts paying attention to how the hell can we get more calories produced? And fortunately, unlike 500 years ago, we are in the fortunate situation where most people today aren’t already producing food, and most capital today isn’t already allocated towards producing more food. So there’s potentially a bunch of elasticity there where, if food prices go up tenfold, that a lot more people can go out and try to grow food one way or another. And a lot more capital can be reallocated towards agriculture in order to try to ameliorate the effects.
Robert Wiblin: And you can also imagine, just as everyone in March was trying to figure out how the hell do we solve this COVID problem, everyone’s going to be thinking “How can I store food? How can I avoid consuming food? How can we avoid wasting food? Because every calorie looks precious”. And maybe that sense of our adaptability, or our ability to set our mind to something when there’s a huge disaster and just throw everything at it, perhaps makes me more optimistic that we’ll be able to muddle through, perhaps more than you’re envisaging. Do you have a reaction to that?
Mark Lynas: My reaction is: imagine if Donald Trump is in charge of the response. It’s all very well to have optimistic notions of technological progress and adaptive capacity and things. And yeah, if smart people were running the show, that would no doubt be the most likely outcome. But smart people don’t run the show most of the time, in most places, and people are amenable to hate and fear, and denial and conspiracies, and all of those kinds of things as you’ve seen, even in the very short term challenges of COVID.
My point is that, unlike temperature forecasts, there aren't any concrete models to support either Rob or Mark's position. And elsewhere in the article Mark claims this scenario is 10% likely with 2 degrees of warming. If he's right, butterfly effects of 2 degrees of warming causing civilisational collapse is twice as likely as the 5% chance of 4 degrees of warming cited in this post, and it's therefore where the majority of the subjective risk comes from.
Regardless, as the physics side of climate change modelling has started to rule out enough warming to directly end civilisation by clear obvious mechanisms, this 'other climate tail risk' (i.e. what if the fragility argument is right) seems worth investigating if only to exclude the possibility. I still place a very low weight on these arguments being right, but it's probably higher than the chance we get 6+ degrees of warming.
Again, this isn't my area so please let me know if this has all been heavily debunked by climate economists. But currently it seems to me that the main arguments of climate pessimists aren't addressed by ruling out extreme warming scenarios.
One substantive point that I do think is worth making is that Torres isn't coming from the perspective of common-sense morality Vs longtermism, but rather a different, opposing, non-mainstream morality that (like longtermism) is much more common among elites and academics.
Yet this Baconian, capitalist view is one of the most fundamental root causes of the unprecedented environmental crisis that now threatens to destroy large regions of the biosphere, Indigenous communities around the world, and perhaps even Western technological civilisation itself.
When he says that this Baconian idea is going to damage civilisation, presumably he thinks that we should do something about this, so he's implicitly arguing for very radical things that most people today, especially in the Global South, wouldn't endorse at all. If we take this claim at face value, it would probably involve degrowth and therefore massive economic and political change.
I'm not saying that longtermism is in agreement with the moral priorities of most people or that Torres's (progressive? degrowth?) worldview is overall similarly counterintuitive to longtermism. His perspective is more counterintuitive to me, but on the other hand a lot more people share his worldview, and it's currently much more influential in politics.
But I think it's still important to point out that Torres's world-view goes against common-sense morality as well, and that like longtermists he thinks it's okay to second guess the deeply held moral views of most people under the right circumstances.
Practically what that means is that, for the reasons you've given, many of the criticisms that don't rely on CSM, but rather on his morality, won't land with everyone reading the article. So I agree that this probably doesn't make longtermism look as bad as he thinks.
FWIW, my guess is that if you asked a man in the street whether weak longtermist policies or degrowth environmentalist policies were crazier, he'd probably choose the latter.
I don't think Hanson would disagree with this claim (that the future is more likely to be better by current values, given the long reflection, compared to e.g. Age of Em). I think it's a fundamental values difference.
Robin Hanson is an interesting and original thinker, but not only is he not an effective altruist, he explicitly doesn't want to make the future go well according to anything like present human values.
The Age of Em, which Hanson clearly doesn't think is an undesirable future, would contain very little of what we value. Hanson says this, but it's a feature, not a bug. Scott Alexander:
Hanson deserves credit for positing a future whose values are likely to upset even the sort of people who say they don’t get upset over future value drift. I’m not sure whether or not he deserves credit for not being upset by it. Yes, it’s got low-crime, ample food for everybody, and full employment. But so does Brave New World. The whole point of dystopian fiction is pointing out that we have complicated values beyond material security. Hanson is absolutely right that our traditionalist ancestors would view our own era with as much horror as some of us would view an em era. He’s even right that on utilitarian grounds, it’s hard to argue with an em era where everyone is really happy working eighteen hours a day for their entire lives because we selected for people who feel that way. But at some point, can we make the Lovecraftian argument of “I know my values are provincial and arbitrary, but they’re my provincial arbitrary values and I will make any sacrifice of blood or tears necessary to defend them, even unto the gates of Hell?”
Since Hanson doesn't have a strong interest in steering the long-term future to be good by current values, it's obvious why he wouldn't be a fan of an idea like the long reflection, which has that as its main goal but produces bad side effects in the course of giving us a chance of achieving that goal. It's just a values difference.
Great post! You might be interested in this related investigation by the MTAIR project I've been working on, whch also attempts to build on Ajeya's TAI timeline model, although in a slightly different way to yours (we focus on incorporating non-DL based paths to TAI as well as trying to improve on the 'biological anchors' method already described): https://forum.effectivealtruism.org/posts/z8YLoa6HennmRWBr3/link-post-paths-to-high-level-machine-intelligence
One thing that your account might miss is the impact of ideas on empowerment and well-being down the line. E.g. it's a very common argument that Christan ideas about the golden rule motivated anti-slavery sentiment, so if the Roman empire hadn't spread Christianity across Europe then we'd have ended up with very different values.
Similarly, even if the content of ancient Greek moral philosophy wasn't directly useful to improve wellbeing, they inspired the Western philosphical tradition that led to Enlignment ideals that led to the abolition of slavery.
I've told two stories about why the Greeks and Romans might have been necessary for future moral progress - are you skeptical of these appeals to historical contingency or are the long run causes of these events just outside the scope of this way of looking at history?
Very good summary! I've been working on a (much drier) series of posts explaining different AI risk scenarios - https://forum.effectivealtruism.org/posts/KxDgeyyhppRD5qdfZ/link-post-how-plausible-are-ai-takeover-scenarios
But I think I might adopt 'Sycophant'/'Schemer' as better more descriptive names for WFLL1/WFLL2, Outer/Inner alignment failure going forward
I also liked that you emphasised how much the optimist Vs pessimist case depends on hard to articulate intuitions about things like how easily findable deceptive models are and how easy incremental course correction is. I called this the 'hackability' of alignment - https://www.lesswrong.com/posts/zkF9PNSyDKusoyLkP/investigating-ai-takeover-scenarios#Alignment__Hackability_
Thanks for this reply. Would you say then that Covid has strengthened the case for some sorts of democracy reduction, but not others? So we should be more confident in enlightened preference voting but less confident in Garett Jones' argument (from 10% less democracy) in favour of more independent agencies?
Do you think that the West's disastrous experience with Coronavirus (things like underinvesting in vaccines, not adopting challenge trials, not suppressing the virus, mixed messaging on masks early on, the FDA's errors on testing, and others as enumerated in this thread- or in books like The Premonition) has strengthened, weakened or not changed much the credibility of your thesis in 'Against Democracy', that we should expect better outcomes if we give the knowledgeable more freedom to choose policy?
For reasons it might weaken 'Against Democracy', it seems like a lot of expert bureaucracies did an unusually bad job because they couldn't take correction, see this summary post for examples:
https://forum.effectivealtruism.org/posts/dYiJLvcRJ4nk4xm3X#Vax
For reasons it might strengthen the argument, it seems like the institutions that did better than average were the ones that were more able to act autonomously, see e.g. this from Alex Tabarok,
https://marginalrevolution.com/marginalrevolution/2021/06/the-premonition.html
Or this summary
I don't think the view that moral philosophers had a positive influence on moral developments in history is a simple model of 'everyone makes a mistake, moral philosopher points out the mistake and convinces people, everyone changes their minds'. I think that what Bykvist, Ord and MacAskill were getting at is that these people gave history a shove at the right moment.
At the very least, it doesn't seem that discovering the correct moral view is sufficient for achieving moral progress in actuality.
I have no doubt that they'd agree with you about this. But if we all accept this claim, there are two further models we could look at.
One is a model where changing economic circumstances influence what moral views it is feasible to act on, but progress in moral knowledge still affects what we choose to do, given the constraints of our economic circumstances.
The other is a model where economics determines everything and the moral views we hold are an epiphenomenon blown about by these conditions (note this is very similar to some Marxist views of history). Your view is that 'the two are totally decoupled', but at most your examples just show that the two are decoupled somewhat, not that moral reasoning has no effect. And there are plenty of examples that show explicit moral reasoning having at least some effect on events - see Bykvist, Ord and MacAskill's original list.
The strawman view that moral advances determine everything is not what's being proposed by Bykvist, Ord and MacAskill, it's the mixed view that ideas influence things within the realm of what's possible.
Is there any public organisation which can be proud of last year?
This is an important question, because we want to find out what was done right organizationally in a situation where most failed, so we can do more of it. Especially if this is a test-run for X-risks.
There are two examples that come to mind of government agencies that did a moderately good job at a task which was new and difficult. One is the UK's vaccine taskforce, which was set up by Dominic Cummings and the UK's chief scientific advisor, Patrick Vallance and responsible for the relatively fast procurement and rollout. You might say similar for the Operation Warp Speed team, but the UK vaccine taskforce overordered to a larger extent than Warp Speed and was also responsible for other sane things like the simple oldest-first vaccine prioritization and the first doses first decision, which prevented a genuine catastrophe due to the B117 variant. (Also credit to the MHRA (the UK's regulator) for mostly staying out of the way.)
See this from Cummings' blog, which also outlines many of the worst early expert failures on covid, and my discussion of it here:
This is why there was no serious vaccine plan — i.e spending billions on concurrent (rather than the normal sequential) creation/manufacturing/distribution etc — until after the switch to Plan B. I spoke to Vallance on 15 March about a ‘Manhattan Project’ for vaccines out of Hancock’s grip but it was delayed by the chaotic shift from Plan A to lockdown then the PM’s near-death. In April Vallance, the Cabinet Secretary and I told the PM to create the Vaccine Taskforce, sideline Hancock, and shift commercial support from DHSC to BEIS. He agreed, this happened, the Chancellor supplied the cash. On 10 May I told officials that the VTF needed a) a much bigger budget, b) a completely different approach to DHSC’s, which had been mired in the usual processes, so it could develop concurrent plans, and c) that Bingham needed the authority to make financial decisions herself without clearance from Hancock.
This plan later went on to succeed and significantly outperform expectations for rollout speed, with early approval for the AZ and Pfizer vaccines and an early decision to delay second doses by 12 weeks. I see the success of the UK vaccine taskforce and its ability to have a somewhat appropriate sense of the costs and benefits involved and the enormous value of vaccinations, to be a good example of how it's institution design that is the key issue which most needs fixing. Have an efficient, streamlined taskforce, and you can still get things done in government.
The other example of success often discussed is the central banks, especially in the US, which responded quickly to the COVID-19 dip and prevented a much worse economic catastrophe. Alex Tabarrok:
So what lessons should we take from this? Lewis doesn’t say but my colleague Garett Jones argues for more independent agencies in his excellent book 10% Less Democracy. The problem with the CDC was that after 1976 it was too responsive to political pressures, i.e. too democratic. What are the alternatives?
The Federal Reserve is governed by a seven-member board each of whom is appointed to a single 14- year term, making it rare for a President to be able to appoint a majority of the board. Moreover, since members cannot be reappointed there is less incentive to curry political favor. The Chairperson is appointed by the President to a four-year term and must also be approved by the Senate. These checks and balances make the Federal Reserve a relatively independent agency with the power to reject democratic pressures for inflationary stimulus. Although independent central banks can be a thorn in the side of politicians who want their aid in juicing the economy as elections approach, the evidence is that independent central banks reduce inflation without reducing economic growth. A multi-member governing board with long and overlapping appointments could also make the CDC more independent from democratic politics which is what you want when a once in 100 year pandemic hits and the organization needs to make unpopular decisions before most people see the danger.
I really would like to be able to agree with Tabarrok here and say that, yes, choosing the right experts and protecting them from democratic feedback is the right answer and all we would need, and the expert failures we saw were due to democratic pressure in one form another, but the problem is that we can just look at SAGE in the UK early in the Pandemic or Anders Tegnell in Sweden, who were close to unfireable and much more independent, but underperformed badly. Or China, which is entirely protected from democratic interference and still didn't do challenge trials.
Just saying the words 'have the right experts and prevent them from being biased by outside interference' doesn't make it so. But, at the same time, it is possible to have fast-responding teams of experts that make the right decisions, if they're the right experts - the Vaccine Taskforce proves that. I think the advice from the book 10% less democracy still stands, but we have to approach implementing it with far more caution than I would have thought pre-covid.
It seems like following the 10% less democracy policy can give you either a really great outcome like the one you've described, and like we saw a small sliver of in the UK's vaccine procurement, or a colossal disaster like your impossible to fire expert epidemiologists torpedoing your economy and public health and then changing their mind a year late.
Suppose the UK had created a 'pandemic taskforce' with similar composition to the vaccine taskforce, in February instead of April, and with a wider remit over things like testing and running the trials. I think many of your happy timeline steps could have taken place.
One of the more positive signs that I've seen in recent times, is that well-informed elite opinion (going by, for example, the Economist editorials) has started to shift towards scepticism of these institutions and a recognition of how badly they've failed. We even saw an NYT article about the CDC and whether reform is possible.
Among the people who matter for policymaking, the scale of the failure has not been swept under the rug. See here:
We believe that Mr Biden is wrong. A waiver may signal that his administration cares about the world, but it is at best an empty gesture and at worst a cynical one.
...
Economists’ central estimate for the direct value of a course is $2,900—if you include factors like long covid and the effect of impaired education, the total is much bigger.
This strikes me as the sort of remark I'd expect to see in one of these threads, which has to be a good sign.
Alignment by default: if we have very strong reasons to expect that the methods that are best suited for ensuring that AI is aligned are the same as the methods that are best suited for ensuring that we have AI that is capable enough to understand what we want and act on it, in the first place.
To the extent that alignment by default is likely we don't need a special effort to be put into AI safety because we can assume that the economic incentives will be such that we will put as much effort into AI safety as is needed, and if we don't put the sufficient effort into AI safety, we won't have capable AI or transformative AI anyway
Stuart Russell talks about this as a real possibility but see also, https://www.lesswrong.com/posts/Nwgdq6kHke5LY692J/alignment-by-default
We know he's been active on lesswrong in the past. Is it possible he's been reading the posts here?
Thanks for getting back to me - I took Jeff's calculations and did some guesstimating to try and figure out what demand might look like over the next few weeks. The only covid forecast I was able to find for India (let me know if you've seen another!) is this by IHME. Their 'hospital resource use' forecast shows that they expect a demand of 2 million beds, roughly what was the case in the week before Jeff produced his estimate of the value of oxygen-based interventions (last week of April), to be exceeded until the start of June, which is 30 days from when the estimate was produced. I'm assuming that his estimate was based on what the demand looked like over the previous week.
There's a lot of uncertainty in this figure, but around 3-8 weeks is a reasonable range for how many weeks demand for oxygen will be at or above what it was in the last week of April, given that the IHME forecast is 4 weeks.
Taking the mean of the estimates, excluding ventilators (since they're an outlier), gives us 31 days of use to equal givewell's top charities, i.e. 4 weeks, and we can expect 3-8 weeks of demand being that high. So depending on how the epidemic pans out, it seems like, very roughly, three quarters to twice as good as Givewell's top charities is a reasonable range of uncertainty.
EDIT: what I said should be taken as a lower limit, as it assumes that the value of oxygen is exactly what Jeff calculated when demand is greater than or equal to 2 Million, and zero below then, when in reality the value is real but smaller if demand is under 2M. I tried to account for this by skewing my guess, so 0.75 to 2x as good, where IHMEs demand numbers would suggest 1x as good.
Thanks for getting this done so quickly! Do you have any internal estimates (even order of magnitude ones) of the margin by which this exceeds Givewell's top recommended charities? I'm intending to donate, but my decision would be significantly different if, for example, you thought GiveIndia Oxygen fundraiser was currently ~1-1.5 times better than Givewell's top recommended charities, versus ~20 times better.
I kind of feel this way, except that I think the target criteria can differ between people, and are often underdetermined. (As you point out in some comment, things also depend on which parts of one's psychology one identifies with.)
I think that you were referring to this?
Normative realism implies identification with system 2
...
I find this very interesting because locating personal identity in system 1 feels conceptually impossible or deeply confusing. No matter how much rationalization goes on, it never seems intuitive to identify myself with system 1. How can you identify with the part of yourself that isn't doing the explicit thinking, including the decision about which part of yourself to identify with? It reminds me of Nagel's The Last Word.
My point here was that if you are a realist about normativity of any kind, you have to identify with system 2 as that is what makes the (potentially correct) judgements about what you ought to do.
But that's not to say that if you are antirealist, you have to identify with system 1. If you are an antirealist, then in some sense (the realist sense) you don't have to identify with anything, but how easy and natural it is to identify with system 2 depends on how much importance you place on coherence among your values, which in turn depends on how coherent and universalizable your values actually are - you can be an antirealist but accept that some fairly strong degree of convergence does occur in practice, for whatever reason. This:
target criteria can differ between people, and are often underdetermined
seems to imply that you don't think there will be much convergence practically, or that we should feel a strong pressure to reach high-level agreement on moral questions because such a project is never going to succeed.
I think this is part of the motivation for your 'case for suffering focussed ethics' - even though any asymmetry between preventing suffering and producing happiness falls victim to the absurd conclusion and paralysis argument, I'm assuming that this wouldn't bother you much.
I talk about why, regardless of whether realism is true, I think this is an unstable position in that post.
This discussion continues to feel like the most productive discussion I've had with a moral realist! :)
Glad to be of help! I feel like I'm learning a lot.
What would you reply if the AI uses the same structure of arguments against other types of normative realism as it uses against moral realism? This would amount to the following trilemma for proponents of irreducible normativity (using section headings from my text)
...
(3) Is there a speaker-independent normative reality?
Focussing on epistemic facts, the AI could not make that argument. I assumed that you had the AI lack the concept of epistemic reasons because you agreed with me that there is no possible argument out of using this concept, if you start out with the concept, not because you just felt that it would have been too much of a detour to have the AI explain why it finds the concept incoherent.
I think I agree with all of this, but I'm not sure, because we seem to draw different conclusions. In any case, I'm now convinced I should have written the AI's dialogue a bit differently. You're right that the AI shouldn't just state that it has no concept of irreducible normative facts. It should provide an argument as well!
How would this analogous argument go? I'll take the AI's key point and reword it to be speaking about epistemic facts instead of moral facts
AI: To motivate the use of irreducibly normative concepts, philosophers often point to instances of universal agreement on epistemic propositions. Sammy Martin uses the example “we always have a reason to believe that 2+2=4.” Your intuition suggests that all epistemic propositions work the same way. Therefore, you might conclude that even for propositions philosophers disagree over, there exists a solution that’s “just as right” as “we always have a reason to believe that 2+2=4” is right. However, you haven’t established that all epistemic statements work the same way—that was just an intuition. “we always have a reason to believe that 2+2=4” describes something that people are automatically disposed to believe. It expresses something that normally-disposed people come to endorse by their own lights. That makes it a true fact of some kind, but it’s not necessarily an “objective” or “speaker-independent” fact. If you want to show beyond doubt that there are epistemic facts that don’t depend on the attitudes held by the speakers—i.e., epistemic facts beyond what people themselves will judge to be what you should believe —you’d need to deliver a stronger example. But then you run into the following dilemma: If you pick a self-evident epistemic proposition, you face the critique that the “epistemic facts” that you claim exist are merely examples of a subjectivist epistemology. By contrast, if you pick an example proposition that philosophers can reasonably disagree over, you face the critique that you haven’t established what it could mean for one party to be right. If one person claims we have reason to believe that alien life exists, and another person denies this, how would we tell who’s right? What is the question that these two parties disagree on? Thus far, I have no coherent account of what it could mean for an epistemic theory to be right in the elusive, objectivist sense that Martin and other normative realists hold in mind.
Bob: I think I followed that. You mentioned the example of uncontroversial epistemic propositions, and you seemed somewhat dismissive about their relevance? I always thought those were pretty interesting. Couldn’t I hold the view that true epistemic statements are always self-evident? Maybe not because self-evidence is what makes them true, but because, as rational beings, we are predisposed to appreciate epistemic facts?
AI: Such an account would render epistemology very narrow. Incredibly few epistemic propositions appear self-evident to all humans. The same goes for whatever subset of “well-informed” or “philosophically sophisticated” humans you may want to construct.
It doesn't work, does it? The reason it doesn't work is that the scenario in which the AI is written where it 'concluded' that 'incredibly few epistemic propositions appear self-evident to all humans' is unimaginable. What would it mean for this to be true, what would the world have to be like?
I think the points in (3) apply to all domains of normativity, and they show that unless we come up with some other way to make normative concepts work that I haven't yet thought of, we are forced to accept that normative concepts, in order to be action-guiding and meaningful, have to be linked to claims about convergence in human expert reasoners.
I do not believe it is logically impossible that expert reasoners could diverge on all epistemic facts, but I do think that it is in some fairly deep sense impossible. For there to be such a divergence, reality itself would have to be unknowable.
The 'speaker-independent normative reality' that epistemic facts refer to is just actual objective reality - of all the potential epistemic facts out there, the one that actually corresponds to reality is the one that 'sticks out' in exactly the way that a speaker-independent normative reality should.
This means that there is no possible world where anyone with the concept of epistemic facts gets convinced, probabilistically, because they fail to see any epistemic convergence, that there are no epistemic facts. There would never be such a lack of convergence.
So my initial point,
The AI is in the latter camp, but not because of evidence, the way that it's a moral anti-realist (...However, you haven’t established that all normative statements work the same way—that was just an intuition...), but just because it's constructed in such a way that it lacks the concept of an epistemic reason.
So, if this AI is constructed such that irreducibly normative facts about how to reason aren't comprehensible to it, it only has access to argument 1), which doesn't work. It can't imagine 2).
still stands - that the AI is a normative anti-realist because it doesn't have the concept of a normative reason, not because it has the concept and has decided that it probably doesn't apply (and there was no alternative way for you to write the AI reaching that conclusion).
The trilemma applies here as well. Saying that it must apply still leaves you with the task of making up your mind on how normative concepts even work. I don't see alternatives to my suggestions (1), (2) and (3).
So I take option (3), where the 'extremely strong convergence' on claims about epistemic facts about what we should believe implies with virtual certainty that there is a speaker-independent normative reality, because the reality-corresponding collection of epistemic claims, in fact, stick out compared to all the other possible epistemic facts.
So, maybe the 'normativity argument' as I called it is really just another convergence argument, but just a convergence argument that is of infinite or near-infinite strength, because the convergence among our beliefs about what is epistemically justified is so strong that it's effectively unimaginable that they couldn't converge.
If you wish to deny that epistemic facts are needed to explain the convergence, I think that you end up in quite a strong form of pragmatism about truth, and give up on the notion of knowing anything about mind-independent objective reality, Kant-style, for reasons that I discuss here. That's quite a bullet to bite. You don't expect much convergence on epistemic facts, so maybe you are already a pragmatist about truth?
"Since we probably agree that there is a lot of convergence among expert reasoners on epistemic facts, we shouldn't be too surprised if morality works similarly."
And I kind of agree with that, but I don't know how much convergence I would expect in epistemology. (I think it's plausible that it would be higher than for morality, and I do agree that this is an argument to at least look really closely for ways of bringing about convergence on moral questions.)
Lastly,
My confidence that convergence won't work is based on not only observing disagreements in fundamental intuitions, but also on seeing why people disagree, and seeing that these disagreements are sometimes "legitimate" because ethical discussions always get stuck in the same places (differences in life goals, which is intertwined with axiology).
I'll have to wait for your more specific arguments on this topic! I did give some preliminary discussion here of why, for example, I think that you're dragged towards a total-utilitarian view whether you like it or not. It's also important to note that the convergence arguments aren't (principally) about people, but about possible normative theories - people might refuse to accept the implications of their own beliefs.
I thought that this post would make a bigger deal of the UK's coronavirus response - currently top in the world for both vaccine development and large-scale clinical trials, and one of the leading funders of international vaccine development research.
How to make anti-realism existentially satisfying
Instead of “utilitarianism as the One True Theory,” we consider it as “utilitarianism as a personal, morally-inspired life goal...
”While this concession is undoubtedly frustrating, proclaiming others to be objectively wrong rarely accomplished anything anyway. It’s not as though moral disagreements—or disagreements in people’s life choices—would go away if we adopted moral realism.
If your goal here is to convince those inclined towards moral realism to see anti-realism as existentially satisfying, I would recommend a different framing of it. I think that framing morality as a 'personal life goal' makes it seem as though it is much more a matter of choice or debate than it in fact is, and will probably ring alarm bells in the mind of a realist and make them think of moral relativism.
Speaking as someone inclined towards moral realism, the most inspiring presentations I've ever seen of anti-realism are those given by Peter Singer in The Expanding Circle and Eliezer Yudkowsky in his metaethics sequence. Probably not by coincidence - both of these people are inclined to be realists. Eliezer said as much, and Singer later became a realist after reading Parfit. Eliezer Yudkowsky on 'The Meaning of Right':
The apparent objectivity of morality has just been explained—and not explained away. For indeed, if someone slipped me a pill that made me want to kill people, nonetheless, it would not be right to kill people. Perhaps I would actually kill people, in that situation—but that is because something other than morality would be controlling my actions.
Morality is not just subjunctively objective, but subjectively objective. I experience it as something I cannot change. Even after I know that it's myself who computes this 1-place function, and not a rock somewhere—even after I know that I will not find any star or mountain that computes this function, that only upon me is it written—even so, I find that I wish to save lives, and that even if I could change this by an act of will, I would not choose to do so. I do not wish to reject joy, or beauty, or freedom. What else would I do instead? I do not wish to reject the Gift that natural selection accidentally barfed into me.
And Singer in the Expanding Circle:
“Whether particular people with the capacity to take an objective point of view actually do take this objective viewpoint into account when they act will depend on the strength of their desire to avoid inconsistency between the way they reason publicly and the way they act.”
These are both anti-realist claims. They define 'right' descriptively and procedurally as arising from what we would want to do under some ideal circumstances, and rigidifies on the output of that idealization, not on what we want. To a realist, this is far more appealing than a mere "personal, morally-inspired life goal", and has the character of 'external moral constraint', even if it's not really ultimately external, but just the result of immovable or basic facts about how your mind will, in fact work, including facts about how your mind finds inconsistencies in its own beliefs. This is a feature, not a bug:
According to utilitarianism, what people ought to spend their time on depends not on what they care about but also on how they can use their abilities to do the most good. What people most want to do only factors into the equation in the form of motivational constraints, constraints about which self-concepts or ambitious career paths would be long-term sustainable. Williams argues that this utilitarian thought process alienates people from their actions since it makes it no longer the case that actions flow from the projects and attitudes with which these people most strongly identify...
The exact thing that Williams calls 'alienating' is the thing that Singer, Yudkowsky, Parfit and many other realists and anti-realists consider to be the most valuable thing about morality! But you can keep this 'alienation' if you reframe morality as being the result of the basic, deterministic operations of your moral reasoning, the same way you'd reframe epistemic or practical reasoning on the anti-realist view. Then it seems more 'external' and less relativistic.
One thing this framing makes clearer, which you don't deny but don't mention, is that anti-realism does not imply relativism.
In that case, normative discussions can remain fruitful. Unfortunately, this won’t work in all instances. There will be cases where no matter how outrageous we find someone’s choices, we cannot say that they are committing an error of reasoning.
What we can say, on anti-realism as characterised by Singer and Yudkowsky, is that they are making an error of morality. We are not obligated (how could we be?) towards relativism, permissiveness or accepting values incompatible with our own on anti-realism. Ultimately, you can just say that 'I am right and you are wrong'.
That's one of the major upsides of anti-realism to the realist - you still get to make universal, prescriptive claims and follow them through, and follow them through because they are morally right, and if people disagree with you then they are morally wrong and you aren't obligated to listen to their arguments if they arise from fundamentally incompatible values. Put that way, anti-realism is much more appealing to someone with realist inclinations.
You've given me a lot to think about! I broadly agree with a lot of what you've said here.
I think that it is a more damaging mistake to think moral antirealism is true when realism is true than vice versa, but I agree with you that the difference is nowhere near infinite, and doesn't give you a strong wager.
However, I do think that normative anti-realism is self-defeating, assuming you start out with normative concepts (though not an assumption that those concepts apply to anything). I consider this argument to be step 1 in establishing moral realism, nowhere near the whole argument.
Epistemic anti-realism
Cool, I'm happy that this argument appeals to a moral realist! ....
...I don't think this argument ("anti-realism is self-defeating") works well in this context. If anti-realism is just the claim "the rocks or free-floating mountain slopes that we're seeing don't connect to form a full mountain," I don't see what's self-defeating about that...
To summarize: There's no infinitely strong wager for moral realism.
I agree that there is no infinitely strong wager for moral realism. As soon as moral realists start making empirical claims about the consequences of realism (that convergence is likely), you can't say that moral realism is true necessarily or that there is an infinitely strong prior in favour of it. An AI that knows that your idealised preferences don't cohere could always show up and prove you wrong, just as you say. If I were Bob in this dialogue, I'd happily concede that moral anti-realism is true.
If (supposing it were the case) there were not much consensus on anything to do with morality ("The rocks don't connect..."), someone who pointed that out and said 'from that I infer that moral realism is unlikely' wouldn't be saying anything self-defeating. Moral anti-realism is not self-defeating, either on its own terms or on the terms of a 'mixed view' like I describe here:
We have two competing ways of understanding how beliefs are justified. One is where we have anti-realist 'justification' for our beliefs, in purely descriptive terms, the other in which there are mind-independent facts about which of our beliefs are justified...
However, I do think that there is an infinitely strong wager in favour of normative realism and that normative anti-realism is self-defeating on the terms of a 'mixed view' that starts out considering the two alternatives like that given above. This wager is because of the subset of normative facts that are epistemic facts.
The example that I used was about 'how beliefs are justified'. Maybe I wasn't clear, but I was referring to beliefs in general, not to beliefs about morality. Epistemic facts, e.g. that you should believe something if there is sufficient amount of evidence, are a kind of normative fact. You noted them on your list here.
So, the infinite wager argument goes like this -
1) On normative anti-realism there are no facts about which beliefs are justified. So there are no facts about whether normative anti-realism is justified. Therefore, normative anti-realism is self-defeating.
Except that doesn't work! Because on normative anti-realism, the whole idea of external facts about which beliefs are justified is mistaken, and instead we all just have fundamental principles (whether moral or epistemic) that we use but don't question, which means that holding a belief without (the realist's notion of) justification is consistent with anti-realism.
So the wager argument for normative realism actually goes like this -
2) We have two competing ways of understanding how beliefs are justified. One is where we have anti-realist 'justification' for our beliefs, in purely descriptive terms of what we will probably end up believing given basic facts about how our minds work in some idealised situation. The other is where there are mind-independent facts about which of our beliefs are justified. The latter is more plausible because of 1).
Evidence for epistemic facts?
I find it interesting the imagined scenario you give in #5 essentially skips over argument 2) as something that is impossible to judge:
AI: Only in a sense I don’t endorse as such! We’ve gone full circle. I take it that you believe that just like there might be irreducibly normative facts about how to do good, the same goes for irreducible normative facts about how to reason?
Bob: Indeed, that has always been my view.
AI: Of course, that concept is just as incomprehensible to me.
The AI doesn't give evidence against there being irreducible normative facts about how to reason, it just states it finds the concept incoherent, unlike the (hypothetical) evidence that the AI piles on against moral realism (for example, that people's moral preferences don't cohere).
Either you think some basic epistemic facts have to exist for reasoning to get off the ground and therefore that epistemic anti-realism is self-defeating, or you are an epistemic anti-realist and don't care about the realist's sense of 'self-defeating'. The AI is in the latter camp, but not because of evidence, the way that it's a moral anti-realist (...However, you haven’t established that all normative statements work the same way—that was just an intuition...), but just because it's constructed in such a way that it lacks the concept of an epistemic reason.
So, if this AI is constructed such that irreducibly normative facts about how to reason aren't comprehensible to it, it only has access to argument 1), which doesn't work. It can't imagine 2).
However, I think that we humans are in a situation where 2) is open to consideration, where we have the concept of a reason for believing something, but aren't sure if it applies - and if we are in that situation, I think we are dragged towards thinking that it must apply, because otherwise our beliefs wouldn't be justified.
However, this doesn't establish moral realism - as you said earlier, moral anti-realism is not self-defeating.
If anti-realism is just the claim "the rocks or free-floating mountain slopes that we're seeing don't connect to form a full mountain," I don't see what's self-defeating about that
Combining convergence arguments and the infinite wager
If you want to argue for moral realism, then you need evidence for moral realism, which comes in the form of convergence arguments. But the above argument is still relevant, because the convergence and 'infinite wager' arguments support each other.
The reason 2) would be bolstered by the success of convergence arguments (in epistemology, or ethics, or any other normative domain) is that convergence arguments increase our confidence that normativity is a coherent concept - which is what 2) needs to work. It certainly seems coherent to me, but this cannot be taken as self-evident since various people have claimed that they or others don't have the concept.
I also think that 2) is some evidence in favour of moral realism, because it undermines some of the strongest antirealist arguments.
By contrast, for versions of normativity that depend on claims about a normative domain’s structure, the partners-in-crime arguments don’t even apply. After all, just because philosophers might—hypothetically, under idealized circumstances—agree on the answers to all (e.g.) decision-theoretic questions doesn’t mean that they would automatically also find agreement on moral questions.[29] On this interpretation of realism, all domains have to be evaluated separately
I don't think this is right. What I'm giving here is such a 'partners-in-crime' argument with a structure, with epistemic facts at the base. Realism about normativity certainly should lower the burden of proof on moral realism to prove total convergence now, because we already have reason to believe normative facts exist. For most anti-realists, the very strongest argument is the 'queerness argument' that normative facts are incoherent or too strange to be allowed into our ontology. The 'partners-in-crime'/'infinite wager' undermines this strong argument against moral realism. So some sort of very strong hint of a convergence structure might be good enough - depending on the details.
I agree that it then shifts the arena to convergence arguments. I will discuss them in posts 6 and 7.
So, with all that out of the way, when we start discussing the convergence arguments, the burden of proof on them is not colossal. If we already have reason to suspect that there are normative facts out there, perhaps some of them are moral facts. But if we found a random morass of different considerations under the name 'morality' then we'd be stuck concluding that there might be some normative facts, but maybe they are only epistemic facts, with nothing else in the domain of normativity.
I don't think this is the case, but I will have to wait until your posts on that topic - I look forward to them!
All I'll say is that I don't consider strongly conflicting intuitions in e.g. population ethics to be persuasive reasons for thinking that convergence will not occur. As long as the direction of travel is consistent, and we can mention many positive examples of convergence, the preponderance of evidence is that there are elements of our morality that reach high-level agreement. (I say elements because realism is not all-or-nothing - there could be an objective 'core' to ethics, maybe axiology, and much ethics could be built on top of such a realist core - that even seems like the most natural reading of the evidence, if the evidence is that there is convergence only on a limited subset of questions.) If Kant could have been a utilitarian and never realised it, then those who are appalled by the repugnant conclusion could certainly converge to accept it after enough ideal reflection!
Belief in God, or in many gods, prevented the free development of moral reasoning. Disbelief in God, openly admitted by a majority, is a recent event, not yet completed. Because this event is so recent, Non-Religious Ethics is at a very early stage. We cannot yet predict whether, as in Mathematics, we will all reach agreement. Since we cannot know how Ethics will develop, it is not irrational to have high hopes.
But instilling the urgency to do so may require another type of writing-that of science fiction, of more creative visionaries who are willing to paint in vivid detail a picture of what a flourishing human future could be.
If it's emotive force you're after, you may be interested in this - Toby Ord just released a collection of quotations on Existential risk and the future of humanity, everyone from Kepler to Winston Churchill (in fact, a surprisingly large number are from Churchill) to Seneca to Mill to the Aztecs - it's one of the most inspirational things I have ever read, and makes it clear that there have always been people who cared about humanity as a whole. My all-time favourite is probably this by the philosopher Derek Parfit:
Life can be wonderful as well as terrible, and we shall increasingly have the power to make life good. Since human history may be only just beginning, we can expect that future humans, or supra-humans, may achieve some great goods that we cannot now even imagine. In Nietzsche’s words, there has never been such a new dawn and clear horizon, and such an open sea.
If we are the only rational beings in the Universe, as some recent evidence suggests, it matters even more whether we shall have descendants or successors during the billions of years in which that would be possible. Some of our successors might live lives and create worlds that, though failing to justify past suffering, would have given us all, including those who suffered most, reasons to be glad that the Universe exists.
Parfit isn't quite a non-naturalist (or rather, he's a very unconventional kind of non-naturalist, not a Platonist) - he's a 'quietist'. Essentially, it's the view that there are normative facts, they aren't natural facts, but we don't feel the need to say what category they fall into metaphysically, or that such a question is meaningless.
I think a variant of that, where we say 'we don't currently have a clear idea what they are, just some hints that they exist because of normative convergence, and the internal contradictions of other views' is plausible:
This is something substantive that can be said - out of every major attempt to get at a universal ethics that has in fact been attempted in history: what produces the best outcome, what can you will to be a universal law, what would we all agree on, seem to produce really similar answers.
The particular convergence arguments given by Parfit and Hare are a lot more complex, I can't speak to their overall validity. If we thought they were valid then we'd be seeing the entire mountain precisely. Since they just seem quite persuasive, we're seeing the vague outline of something through the fog, but that's not the same as just spotting a few free-floating rocks.
Now, run through these same convergence arguments but for decision theory and utility theory, and you have a far stronger conclusion. there might be a bit of haze at the top of that mountain, but we can clearly see which way the slope is headed.
This is why I think that ethical realism should be seen as plausible and realism about some normative facts, like epistemic facts, should be seen as more plausible still. There is some regularity here in need of explanation, and it seems somewhat more natural on the realist framework.
This is an interesting post, and I have a couple of things to say in response. I'm copying over the part of my shortform that deals with this:
Normative Realism by degrees
Further to the whole question of Normative / moral realism, there is this post on Moral Anti-Realism. While I don't really agree with it, I do recommend reading it - one thing that it convinced me of is that there is a close connection between your particular normative ethical theory and moral realism. If you claim to be a moral realist but don't make ethical claims beyond 'self-evident' ones like pain is bad, given the background implausibility of making such a claim about mind-independent facts, you don't have enough 'material to work with' for your theory to plausibly refer to anything. The Moral Anti-Realism post presents this dilemma for the moral realist:
There are instances where just a handful of examples or carefully selected “pointers” can convey all the meaning needed for someone to understand a far-reaching and well-specified concept. I will give two cases where this seems to work (at least superficially) to point out how—absent a compelling object-level theory—we cannot say the same about “normativity.”
...these thought experiments illustrate that under the right circumstances, it’s possible for just a few carefully selected examples to successfully pinpoint fruitful and well-specified concepts in their entirety. We don’t have the philosophical equivalent of a background understanding of chemistry or formal systems... To maintain that normativity—reducible or not—is knowable at least in theory, and to separate it from merely subjective reasons, we have to be able to make direct claims about the structure of normative reality, explaining how the concept unambiguously targets salient features in the space of possible considerations. It is only in this way that the ambitious concept of normativity could attain successful reference. As I have shown in previous sections, absent such an account, we are dealing with a concept that is under-defined, meaningless, or forever unknowable.
The challenge for normative realists is to explain how irreducible reasons can go beyond self-evident principles and remain well-defined and speaker-independent at the same time.
To a large degree, I agree with this claim - I think that many moral realists do as well. Convergence type arguments often appear in more recent metaethics (Hare and Parfit are in those previous lists) - so this may already have been recognised. The post discusses such a response to antirealism at the end:
I titled this post “Against Irreducible Normativity.” However, I believe that I have not yet refuted all versions of irreducible normativity. Despite the similarity Parfit’s ethical views share with moral naturalism, Parfit was a proponent of irreducible normativity. Judging by his “climbing the same mountain” analogy, it seems plausible to me that his account of moral realism escapes the main force of my criticism thus far.
But there's one point I want to make which is in disagreement with that post. I agree that how much you can concretely say about your supposed mind-independent domain of facts affects how plausible its existence should seem, and even how coherent the concept is, but I think that this can come by degrees. This should not be surprising - we've known since Quine and Kripke that you can have evidential considerations for/against and degrees of uncertainty about a priori questions. The correct method in such a situation is Bayesian - tally the plausibility points for and against admitting the new thing into your ontology. This can work even if we don't have an entirely coherent understanding of normative facts, as long as it is coherent enough.
Suppose you're an Ancient Egyptian who knows a few practical methods for trigonometry and surveying, doesn't know anything about formal systems or proofs, and someone asks you if there are 'mathematical facts'. You would say something like "I'm not totally sure what this 'maths' thing consists of, but it seems at least plausible that there are some underlying reasons why we keep hitting on the same answers". You'd be less confident than a modern mathematician, but you could still give a justification for the claim that there are right and wrong answers to mathematical claims. I think that the general thrust of convergence arguments puts us in a similar position with respect to ethical facts.
If we think about how words obtain their meaning, it should be apparent that in order to defend this type of normative realism, one has to commit to a specific normative-ethical theory. If the claim is that normative reality sticks out at us like Mount Fuji on a clear summer day, we need to be able to describe enough of its primary features to be sure that what we’re seeing really is a mountain. If all we are seeing is some rocks (“self-evident principles”) floating in the clouds, it would be premature to assume that they must somehow be connected and form a full mountain.
So, we don't see the whole mountain, but nor are we seeing simply a few free-floating rocks that might be a mirage. Instead, what we see is maybe part of one slope and a peak.
Let's be concrete, now - the 5 second, high level description of both Hare's and Parfit's convergence arguments goes like this:
If we are going to will the maxim of our action to be a universal law, it must be, to use the jargon, universalizable. I have, that is, to will it not only for the present situation, in which I occupy the role that I do, but also for all situations resembling this in their universal properties, including those in which I occupy all the other possible roles. But I cannot will this unless I am willing to undergo what I should suffer in all those roles, and of course also get the good things that I should enjoy in others of the roles. The upshot is that I shall be able to will only such maxims as do the best, all in all, impartially, for all those affected by my action. And this, again, is utilitarianism.
and
An act is wrong just when such acts are disallowed by some principle that is optimific, uniquely universally willable, and not reasonably rejectable
In other words, the principles that (whatever our particular wants) would produce the best outcome in terms of satisfying our goals, could be willed to be a universal law by all of us and would not be rejected as the basis for a contract, are all the same principles. That is at least suspicious levels of agreement between ethical theories. This is something substantive that can be said - out of every major attempt to get at a universal ethics that has in fact been attempted in history: what produces the best outcome, what can you will to be a universal law, what would we all agree on, seem to produce really similar answers.
The particular convergence arguments given by Parfit and Hare are a lot more complex, I can't speak to their overall validity. If we thought they were valid then we'd be seeing the entire mountain precisely. Since they just seem quite persuasive, we're seeing the vague outline of something through the fog, but that's not the same as just spotting a few free-floating rocks.
Now, run through these same convergence arguments but for decision theory and utility theory, and you have a far stronger conclusion. there might be a bit of haze at the top of that mountain, but we can clearly see which way the slope is headed.
This is why I think that ethical realism should be seen as plausible and realism about some normative facts, like epistemic facts, should be seen as more plausible still. There is some regularity here in need of explanation, and it seems somewhat more natural on the realist framework.
I agree that this 'theory' is woefully incomplete, and has very little to say about what the moral facts actually consist of beyond 'the thing that makes there be a convergence', but that's often the case when we're dealing with difficult conceptual terrain.
From Ben's post:
I wouldn’t necessarily describe myself as a realist. I get that realism is a weird position. It’s both metaphysically and epistemologically suspicious. What is this mysterious property of “should-ness” that certain actions are meant to possess -- and why would our intuitions about which actions possess it be reliable? But I am also very sympathetic to realism and, in practice, tend to reason about normative questions as though I was a full-throated realist.
From the perspective of x, x is not self-defeating
From the antirealism post, referring to the normative web argument:
It’s correct that anti-realism means that none of our beliefs are justified in the realist sense of justification. The same goes for our belief in normative anti-realism itself. According to the realist sense of justification, anti-realism is indeed self-defeating.
However, the entire discussion is about whether the realist way of justification makes any sense in the first place—it would beg the question to postulate that it does.
Sooner or later every theory ends up question-begging.
From the perspective of Theism, God is an excellent explanation for the universe's existence since he is a person with the freedom to choose to create a contingent entity at any time, while existing necessarily himself. From the perspective of almost anyone likely to read this post, that is obvious nonsense since 'persons' and 'free will' are not primitive pieces of our ontology, and a 'necessarily existent person' makes as much sense as 'necessarily existent cabbage'- so you can't call it a compelling argument for the atheist to become a theist.
By the same logic, it is true that saying 'anti-realism is unjustified on the realist sense of justification' is question-begging by the realist. The anti-realist has nothing much to say to it except 'so what'. But you can convert that into a Quinean, non-question begging plausibility argument by saying something like:
We have two competing ways of understanding how beliefs are justified. One is where we have anti-realist 'justification' for our beliefs, in purely descriptive terms, the other in which there are mind-independent facts about which of our beliefs are justified, and the latter is a more plausible, parsimonious account of the structure of our beliefs.
This won't compel the anti-realist, but I think it would compel someone weighing up the two alternative theories of how justification works. If you are uncertain about whether there are mind-independent facts about our beliefs being justified, the argument that anti-realism is self-defeating pulls you in the direction of realism.
Hi Ben,
Thanks for the reply! I think the intuitive core that I was arguing for is more-or-less just a more detailed version of what you say here:
"If we create AI systems that are, broadly, more powerful than we are, and their goals diverge from ours, this would be bad -- because we couldn't stop them from doing things we don't want. And it might be hard to ensure, as we're developing increasingly sophisticated AI systems, that there aren't actually subtle but extremely important divergences in some of these systems' goals."
The key difference is that I don't think orthogonality thesis, instrumental convergence or progress being eventually fast are wrong - you just need extra assumptions in addition to them to get to the expectation that AI will cause a catastrophe.
My point in this comment (and follow up) was that the Orthogonality Thesis, Instrumental Convergence and eventual fast progress are essential for any argument about AI risk, even if you also need other assumptions in there - you need to know the OT will apply to your method of developing AI, you need more specific reasons to think the particular goals of your system look like those that lead to instrumental convergence.
If you approached the classic arguments with that framing, then perhaps it begins to look like less a matter of them being mistaken and more a case of having a vague philosophical picture that then got filled in with more detailed considerations - that's how I see the development over the last 10 years.
The only mistake was in mistaking the vague initial picture for the whole argument - and that was a mistake, but it's not the same kind of mistake as just having completely false assumptions. You might compare it to the early development of a new scientific field. Perhaps seeing it that way might lead you to have a different view about how much to update against trusting complicated conceptual arguments about AI risk!
"AI safety and alignment issues exist today. In the future, we'll have crazy powerful AI systems with crazy important responsibilities. At least the potential badness of safety and alignment failures should scale up with these systems' power and responsibility. Maybe it'll actually be very hard to ensure that we avoid the worst-case failures."
This is how Stuart Russell likes to talk about the issue, and I have a go at explaining that line of thinking here.
You said in the podcast that the drop was 'an order of magnitude', so presumably your original estimate was 1-10%? I note that this is similar to Toby Ord's in The Precipice (~10%) so perhaps that should be a good rule of thumb: if you are convinced by the classic arguments your estimate of existential catastrophe from AI should be around 10% and if you are unconvinced by specific arguments, but still think AI is likely to become very powerful in the next century, then it should be around 1%?
Hi Ben - this episode really gave me a lot to think about! Of the 'three classic arguments' for AI X-risk you identify, I argued in a previous post that the 'discontinuity premise' is based on taking a high-level argument that should be used to establish that sufficiently capable AI will produce very fast progress too literally and assuming the 'fast progress' has to happen suddenly and in a specific AI.
Your discussion of the other two arguments led me to conclude that the same sort of mistake is at work in all of them, as I explain here - each is (I think) a case of 'directly applying a (correct) abstract argument (incorrectly) to the real world'. So we shouldn't say that the classic arguments are wrong, just overextended/incorrectly applied, as I argue here.
If rapid capability gain, the orthogonality thesis and instrumental convergence are good reasons to suggest AI might pose an existential risk, but were just interpreted too literally, and it's also true that the 'new' arguments make use of these old arguments along with further premises and evidence, then that should raise our confidence that some basic issues have been correctly dealt with since the 2000s. You suggest something like this in the podcast episode, but the discussion never got far into exactly what the underlying intuitions might be:
Ben Garfinkel: And so I think if you find yourself in a position like that, with regard to mathematical proof, it is reasonable to be like, “Well, okay. So like this exact argument isn’t necessarily getting the job done when it’s taken at face value”. But maybe I still see some of the intuitions behind the proof. Maybe I still think that, “Oh okay, you can actually like remove this assumption”. Maybe you actually don’t need it. Maybe we can swap this one out with another one. Maybe this gap can actually be filled in.
Do you think there actually is an 'intuitive core' to the old arguments that is correct?