Critical Review of 'The Precipice': A Reassessment of the Risks of AI and Pandemics

post by Fods12 · 2020-05-11T11:11:13.231Z · score: 77 (44 votes) · EA · GW · 32 comments

Contents

  Introduction
  Why probability estimates matter
  Engineered Pandemics
    Extinction level agent exists
    Extinction level agent technologically feasible
    Extinction level agent produced and delivered
    Failure of timely public policy response
    Failure of timely biomedical response
  Unaligned Artificial Intelligence
    AI experts and AI timelines
    AI has the power to usurp humanity
    AI has reason to usurp humanity
    AI retains permanent control over humanity
  Probability Estimates
    Probability of engineered pandemics
    Probability of unaligned artificial intelligence
    Arriving at credible estimates
  Conclusion
None
32 comments

Introduction

In this essay I will present a critical response to Toby Ord’s recent book The Precipice (page numbers refer to the soft cover version of this book). Rather than attempting to address all of the many issues discussed by Ord, I will focus on what I consider to be one of the most critical claims of the book. Namely, Ord claims that the present century is a time of unprecedented existential risk, that “we stand at a crucial moment in the history of our species” (p. 3), a situation which is “unsustainable” (p. 4). Such views are encapsulated in Ord’s estimate of the probability of an existential catastrophe over the next century, which he places at one in six. Of this roughly seventeen percent chance, he attributes roughly ten percentage points to the risks posed by unaligned artificial intelligence, and another three percentage points to the risks posed by engineered pandemics, with most of the rest of the risk is due to unforeseen and ‘other’ anthropogenic risks (p. 167). In this essay I will focus on the two major sources of risk identified by Ord, artificial intelligence and engineered pandemics. I will consider the analysis presented by Ord, and argue that by neglecting several critical considerations, Ord dramatically overestimates the magnitude of the risks from these two sources. This short essay is insufficient to provide a full justification for all of my views about these risks. Instead, my aim is to highlight some of what I believe to be the major flaws and omissions of Ord’s account, and also to outline some of the key considerations that I believe support a significantly lower assessment of the risks.

Why probability estimates matter

Before analysing the details of Ord’s claims about the risks of engineered pandemics and unaligned artificial intelligence, I will first explain why I think it is important to establish as accurate as possible estimates of the magnitude of these existential risks. After all, it could be argued that even if the risks are significantly less than those presented by Ord, nevertheless the risks are still far higher than we would like them to be, and causes such as unaligned AI and engineered pandemics are clearly neglected and require much more attention than they currently receive. As such, does it really matter what precise probabilities we assign to these risks? I believe it does matter, for a number of reasons.

First, Ord’s core thesis in his book is that humanity faces a ‘precipice’, a relatively short period of time with uniquely high and unsustainable levels of existential risk. To substantiate this claim, Ord needs to show not just that existential risks are high enough to warrant our attention, but that existential risk is much higher now than in the past, and that the risks are high enough to represent a ‘precipice’ at which humanity stands at the edge. Ord articulates this in the following passage:

“If I’m even roughly right about their (the risks’) scale, then we cannot survive many centuries with risk like this. It is an unsustainable level of risk. Thus, one way or another, this period is unlikely to last more than a small number of centuries. Either humanity takes control of its destiny and reduces the risk to a sustainable level, or we destroy ourselves.” (p. 31)

Critical here is Ord’s linkage of the scale of the risk with our inability to survive many centuries of this scale of risk. He goes on to argue that this is what leads to the notion of a precipice:

This comparatively brief period is a unique challenge in the history of our species... Historians of the future will name this time, and schoolchildren will study it. But I think we need a name now. I call it the Precipice. The Precipice gives our time immense meaning. (p. 31)

Given these passages, it is clear that there is a direct connection between the magnitude of the existential risks over the next century or so, and the existence of a ‘precipice’ that uniquely defines our time as historically special. This is a distinct argument from the weaker claim that existential risks are far higher than we should be comfortable with, and that more should be done to reduce them. My argument in this essay is that the main sources of the abnormally high risk identified by Ord, namely engineered pandemics and unaligned artificial intelligence, do not pose nearly as high a risk as Ord contends, and therefore his argument that the present period constitutes a ‘precipice’ is unpersuasive.

Second, I think precise estimates of the probabilities matter because there is a very long history of predicting the end of the world (or the end of civilisation, or other existential catastrophes), so the baseline for accuracy of such claims is poor. As such it seems reasonable to exercise some scepticism and caution when evaluating such claims, and ensure that they are based on sufficiently plausible evidence and reasoning to be taken seriously. This is also important for convincing others of such risks, as exaggeration of risks to humanity is very common, and is likely to reduce the credibility of those attempting to raise awareness of such risks. Ord makes a similar argument when he advises:

Don’t exaggerate the risks. There is a natural tendency to dismiss claims of existential risk as hyperbole. Exaggerating the risks plays into that, making it much harder for people to see that there is sober, careful analysis amidst the noise. (p. 213)

Third, I think that accurate estimates of probabilities of different forms of existential risk are important because it helps us to align our efforts and resources in proportion to the amount of risk posed by different causes. For example, if one type of risk is estimated to pose one hundred times as much risk as another, this implies a different distribution of efforts compared to if both causes posed roughly comparable amounts of risk. Ord makes this argument as follows:

This variation (in risk) makes it extremely important to prioritise our efforts on the right risks. And it also makes our estimate of the total risk very sensitive to the estimates of the top few risks (which are among the least well understood). So getting better understanding and estimates for those becomes a key priority. (p. 168)

As such, I believe it is important to carefully consider the probability of various proposed existential risk scenarios. In the subsequent two sections I will consider risks of engineered pandemics and unaligned artificial intelligence.

Engineered Pandemics

Extinction level agent exists

One initial consideration that must be addressed is how likely it is that any biological pathogen can even kill enough people to drive humanity to extinction. This places an upper limit on what any biotechnology could achieve, regardless of how advanced. Note that here I am referring to an agent such as a virus or bacterium that is clearly biological in nature, even if it is engineered to be more deadly than any naturally-occurring pathogen. I am not including entities that are non-biological in nature, such as artificial nanotechnology or other chemical agents. Whilst it is impossible to determine the ultimate limits of biology, one relevant point of comparison is the most deadly naturally-occurring infectious disease. To my knowledge, the highest fatality rate for any infectious biological agent that is readily transmissible between living humans is the Zaire ebolavirus, with a fatality rate of around 90%. It is unclear whether such a high fatality rate would be sustained outside of the social and climactic environment of West Africa whence the disease originated, but nevertheless we can consider this to be a plausible baseline for the most deadly known human infectious pathogen. Critically, it appears unlikely that the death of even 90% of the world population would result in the extinction of humanity. Death rates of up to 50% during the Black Death in Europe do not appear to have even come close to causing civilisational collapse in that region, while population losses of up to 90% in Mesoamerica over the course of the invasion and plagues of the 16th century did not lead to the end of civilization in those regions (though social and political disruption during these events were massive).

If we think the minimal viable human population is roughly 7,000 (which is near the upper end of the figures cited by Ord (p. 41), though rounded for simplicity), then a pathogen would need to directly or indirectly lead to the deaths of more than 99.9999% of the current world population in order to lead to human extinction. One could argue that the pathogen would only need to directly cause a much smaller number of deaths, with the remaining deaths caused by secondary disruptions such as war or famine. However to me this seems very unlikely, considering that such a devastating pathogen would significantly impair the ability of nations to wage war, and it is hard to see how warfare would affect all areas of the globe sufficiently to bring about such significant population loss. Global famine also seems unlikely, given that the greater the number of pandemic deaths, the more food stores would be available to survivors. Perhaps the most devastating scenario would be a massive global pandemic followed by a full-scale nuclear war, though it is unclear why should a nuclear exchange would follow a pandemic. One can of course devise various hypothetical scenarios, but overall it appears to me that a pathogen would have to have an extremely high fatality rate in order to have the potential to cause human extinction.

In addition to a high fatality rate, an extinction-level pathogen would also have to be sufficiently infectious such that it would be able to spread rapidly through human populations. It would need to have a long enough incubation time such that infected persons can travel and infect more people before they can be identified and quarantined. It would also need to be able to survive and propagate in a wide range of temperatures and climactic conditions. Finally, it would also need to be sufficiently dangerous to a wide range of ages and genetic populations, since any pockets of immunity would render extinction considerably less likely. Overall, it is highly unclear whether any biological agent with all these properties is even possible. In particular, pathogens which are sufficiently virulent to cause 99% or more fatality rates are likely to place such a burden on human physiology such that they would have a short incubation time, potentially rendering it easier to quarantine infected persons. Of course we do not know what is possible at the limits of biology, but given the extreme properties required of such an extinction-level pathogen, in my view it is very unlikely that such a pathogen is even possible.

Extinction level agent technologically feasible

Even if biological agents with the potential of wiping out humanity are theoretically possible, the question remains as to how long it will be until it becomes technologically feasible to engineer such an agent. While our current scientific understanding places significant limitations on what can be engineered, Ord argues that “it is not twentieth-century bioweaponry that should alarm us, but the next hundred years of improvements” (p. 133), which indicates that he believes that biotechnological advances over the next century are likely to enable the creation of a much wider range of dangerous biological agents. Of course, it is impossible to know how rapidly such technology will develop in the coming decades, however I believe that Ord overstates the current capabilities of such technology, and underestimates the challenges in developing pathogens of dramatically greater lethality than existing natural agents.

For example, Ord states that it is possible to “create entire functional viruses from their written code” (p. 128). I believe this claim is misleading, especially when read alongside Ord’s concern about ease of obtaining synthesised DNA, as it can potentially be read as asserting that viruses can be created using entirely synthetic means using only their DNA. This is false, as the methods cited by Ord describe techniques in which synthesised viral DNA is cultured cellular extracts, which as Ord also notes is not trivial and requires careful technique (p. 359). This approach still relies critically on utilising the ribosomes and other cellular machinery to translate viral DNA and produce the needed viral proteins. It does not involve the degree of control or understanding of the precise molecular processes involved that would be implied if an intact virus could be produced from its DNA using entirely synthetic means.

Ord also cites the 2012 experiments of Ron Fouchier, who conducted a gain-of-function experiment with H5N1 influenza in ferrets. Ord states that “by the time it passed to the final ferret, his strain of H5N1 had become directly transmissible between mammals” (p. 129). While technically correct, I believe this claim is misleading, since only a few sentences prior Ord states that this strain of influenza had an estimated 60% mortality rate in humans, implying that this would also apply to an airborne variant of the same virus. However in Fouchier’s study, it is reported that “although the six ferrets that became infected via respiratory droplets or aerosol also displayed lethargy, loss of appetite, and ruffled fur, none of these animals died within the course of the experiment.” Furthermore, the mere possibility of airborne transmission says nothing about the efficiency of this transmission mechanism. As reported in the paper:

Although our experiments showed that A/H5N1 virus can acquire a capacity for airborne transmission, the efficiency of this mode remains unclear. Previous data have indicated that the 2009 pandemic A/H1N1 virus transmits efficiently among ferrets and that naïve animals shed high amounts of virus as early as 1 or 2 days after exposure. When we compare the A/H5N1 transmission data with that of [another paper]..., the data shown in Figs. 5 and 6 suggest that A/H5N1 airborne transmission was less robust, with less and delayed virus shedding compared with pandemic A/H1N1 virus.

These qualifications illustrate the fundamental point that most biological systems exist as a set of tradeoffs and balances between competing effects and conflicting needs. Thus changing one aspect of a pathogen, such as its mode of transmission, is likely to have effects on other aspects of the pathogen, such as its lethality, incubation period, susceptibility to immune system attack, or survival outside a host. In theory it may be possible to design a pathogen with properties optimised to be as lethal to humans as possible, but doing so would require far greater understanding of protein folding pathways, protein-protein interactions, gene expression, mechanisms of pathogen invasion, immune system evasion strategies, and other such factors than is currently possessed. Thus it is by no means clear that Ord is correct when he states that “this progress in biotechnology seems unlikely to fizzle out soon: there are no insurmountable challenges looming; no fundamental laws blocking further developments” (p. 128). Indeed, I believe there are many fundamental challengers and gaps in our understanding which prevent the development of pathogens with arbitrarily specified properties.

Extinction level agent produced and delivered

Even if was technologically possible to produce a pathogen capable of causing human extinction, the research, production, and distribution of such an infectious agent would still actually need to be carried out by an organisation with the capabilities and desire to do so. While Ord’s example of the Aum Shinrikyo cult does demonstrate that such groups exist, the very small number of such attacks historically appears to indicate that such groups do not exist in large numbers. Very few ideologies have an interest in bringing humanity to an end through violent means. Indeed as Ord notes:

For all our flirtation with biowarfare, there appear to have been relatively few deaths from either accidents or use... Exactly why this is so is unclear. One reason may be that bioweapons are unreliable and prone to backfiring, leading states to use other weapons in preference. (p. 132)

Ord partially counters this observation by arguing that the severity of events such as terrorist attacks and incidents of biowarfare follow a power law distribution, with very rare, very high impact events meaning that the average size of past events will underestimate the expected size of future events. However this response does not seem to address the core observation that bioweapons have proven very hard to control, and that very few agents or organisations have any interest in unleashing a pathogen that kills humans indiscriminately. This appears to be reflected in the fact that as far as is publicly known, very few attempts have even been made to deploy such weapons in modern times. I thus believe that we have good reason to think that the number of people and amount of effort devoted to developing such dangerous bioweapons is likely to be low, especially for non-state actors.

Furthermore, Ord fails to consider the practical difficulties of developing and releasing a pathogen sufficiently deadly to cause human extinction. In particular, developing a novel organism would require lengthy research and extensive testing. Even if all the requisite supplies, technology, and expertise over a period of time could be obtained without arousing enough suspicion for the project to be investigated and shut down, there still remains the challenge of how such a pathogen could be tested. No animal model is perfect, and so any novel pathogen would (just like vaccines and other medical treatments) need to be tested on large numbers of human subjects, and likely adjusted in response to results. It would need to be trialed in different environments and climates to determine whether it would spread sufficiently rapidly and survive outside a host long enough. Without such tests, it is virtually impossible that an untested novel pathogen would be sufficiently optimised to kill enough people across a wide enough range of environments to cause human extinction. However, it is hard to see how it would be possible to carry out such widespread testing with a diverse enough range of subjects without drawing the attention of authorities.

A rogue state such as North Korea might be able to circumvent this particular problem, however that raises as range of new difficulties, such as why it would ever be in the interest of a state actor (as opposed to a death cult terrorist group) to develop such a deadly, indiscriminate pathogen. Ord raises the possibility of its use as a deterrent (akin to the deterrence function of nuclear weapons), but the analogy does not appear to hold up. Nuclear weapons work as a deterrent because their possession can be publicly demonstrated (by testing), their devastating impact is widely known, and there is no practical defence against them. None of these properties are true of an extremely lethal novel pathogen. A rogue state would have great difficulty proving that they possessed such a weapon without actually making available enough information about the pathogen, such that the world would likely be able to develop countermeasures to that particular pathogen. As such, it does not appear feasible to use bioweapons as effective deterrents, which may partly explain why despite extensive research into the possibility, no states have yet used them in this manner. As a result of these considerations, I conclude that even if it were technologically possible to develop a pathogen sufficiently lethal to cause human extinction, it is unlikely that anyone would actually have both the desire and the ability to successfully produce and deliver the pathogen.

Failure of timely public policy response

The release of a pathogen that has the potential to cause human extinction in itself does not imply that human extinction would inevitably occur. Whether this would follow depends on the extent of the governmental and societal responses to the outbreak of the novel pandemic, such as quarantines, widespread testing, and contact tracing. In considering the balance of positive and negative effects that organisational and civilization advances have had on the ability to respond to the risk of pathogens, Ord states that “it is hard to know whether these combined effects have increased or decreased the existential risk from pandemics” (p. 127). This argument, however, seems implausible, since deaths from infectious diseases and pandemics in particular have decreased in recent centuries, with no major pandemics in Western Europe since the early eighteenth century. The disappearance of plague from Western Europe, while still not well understood, plausibly may have been caused at least in part by the improvement of quarantine and public policy responses to plague. In the US, the crude death rate from infectious diseases fell by about 90% over the course of the twentieth century. Furthermore, a successful public policy response to a pathogen outbreak in even a single country would likely be enough to prevent extinction, even if most countries failed to enact a sufficient public policy response. As such, I believe it is unlikely that even an extinction-level novel pathogen would be able to sufficiently evade all public health responses so as to cause human extinction.

Failure of timely biomedical response

In addition to the failure of public policy responses, extinction of humanity by a novel pathogen would also require the failure of any biomedical response to the pandemic. Ord believes that as biological techniques become easier and cheaper, they become accessible to more and more people, and hence represent a greater and greater risk. He argues:

As the pool of people with access to a technique grows, so does the chance it contains someone with malign intent. (p. 134)

This argument, however, appears to only consider one side of the issue. As the pool of people with access to a technique grows, so too does the number of people who wish to use that technique to do good. This includes developing techniques and technologies for more easily detecting, controlling, and curing infectious diseases. It surprises me that Ord never mentions this, since the development of biomedical technologies does not only mean that there is greater scope for use of the technology to cause disease, but also greater scope for use new techniques to prevent and cure disease. Indeed, since the prevention of disease receives far more research attention that causing disease, it seems reasonable to assume that our abilities to development treatments, tests, and vaccines for diseases will develop more rapidly than our abilities to cause disease. There are a range of emerging biomedical technologies that promise to greater improve our ability to fight existing and novel diseases, including transmissible vaccines, rational design of drugs, and reverse vaccinology. As such, I regard it unlikely that if biomedical technology had advanced sufficiently to be able to produce an extinction-level pathogen, it would nevertheless fail to develop sufficient countermeasures to the pathogen to at least prevent full human extinction.

Unaligned Artificial Intelligence

AI experts and AI timelines

Although Ord appeals to surveys of AI researchers as evidence of the plausibility of the development of superhuman artificial intelligence in the next century, experts in artificial intelligence do not have a good track record of predicting future progress in AI. Massively inflated expectations of the capabilities of symbolic AI systems in the 1950s and 1960s, and of expert systems in the 1980s, are well-known examples of this. More generally, it is unclear why we should even expect AI researchers to have any particular knowledge about the future trajectories of AI capabilities. Such researchers study and develop particular statistical and computational techniques to solve specific types of problems. I am not aware of any focus of their training on extrapolating technological trends, or in investigations historical case studies of technological change. Indeed, it would seem that cognitive psychologists or cognitive neuroscientists might be better placed (although probably still not very well placed) to make judgements about the boundaries of human capability and what would be required for these to be exceeded in a wide range of tasks, since AI researchers have no particular expertise in the limits of human ability. AI researchers generally only consider human-level performance in the context of baseline levels of performance on well-defined tasks such as image recognition, categorisation, or game-playing. This is far removed from being able to make judgements about when AIs would be able to outperform humans on ‘every task’. For example, do AI researchers really have any expertise on when AIs are likely to overtake human ability to do philosophy, serve as political leaders, compose a novel, or teach high school mathematics? These are simply not questions that are studied by AI researchers, and therefore I don’t see any reason why they should be regarded as having special knowledge about them. These concerns are further emphasised by the inconsistency of researcher responses to AI timeline surveys:

Asked when an AI system would be ‘able to accomplish every task better and more cheaply than human workers, on average they estimated a 50 percent change of this happening by 2061. (p. 141)

However in a footnote Ord notes:

Note also that this estimate may be quite unstable. A subset of the participants were asked a slightly different question instead (emphasising the employment consequences by talking of all occupations instead of all tasks). Their time by which there would be a 50% chance of this standard being met was 2138, with a 10% chance of it happening as early as 2036. (p. 362)

Another factor highly pertinent to establishing the relevant set of experts concerns how the current topics researched by AI researchers relate to the eventual set of methods and techniques eventually used in building an AGI. Ord seems to think that developments of current methods may be sufficient to develop AGI:

One of the leading paradigms for how we might eventually create AGI combines deep learning with an earlier idea called reinforcement learning. (p. 143)

However such current methods, in particular deep learning, are known to be subject to a wide range of limitations. Major concerns include the ease with which adversarial examples can be used to ‘fool’ networks into misclassifying basic stimuli, the lack of established methods for integrating syntactically-structured information with neural networks, the fact that deep learning is task-specific and does not generalise well, the inability of deep learning systems to develop human-like ‘understanding’ that permits robust inferences about the world, and the requirement for very large datasets for deep learning algorithms to be trained on. While it remains possible that all these limitations may be overcome in the future, at present they represent deep theoretical limitations of current methods, and as such I see little reason to expect they can be overcome without the development of substantially new and innovative concepts and techniques. If this is current, then there seems little reason to expect that AI researchers to have any expertise in predicting when such developments are likely to take place. AI researchers study current techniques, but if (as I have argued) such techniques are fundamentally inadequate for the development of true AGI, then such expertise is of limited relevance in assessing plausible AI timelines.

One argument that Ord gives in apparent support of the notion that current methods may in principle be sufficient for the development of AGI relates to the success of using deep neural networks and reinforcement learning to train artificial agents to play Atari games:

The Atari-playing systems learn and master these games directly from the score and the raw pixels on the screen. They are a proof of concept for artificial general agents: learning to control the world from raw visual input; achieving their goals across a diverse range of environments. (p. 141)

I believe this is a gross overstatement. While these developments are impressive, they in no way provide a proof of concept for ‘artificial general agents’, anymore than programs developed in the 1950s and 1960s to solve grammatical or geometric problems in simple environments provided such a proof of concept. Atari games are highly simplified environments with comparatively few degrees of freedom, the number of possible actions is highly limited, and where a clear measure of success (score) is available. Real-world environments are extremely complicated, with a vast number of possible actions, and often no clear measure of success. Uncertainty also plays little direct role in Atari games, since a complete picture of the current gamespace is available to the agent. In the real world, all information gained from the environment is subject to error, and must be carefully integrated to provide an approximate model of the environment. Given these considerations, I believe that Ord overstates how close we currently are to achieving superhuman artificial intelligence, and understates the difficulties that scaling up current techniques would face in attempting to achieve this goal.

AI has the power to usurp humanity

Ord argues that artificial intelligence that was more intelligent than humans would be able to usurp humanity’s position as the most powerful species on Earth:

What would happen if sometime this century researchers created an artificial general intelligence surpassing human abilities in almost every domain? In this act of creation, we would cede our status as the most intelligent entities on Earth. So without a very good plan to keep control, we should also expect to cede our status as the most powerful species, and the one that controls its own destiny. (p. 143)

The assumption behind this claim appears to be that intelligence alone is the critical determining factor behind which species or entity maintains control over Earth’s resources and future. This premise, however, conflicts with what Ord says earlier in the book:

What set us (humanity) apart was not physical, but mental - our intelligence, creativity, and language...each human’s ability to cooperate with the dozens of other people in their band was unique among large animals. (p. 12)

Here Ord identifies not only intelligence, but also creativity and ability to cooperate with others as critical to the success of humanity. This seems consistent with the fact that human intelligence, as far as can be determined, has not fundamentally changed over the past 10,000 years, even while our power and capabilities have dramatically increased. Obviously, what has changed is our ability to cooperate at much larger scales, and also our ability to build upon the achievements of previous generations to gradually increase our knowledge, and build up more effective institutions and practices. Given these considerations, it seems far from obvious to me that there mere existence of an agent more intelligent than an individual human would have the ability to usurp humanity’s position. Indeed, Ord’s own examples seem to further emphasise this point:

History already involves examples of individuals with human-level intelligence (Hitler, Stalin, Genghis Khan) scaling up from the power of an individual to a substantial fraction of all global power. (p. 147)

Whilst we have no clear data on the intelligence of these three individuals, what does seem clear is that none of them achieved the positions they did by acts of profound intellect. They were capable men, with Stalin in particular being very widely read, and Hitler known to have a sharp memory for technical details, nevertheless they were far from being the greatest minds of their generation. Nor did they achieve their positions by ‘scaling up’ from an individual to world superpower. I think it is more accurate to say that they used their individual talents (military leadership for Genghis Khan, administrative ability and political scheming for Stalin, and oratory and political scheming for Hitler) to gain control over existing power structures (respectively Mongol tribes, the Soviet government, and the German government). They did not build these things from scratch themselves (though Genghis Khan did establish a unified Mongol state, so comes closer than the others), but were able to hijack existing systems and convince enough people to follow their leadership. These skills may be regarded as a subset of a very broad notion of intelligence, but do not seem to correspond very closely at all to the way we normally use the word ‘intelligence’, nor do they seem likely to be the sorts of things AIs would be very good at doing.

Lacking a physical body to interact with people, it is hard to see how an AI could inspire the same levels of loyalty and fear that these three leaders (and many others like then) relied upon in their subordinates and followers. Of course, AIs could manipulate humans to do this job for them, but this would raise an immense difficulty of ensuring that their human pawns do not usurp their authority, which would be very difficult if all the humans that the AI is attempting to control do not actually have any personal loyalty for the AI itself. Perhaps the AI could pit multiple humans against one another and retain control over them in this manner (indeed that is effectively what Hitler did with his subordinates), however doing so generally requires some degree of trust and loyalty on behalf of one’s subordinates to be sustainable. Such methods are also very difficult to manage (such as the need to prevent plots by subordinates against the leader), and place clear limits on how effectively the central ruler can personally control everything. Of course one could always say ‘if an AI is intelligent enough it can solve these problems’, but my argument is precisely that it is not at all clear to me that ‘intelligence’ is even the key factor determining success. A certain level of intelligence is needed, but various forms of subtle interpersonal skills distinct from intelligence seem far more important in acquiring and maintaining their positions, skills which a non-embodied AI would face particular difficulty in acquiring.

Overall, I am not convinced that the mere existence of a highly-intelligent AI would imply anything about the ability of that AI to acquire significant power over humanity. Gaining power requires much more than individual intelligence, but also the ability to coordinate large numbers of people, to exercise creativity, to inspire loyalty, to build upon past achievements, and many others. I am not saying that an AI could not do these things, only that they would not automatically be able to do these things by being very intelligent, nor would these things necessarily be able to be done very quickly.

AI has reason to usurp humanity

Although Ord’s general case for concern about AI does not appeal to any specific vision for what AI might look like, an analysis of the claims that he makes indicates that his arguments are mostly relevant to a specific type of agent based on reinforcement learning. He says:

One of the leading paradigms for how we might eventually create AGI combines deep learning with an earlier idea called reinforcement learning... unfortunately, neither of these methods can be easily scaled up to encode human values in the agent’s reward function. (p. 144)

While Ord presents this as merely a ‘leading paradigm’, subsequent discussion appears to assume that an AI would likely embody this paradigm. For example he remarks:

An intelligent agent would also resist attempts to change its reward function to something more aligned with human values. (p. 145)

Similarly he argues:

The real issue is that AI researchers don’t yet know how to make a system which, upon noticing this misalignment, updates its ultimate values to align with ours rather than updating its instrumental goals to overcome us. (p. 146)

While this seems plausible in the case of a reinforcement learning agent, it seems far less clear that it would apply to another form of AI. In particular, it is not even clear if humans actually posses anything that corresponds to a ‘reward function’, nor is it clear that such a thing is immutable with experience or over the lifespan. To assume that an AI would have such a thing therefore is to make specific assumptions about the form such an AI would take. This is also apparent when Ord argues:

It (the AI) would seek to acquire additional resource, computational, physical or human, as these would let it better shape the world to receive higher reward. (p. 145)

Again, this remark seems explicitly to assume that the AI is maximising some kind of reward function. Humans often act not as maximisers but as satisficers, choosing an outcome that is good enough rather than searching for the best possible outcome. Often humans also act on the basis of habit or following simple rules of thumb, and are often risk averse. As such, I believe that to assume that an AI agent would be necessarily maximising its reward is to make fairly strong assumptions about the nature of the AI in question. Absent these assumptions, it is not obvious why an AI would necessarily have any particular reason to usurp humanity.

Related to this question about the nature of AI motivations, I was surprised that (as far as I could find) Ord says nothing about the possible development of artificial intelligence through the avenue of whole brain emulation. Although currently infeasible, simulation of the neural activity of an entire human brain is potential route to AI which requires only very minimal theoretical assumptions, and no major conceptual breakthroughs. A low-level computer simulation of the brain would only require sufficient scanning resolution to measure neural connectivity and parameters of neuron physiology, and sufficient computing power to run the simulation in reasonable time. Plausible estimates have been made which indicate that extrapolating from current trends, such technologies are likely to be developed by the second half of this century. Although it is by no means certain, I believe it is likely that whole brain emulation will be achievable before it is possible to build a general artificial intelligence using techniques that do not attempt to directly emulate the biology of the brain. This potentially results in a significantly different analysis of the potential risks than that presented by Ord. In particular, while misaligned values still represent a problem for emulated intelligences, we do at least possess an in-principle method for aligning their values, namely the same sort of socialisation that is used with general success in aligning the values of the next generation of humans. As a result of such considerations, I am not convinced that it is especially likely that an artificial intelligence would have any particular reason or motivation to usurp humanity over the next century.

AI retains permanent control over humanity

Ord seems to assume that once an AI attained a position of power over the destiny of humanity, it would inevitably maintain this position indefinitely. For instance he states:

Such an outcome needn’t involve the extinction of humanity. But it could easily be an existential catastrophe nonetheless. Humanity would have permanently ceded its control over the future. Our future would be at the mercy of how a small number of people set up the computer system that took over. If we are lucky, this could leave us with a good or decent outcome, or we could just as easily have a deeply flawed or dystopian future locked in forever. (p. 148)

In this passage Ord speaks of the AI as it if is simply a passive tool, something that is created and forever after follows its original programming. Whilst I do not say this is impossible, I believe that it is an unsatisfactory way to describe an entity that is supposedly a superintelligent agent, something capable of making decisions and taking actions on the basis of its own volition. Here I do not mean to imply anything about the nature of free will, only that we do not regard the behaviour of humans as simply the product of what evolution has ‘programmed into us’. While it must be granted that evolutionary forces are powerful in shaping human motivations and actions, nevertheless the range of possible sets of values, social arrangements, personality types, life goals, beliefs, and habits that is consistent with such evolutionary forces is extremely broad. Indeed, this is presupposed by Ord’s claim that “humanity is currently in control of its own fate. We can choose our future.” (p. 142).

If humanity’s fate is in our own hands and not predetermined by evolution, why should we not also say that the fate of a humanity dominated by an AI would be in the hands of that AI (or collective of AIs that share control), rather than in the hands of the designers who built that AI? The reason I think this is important is that it highlights the fact that an AI-dominated future is by no means one in which the AI’s goals, beliefs, motivations, values, or focus is static and unchanging. To assume otherwise is to assume that the AI in question takes a very specific form which, as I have argued above, I regard as being unlikely. This significantly reduces the likelihood that a current negative outcome with AI represents a permanent negative outcome. Of course, this is irrelevant if the AI has driven humans to extinction, but it becomes highly relevant in other situations in which an AI has placed humans in an undesirable, subservient position. I am not convinced that such a situation would be perpetuated indefinitely.

Probability Estimates

Taking into consideration the analysis I have presented above, I would like to close by presenting some estimates of my best guess of the probability of an existential catastrophe occurring within the next century by an engineered pandemic and unaligned artificial intelligence. These estimates should not be taken very seriously. I do not believe we have enough information to make sensible quantitative estimates about these eventualities. Nevertheless, I present my estimates largely in order to illustrate the extent of my disagreement with Ord’s estimates, and to illustrate the key considerations I examine in order to arrive at an estimate.

Probability of engineered pandemics

Considering the issue of how an engineered pandemic could lead to the extinction of humanity, I identify five separate things that must occur, which to a first approximation I will regard as being conditionally independent of one another:

1. There must exist a biological pathogen with the right balance of properties to have the potential of leading to human extinction.

2. It must become technologically feasible within the next century to evolve or engineer this pathogen.

3. The extinction-level agent must be actually produced and delivered by some person or organisation.

4. The public policy response to the emerging pandemic must fail in all major world nations.

5. Any biomedical response to the pandemic, such as developing tests, treatments, or vaccines, must fail to be developed within sufficient time to prevent extinction.

On the basis of the reasoning presented in the previous sections, I regard 1) as very unlikely, 2), 4), and 5) as unlikely, and 3) as slightly less unlikely. I will operationalise ‘very unlikely’ as corresponding to a probability of 1%, ‘unlikely’ as corresponding to 10%, and the ‘slightly less likely’ as 20%. Note each of these probabilities is taken as conditional on all the previous elements; so for example my claim is that conditional on an extinction-level pathogen being possible, there is a 10% chance that it will be technologically feasible to produce this pathogen within the next century. Combining all these elements results in the following probability:

P(bio extinction) = P(extinction level agent exists) x P(extinction level agent technologically feasible) x P(extinction level agent produced and delivered) x P(failure of timely public policy response) x P(failure of timely biomedical response)

P(bio extinction) = 0.01×0.1×0.2×0.1×0.1 = 2×10^(-6)

In comparison, Ord’s estimated risk from engineered pandemics is 1/30, or 3×10^(-2). Ord’s estimated risk is thus roughly 10,000 times larger than mine.

Probability of unaligned artificial intelligence

Considering the issue of unaligned artificial intelligence, I identify four key stages that would need to happen for this to occur, which again I will regard to first approximation as being conditionally independent of one another:

1. Artificial general intelligence, or an AI which is able to out-perform humans in essentially all human activities, is developed within the next century.

2. This artificial intelligence acquires the power to usurp humanity and achieve a position of dominance on Earth.

3. This artificial intelligence has a reason/motivation/purpose to usurp humanity and achieve a position of dominance on Earth.

4. This artificial intelligence either brings about the extinction of humanity, or otherwise retains permanent dominance over humanity in a manner so as to significantly diminish our long-term potential.

On the basis of the reasoning presented in the previous sections, I regard 1) as roughly as likely as not, and 2), 3), and 4) as being unlikely. Combining all these elements results in the following probability:

P(AI x-risk) = P(AI of sufficient capability is developed) x P(AI gains power to usurp humanity) x P(AI has sufficient reason to usurp humanity) x P(AI retains permanent usurpation of humanity)

P(AI x.risk) = 0.5×0.1×0.1×0.1=5×10^(-4)

In comparison, Ord’s estimated risk from unaligned AI is 1/10, or 10^-1 . Ord’s estimated risk is roughly 200 times larger than mine.

Arriving at credible estimates

Although I do not think the specific numbers I present should be taken very seriously, I would like to defend the process I have gone through in estimating these risks. Specifically, I have identified the key processes I believe would need to occur in order for extinction or other existential catastrophe to occur, and then assessed how likely each of these processes would be to occur on the basis of the relevant historical, scientific, social, and other considerations that I believe to be relevant. I then combine these probabilities to produce an overall estimate.

Though far from perfect, I believe this process if far more transparent than the estimates provided by Ord, for which no explanation is offered as to how they were derived. This means that it is effectively impossible to subject them to critical scrutiny. Indeed, Ord even states that his probabilities “aren’t simply an encapsulation of the information and argumentation in the chapters on the risks” (p. 167), which seems to imply that it is not even possible to subject them to critical analysis on the basis of the information present in this book. While he defends this on the basis that what he knows about the risks “goes beyond what can be distilled into a few pages” (p. 167), I do not find this a very satisfactory response given the total lack of explanation of these numbers in a book of over 400 pages.

Conclusion

In this essay I have argued that in his book The Precipice, Toby Ord has failed to provide a compelling argument that humanity faces a ‘precipice’ with unprecedentedly high and clearly unsustainable levels of existential risk. My main objective was to present an alternative analysis of the risks associated with engineered pandemics and unaligned artificial intelligence, highlighting issues and considerations that I believe Ord does not grant sufficient attention. Furthermore, on the basis of this analysis I presented an alternative set of probability estimates for these two risks, both of which are considerably lower than those presented by Ord. While far from comprehensive or free from debatable premises, I hope that the approach I have outlined here provides a different perspective on the debate, and helps in the development of a nuanced understanding of these important issues.

32 comments

Comments sorted by top scores.

comment by rohinmshah · 2020-05-11T15:58:59.940Z · score: 65 (35 votes) · EA(p) · GW(p)
Though far from perfect, I believe this process if far more transparent than the estimates provided by Ord, for which no explanation is offered as to how they were derived. This means that it is effectively impossible to subject them to critical scrutiny.

I want to note that I agree with this, and I think it's good for people to write down their explicit reasoning.

That said, I disagree pretty strongly with the section on AI.

More generally, it is unclear why we should even expect AI researchers to have any particular knowledge about the future trajectories of AI capabilities. Such researchers study and develop particular statistical and computational techniques to solve specific types of problems. I am not aware of any focus of their training on extrapolating technological trends, or in investigations historical case studies of technological change.

I don't see why people keep saying this. Given the inconsistent expert responses to surveys, I think it makes sense to say that AI researchers probably aren't great at predicting future trajectories of AI capabilities. Nonetheless, if I had no inside-view knowledge and I wanted to get a guess at AI timelines, I'd ask experts in AI. (Even now, after people have spent a significant amount of time thinking about AI timelines, I would not ask experts in trend extrapolation; I seriously doubt that they would know which trends to extrapolate without talking to AI researchers.)

I suppose you could defend a position of the form "we can't know AI timelines", but it seems ridiculous to say "we can't know AI timelines, therefore AGI risk is low".

However such current methods, in particular deep learning, are known to be subject to a wide range of limitations. [...] at present they represent deep theoretical limitations of current methods

I disagree. So do many of the researchers at OpenAI and DeepMind, who are explicitly trying to build AGI using deep learning, reinforcement learning, and similar techniques. Meanwhile, academics tend to agree. I think from an outside view this should be maybe a 2x hit to the probability of developing AGI soon (if you start from the OpenAI / DeepMind position).

Atari games are highly simplified environments with comparatively few degrees of freedom, the number of possible actions is highly limited, and where a clear measure of success (score) is available. Real-world environments are extremely complicated, with a vast number of possible actions, and often no clear measure of success. Uncertainty also plays little direct role in Atari games, since a complete picture of the current gamespace is available to the agent. In the real world, all information gained from the environment is subject to error, and must be carefully integrated to provide an approximate model of the environment.

All of these except for the "clear measure of success" have already been surmounted (see OpenAI Five or AlphaStar for example). I'd bet that we'll see AI systems based on deep imitation learning and related techniques that work well in domains without a clear measure of success within the next 5 years. There definitely are several obstacles to general AI systems, but these aren't the obstacles.

These skills may be regarded as a subset of a very broad notion of intelligence, but do not seem to correspond very closely at all to the way we normally use the word ‘intelligence’, nor do they seem likely to be the sorts of things AIs would be very good at doing.

... Why wouldn't AIs be good at doing these things? It seems like your main point is that AI will lack a physical body and so will be bad at social interactions, but I don't see why an AI couldn't have social interactions from a laptop screen (just like the rest of us in the era of COVID-19).

More broadly, if you object to the implication "superintelligence implies ability to dominate the world", then just take whatever mental property P you think does allow an agent to dominate the world; I suspect both Toby and I would agree with "there is a non-trivial chance that future AI systems will be superhuman at P and so would be able to dominate the world".

While this seems plausible in the case of a reinforcement learning agent, it seems far less clear that it would apply to another form of AI. In particular, it is not even clear if humans actually posses anything that corresponds to a ‘reward function’, nor is it clear that such a thing is immutable with experience or over the lifespan. To assume that an AI would have such a thing therefore is to make specific assumptions about the form such an AI would take.

I agree with this critique of Toby's argument; I personally prefer the argument given in Human Compatible, which roughly goes:

  • Almost every AI system we've created so far (not just deep RL systems) have some predefined, hardcoded, certain specification that the AI is trying to optimize for.
  • A superintelligent agent pursuing a known specification has convergent instrumental subgoals (the thing that Toby is worried about).
  • Therefore, if we want superintelligent AI systems that don't have these problems, we need to change how AI is done.

This doesn't tell you the probability with which superintelligent AI has convergent instrumental subgoals, since maybe we were always going to change how AI is done, but it does show why you might expect the "default assumption" to be an AI system that has convergent instrumental subgoals, instead of one that is more satisficing like humans are.

the fate of a humanity dominated by an AI would be in the hands of that AI (or collective of AIs that share control)

This seems true to me, but if an AI system was so misaligned as to subjugate humans, I don't see why you should be hopeful that future changes in its motivations lead to it not subjugating humans. It's possible, but seems very unlikely (< 1%).

I regard 1) as roughly as likely as not

Isn't this exactly the same as Toby's estimate? (I actually don't know, I have a vague sense that this is true and was stated in The Precipice.)

Probability of unaligned artificial intelligence

Here are my own estimates for your causal pathway:

1: 0.8

2 conditioned on 1: 0.05 (I expect that there will be an ecosystem of AI systems, not a single AI system that can achieve a decisive strategic advantage)

3 conditioned on 1+2: 0.3 (If there is a single AI system that has a DSA, probably it took us by surprise, seems less likely we solved the problem in that world)

4 conditioned on 1+2+3: 0.99

Which gives in total ~0.012, or about 1%.

But really, the causal pathway I would want involves a change to 2 and 3:

2+3: Some large fraction of the AI systems in the world have reason / motivation to usurp power, and by coordinating they are able to do it.

Then:

1: 0.8

2+3 conditioned on 1: 0.1 (with ~10% on "has the motivation to usurp power" and ~95% on "can usurp power")

4: 0.99

Which comes out to ~0.08, or 8%.

comment by richard_ngo · 2020-05-14T21:19:05.792Z · score: 15 (6 votes) · EA(p) · GW(p)
at present they represent deep theoretical limitations of current methods

+1 on disagreeing with this. It's not clear that there's enough deep theory of current methods for them to have deep theoretical limitations :P

More generally, I broadly agree with Rohin, but (as I think we've discussed) find this argument pretty dubious:

Almost every AI system we've created so far (not just deep RL systems) have some predefined, hardcoded, certain specification that the AI is trying to optimize for.
A superintelligent agent pursuing a known specification has convergent instrumental subgoals (the thing that Toby is worried about).
Therefore, if we want superintelligent AI systems that don't have these problems, we need to change how AI is done.

Convergent instrumental subgoals aren't the problem. Large-scale misaligned goals (instrumental or not) are the problem. Whether or not a predefined specification gives rise to those sorts of goals depends on the AI architecture and training process in a complicated way. Once you describe in more detail what it actually means for an AI system to "have some specification", the "certain" bit also stop seeming like a problem.

I'd like to refer to a better argument here, but unfortunately there is no source online that makes the case that AGI will be dangerous in a satisfactory way. I think there are enough pieces floating around in people's heads/private notes to make a compelling argument, but the fact that they haven't been collated publicly is a clear failure of the field.

comment by rohinmshah · 2020-05-14T22:45:22.677Z · score: 9 (5 votes) · EA(p) · GW(p)

We have discussed this, so I'll just give brief responses so that others know what my position is. (My response to you is mostly in the last section, the others are primarily explanation for other readers.)

Convergent instrumental subgoals aren't the problem. Large-scale misaligned goals (instrumental or not) are the problem.

I'm not entirely sure what you mean by "large-scale", but misaligned goals simply argues for "the agent doesn't do what you want". To get to "the agent kills everyone", you need to bring in convergent instrumental subgoals.

Once you describe in more detail what it actually means for an AI system to "have some specification", the "certain" bit also stop seeming like a problem.

The model of "there is an POMDP, it has a reward function, the specification is to maximize expected reward" is fully formal and precise (once you spell out the MDP and reward), and the optimal solution usually involves convergent instrumental subgoals.

Whether or not a predefined specification gives rise to those sorts of goals depends on the AI architecture and training process in a complicated way.

I'm assuming you agree with:

1. The stated goal of AI research would very likely lead to human extinction

I agree that it is unclear whether AI systems actually get anywhere close to optimal for the tasks we train them for. However, if you think that we will get AGI and be fine, but we'll continue to give certain specifications of what we want, it seems like you also have to believe:

2. We will build AGI without changing the stated goal of AI research

3. AI research will not achieve its stated goal

The combination of 2 + 3 seems like a strange set of beliefs to have. (Not impossible, but unlikely.)

comment by Max_Daniel · 2020-05-18T08:43:43.555Z · score: 20 (12 votes) · EA(p) · GW(p)

This discussion (incl. child comments) was one of the most interesting things I read in the last weeks, maybe months. - Thank you for having it publicly. :)

comment by richard_ngo · 2020-05-16T12:03:29.829Z · score: 17 (5 votes) · EA(p) · GW(p)
1. The stated goal of AI research would very likely lead to human extinction

I disagree pretty strongly with this. What does it even mean for a whole field to have a "stated goal"? Who stated it? Russell says in his book that "From the very beginnings of AI, intelligence in machines has been defined in the same way", but then a) doesn't give any citations or references to the definition he uses (I can't find the quoted definition online from before his book); and b) doesn't establish that building "intelligent machines" is the only goal of the field of AI. In fact there are lots of AI researchers concerned with fairness, accountability, transparency, and so on - not just intelligence. Insofar as those researchers aren't concerned about existential risk from AI, it's because they don't think it'll happen, not because they think it's somehow outside their remit.

Now in practice, a lot of AI researcher time is spent trying to make things that better optimise objective functions. But that's because this has been the hardest part so far - specification problems have just not been a big issue in such limited domains (and insofar as they are, that's what all the FATE researchers are working on). So this observed fact doesn't help us distinguish between "everyone in AI thinks that making AIs which intend to do what we want is an integral part of their mission, but that the 'intend' bit will be easy" vs "everyone in AI is just trying to build machines that can achieve hardcoded literal objectives even if it's very difficult to hardcode what we actually want". And without distinguishing them, then the "stated goal of AI" has no predictive power (if it even exists).

We'll continue to give certain specifications of what we want

What is a "certain specification"? Is training an AI to follow instructions, giving it strong negative rewards every time it misinterprets us, then telling it to do X, a "certain specification" of X? I just don't think this concept makes sense in modern ML, because it's the optimiser, not the AI, that is given the specification. There may be something to the general idea regardless, but it needs a lot more fleshing out, in a way that I don't think anyone has done.


More constructively, I just put this post online [AF · GW]. It's far from comprehensive, but it points at what I'm concerned about more specifically than anything else.

comment by rohinmshah · 2020-05-16T16:30:15.686Z · score: 10 (6 votes) · EA(p) · GW(p)
What is a "certain specification"?

I agree this is a fuzzy concept, in the same way that "human" is a fuzzy concept.

Is training an AI to follow instructions, giving it strong negative rewards every time it misinterprets us, then telling it to do X, a "certain specification" of X?

No, the specification there is to follow instructions. I am optimistic about these sorts of "meta" specifications; CIRL / assistance games can also be thought of as a "meta" specification to assist the human. But like, afaict this sort of idea has only recently become common in the AI community; I would guess partly because of people pointing out problems with the regular method of writing down specifications.

Broadly speaking, think of certain specifications as things that you plug in to hardcoded optimization algorithms (not learned ones which can have "common sense" and interpret you correctly).

I just don't think this concept makes sense in modern ML, because it's the optimiser, not the AI, that is given the specification.

If you use a perfect optimizer and train in the real world with what you would intuitively call a "certain specification", an existential catastrophe almost certainly happens. Given agreement on this fact, I'm just saying that I want a better argument for safety than "it's fine because we have a less-than-perfect optimizer", which as far as I can tell is ~the argument we have right now, especially since in the future we will presumably have better optimizers (where more compute during training is a type of better optimization).

More constructively, I just put this post online [AF · GW]. It's far from comprehensive, but it points at what I'm concerned about more specifically than anything else.

I also find that the most plausible route by which you actually get to extinction, but it's way more speculative (to me) than the arguments I'm using above.

So this observed fact doesn't help us distinguish between "everyone in AI thinks that making AIs which intend to do what we want is an integral part of their mission, but that the 'intend' bit will be easy" vs "everyone in AI is just trying to build machines that can achieve hardcoded literal objectives even if it's very difficult to hardcode what we actually want".

??? I agree that you can't literally rule the first position out, but I've talked to many people in AI, and the closest people get to this position is saying "well maybe the 'intend' bit will be easy"; I haven't seen anyone argue for it.

I feel like you're equivocating between what AI researchers want (obviously they don't want extinction) and what they actually do (things that, if extrapolated naively, would lead to extinction).

I agree that they will start (and have started) working on the 'intend' bit once it's important, but to my mind that means at that point they will have started working on the category of work that we call "AI safety". This is consistent with my statement above:

Therefore, if we want superintelligent AI systems that don't have these problems, we need to change how AI is done.

(We in that statement was meant to refer to humanity as a whole.)

And without distinguishing them, then the "stated goal of AI" has no predictive power (if it even exists).

I specifically said this was not a prediction for this reason:

This doesn't tell you the probability with which superintelligent AI has convergent instrumental subgoals, since maybe we were always going to change how AI is done

Nonetheless, it still establishes "AI safety work needs to be done by someone", which seems like the important bit.

Perhaps you think that to motivate work by EAs on AI safety, you need to robustly demonstrate that a) there is a problem AND b) the problem won't be solved by default. I think this standard eliminates basically all x-risk prevention efforts, because you can always say "but if it's so important, someone else will probably solve it" (a thing that I think is approximately true).

(I don't think this is actually your position though, because the same critique could be applied to your new post.)

comment by richard_ngo · 2020-05-18T03:45:42.244Z · score: 24 (6 votes) · EA(p) · GW(p)

If you use a perfect optimizer and train in the real world with what you would intuitively call a "certain specification", an existential catastrophe almost certainly happens. Given agreement on this fact, I'm just saying that I want a better argument for safety than "it's fine because we have a less-than-perfect optimizer"

I think this is the central point of disagreement. I agree that perfect optimisers are pathological. But we are not going to train anything that is within light-years of perfect optimisation. Perfect optimisation is a totally different type of thing [LW · GW] to what we're doing. This argument feels to me like saying "We shouldn't keep building bigger and bigger bombs because in the limit of size they'll form a black hole and destroy the Earth." It may be true that building sufficiently big bombs will destroy the earth, but the mechanism in the limit of size is not the relevant one, and is only very loosely analogous to the mechanism we're actually worried about. (In the case of AI, to be very explicit, I'm saying that inner misalignment is the thing which might kill us, and that outer misalignment of perfect optimizers is the thing that's only very loosely analogous to it. Outer misalignment of imperfect optimisers is somewhere in the middle).

The rest of this comment is more meta.

The reason I am particularly concerned about spreading arguments related to perfect optimisers is threefold. Firstly because it feels reminiscent of the utility-maximisation arguments made by Yudkowsky - in both cases the arguments are based on theoretical claims which are literally true but in practice irrelevant or vacuous. This is specifically what made the utility-maximisation argument so misleading, and why I don't want another argument of this type to gain traction.

Secondly because I think that five years ago, if you'd asked a top ML researcher why they didn't believe in the existing arguments for AI risk, they'd have said something like:

Well, the utility function thing is a trivial mathematical result. And the argument about paperclips is dumb because the way we train AIs is by giving them rewards when they do things we like, and we're not going to give them arbitrarily high rewards for building arbitrarily many paperclips. What if we write down the wrong specification? Well, we do that in RL but in supervised learning we use human-labeled data, so if there's any issue with written specifications we can use that approach.

I think that these arguments would have been correct rebuttals to the public arguments for AI risk which existed at that time. We may have an object-level disagreement about whether a top ML researcher would actually have said something like this, but I am now strongly inclined to give the benefit of the doubt to mainstream ML researchers when I try to understand their positions. In particular, if I were in their epistemic position, I'm not sure I would make specific arguments for why the "intends" bit will be easy either, because it's just the default hypothesis: we train things, then if they don't do what we want, we train them better.

Thirdly, because I am epistemically paranoid about giving arguments which aren't actually the main reason to believe in a thing. I agree that the post I linked is super speculative, but if someone disproved the core intuitions that the post is based on that'd make a huge dent in my estimates of AI risk. Whereas I suspect that the same is not really the case for you and the argument you give (although I feel a bit weird asserting things about your beliefs, so I'm happy to concede this point if you disagree). Firstly because (even disregarding my other objections) it doesn't establish that AI safety work needs to be done by someone, it just establishes that AI researchers have to avoid naively extrapolating their current work. Maybe they could extrapolate it in non-naive ways that doesn't look anything like safety work. "Don't continue on the naively extrapolated path" is often a really low bar, because naive extrapolations can be very dubious (if we naively extrapolate a baby's growth, it'll end up the size of the earth pretty quickly). Secondly because the argument is also true for image classifiers, since under perfect optimisation they could hack their loss functions. Insofar as we're much less worried about them than RL agents, most of the work needed to establish the danger of the latter must be done by some other argument. Thirdly because I do think that counterfactual impact is the important bit, not "AI safety work needs to be done by someone." I don't think there needs to be a robust demonstration that the problem won't be solved by default, but there do need to be some nontrivial arguments. In my scenario, one such argument is that we won't know what effects our labels will have on the agent's learned goals, so there's no easy way to pay more to get more safety. Other arguments that fill this role are appeals to fast takeoff, competitive pressures, etc.

I specifically said this was not a prediction for this reason

I didn't read this bit carefully enough, mea culpa. I'm still not sure what the value of a "default assumption" is if it's not predictive, though.

(We in that statement was meant to refer to humanity as a whole.)

I also didn't pick up on the we = humanity thing, sorry. Makes more sense now.

comment by rohinmshah · 2020-05-18T18:36:10.192Z · score: 10 (7 votes) · EA(p) · GW(p)
I agree that perfect optimisers are pathological. But we are not going to train anything that is within light-years of perfect optimisation. Perfect optimisation is a totally different type of thing to what we're doing.

If you replace "perfect optimization" with "significantly-better-than-human optimization" in all of my claims, I'd continue to agree with them.

This argument feels to me like saying "We shouldn't keep building bigger and bigger bombs because in the limit of size they'll form a black hole and destroy the Earth."

If somehow I knew that this fact were true, but I didn't know at what size the bombs form a black hole and destroy us all, I would in fact see this as a valid and motivating argument for not building bigger bombs, and for trying to figure out how to build bombs that don't destroy the Earth (or coordinate to not build them at all).

Firstly because it feels reminiscent of the utility-maximisation arguments made by Yudkowsky - in both cases the arguments are based on theoretical claims which are literally true but in practice irrelevant or vacuous.

I strongly disagree with this.

The utility-maximization argument that I disagree with is something like:

"AI is superintelligent" implies "AI is EU-maximizing" implies "AI has convergent instrumental subgoals".

This claim is not true [AF · GW] even theoretically. It's not a question of what's happening in practice.

There is a separate argument which goes

"Superintelligent AI is built by humans" implies "AI is goal-directed" implies "AI has convergent instrumental subgoals"

And I place non-trivial weight on this claim, even though it is a conceptual, fuzzy claim that we're not sure yet will be relevant in practice, and one of the implications doesn't apply in the case where the AI is pursuing some "meta" goal that refers to the human's goals.

(You might disagree with this analysis as well, but I'd guess you'd be in the minority amongst AI safety researchers.)

The argument I gave is much more like the second kind -- a conceptual claim that depends on fuzzy categories like "certain specifications".

Secondly [...]

Sorry, I don't understand your point here. It sounds like "the last time we made an argument, we were wrong, therefore we shouldn't make more arguments", but that can't be what you're saying.

Maybe your point is that ML researchers are more competent than we give them credit for, and so we should lower our probability of x-risk? If so, I mostly just want to ignore this; I'm really not making a probabilistic argument. I'm making an argument "from the perspective of humanity / the full AI community".

I think spreading the argument "if we don't do X, then we are in trouble because of problem Y" seems better than spreading something like "there is a p% of having problem Y, where I've taken into account the fact that people will try to solve Y, and that won't be sufficient because of Z; therefore we need to put more effort into X". The former is easier to understand and more likely to be true / correctly reasoned.

(I would also defend "the chance is not so low that EAs should ignore it", but that's a separate conversation, and seems not very relevant to what arguments we should spread amongst the AI community.)

Thirdly, because I am epistemically paranoid about giving arguments which aren't actually the main reason to believe in a thing. [...] I suspect that the same is not really the case for you and the argument you give.

It totally is. I have basically two main concerns with AI alignment:

  • We're aiming for the wrong thing (outer alignment)
  • Even if we aim for the right thing, we might generalize poorly (inner alignment)

If you told me that inner alignment was magically not a problem -- we always generalize in the way that the reward function would have incentivized -- I would still be worried; though it would make a significant dent in my AI risk estimate.

If you told me that outer alignment was magically not a problem (we're actually aiming for the right thing), that would make a smaller but still significant dent in my estimate of AI risk. It's only smaller because I expect the work to solve this problem to be done by default, whereas I feel less confident about that for inner alignment.

it doesn't establish that AI safety work needs to be done by someone, it just establishes that AI researchers have to avoid naively extrapolating their current work.

Why is "not naively extrapolating their current work" not an example of AI safety work? Like, presumably they need to extrapolate in some as-yet-unknown way, figuring out that way sounds like a central example of AI safety work.

It seems analogous to "biologists just have to not publish infohazards, therefore there's no need to work on the malicious use category of biorisk".

Secondly because the argument is also true for image classifiers, since under perfect optimisation they could hack their loss functions. So insofar as we're not worried about them, then the actual work is being done by some other argument.

I'm not worried about them because there are riskier systems that will be built first, and because there isn't much economic value in having strongly superintelligent image classifiers. If we really tried to build strongly superintelligent image classifiers, I would be somewhat worried (though less so, since the restricted action space provides some safety).

(You might also think that image classifiers are safe because they are myopic, but in this world I'm imagining that we make non-myopic image classifiers, because they will be better at classifying images than myopic ones.)

Thirdly because I do think that counterfactual impact is the important bit, not "AI safety work needs to be done by someone."

I do think that there is counterfactual impact in expectation. I don't know why you think there isn't counterfactual impact. So far it sounds to me like "we should give the benefit of the doubt to ML researchers and assume they'll solve outer alignment", which sounds like a claim about norms, not a claim about the world.

I think the better argument against counterfactual impact is "there will be a strong economic incentive to solve these problems" (see e.g. here [EA · GW]), and that might reduce it by an order of magnitude, but that still leaves a lot of possible impact. But also, I think this argument applies to inner alignment as well (though less strongly).

comment by richard_ngo · 2020-05-18T23:59:56.948Z · score: 10 (3 votes) · EA(p) · GW(p)

A few more meta points:

  • I'm very surprised that we're six levels deep into a disagreement and still actively confused about each other's arguments. I thought our opinions were much more similar. This suggests that we should schedule a time to talk in person, and/or an adversarial collaboration trying to write a version of the argument that you're thinking of. (The latter might be more efficient than this exchange, while also producing useful public records).
  • Thanks for the thorough + high-quality engagement, I really appreciate it.
  • Due to time constraints I'll just try hit two key points in this reply (even though I don't think your responses resolved any of the other points for me, which I'm still very surprised by).

If you replace "perfect optimization" with "significantly-better-than-human optimization" in all of my claims, I'd continue to agree with them.

We are already at significantly-better-than-human optimisation, because none of us can take an environment and output a neural network that does well in that environment, but stochastic gradient descent can. We could make SGD many many times better and it still wouldn't produce a malicious superintelligence when trained on CIFAR, because there just isn't any gradient pushing it in the direction of intelligence; it'll train an agent to memorise the dataset far before that. And if the path to tampering is a few dozen steps long, the optimiser won't find it before the heat death of the universe (because the agent has no concept of tampering to work from, all it knows is CIFAR). So when we're talking about not-literally-perfect optimisers, you definitely need more than just amazing optimisation and hard-coded objective functions for trouble to occur - you also need lots of information about the world, maybe a bunch of interaction with it, maybe a curriculum. This is where the meat of the argument is, to me.

I think spreading the argument "if we don't do X, then we are in trouble because of problem Y" seems better. ... The former is easier to understand and more likely to be true / correctly reasoned.

I previously said:

I'm still not sure what the value of a "default assumption" is if it's not predictive, though.

And I still have this confusion. It doesn't matter if the argument is true and easy to understand if it's not action-guiding for anyone. Compare the argument: "if we (=humanity) don't remember to eat food in 2021, then everyone will die". Almost certainly true. Very easy to understand. Totally skips the key issue, which is why we should assign high enough probability to this specific hypothetical to bother worrying about it.

So then I guess your response is something like "But everyone forgetting to eat food is a crazy scenario, whereas the naive extrapolation of the thing we're currently doing is the default scenario". (Also, sorry if this dialogue format is annoying, I found it an easy way to organise my thoughts, but I appreciate that it run the risk of strawmanning you).

To which I respond: there are many ways of naively extrapolating "the thing we are currently doing". For example, the thing we're currently doing is building AI with a 100% success record at not taking over the world. So my naive extrapolation says we'll definitely be fine. Why should I pay any attention to your naive extrapolation?

I then picture you saying: "I'm not using these extrapolations to make probabilistic predictions, so I don't need to argue that mine is more relevant than yours. I'm merely saying: once our optimisers get really really good, if we give them a hard-coded objective function, things will go badly. Therefore we, as humanity, should do {the set of things which will not lead to really good optimisers training on hard-coded objective functions}."

To which I firstly say: no, I don't buy the claim that once our optimisers get really really good, if we give them a hard-coded objective function, "an existential catastrophe almost certainly happens". For reasons which I described above.

Secondly, even if I do accept your claim, I think I could just point out: "You've defined what we should do in terms of its outcomes, but in an explicitly non-probabilistic way. So if the entire ML community hears your argument, agrees with it, and then commits to doing exactly what they were already doing for the next fifty years, you have no grounds to complain, because you have not actually made any probabilistic claims about whether "exactly what they were already doing for the next fifty years" will lead to catastrophe." So again, why is this argument worth making?

Man, this last point felt really nitpicky, but I don't know how else to convey my intuitive feeling that there's some sort of motte and bailey happening in your argument. Again, let's discuss this higher-bandwidth.

comment by MichaelA · 2020-05-20T09:27:44.479Z · score: 8 (5 votes) · EA(p) · GW(p)

Just want to say that I've found this exchange quite interesting, and would be keen to read an adversarial collaboration between you two on this sort of thing. Seems like that would be a good addition to the set of discussions there've been about key cruxes related to AI safety/alignment [LW(p) · GW(p)].

(ETA: Actually, I've gone ahead and linked to this comment thread in that list as well, for now, as it was already quite interesting.)

comment by rohinmshah · 2020-05-19T16:20:25.051Z · score: 5 (3 votes) · EA(p) · GW(p)
This suggests that we should schedule a time to talk in person, and/or an adversarial collaboration trying to write a version of the argument that you're thinking of.

Sounds good, I'll just clarify my position in this response, rather than arguing against your claims.

So then I guess your response is something like "But everyone forgetting to eat food is a crazy scenario, whereas the naive extrapolation of the thing we're currently doing is the default scenario".

It's more like "there isn't any intellectual work to be done / field building to do / actors to coordinate to get everyone to eat".

Whereas in the AI case, I don't know how we're going to fix the problem I outlined; and as far as I can tell nor does anyone else in the AI community, and therefore there is intellectual work to be done.

We are already at significantly-better-than-human optimisation

Sorry, by optimization there I meant something more like "intelligence". I don't really care whether it comes from better SGD, some hardcoded planning algorithm, or a mesa optimizer; the question is whether it is significantly more capable than humans at pursuing goals.

I thought our opinions were much more similar.

I think our predictions of how the world will go concretely are similar; but I'd guess that I'm happier with abstract arguments that depend on fuzzy intuitive concepts than you are, and find them more compelling than more concrete ones that depend on a lot of specific details.

comment by Max_Daniel · 2020-05-18T08:40:05.448Z · score: 7 (4 votes) · EA(p) · GW(p)

(FWIW, when reading the above discussion I independently had almost exactly the same reaction as the following before reading it in Richard's latest comment:

This argument feels to me like saying "We shouldn't keep building bigger and bigger bombs because in the limit of size they'll form a black hole and destroy the Earth."

)

comment by MichaelA · 2020-05-16T08:27:47.321Z · score: 2 (2 votes) · EA(p) · GW(p)
"I regard 1) as roughly as likely as not"
Isn't this exactly the same as Toby's estimate? (I actually don't know, I have a vague sense that this is true and was stated in The Precipice.)

This indeed matches Ord's views. He says on 80k:

Basically, you can look at my 10% [estimate of the existential risk from AI this century] as, there’s about a 50% chance that we create something that’s more intelligent than humanity this century. And then there’s only an 80% chance that we manage to survive that transition, being in charge of our future.

I think he also gives that 50% estimate in The Precipice, but I can't remember for sure.

comment by MichaelA · 2020-05-16T08:27:12.428Z · score: 1 (1 votes) · EA(p) · GW(p)

Good points!

... Why wouldn't AIs be good at doing these things? [...]
More broadly, if you object to the implication "superintelligence implies ability to dominate the world", then just take whatever mental property P you think does allow an agent to dominate the world; I suspect both Toby and I would agree with "there is a non-trivial chance that future AI systems will be superhuman at P and so would be able to dominate the world".

I think this is a key point, and that three related/supporting points can also be made:

  • The timeline surveys discussed in this essay related to when AI will be "able to accomplish every task better and more cheaply than human workers", and some of the tasks/jobs asked about would seem to rely on a wide range of skills, including social skills.
  • Intelligence is already often defined in a very expansive way that would likely include all mental properties that may be relevant to world-dominance. E.g., “Intelligence measures an agent’s ability to achieve goals in a wide range of environments" (Legg and Hutter). That definition is from AI researchers, and I'd guess that AI researchers often have something like that (rather than something like performance on IQ tests) in mind as what they're ultimately aiming for or predicting arrival dates of (though I don't have insider knowledge on AI researchers' views).
    • That said, I do think intelligence is also often used in a narrower way, and so I do value the OP having helped highlight that it's worth being more specific about what abilities are being discussed.
  • I'd guess that there are economic incentives to create AI systems that are very strong in whatever mental abilities are important for achieving goals. If it turns out that a narrow sense of intelligent is not sufficient for that, it doesn't seem likely that people will settle for AI systems that are very strong in only that dimension.
comment by willbradshaw · 2020-05-11T17:00:48.993Z · score: 31 (14 votes) · EA(p) · GW(p)

I'm mostly going to restrict my comments to your section on biosecurity, since (a) I have a better background in that area and (b) I think it's stronger than the AI section. I haven't read all of The Precipice yet, so I'm responding to your arguments in general, rather than specifically defending Ord's phrasing.

One general comment: this post is long enough that I think it would benefit from a short bullet-point summary of the main claims (the current intro doesn't say much beyond the fact that you disagree with Ord's risk assessments).

Anyway, biosecurity. There's a general problem with info-hazards/tongue-biting in this domain, which can make it very difficult to have a full and frank exchange of views, or even tell exactly where and why someone disagrees with you. I, and ~everyone else, finds this very frustrating, but it is the way of things. So you might well encounter people disagreeing with claims you make without telling you why, or even silently disagreeing without saying so.

That said, it's my impression that most people I've spoken to (who are willing to issue an opinion on the subject!) think that, currently, directly causing human extinction via a pandemic would be extremely hard (there are lots of GCBR-focused biosecurity papers that say this). Your claim that such a pathogen is very likely impossible seems oddly strong to me, given that evolutionary constraints are not the same thing as physical constraints. But even if possible, such agents are certainly very hard to find in pathogen-space. I expect we'll get drastically better at searching that space over the next 100 years, though.

I disagree with parts of all of your points here, but I think the weakest is the section arguing that no-one would want to create such dangerous biological weapons (which, to be fair, you also place the least weight on):

This appears to be reflected in the fact that as far as is publicly known, very few attempts have even been made to deploy such weapons in modern times. I thus believe that we have good reason to think that the number of people and amount of effort devoted to developing such dangerous bioweapons is likely to be low, especially for non-state actors.

We know that state actors (most notably, but not only, the Soviet Union) have put enormous effort and funding into creating some very nasty biological weapons over the past 100 years, including many "strategic" weapons that were intended to spread widely and create mass civilian casualties if released. Whether or not doing so was strategically rational or consistent with their stated goals or systems of ethics, there have in fact been biological weapons programs, which did in fact create "deadly, indiscriminate pathogen[s]."

A rogue state such as North Korea might be able to circumvent this particular problem, however that raises as range of new difficulties, such as why it would ever be in the interest of a state actor (as opposed to a death cult terrorist group) to develop such a deadly, indiscriminate pathogen.

Eppur si muove. Any attempt to tackle the question of how likely it is that someone would seek to develop catastrophic biological weapons must reckon with the fact that such weapons have, in fact, been sought.

comment by MichaelA · 2020-05-16T06:58:28.379Z · score: 11 (4 votes) · EA(p) · GW(p)

I'm glad you mentioned information hazards [LW · GW] in this context. Personally, I felt a bit uncomfortable reading the engineered pandemics section listing an array of obstacles to be surmounted to causing extinction, and ways they might be surmounted.

I agree that it's quite an unfortunate situation that concerns about information hazards make it harder to openly debate levels of risks from various sources and related topics (at least within the biorisk space). I'm also generally quite in favour of people being able to poke and prod at prominent or common views, think this post seems to have done a good job of that in certain parts (although I disagree with quite a few specific points made), and would feel uncomfortable if people felt unable to write anything like this for information hazards reasons.

But I'd personally really hope that, before publishing this, the author at least ran the engineered pandemics section by one person who is fairly familiar with the biorisk or x-risk space, explicitly asking them for their views on how wise it would be to publish it in the current form. Such a person might be able to provide info on where the contents of that to-do list of doom are on a spectrum from:

  • already very widely known (such that publication may not do that much harm)
  • surprisingly novel, or currently receiving little attention from the most concerning actors (who may not have especially high creativity or expertise)

(There's more discussion of the fraught topic of info hazards in these sources [EA(p) · GW(p)].)

comment by MichaelA · 2020-06-17T03:59:14.764Z · score: 2 (1 votes) · EA(p) · GW(p)

In Kevin Esvelt's recent EAGx talk, he provides a lot of interesting thoughts on the matter of information hazards in the bio space. It seems that Esvelt would likewise hope that the engineered pandemics section had at least been run by a knowledgeable and trustworthy person first, or that Esvelt might actually express stronger concerns than I did.

For people low on time, the last bit, from 40:30 onwards, is perhaps especially relevant.

comment by MichaelA · 2020-05-16T06:44:36.863Z · score: 1 (1 votes) · EA(p) · GW(p)
Your claim that such a pathogen is very likely impossible seems oddly strong to me, given that evolutionary constraints are not the same thing as physical constraints.

I think this is an important point (as are the rest of your points), and something similar came to my mind too. I think we may be able to put it more strongly. Your phrasing makes me think of evolution "trying" to create the sort of pathogen that could lead to human extinction, but there being constraints on its ability to do so, which, given that they aren't physical constraints, could perhaps be overcome through active technological effort. It seems to me that evolution isn't even "trying" to create that sort of pathogen in the first place.

In fact, I've seen it argued that natural selection actively pushes against extreme virulence. From the Wikipedia article on optimal virulence:

A pathogen that is too restrained will lose out in competition to a more aggressive strain that diverts more host resources to its own reproduction. However, the host, being the parasite's resource and habitat in a way, suffers from this higher virulence. This might induce faster host death, and act against the parasite's fitness by reducing probability to encounter another host (killing the host too fast to allow for transmission). Thus, there is a natural force providing pressure on the parasite to "self-limit" virulence. The idea is, then, that there exists an equilibrium point of virulence, where parasite's fitness is highest. Any movement on the virulence axis, towards higher or lower virulence, will result in lower fitness for the parasite, and thus will be selected against.

I don't have any background in this area, so I'm not sure how well that Wikipedia article represents expert consensus, what implications to draw from that idea, and whether that's exactly what you were already saying. But it seems to me that this presents additional reason to doubt how much we can extrapolate from what pathogens naturally arise to what pathogens are physically possible.

(Though I imagine that what pathogens are physically possible still provides some evidence, and that it's reasonable to tentatively raise it in discussions of risks from engineered pandemics.)

comment by steve2152 · 2020-05-12T15:26:56.872Z · score: 27 (11 votes) · EA(p) · GW(p)

Again, this remark seems explicitly to assume that the AI is maximising some kind of reward function. Humans often act not as maximisers but as satisficers, choosing an outcome that is good enough rather than searching for the best possible outcome. Often humans also act on the basis of habit or following simple rules of thumb, and are often risk averse. As such, I believe that to assume that an AI agent would be necessarily maximising its reward is to make fairly strong assumptions about the nature of the AI in question. Absent these assumptions, it is not obvious why an AI would necessarily have any particular reason to usurp humanity.

Imagine that, when you wake up tomorrow morning, you will have acquired a magical ability to reach in and modify your own brain connections however you like.

Over breakfast, you start thinking about how frustrating it is that you're in debt, and feeling annoyed at yourself that you've been spending so much money impulse-buying in-app purchases in Farmville. So you open up your new brain-editing console, look up which neocortical generative models [LW · GW] were active the last few times you made a Farmville in-app purchase, and lower their prominence, just a bit.

Then you take a shower, and start thinking about the documentary you saw last night about gestation crates. 'Man, I'm never going to eat pork again!' you say to yourself. But you've said that many times before, and it's never stuck. So after the shower, you open up your new brain-editing console, and pull up that memory of the gestation crate documentary and the way you felt after watching it, and set that memory and emotion to activate loudly every time you feel tempted to eat pork, for the rest of your life.

Do you see the direction that things are going? As time goes on, if an agent has the power of both meta-cognition and self-modification, any one of its human-like goals (quasi-goals which are context-dependent, self-contradictory, satisficing, etc.) can gradually transform itself into a utility-function-like goal (which is self-consistent, all-consuming, maximizing)! To be explicit: during the little bits of time when one particular goal happens to be salient and determining behavior, the agent may be motivated to "fix" any part of itself that gets in the way of that goal, until bit by bit, that one goal gradually cements its control over the whole system.

Moreover, if the agent does gradually self-modify from human-like quasi-goals to an all-consuming utility-function-like goal, then I would think it's very difficult to predict exactly what goal it will wind up having. And most goals have problematic convergent instrumental sub-goals [LW · GW] that could make them into x-risks.

...Well, at least, I find this a plausible argument, and don't see any straightforward way to reliably avoid this kind of goal-transformation. But obviously this is super weird and hard to think about and I'm not very confident. :-)

(I think I stole this line of thought from Eliezer Yudkowsky but can't find the reference.)

Everything up to here is actually just one of several lines of thought that lead to the conclusion that we might well get an AGI that is trying to maximize a reward.

Another line of thought is what Rohin said [EA(p) · GW(p)]: We've been using reward functions since forever, so it's quite possible that we'll keep doing so.

Another line of thought is: We humans actually have explicit real-world goals, like curing Alzheimer's and solving climate change etc. And generally the best way to achieve goals is to have an agent seeking them.

Another line of thought is: Different people will try to make AGIs in different ways, and it's a big world, and (eventually by default) there will be very low barriers-to-entry in building AGIs. So (again by default) sooner or later someone will make an explicitly-goal-seeking AGI, even if thoughtful AGI experts pronounce that doing so is a terrible idea.

comment by steve2152 · 2020-05-12T17:31:15.606Z · score: 17 (7 votes) · EA(p) · GW(p)

A nice short argument that a sufficiently intelligent AGI would have the power to usurp humanity is Scott Alexander's Superintelligence FAQ Section 3.1 [LW · GW].

comment by willbradshaw · 2020-05-22T14:37:28.036Z · score: 6 (5 votes) · EA(p) · GW(p)

One meta-level comment I have here is: while there are very many things that could very reasonably have caused this not to happen here, I think these sorts of controversial/gauntlet-throwing posts are generally much more valuable if the author is around to respond to comments and explore disagreements in more depth.

comment by EdoArad (edoarad) · 2020-05-18T07:33:29.242Z · score: 5 (4 votes) · EA(p) · GW(p)

Regarding the possibility of Extinction level agents [EA · GW], there has been at least 2 species extinction cases that likely resulted from pathogens (here or in sci-hub). 

Also, the Taino people were pretty much extinct and that may be mostly the result of disease, though it seems contended:

In thirty years, between 80% and 90% of the Taíno population died.[76] Because of the increased number of people (Spanish) on the island, there was a higher demand for food. Taíno cultivation was converted to Spanish methods. In hopes of frustrating the Spanish, some Taínos refused to plant or harvest their crops. The supply of food became so low in 1495 and 1496, that some 50,000 died from the severity of the famine.[77] Historians have determined that the massive decline was due more to infectious disease outbreaks than any warfare or direct attacks.[78][79] By 1507, their numbers had shrunk to 60,000. Scholars believe that epidemic disease (smallpox, influenza, measles, and typhus) was an overwhelming cause of the population decline of the indigenous people,[80] and also attributed a "large number of Taíno deaths...to the continuing bondage systems" that existed.[81][82] Academics, such as historian Andrés Reséndez of the University of California, Davis, assert that disease alone does not explain the total destruction of indigenous populations of Hispaniola.

These two cases actually lower my fear of naturally accruing pandemics, because I'd expect to find more evidence. This in turn also lowers slightly my credence in the plausibility of engineered pandemics. 

I'm sure that other people here are much more knowledgeable than myself, and this brief analysis might be misleading. 

comment by Ben_West · 2020-05-20T22:20:29.953Z · score: 3 (5 votes) · EA(p) · GW(p)

This is really interesting! It seems like there's also compelling evidence for more than 2:

While there is no direct evidence that any of the 25 [18] species of Hawaiian land birds that have become extinct since the documented arrival of Culex quinquefasciatus in 1826 [19] were even susceptible to malaria and there is limited anecdotal information suggesting they were affected by birdpox [19], the observation that several remaining species only persist either on islands where there are no mosquitoes or at altitudes above those at which mosquitoes can breed and that these same species are highly susceptible to avian malaria and birdpox [18,19] is certainly very strong circumstantial evidence...

The formerly abundant endemic rats Rattus macleari and Rattus nativitas disappeared from Christmas Island in the Indian Ocean (10°29′ S 105°38′ E) around the turn of the twentieth century. Their disappearance was apparently abrupt, and shortly before the final collapse sick individuals were seen crawling along footpaths [22]. At that time, trypanosomiasis transmitted by fleas from introduced black rats R. rattus was suggested as the causative agent. Recently, Wyatt et al. [22] managed to isolate trypanosome DNA from both R. rattus and R. macleari specimens collected during the period of decline, whereas no trypanosome DNA was present in R. nativitas specimens collected before the arrival of black rats. While this is good circumstantial evidence, direct evidence that trypanosomes caused the mortality is limited

comment by MichaelA · 2020-05-16T07:43:33.818Z · score: 4 (3 votes) · EA(p) · GW(p)

I think I'd say this essay fills a valuable role of poking and prodding at prominent views, and thereby pushing people to grapple with ways in which those views may be flawed, in which those views may not have been fully argued for, or in which people may not understand the full justifications for those views (despite accepting them on authority). It has also updated my thinking somewhat.

And I appreciate the general Fermi estimate approach you've taken in the Probability Estimates section. (On the other hand, it seems worth acknowledging that that approach may make it harder to account for disjunctive scenarios, or scenarios that are hard to predict the pathways of in advance, as alluded to in some other comments.) I've also added those estimates to this database of existential risk estimates [EA · GW].

That said, I disagree with many of the specific points made in this essay, and thus at least partially disagree with its conclusions. I'll split my points into separate comments.

On historical and current pandemics

deaths from infectious diseases and pandemics in particular have decreased in recent centuries

This felt like a statement that should have a source. Here's one. It doesn't look super trustworthy, but does seem to match my previous (non-expert) impression from various (mostly EA or EA-adjacent) sources, which is: Number of deaths from pandemic in a given century seems to vary hugely, without following a clear trend. Deaths from pandemics per century have been lower each century since the Black Death than they were during the Black Death, but:

  • that's just 6 centuries, so not a very large sample
  • the Black Death seems to mean deaths from pandemics haven't been trending downwards since before that point
  • it doesn't look like the deaths have trended downwards since the Black Death; just that Black Death was an extreme occurrence. Indeed, the Spanish Flu seems to have caused more deaths than almost any other post-Black-Death pandemic (based on that potentially not trustworthy source)
with no major pandemics in Western Europe since the early eighteenth century.

Wouldn't COVID-19 count? It's definitely been far smaller than e.g. the Black Death so far, in terms of numbers of deaths. And I expect that will remain true. Perhaps you're therefore counting it as not "major"? But it still seems odd to say it's not a major pandemic.

(The public policy response to COVID does seem to more support than detract from your overall points; I'm just questioning that specific claim I quoted there, or at least its phrasing.)

comment by Matthew_Barnett · 2020-05-12T07:56:32.541Z · score: 3 (2 votes) · EA(p) · GW(p)

Regarding the section on estimating the probability of AI extinction, I think a useful framing is to focus on disjunctive scenarios where AI ends up being used. If we imagine a highly detailed scenario where a single artificial intelligence goes rougue, then of course these types of things will seem unlikely.

However, my guess is that AI will gradually become more capable and integrated into the world economy, and there won't be a discrete point where we can say "now the AI was invented." Over the broad course of history, we have witnessed numerous instances of populations displacing other populations eg. species displacements in ecosystems, and humans populations displacing other humans. If we think about AI as displacing humanity's seat of power in this abstract way, then an AI takeover doesn't seem implausible anymore, and indeed I find it quite likely in the long run.

comment by saulius · 2020-05-11T18:34:15.371Z · score: 3 (2 votes) · EA(p) · GW(p)

Hey, it’s an interesting article, thanks for writing it. I’ll just respond to one point.

Lacking a physical body to interact with people, it is hard to see how an AI could inspire the same levels of loyalty and fear that these three leaders (and many others like then) relied upon in their subordinates and followers.

If that is important, AI could make itself a virtual body and show videos of itself talking. People rarely see their leaders physically anyway. And the virtual body could be optimized for whatever is needed to gain power. It could maybe make itself more fearsome because lacking a body makes it less vulnerable, more mysterious and authoritative. It’s not just another human, it’s a whole new type of being that is better than us at everything.

If I am wrong and people would only follow other people, AI could e.g. hire a hitman to assassinate whoever and then assume their identity with deepfake-like videos and tell that they are filmed from a secret location for safety reasons. Or construct a new human identity.

This is all super speculative and these probably wouldn't be the strategies it would use, I'm just pointing out some possibilities. Also, note that it’s not my area and I only thought about it for ~15 minutes.

comment by MichaelA · 2020-05-16T08:30:20.688Z · score: 1 (1 votes) · EA(p) · GW(p)

Minor point, regarding:

In considering the balance of positive and negative effects that organisational and civilization advances have had on the ability to respond to the risk of pathogens, Ord states that “it is hard to know whether these combined effects have increased or decreased the existential risk from pandemics” (p. 127). This argument, however, seems implausible

If we interpret Ord as saying "the existential risk from pandemics is just as likely to have increased as to have decreased", then I'd agree that that seems implausible. (Though I'm not an expert on the relevant topics.) For that reason, I think that that wasn't an ideally phrased sentence from Ord.

However, his literal claim is just that it's hard to know whether the risk has risen or fallen. I'd agree with that. It seems to me likely that the risk has fallen, but maybe around a 60-90% chance that that's true, rather than 99%. (These are quite made-up numbers.) And my estimate of the chance the risks have fallen wouldn't be very "resilient" [LW(p) · GW(p)] (i.e., it'd be quite open to movement based on new evidence).

comment by MichaelA · 2020-05-16T07:56:41.035Z · score: 1 (1 votes) · EA(p) · GW(p)

Extinction risk ≠ existential risk [EA · GW]

Considering the issue of how an engineered pandemic could lead to the extinction of humanity, I identify five separate things that must occur... [emphasis added]

Ord's estimates are of existential risk from various sources, not extinction risk. Thus, at least part of the difference between your estimate and his, regarding engineered pandemics, can be explained by you estimating the risk of a narrower subset of very bad outcomes than he is.

I don't think this explains a lot of the difference, because:

  • You already seem to be giving your estimate of the chance an engineered pandemic brings humanity below the minimum viable population, rather than the chance an engineered pandemic "directly"/"itself" reduces the population to literally 0
  • I get the impression that Ord is relatively optimistic (compared to many but not all other x-risk researchers) about humanity's chance of recovery from collapse, and about our chance of being ok in the end as long as we avoid seriously extreme outcomes (e.g., he doesn't seem very concerned by things like a catastrophe resulting in us having notably worse values [EA · GW], which end up persisting over time)

But I think the difference in what you're estimating may explain some of the difference in your estimates.

And in any case, it seems worth noting, because it seems to me reasonable to be less optimistic than Ord about our chances of recovery or issues like catastrophes making our values worse in a persistent way. And that in turn could be a reason to end up with an existential risk estimate closer to Ord's than to yours, even if one agrees with your views about the extinction risks.

comment by matthias_samwald · 2020-05-13T09:33:44.511Z · score: 0 (3 votes) · EA(p) · GW(p)
1. Artificial general intelligence, or an AI which is able to out-perform humans in essentially all human activities, is developed within the next century.
2. This artificial intelligence acquires the power to usurp humanity and achieve a position of dominance on Earth.
3. This artificial intelligence has a reason/motivation/purpose to usurp humanity and achieve a position of dominance on Earth.
4. This artificial intelligence either brings about the extinction of humanity, or otherwise retains permanent dominance over humanity in a manner so as to significantly diminish our long-term potential.

I think one problem here is phrasing 2.-4 as singular ("This artificial intelligence"), when the plural would be more appropriate. If the technological means are available, it is likely that many actors will create powerful AI systems. If the offense-defense balance is unfavorable (i.e., it is much easier for the AGI systems available at a specific time to do harm than to protect from harm), then a catastrophic event might be triggered by just one of very many AGI systems becoming unaligned ('unilateralist curse').

So I would rephrase your estimates like this:

1. Artificial general intelligence (AGI), or an AI which is able to out-perform humans in essentially all human activities, is developed within the next century.

2. AT LEAST ONE of a large number of AGI systems acquires the capability to usurp humanity and achieve a position of dominance on Earth.

3. AT LEAST ONE of those AGI systems has a reason/motivation/purpose to usurp humanity and achieve a position of dominance on Earth (unaligned AGI).

4. The offense-defense balance between AGI systems available at the time is unfavorable (i.e., defense from unaligned AGI through benevolent AGI is difficult)

5. The unaligned AGI either brings about the extinction of humanity, or otherwise retains permanent dominance over humanity in a manner so as to significantly diminish our long-term potential.

My own estimates when phrasing it this way would be 0,99 * 0,99 * 0,99 * 0,5 * 0,1 = roughly a 5% risk, with high uncertainty.

This would make risk of an unfavorable offense-defense balance (here estimated as 0,5) one of the major determining parameters in my estimate.

comment by avturchin · 2020-05-12T13:23:09.346Z · score: 0 (11 votes) · EA(p) · GW(p)

There is an idea of a multipandemic, that is several pandemics running simultaneously. This would significantly increase the probability of extinction.

comment by Flodorner · 2020-05-15T18:02:52.543Z · score: 4 (3 votes) · EA(p) · GW(p)

While I am unsure about how good of an idea it is to map out more plausible scenarios for existential risk from pathogens, I agree with the sentiment that the top level post seems seems to focus too narrowly on a specific scenario.

comment by avturchin · 2020-05-17T16:26:06.661Z · score: -15 (7 votes) · EA(p) · GW(p)

I created once a map of crazy ideas how biotech could cause human extinction. There are around 100 ways.