richard_ngo feed - EA Forum Reader richard_ngo’s posts and comments on the Effective Altruism Forum en-us Comment by richard_ngo on Why do you reject negative utilitarianism? https://forum.effectivealtruism.org/posts/Wj8JymeiMN28Hrrjo/why-do-you-reject-negative-utilitarianism#zmyKAJi5bR3Q9Lf6C <p>Toby Ord gives a good summary of a range of arguments against negative utilitarianism <a href="http://www.amirrorclear.net/academic/ideas/negative-utilitarianism/">here</a>.</p><p>Personally, I think that valuing positive experiences instrumentally is insufficient, given that the future has the potential to be fantastic.</p> richard_ngo zmyKAJi5bR3Q9Lf6C 2019-02-17T22:48:17.451Z Comment by richard_ngo on Ben Garfinkel: How Sure are we about this AI Stuff? https://forum.effectivealtruism.org/posts/9sBAW3qKppnoG3QPq/ben-garfinkel-how-sure-are-we-about-this-ai-stuff#JM5SKA47yJ2HL4gs5 <blockquote>The argument for doom by default seems to rest on a default misunderstanding of human values as the programmer attempts to communicate them to the AI.</blockquote><p>I don&#x27;t think this is correct. The argument rests on AIs having any values which aren&#x27;t human values (e.g. maximising paperclips), not just misunderstood human values.</p> richard_ngo JM5SKA47yJ2HL4gs5 2019-02-13T00:13:55.588Z Comment by richard_ngo on Arguments for moral indefinability https://forum.effectivealtruism.org/posts/TqCDCkp2ZosCiS3FB/arguments-for-moral-indefinability#kNTjECk4oFCtSCzcX <blockquote>Multiple terminal values will always lead to irreconcilable conflicts.</blockquote><p>This is not the case when there&#x27;s a well-defined procedure for resolving such conflicts. For example, you can map several terminal values onto a numerical &quot;utility&quot; scale.</p> richard_ngo kNTjECk4oFCtSCzcX 2019-02-10T10:33:34.511Z Comment by richard_ngo on Arguments for moral indefinability https://forum.effectivealtruism.org/posts/TqCDCkp2ZosCiS3FB/arguments-for-moral-indefinability#mZ9pxw6D7L9KATJk9 <p>From skimming the SEP article on pluralism, it doesn't quite seem like what I'm talking about. Pluralism + incomparability comes closer, but still seems like a subset of my position, since there are other ways that indefinability could be true (e.g. there's only one type of value, but it's intrinsically vague)</p> richard_ngo mZ9pxw6D7L9KATJk9 2019-02-09T14:01:54.518Z Arguments for moral indefinability https://forum.effectivealtruism.org/posts/TqCDCkp2ZosCiS3FB/arguments-for-moral-indefinability <p><em>Epistemic status: I endorse the core intuitions behind this post, but am only moderately confident in the specific claims made. Also, while I do have a degree in philosophy, I am not a professional ethicist, and I’d appreciate feedback on how these ideas relate to existing literature.</em><br/></p><p>Moral indefinability is the term I use for the idea that there is no ethical theory which provides acceptable solutions to all moral dilemmas, and which also has the theoretical virtues (such as simplicity, precision and non-arbitrariness) that we currently desire. I think this is an important and true perspective on ethics, and in this post will explain why I hold it, with the caveat that I&#x27;m focusing more on airing these ideas than constructing a watertight argument.<br/></p><p>Here’s another way of explaining moral indefinability: let’s think of ethical theories as procedures which, in response to a moral claim, either endorse it, reject it, or do neither. Moral philosophy is an attempt to find the theory whose answers best match our intuitions about what answers ethical theories should give us (e.g. don’t cause unnecessary suffering), and whose procedure for generating answers best matches our meta-level intuitions about what ethical theories should look like (e.g. they should consistently apply impartial principles rather than using ad-hoc, selfish or random criteria). None of these desiderata are fixed in stone, though - in particular, we sometimes change our intuitions when it’s clear that the only theories which match those intuitions violate our meta-level intuitions. My claim is that eventually we will also need to change our meta-level intuitions in important ways, because it will become clear that the only theories which match them violate key object-level intuitions. In particular, this might lead us to accept theories which occasionally evince properties such as:</p><ul><li><em>Incompleteness</em>: for some claim A, the theory neither endorses nor rejects either A or ~A, even though we believe that the choice between A and ~A is morally important.</li><li><em>Vagueness</em>: the theory endorses an imprecise claim A, but rejects every way of making it precise.</li><li><em>Contradiction</em>: the theory endorses both A and ~A (note that this is a somewhat provocative way of framing this property, since we can always add arbitrary ad-hoc exceptions to remove the contradictions. So perhaps a better term is <em>arbitrariness of scope</em>: when we have both a strong argument for A and a strong argument for ~A, the theory can specify in which situations each conclusion should apply, based on criteria which we would consider arbitrary and unprincipled. Example: when there are fewer than N lives at stake, use one set of principles; otherwise use a different set).</li></ul><p></p><p>Why take moral indefinability seriously? The main reason is that ethics evolved to help us coordinate in our ancestral environment, and did so not by giving us a complete decision procedure to implement, but rather by ingraining intuitive responses to certain types of events and situations. There were many different and sometimes contradictory selection pressures driving the formation of these intuitions - and so, when we construct generalisable principles based on our intuitions, we shouldn&#x27;t expect those principles to automatically give useful or even consistent answers to very novel problems. Unfortunately, the moral dilemmas which we grapple with today have in fact &quot;scaled up&quot; drastically in at least two ways. Some are much greater in scope than any problems humans have dealt with until very recently. And some feature much more extreme tradeoffs than ever come up in our normal lives, e.g. because they have been constructed as thought experiments to probe the edges of our principles.<br/></p><p>Of course, we&#x27;re able to adjust our principles so that we are more satisfied with their performance on novel moral dilemmas. But I claim that in some cases this comes at the cost of those principles conflicting with the intuitions which make sense on the scales of our normal lives. And even when it&#x27;s possible to avoid that, there may be many ways to make such adjustments whose relative merits are so divorced from our standard moral intuitions that we have no good reason to favour one over the other. I&#x27;ll give some examples shortly.<br/></p><p>A second reason to believe in moral indefinability is the fact that human concepts tend to be <em><a href="http://oxfordindex.oup.com/view/10.1093/oi/authority.20110803100251341?rskey=DGtCcA&result=0&q=open%20texture">open texture</a></em>: there is often no unique &quot;correct&quot; way to rigorously define them. For example, we all know roughly what a table is, but it doesn’t seem like there’s an objective definition which gives us a sharp cutoff between tables and desks and benches and a chair that you eat off and a big flat rock on stilts. A less trivial example is our inability to rigorously define what entities qualify as being &quot;alive&quot;: edge cases include viruses, fires, AIs and embryos. So when moral intuitions are based on these sorts of concepts, trying to come up with an exact definition is probably futile. This is particularly true when it comes to very complicated systems in which tiny details matter a lot to us - like human brains and minds. It seems implausible that we’ll ever discover precise criteria for when someone is experiencing contentment, or boredom, or many of the other experiences that we find morally significant.<br/></p><p>I would guess that many anti-realists are sympathetic to the arguments I’ve made above, but still believe that we can make morality precise without changing our meta-level intuitions much - for example, by grounding our ethical beliefs in <a href="https://wiki.lesswrong.com/wiki/Coherent_Extrapolated_Volition">what idealised versions of ourselves would agree with</a>, after long reflection. My main objection to this view is, broadly speaking, that there is no canonical “idealised version” of a person, and different interpretations of that term could lead to a very wide range of ethical beliefs. I explore this objection in much more detail in <a href="https://www.lesswrong.com/posts/suxvE2ddnYMPJN9HD/realism-about-rationality">this post</a>. (In fact, the more general idea that humans aren’t really “utility maximisers”, even approximately, is another good argument for moral indefinability.) And even if idealised reflection is a coherent concept, it simply passes the buck to your idealised self, who might then believe my arguments and decide to change their meta-level intuitions.<br/></p><p>So what are some pairs of moral intuitions which might not be simultaneously satisfiable under our current meta-level intuitions? Here’s a non-exhaustive list - the general pattern being clashes between small-scale perspectives, large-scale perspectives, and the meta-level intuition that they should be determined by the same principles:</p><ul><li>Person-affecting views versus non-person-affecting views. Small-scale views: killing children is terrible, but not having children is fine, even when those two options lead to roughly the same outcome. Large-scale view: extinction is terrible, regardless of whether it comes about from people dying or people not being born.</li><li>The mere addition paradox, aka the repugnant conclusion. Small-scale views: adding happy people and making people more equal can&#x27;t make things worse. Large-scale view: a world consisting only of people whose lives are barely worth living is deeply suboptimal. (Note also Arrhenius&#x27; impossibility theorems, which show that you can&#x27;t avoid the repugnant conclusion without making even greater concessions).</li><li>Weighing theories under moral uncertainty. I personally find OpenPhil&#x27;s work on <a href="https://www.youtube.com/watch?v=hjuG6nfD080">cause prioritisation under moral uncertainty</a> very cool, and the fundamental intuitions behind it seem reasonable, but some of it (e.g. variance normalisation) has reached a level of abstraction where I feel almost no moral force from their arguments, and aside from an instinct towards definability I&#x27;m not sure why I should care.</li><li>Infinite and relativistic ethics. Same as above. See also <a href="https://www.lesswrong.com/posts/8FRzErffqEW9gDCCW/against-the-linear-utility-hypothesis-and-the-leverage">this LessWrong post</a> arguing against applying the “linear utility hypothesis” at vast scales.</li><li>Whether we should force future generations to have our values. On one hand, we should be very glad that past generations couldn&#x27;t do this. But on the other, the future will probably disgust us, like our present would disgust our ancestors. And along with &quot;moral progress&quot; there&#x27;ll also be value drift in arbitrary ways - in fact, I don&#x27;t think there&#x27;s any clear distinction between the two.</li></ul><p></p><p>I suspect that many readers share my sense that it&#x27;ll be very difficult to resolve all of the dilemmas above in a satisfactory way, but also have a meta-level intuition that they need to be resolved somehow, because it&#x27;s important for moral theories to be definable. But perhaps at some point it&#x27;s this very urge towards definability which will turn out to be the weakest link. I do take seriously Parfit&#x27;s idea that secular ethics is still young, and there&#x27;s much progress yet to be made, but I don&#x27;t see any principled reason why we should be able to <em>complete</em> ethics, except by raising future generations without whichever moral intuitions are standing in the way of its completion (and isn&#x27;t that a horrifying thought?). From an anti-realist perspective, I claim that perpetual indefinability would be better. That may be a little more difficult to swallow from a realist perspective, of course. My guess is that the core disagreement is whether moral claims are more like facts, or <a href="http://thinkingcomplete.blogspot.com/2018/04/a-pragmatic-approach-to-interpreting.html">more like preferences or tastes</a> - if the latter, moral indefinability would be analogous to the claim that there’s no (principled, simple, etc) theory which specifies exactly which foods I enjoy.<br/></p><p>There are two more plausible candidates for moral indefinability which were the original inspiration for this post, and which I think are some of the most important examples:</p><ul><li>Whether to define welfare in terms of preference satisfaction or hedonic states.</li><li>The problem of &quot;maximisation&quot; in utilitarianism.</li></ul><p>I&#x27;ve been torn for some time over the first question, slowly shifting towards hedonic utilitarianism as problems with formalising preferences piled up. While this isn&#x27;t the right place to enumerate those problems (<a href="http://thinkingcomplete.blogspot.com/2017/10/utilitarianism-and-its-discontents.html">see here for a previous relevant post</a>), I&#x27;ve now become persuaded that any precise definition of which preferences it is morally good to satisfy will lead to conclusions which I find unacceptable. After making this update, I can either reject a preference-based account of welfare entirely (in favour of a hedonic account), or else endorse a &quot;vague&quot; version of it which I think will never be specified precisely.<br/></p><p>The former may seem the obvious choice, until we take into account the problem of maximisation. Consider that a true (non-person-affecting) hedonic utilitarian would kill everyone who wasn&#x27;t maximally happy if they could replace them with people who were (<a href="http://www.simonknutsson.com/the-world-destruction-argument/">see here for a comprehensive discussion of this argument</a>). And that for any precise definition of welfare, they would search for edge cases where they could push it to extreme values. In fact, reasoning about a &quot;true utilitarian&quot; feels remarkably like reasoning about an unsafe AGI. I don&#x27;t think that&#x27;s a coincidence: psychologically, humans just aren&#x27;t built to be maximisers, and so a true maximiser would be fundamentally adversarial. And yet many of us also have strong intuitions that there are some good things, and it&#x27;s always better for there to be more good things, and it’s best if there are most good things.<br/></p><p>How to reconcile these problems? My answer is that utilitarianism is pointing in the right direction, which is “lots of good things”, and in general we can move in that direction without moving maximally in that direction. What are those good things? I use a vague conception of welfare that balances preferences and hedonic experiences and some of my own parochial criteria - importantly, without feeling like it&#x27;s necessary to find a perfect solution (although of course there will be ways in which my current position can be improved). In general, I think that we can often do well enough without solving fundamental moral issues - see, for example, <a href="https://www.lesswrong.com/posts/Zgwy2QRgYBSrMWDMQ/logarithms-and-total-utilitarianism">this LessWrong post</a> arguing that we’re unlikely to ever face the true repugnant dilemma, because of empirical facts about psychology.<br/></p><p>To be clear, this still means that almost everyone should focus much more on utilitarian ideas, like the enormous value of the far future, because in order to reject those ideas it seems like we’d need to sacrifice important object- or meta-level moral intuitions to a much greater extent than I advocate above. We simply shouldn’t rely on the idea that such value is precisely definable, nor that we can ever identify an ethical theory which meets all the criteria we care about.</p> richard_ngo TqCDCkp2ZosCiS3FB 2019-02-08T11:09:25.547Z Comment by richard_ngo on Simultaneous Shortage and Oversupply https://forum.effectivealtruism.org/posts/JfoutXamSWsoCWbef/simultaneous-shortage-and-oversupply#9sYHpkzWSXeTKt2r4 <p>This seems plausible, but also quite distinct from the claim that &quot;roles for programmers in direct work tend to sit open for a long time&quot;, which I took the list of openings to be supporting evidence for.</p> richard_ngo 9sYHpkzWSXeTKt2r4 2019-02-01T13:20:28.282Z Comment by richard_ngo on Simultaneous Shortage and Oversupply https://forum.effectivealtruism.org/posts/JfoutXamSWsoCWbef/simultaneous-shortage-and-oversupply#xWLceE8eftJYLcTLb <p>The OpenAI and DeepMind posts you linked aren&#x27;t necessarily relevant, e.g. the Software Engineer, Science role is not for DeepMind&#x27;s safety team, and it&#x27;s pretty unclear to me whether the OpenAI ML engineer role is safety-relevant.</p> richard_ngo xWLceE8eftJYLcTLb 2019-01-27T15:37:29.219Z Comment by richard_ngo on Request for input on multiverse-wide superrationality (MSR) https://forum.effectivealtruism.org/posts/92wCvqF73Gzg5Jnrr/request-for-input-on-multiverse-wide-superrationality-msr#25vKYg86M6zjvZyXi <p>The example you&#x27;ve given me shows that agents which implement exactly the same (high-level) algorithm can cooperate with each other. The metric I&#x27;m looking for is: how can we decide how similar two agents are when their algorithms are non-identical? Presumably we want a smoothness property for that metric such that if our algorithms are very similar (e.g. only differ with respect to some radically unlikely edge case) the reduction in cooperation is negligible. But it doesn&#x27;t seem like anyone knows how to do this.</p> richard_ngo 25vKYg86M6zjvZyXi 2019-01-27T02:02:39.076Z Comment by richard_ngo on Announcing an updated drawing protocol for the EffectiveAltruism.org donor lotteries https://forum.effectivealtruism.org/posts/ut29rXvY9hAGgatNR/announcing-an-updated-drawing-protocol-for-the#4jK4Hz3jqSDL3Fxtn <p>Can you give some examples of &quot;more responsible&quot; ways?</p><p>I agree that in general calculating your own random digits feels a lot like <a href="https://motherboard.vice.com/en_us/article/wnx8nq/why-you-dont-roll-your-own-crypto">rolling your own crypto</a>. (Edit: I misunderstood the method and thought there was an easy exploit, which I was wrong about. Nevertheless at least 1/3 of the digits in the API response are predictable, maybe more, and the whole thing is quite small, so it might be possible to increase your probability of winning slightly by brute force calculating possibilities, assuming you get to pick your own contiguous ticket number range. My preliminary calculations suggest that this method would be too difficult, but I&#x27;m not an expert, there may be more sophisticated hacks).</p> richard_ngo 4jK4Hz3jqSDL3Fxtn 2019-01-25T11:10:37.974Z Comment by richard_ngo on If slow-takeoff AGI is somewhat likely, don't give now https://forum.effectivealtruism.org/posts/JimLnG3sbYqPF8rKJ/if-slow-takeoff-agi-is-somewhat-likely-don-t-give-now#LiwuQdKNAAAfAS7ap <p>(edited) I just saw your link above about growth vs value investing. I don&#x27;t think that&#x27;s a helpful distinction in this case, and when people talk about a company being undervalued I think that typically includes both unrecognised growth potential and unrecognised current value. (Maybe that&#x27;s less true for startups, but we&#x27;re talking about already-listed companies here).</p><p>I do think the core claim of &quot;if AGI will be as big a deal as we think it&#x27;ll be, then the markets are systematically undervaluing AI companies&quot; is a reasonable one, but the arguments you&#x27;ve given here aren&#x27;t precise enough to justify confidence, especially given the aforementioned need for caution. For example, premise 4 doesn&#x27;t actually follow directly from premise 3 because the returns could be large but not outsized compared with other investments. I think you can shore that link up, but not without contradicting your other point:</p><blockquote>I&#x27;m not claiming that investing in AI companies will generate higher-than-average returns in the long run.</blockquote><p>Which means (under the definition I&#x27;ve been using) that you&#x27;re not claiming that they&#x27;re undervalued.</p> richard_ngo LiwuQdKNAAAfAS7ap 2019-01-24T12:18:34.560Z Comment by richard_ngo on Disentangling arguments for the importance of AI safety https://forum.effectivealtruism.org/posts/LprnaEj3uhkmYtmat/disentangling-arguments-for-the-importance-of-ai-safety#BE2ryxsYe2i84pEp3 <p>I agree that the extent to which individual humans are rational agents is often overstated. Nevertheless, there are many examples of humans who spend decades striving towards distant and abstract goals, who learn whatever skills and perform whatever tasks are required to reach them, and who strategically plan around or manipulate the actions of other people. If AGI is anywhere near as agentlike as humans in the sense of possessing the long-term goal-directedness I just described, that&#x27;s cause for significant concern.</p> richard_ngo BE2ryxsYe2i84pEp3 2019-01-24T11:24:12.483Z Comment by richard_ngo on If slow-takeoff AGI is somewhat likely, don't give now https://forum.effectivealtruism.org/posts/JimLnG3sbYqPF8rKJ/if-slow-takeoff-agi-is-somewhat-likely-don-t-give-now#HpZes2skqbJGyyxiW <p>If AI research companies aren&#x27;t currently undervalued, then your Premise 4 (being an investor in such companies will generate outsized returns on the road to slow-takeoff AGI) is incorrect, because the market will have anticipated those outsized returns and priced them in to the current share price.</p> richard_ngo HpZes2skqbJGyyxiW 2019-01-24T01:52:57.839Z Comment by richard_ngo on If slow-takeoff AGI is somewhat likely, don't give now https://forum.effectivealtruism.org/posts/JimLnG3sbYqPF8rKJ/if-slow-takeoff-agi-is-somewhat-likely-don-t-give-now#xcSbi4b6CeQru3DHg <p>&quot;returns that can later be deployed to greater altruistic effect as AI research progresses&quot;</p><p>This is hiding an important premise, which is that you&#x27;ll actually be able to deploy those increased resources well enough to make up for the opportunities you forego now. E.g. Paul thinks that (as an operationalisation of slow takeoff) the economy will double in 4 years before the first 1 year doubling period starts. So after that 4 year period you might end up with twice as much money but only 1 or 2 years to spend it on AI safety.</p> richard_ngo xcSbi4b6CeQru3DHg 2019-01-24T01:50:03.363Z Comment by richard_ngo on Disentangling arguments for the importance of AI safety https://forum.effectivealtruism.org/posts/LprnaEj3uhkmYtmat/disentangling-arguments-for-the-importance-of-ai-safety#Yb8yDunC79Kyr5YpH <p>I&#x27;ve actually spent a fair while thinking about CAIS, and <a href="https://www.alignmentforum.org/posts/HvNAmkXPTSoA4dvzv/comments-on-cais">written up my thoughts here</a>. Overall I&#x27;m skeptical about the framework, but if it turns out to be accurate I think that would heavily mitigate arguments 1 and 2, somewhat mitigate 3, and not affect the others very much. Insofar as 4 and 5 describe AGI as an agent, that&#x27;s mostly because it&#x27;s linguistically natural to do so - I&#x27;ve now edited some of those phrases. 6b does describe AI as a species, but it&#x27;s unclear whether that conflicts with CAIS, insofar as the claim that AI will <em>never </em>be agentlike is a very strong one, and I&#x27;m not sure whether Drexler makes it explicitly (I discuss this point in the blog post I linked above).</p> richard_ngo Yb8yDunC79Kyr5YpH 2019-01-24T01:19:46.429Z Comment by richard_ngo on Disentangling arguments for the importance of AI safety https://forum.effectivealtruism.org/posts/LprnaEj3uhkmYtmat/disentangling-arguments-for-the-importance-of-ai-safety#DPr7y2WF3RjJiG3CQ <p>I agree that it&#x27;s not too concerning, which is why I consider it weak evidence. Nevertheless, there are some changes which don&#x27;t fit the patterns you described. For example, it seems to me that newer AI safety researchers tend to consider intelligence explosions less likely, despite them being a key component of argument 1. For more details along these lines, check out the exchange between me and Wei Dai in the comments on <a href="https://www.alignmentforum.org/posts/JbcWQCxKWn3y49bNB/disentangling-arguments-for-the-importance-of-ai-safety">the version of this post on the alignment forum</a>.</p> richard_ngo DPr7y2WF3RjJiG3CQ 2019-01-24T01:09:33.910Z Disentangling arguments for the importance of AI safety https://forum.effectivealtruism.org/posts/LprnaEj3uhkmYtmat/disentangling-arguments-for-the-importance-of-ai-safety <p>I recently attended the 2019 Beneficial AGI conference organised by the Future of Life Institute. I’ll publish a more complete write-up later, but I was particularly struck by how varied attendees&#x27; reasons for considering AI safety important were. Before this, I’d observed a few different lines of thought, but interpreted them as different facets of the same idea. Now, though, I’ve identified at least 6 distinct serious arguments for why AI safety is a priority. By distinct I mean that you can believe any one of them without believing any of the others - although of course the particular categorisation I use is rather subjective, and there’s a significant amount of overlap. In this post I give a brief overview of my own interpretation of each argument (note that I don’t necessarily endorse them myself). They are listed roughly from most specific and actionable to most general. I finish with some thoughts on what to make of this unexpected proliferation of arguments. Primarily, I think it increases the importance of clarifying and debating the core ideas in AI safety.</p><ol><li><em>Maximisers are dangerous.</em> Superintelligent AGI will behave as if it’s maximising the expectation of some utility function, since doing otherwise can be <a href="https://www.lesswrong.com/posts/F46jPraqp258q67nE/why-you-must-maximize-expected-utility">shown to be irrational.</a> Yet we can’t write down a utility function which precisely describes human values, and optimising very hard for any other function will lead to that AI rapidly seizing control (as a <a href="https://en.wikipedia.org/wiki/Instrumental_convergence">convergent instrumental subgoal)</a> and building a future which contains very little of what we value (because of <a href="https://en.wikipedia.org/wiki/Goodhart%27s_law">Goodhart’s law </a>and <a href="https://wiki.lesswrong.com/wiki/Complexity_of_value">the complexity and fragility of values)</a>. We won’t have a chance to notice and correct misalignment because an AI which has exceeded human level will improve its intelligence very quickly (either by recursive self-improvement or by scaling up its hardware), and then prevent us from modifying it or shutting it down.</li><ol><li>This was the main thesis advanced by Yudkowsky and Bostrom when founding the field of AI safety. Here I’ve tried to convey the original line of argument, although some parts of it have been strongly critiqued since then. In particular, <a href="https://www.fhi.ox.ac.uk/wp-content/uploads/Reframing_Superintelligence_FHI-TR-2019-1.1-1.pdf">Drexler</a> and <a href="https://www.lesswrong.com/posts/NxF5G6CJiof6cemTw/coherence-arguments-do-not-imply-goal-directed-behavior">Shah</a> have disputed the relevance of expected utility maximisation (the latter suggesting the concept of <a href="https://www.lesswrong.com/s/4dHMdK5TLN6xcqtyc/p/DfcywmqRSkBaCB6Ma">goal-directedness</a> as a replacement), while <a href="https://intelligence.org/ai-foom-debate/">Hanson</a> and <a href="https://sideways-view.com/2018/02/24/takeoff-speeds/">Christiano</a> disagree that AI intelligence will increase in a very fast and discontinuous way.</li><li>Most of the arguments in this post originate from or build on this one in some way. This is particularly true of the next two arguments - nevertheless, I think that there’s enough of a shift in focus in each to warrant separate listings.</li></ol><li><em>The target loading problem.</em> Even if we knew exactly what we wanted a superintelligent agent to do, we don’t currently know (even in theory) how to make an agent which actually tries to do that. In other words, if we were to create a superintelligent AGI before solving this problem, the goals we would ascribe to that AGI (by taking the <a href="https://en.wikipedia.org/wiki/Intentional_stance">intentional stance</a> towards it) would not be the ones we had intended to give it. As a motivating example, evolution selected humans for their genetic fitness, yet humans have goals which are very different from just spreading their genes. In a machine learning context, while we can specify a finite number of data points and their rewards, neural networks may then extrapolate from these rewards in non-humanlike ways.</li><ol><li>This is a more general version of the “inner optimiser problem”, and I think it captures the main thrust of the latter while avoiding the difficulties of defining what actually counts as an “optimiser”. I’m grateful to Nate Soares for explaining the distinction, and arguing for the importance of this problem.</li></ol><li><em>The prosaic alignment problem.</em> It is plausible that we build “prosaic AGI”, which replicates human behaviour without requiring breakthroughs in our understanding of intelligence. Shortly after they reach human level (or possibly even before), such AIs will become the world’s dominant economic actors. They will quickly come to control the most important corporations, earn most of the money, and wield enough political influence that we will be unable to coordinate to place limits on their use. Due to economic pressures, corporations or nations who slow down AI development and deployment in order to focus on aligning their AI more closely with their values will be outcompeted. As AIs exceed human-level intelligence, their decisions will become too complex for humans to understand or provide feedback on (unless we develop new techniques for doing so), and eventually we will no longer be able to correct the divergences between their values and ours. Thus the majority of the resources in the far future will be controlled by AIs which don’t prioritise human values. This argument was explained in <a href="https://www.alignmentforum.org/posts/YTq4X6inEudiHkHDF/prosaic-ai-alignment">this blog post by Paul Christiano</a>.</li><ol><li>More generally, aligning multiple agents with multiple humans is much harder than aligning one agent with one human, because value differences might lead to competition and conflict even between agents that are each fully aligned with some humans. (As my own speculation, it’s also possible that having multiple agents would increase the difficulty of single-agent alignment - e.g. the question “what would humans want if I didn’t manipulate them” would no longer track our values if we would counterfactually be manipulated by a different agent).</li></ol><li><em>The human safety problem.</em> This line of argument (which Wei Dai <a href="https://www.lesswrong.com/posts/vbtvgNXkufFRSrx4j/three-ai-safety-related-ideas">has</a> <a href="https://www.lesswrong.com/posts/HBGd34LKvXM9TxvNf/new-safety-research-agenda-scalable-agent-alignment-via#2gcfd3PN8GGqyuuHF">recently</a><a href="https://www.lesswrong.com/posts/HTgakSs6JpnogD6c2/two-neglected-problems-in-human-ai-safety">highlighted</a>) claims that no human is “safe” in the sense that giving them absolute power would produce good futures for humanity in the long term, and therefore that building AI which extrapolates and implements the values of even a very altruistic human is insufficient. A prosaic version of this argument emphasises the corrupting effect of power, and the fact that morality is deeply intertwined with social signalling - however, I think there’s a stronger and more subtle version. In everyday life it makes sense to model humans as mostly rational agents pursuing their goals and values. However, this abstraction breaks down badly in more extreme cases (e.g. addictive superstimuli, unusual moral predicaments), implying that human values are somewhat incoherent. One such extreme case is running my brain for a billion years, after which it seems very likely that my values will have shifted or distorted radically, in a way that my original self wouldn’t endorse. Yet if we want a good future, this is the process which we require to go well: a human (or a succession of humans) needs to maintain broadly acceptable and coherent values for astronomically long time periods.</li><ol><li>An obvious response is that we shouldn’t entrust the future to one human, but rather to some group of humans following a set of decision-making procedures. However, I don’t think any currently-known institution is actually much safer than individuals over the sort of timeframes we’re talking about. Presumably a committee of several individuals would have lower variance than just one, but as that committee grows you start running into well-known problems with democracy. And while democracy isn’t a bad system, it seems unlikely to be robust on the timeframe of millennia or longer. (Alex Zhu has made the interesting argument that the problem of an individual maintaining coherent values is roughly isomorphic to the problem of a civilisation doing so, since both are complex systems composed of individual “modules” which often want different things.)</li><li>While AGI amplifies the human safety problem, it may also help solve it if we can use it to decrease the value drift that would otherwise occur. Also, while it’s possible that we need to solve this problem in conjunction with other AI safety problems, it might be postponable until after we’ve achieved civilisational stability.</li><li>Note that I use “broadly acceptable values” rather than “our own values”, because it’s very unclear to me which types or extent of value evolution we should be okay with. Nevertheless, there are some values which we definitely find unacceptable (e.g. having a very narrow moral circle, or wanting your enemies to suffer as much as possible) and I’m not confident that we’ll avoid drifting into them by default.</li></ol><li><em>Misuse and vulnerabilities</em>. These might be catastrophic even if AGI always carries out our intentions to the best of its ability:</li><ol><li>AI which is superhuman at science and engineering R&amp;D will be able to invent very destructive weapons much faster than humans can. Humans may well be irrational or malicious enough to use such weapons even when doing so would lead to our extinction, especially if they’re invented before we improve our global coordination mechanisms. It’s also possible that we invent some technology which destroys us unexpectedly, either through unluckiness or carelessness. For more on the dangers from technological progress in general, see Bostrom’s paper on the <a href="https://nickbostrom.com/papers/vulnerable.pdf">vulnerable world hypothesis</a>.</li><li>AI could be used to disrupt political structures, for example via unprecedentedly effective psychological manipulation. In an extreme case, it could be used to establish very stable totalitarianism, with automated surveillance and enforcement mechanisms ensuring an unshakeable monopoly on power for leaders.</li><li>AI could be used for large-scale projects (e.g. climate engineering to prevent global warming, or managing the colonisation of the galaxy) without sufficient oversight or verification of robustness. Software or hardware bugs might then induce the AI to make unintentional yet catastrophic mistakes.</li><li>People could use AIs to hack critical infrastructure (include the other AIs which manage aforementioned large-scale projects). In addition to exploiting standard security vulnerabilities, hackers might induce mistakes using adversarial examples or ‘data poisoning’.</li></ol><li><em>Argument from large impacts</em>. Even if we’re very uncertain about what AGI development and deployment will look like, it seems likely that AGI will have a very large impact on the world in general, and that further investigation into how to direct that impact could prove very valuable.</li><ol><li>Weak version: development of AGI will be at least as big an economic jump as the industrial revolution, and therefore affect the trajectory of the long-term future. See Ben Garfinkel’s talk at EA Global London 2018 (which I’ll link when it’s available online). Ben noted that to consider work on AI safety important, we also need to believe the additional claim that there are feasible ways to positively influence the long-term effects of AI development - something which may not have been true for the industrial revolution. (Personally my guess is that since AI development will happen more quickly than the industrial revolution, power will be more concentrated during the transition period, and so influencing its long-term effects will be more tractable.)</li><li>Strong version: development of AGI will make humans the second most intelligent species on the planet. Given that it was our intelligence which allowed us to control the world to the large extent that we do, we should expect that entities which are much more intelligent than us will end up controlling our future, unless there are reliable and feasible ways to prevent it. So far we have not discovered any.</li></ol></ol><p>What should we think about the fact that there are so many arguments for the same conclusion? As a general rule, the more arguments support a statement, the more likely it is to be true. However, I’m inclined to believe that quality matters much more than quantity - it’s easy to make up weak arguments, but you only need one strong one to outweigh all of them. And this proliferation of arguments is (weak) evidence against their quality: if the conclusions of a field remain the same but the reasons given for holding those conclusions change, that’s a warning sign for motivated cognition (especially when those beliefs are considered socially important). This problem is exacerbated by a lack of clarity about which assumptions and conclusions are shared between arguments, and which aren’t.</p><p>On the other hand, superintelligent AGI is a very complicated topic, and so perhaps it’s natural that there are many different lines of thought. One way to put this in perspective (which I credit to Beth Barnes) is to think about the arguments which might have been given for worrying about nuclear weapons, before they had been developed. Off the top of my head, there are at least four:</p><ol><li>They might be used deliberately.</li><li>They might be set off accidentally.</li><li>They might cause a nuclear chain reaction much larger than anticipated.</li><li>They might destabilise politics, either domestically or internationally.</li></ol><p>And there are probably more which would have been credible at the time, but which seem silly now due to hindsight bias. So if there’d been an active anti-nuclear movement in the 30’s or early 40’s, the motivations of its members might well have been as disparate as those of AI safety advocates today. Yet the overall concern would have been (and still is) totally valid and reasonable.</p><p>I think the main takeaway from this post is that the AI safety community as a whole is still confused about the very problem we are facing. The only way to dissolve this tangle is to have more communication and clarification of the fundamental ideas in AI safety, particularly in the form of writing which is made widely available. And while it would be great to have AI safety researchers explaining their perspectives more often, I think there is still a lot of explicatory work which can be done regardless of technical background. In addition to analysis of the arguments discussed in this post, I think it would be particularly useful to see more descriptions of deployment scenarios and corresponding threat models. It would also be valuable for research agendas to highlight which problem they are addressing, and the assumptions they require to succeed.</p><p><em>This post has benefited greatly from feedback from Rohin Shah, Alex Zhu, Beth Barnes, Adam Marblestone, Toby Ord, and the DeepMind safety team. Also see <a href="https://www.lesswrong.com/posts/JbcWQCxKWn3y49bNB/disentangling-arguments-for-the-importance-of-ai-safety">the discussion which has taken place on LessWrong</a>. All opinions are my own.</em></p> richard_ngo LprnaEj3uhkmYtmat 2019-01-23T14:58:27.881Z Comment by richard_ngo on What Is Effective Altruism? https://forum.effectivealtruism.org/posts/wF95uthxv4ZZsujva/what-is-effective-altruism#ktgY9oDx43nn6k4hR <p>I like &quot;science-aligned&quot; better than &quot;secular&quot;, since the former implies the latter as well as a bunch of other important concepts.</p><p>Also, it&#x27;s worth noting that &quot;everyone&#x27;s welfare is to count equally&quot; in Will&#x27;s account is approximately equivalent to &quot;effective altruism values all people equally&quot; in Ozymandias&#x27; account, but neither of them imply the following paraphrase: &quot;from the effective altruism perspective, saving the life of a baby in Africa is exactly as good as saving the life of a baby in America, which is exactly as good as saving the life of Ozy’s baby specifically.&quot; I understand the intention of that phrase, but actually I&#x27;d save whichever baby would grow up to have the best life. Is there any better concrete description of what impartiality actually implies?</p> richard_ngo ktgY9oDx43nn6k4hR 2019-01-10T10:45:45.770Z How democracy ends: a review and reevaluation https://forum.effectivealtruism.org/posts/tbfNF5qCGbjSmpM8E/how-democracy-ends-a-review-and-reevaluation <p>Last month I attended a talk by David Runciman, the author of a recent book called <em>How Democracy Ends</em>. I was prepared for outrage-stirring and pearl-clutching, but was pleasantly surprised by the quality of his arguments, which I’ve summarised below, along with my own thoughts on these issues. Note, however, that I haven’t read the book itself, and so can’t be confident that I’ve portrayed his ideas faithfully.</p><h2>Which lessons from history?</h2><p>Many people have compared recent populist movements with the stirrings of fascism a century ago. And indeed it’s true that a lot of similar rhetoric being thrown around. But Runciman argues that this is one of the least interesting comparisons to be made between these two times. Some things that would be much more surprising to a denizen of the early 20th century:</p><ul><li>Significant advances in technology</li><li>Massive transformations in societal demographics</li><li>Very few changes in our institutions</li></ul><p>The last of those is particularly surprising in light of the first two. Parliamentary democracies in the Anglophone world have been governed by the same institutions - and in some cases, even the same parties - for centuries. Continental European democracies were more disrupted by World War 2, but have been very stable since then, despite the world changing in many ways overall. That’s true even for institutions that are probably harmful - consider the persistence of the electoral college in the US, the House of Lords in the UK, the monarchy in Australia (despite their ardent republicanism movement), first-past-the-post voting systems in many countries, and so on. (In fact, Runciman speculates that Americans voted for Trump partly because of how much confidence they had in the durability of their institutions - a confidence which so far seems to have been well-founded.)</p><p>So history gives us pretty good evidence for the robustness of democratic institutions to the normal flow of time - but not to exceptional circumstances. In fact, an inability to make necessary changes may well render them more fragile in the face of sharp opposition. If and when pressure mounts, are they going to snap like the democratic institutions of 1930s Europe did? Runciman argues that they won’t, because of the nature of the demographic changes the West has seen. There are three particularly important axes of variation:</p><ul><li>Wealth: the average person is many times wealthier than they were a century ago, and the middle class is much larger.</li><li>Education: we’ve gone from only a few percent of people getting tertiary education (and many of the remainder not finishing high school) to nearly 50% of young people being in university in many western countries.</li><li>Age: in the last century, the median age has risen by over ten years in most western countries.</li></ul><p>These three factors are some of the most powerful predictors of behaviour that we have, and so we should take them into account when judging the likelihood of democratic failure. For instance, wealthier and more educated people are much less likely to support populist or extremist groups. But Runciman focuses the most on age, which I think is the correct approach. Wealth is relative - even if people are actually much richer, they can feel poor and angry after a recession (as they did in the 1930s, despite still being many times wealthier than almost all their ancestors). Education may just be correlated with other factors, rather than the actual cause of lasting differences in mindset. But there are pretty clear biological and social reasons to think that the behaviour and priorities of older people are robustly and significantly different from those of younger people. You need only look at the age distribution of violent crime, for example, to see how strong this effect is (although it may have lessened somewhat over recent years, since marriage rates are declining and single men cause more trouble).</p><p>In short: the failures of democracy in the 30’s were based on large populations of young men who could be mobilised in anger by militaristic leaders - see for instance the brownshirts in Germany and blackshirts in Italy. But that’s not what the failure of democracy in our time would look like, because that group of people is much smaller now. For better or worse, older populations are less disruptive and more complacent. To see where that might lead, consider Japan: an ageing population which can’t drag itself out of economic torpor, resistant to immigration, dominated for decades by one political party, betting the country&#x27;s future on using robots to replace the missing workforce.</p><h2>Changes ahead</h2><p>During a Q&amp;A after the talk, I pointed out that Japan is very different to western countries in its particularly strong culture of social conformity and stability. Age trends notwithstanding, I have much more difficulty imaging the same quiet tolerance of slow decline occurring in the US or UK. So, given that government institutions are very difficult to change, where will people direct their frustration if lacklustre growth continues in the coming decades?</p><p>In response, Runciman raised two possibilities. Firstly, that people will “go around their governments”, finding new domains in which politics is less relevant. We could call this the “Wild West” possibility. Of course, there’s no longer an uncolonised West to explore - but there is the internet, which isn’t democratically run and probably never will be. We already see fewer young men working full-time, because the alternative of spending most of their time gaming has become more appealing. As virtual worlds become even more immersive, it seems plausible that people will begin to care much less about political issues.</p><p>One problem with the idea of “going around governments”, though, is that governments are just much bigger now than they used to be. And as technology companies profit from the growing role of the internet, there’ll be pressure for governments to intervene even more to fight inequality. So a second option is a more Chinese approach, with increasingly autocratic Western governments exerting heavy pressure on (and perhaps eventually control over) tech companies.</p><p>A more optimistic possibility is for the internet to make democracy more accountable. Runciman invites us to consider Plato’s original argument against direct democracy (in which people vote on individual issues) - that it would lead to rule by the poor, the ignorant, and the young, all of whom necessarily outnumber the wealthy, wise and old. This argument turned out not to apply for representative democracy, since elected representatives tend to be wealthy, educated and old despite their constituents being the opposite. But now it’s inapplicable for a different reason - that although our representatives haven’t changed much, the rest of us are starting to look much more like them. So maybe it’ll become feasible to implement a more direct democracy, facilitated by the internet and modern communication technology. (This still seems like a bad idea to me, though.)</p><h2>Base rates and complacency</h2><p>The last section was a little speculative, so let’s take a step back and think about how to make predictions about these sorts of events in general. Runciman’s analysis above provides good reasons not to draw a specific parallel between the rise of fascism last century and recent political events. But it would take extraordinary evidence to exempt us from extrapolating broader historical trends, in particular the fact that states always collapse eventually, and that the base rate for coups and other forms of internal strife is fairly high. Are the extraordinary changes we’ve seen since the industrial revolution sufficient to justify belief in our exceptionalism?</p><p>It’s true that since World War 2, almost no wealthy democracies have descended into autocracy or chaos (Turkey and Ireland being two edge cases). It’s also true that, despite widespread political disillusionment, norms against violence have held to a remarkably large degree. But drawing judgements from the historical period “since World War 2” is a classic case of the Texan Sharpshooter’s Fallacy (and possibly also anthropic bias?). In fact, this recent lull should make us skeptical about our ability to evaluate the question objectively, because people are in general very bad at anticipating extreme events that haven&#x27;t occurred in living memory. I think this is true despite these possibilities being discussed in the media. For example, while there’s a lot of talk about Trump being a potential autocrat, few Americans are responding by stockpiling food or investing in foreign currencies or emigrating. This suggests that hostility towards Trump is driven primarily by partisan politics, rather than genuine concern about democratic collapse. An additional data point in favour of this hypothesis is how easily the Republican political establishment has fallen in line.</p><p>Another key question which isn’t often discussed is the nature of modern military culture. Historically, this has been a major factor affecting governmental stability. But, apart from vague intuitions about modern militaries being fairly placid, I find myself remarkably ignorant on this subject, and suspect others are as well. What facts do you know about your country&#x27;s military, about the character of its commanders or the distribution of power within it, that make you confident that it won&#x27;t launch a coup if, for example, one of its generals is narrowly defeated in a disputed presidential election (as in Gore vs Bush)? Note that military demographics haven’t changed nearly as much as those of our societies overall. They’re still primarily composed of young working-class men without degrees - a group that’s unusually angry about today’s politics. So while I am pretty convinced by Runciman’s arguments, this is one way in which they may not apply. Also consider that warfare is much less hands-on than it used to be, and firepower much more centrally concentrated, both of which make coups easier.</p><h2>And what about extreme events?</h2><p>So far I&#x27;ve looked at societal collapse from a political point of view. But many historical transitions were precipitated by natural disasters or diseases. See, for instance, the Mayan collapse, or the Little Ice Age, or the Black Death. Today, we&#x27;re much safer from natural disasters, both because of our technology and because of the scale of our societies - few people live in countries in which the majority of a population can be struck by a single natural disaster. Similarly, we&#x27;re also much safer from natural diseases. But we&#x27;re much more vulnerable to severe man-made disasters, which I think are very likely to occur over the next century. Since this post is focused on political collapse as a distinct phenomenon to technological disaster, I won’t discuss extreme risks from technology here. However, it&#x27;s worthwhile to look at the ways in which smaller technological harms might exacerbate other trends. AI-caused unemployment and the more general trend towards bimodal outcomes in western countries are likely to cause social unrest. Meanwhile terrorism is going to become much easier - consider being able to 3D-print assassin drones running facial recognition software, for instance. And due to antibiotic overuse, it&#x27;s likely that our safety from disease will decline over the coming years (even without the additional danger of bioterrorism using engineered diseases). Finally, I think we&#x27;re much softer than we used to be - it won&#x27;t take nearly as much danger to disrupt a country. Runciman is probably correct that we’re less susceptible to a collapse into authoritarianism than we were in the past - but the same trends driving that change are also pushing us towards new reasons to worry.</p><p></p><p><em>In addition to the talk by Runciman, this post was inspired by discussions with my friends Todor and Julio, and benefited from their feedback.</em></p> richard_ngo tbfNF5qCGbjSmpM8E 2018-11-24T17:41:53.594Z Comment by richard_ngo on Some cruxes on impactful alternatives to AI policy work https://forum.effectivealtruism.org/posts/DW4FyzRTfBfNDWm6J/some-cruxes-on-impactful-alternatives-to-ai-policy-work#7mRd5sv2CC2weADis <p>Your points seem plausible to me. While I don&#x27;t remember exactly what I intended by the claim above, I think that one influence was some material I&#x27;d read referencing the original &quot;<a href="https://en.wikipedia.org/wiki/Productivity_paradox">productivity paradox</a>&quot; of the 70s and 80s. I wasn&#x27;t aware that there was a significant uptick in the 90s, so I&#x27;ll retract my claim (which, in any case, wasn&#x27;t a great way to make the overall point I was trying to convey).</p> richard_ngo 7mRd5sv2CC2weADis 2018-11-24T02:40:38.860Z Some cruxes on impactful alternatives to AI policy work https://forum.effectivealtruism.org/posts/DW4FyzRTfBfNDWm6J/some-cruxes-on-impactful-alternatives-to-ai-policy-work <p><em><a href="https://www.lesswrong.com/posts/DJB82jKwgJE5NsWgT/some-cruxes-on-impactful-alternatives-to-ai-policy-work">Crossposted from Less Wrong</a>.</em></p><p><a href="https://forum.effectivealtruism.org/users/ben-pace">Ben Pace</a> and I (Richard Ngo) recently did a public double crux at the Berkeley REACH on how valuable it is for people to go into AI policy and strategy work: I was optimistic and Ben was pessimistic. During the actual event, we didn&#x27;t come anywhere near to finding a double crux on that issue. But after a lot of subsequent discussion, we&#x27;ve come up with some more general cruxes about where impact comes from.</p><p>I found Ben&#x27;s model of how to have impact very interesting, and so in this post I&#x27;ve tried to explain it, along with my disagreements. Ben liked the goal of writing up a rough summary of our positions and having further discussion in the comments, so while he edited it somewhat he doesn’t at all think that it’s a perfect argument, and it’s not what he’d write if he spent 10 hours on it. He endorsed the wording of the cruxes as broadly accurate.</p><p>(During the double crux, we also discussed how the heavy-tailed worldview applies to community building, but decided on this post to focus on the object level of what impact looks like.)</p><p>Note from Ben: “I am not an expert in policy, and have not put more than about 20-30 hours of thought into it total as a career path. But, as I recently heard Robin Hanson say, there’s a common situation that looks like this: some people have a shiny idea that they think about a great deal and work through the details of, that folks in other areas are skeptical of given their particular models of how the world works. Even though the skeptics have less detail, it can be useful to publicly say precisely why they’re skeptical.</p><p>In this case I’m often skeptical when folks tell me they’re working to reduce x-risk by focusing on policy. Folks doing policy work in AI might be right, and I might be wrong, but it seemed like a good use of time to start a discussion with Richard about how I was thinking about it and what would change my mind. If the following discussion causes me to change my mind on this question, I’ll be really super happy with it.”</p><h2>Ben&#x27;s model: Life in a heavy-tailed world</h2><p>A <a href="https://en.wikipedia.org/wiki/Heavy-tailed_distribution">heavy-tailed distribution</a> is one where the probability of extreme outcomes doesn’t drop very rapidly, meaning that outliers therefore dominate the expectation of the distribution. Owen Cotton-Barratt has written a brief explanation of the idea <a href="https://www.effectivealtruism.org/articles/prospecting-for-gold-owen-cotton-barratt/#heavy-tailed-distributions">here</a>. Examples of heavy-tailed distributions include the Pareto distribution and the log-normal distribution; other phrases people use to point at this concept include ‘power laws’ (see <a href="https://www.amazon.co.uk/Zero-One-Notes-Start-Future/dp/0753555204/ref=sr_1_1?ie=UTF8&qid=1538077169&sr=8-1&keywords=zero+to+one">Zero to One</a>) and ‘black swans’ (see the recent <a href="http://slatestarcodex.com/2018/09/19/book-review-the-black-swan/">SSC book review</a>). Wealth is a heavy-tailed distribution, because many people are clustered relatively near the median, but the wealthiest people are millions of times further away. Human height and weight and running speed are not heavy-tailed; there is no man as tall as 100 people.</p><p>There are three key claims that make up Ben&#x27;s view.</p><p><strong>The first claim is that, since the industrial revolution, we live in a world where the impact that small groups can have is much more heavy-tailed than in the past.</strong></p><ul><li>People can affect incredibly large numbers of other people worldwide. The Internet is an example of a revolutionary development which allows this to happen very quickly.</li><li>Startups are becoming unicorns unprecedentedly quickly, and their valuations are very heavily skewed.</li><li>The impact of global health interventions is heavy-tail distributed. So is funding raised by Effective Altruism - two donors have contributed more money than everyone else combined.</li><li>Google and Wikipedia qualitatively changed how people access knowledge; people don&#x27;t need to argue about verifiable facts any more.</li><li>Facebook qualitatively changed how people interact with each other (e.g. FB events is a crucial tool for most local EA groups), and can swing elections.</li><li>It&#x27;s not just that we got more extreme versions of the same things, but rather that we can get unforeseen types of outcomes.</li><li>The books <em>HPMOR</em> and <em>Superintelligence</em> both led to mass changes in plans towards more effective ends via the efforts of individuals and small groups.</li></ul><p><strong>The second claim is that you should put significant effort into re-orienting yourself to use high-variance strategies.</strong></p><ul><li>Ben thinks that recommending strategies which are <em>safe</em> and <em>low-risk</em> is insane when pulling out of a heavy-tailed distribution. You want everyone to be taking high-variance strategies.</li><ul><li>This is only true if the tails are long to the right and not to the left, which seems true to Ben. Most projects tend to end up not pulling any useful levers whatever and just do nothing, but a few pull crucial levers and solve open problems or increase capacity for coordination.</li></ul><li>Your intuitions were built for the ancestral environment where you didn’t need to be able to think about coordinating humans on the scale of millions or billions, and yet you still rely heavily on the intuitions you’re built with in navigating the modern environment.</li><li><a href="https://www.lesswrong.com/posts/2ftJ38y9SRBCBsCzy/scope-insensitivity">Scope insensitivity</a>, <a href="https://www.lesswrong.com/posts/Nx2WxEuPSvNBGuYpo/feeling-moral">framing effects</a>, <a href="http://www.overcomingbias.com/2017/12/automatic-norms.html">taboo tradeoffs</a>, and <a href="https://rationalaltruist.com/2013/02/28/risk-aversion-and-investment-for-altruists/">risk aversion</a>, are the key things here. You need to learn to train your S1 to understand <em>math</em>.</li><ul><li>By default, you’re not going to spend enough effort finding or executing high-variance strategies.</li></ul><li>We&#x27;re still only 20 years into the internet era. Things keep changing qualitatively, but Ben feels like everyone keeps adjusting to the new technology as if it were always this way.</li><li>Ben: “My straw model of the vast majority of people’s attitudes is: I guess Facebook and Twitter are just things now. I won’t spend time thinking about whether I could build a platform as successful as those two but optimised better for e.g. intellectual progress or social coordination - basically not just money.”</li><li>Ben: “I do note that never in history has change been happening so quickly, so it makes sense that people’s intuitions are off.”</li><li>While many institutions have been redesigned to fit the internet, Ben feels like almost nobody is trying to improve institutions like science on a large scale, and that this is clear low-hanging altruistic fruit.</li><li>The Open Philanthropy Project has gone through this process of updating away from safe, low-risk bets with GiveWell, toward <a href="https://www.openphilanthropy.org/blog/hits-based-giving">hits-based giving</a>, which is an example of this kind of move.</li></ul><p><strong>The third claim is that AI policy is not a good place to get big wins nor to learn the relevant mindset.</strong></p><ul><li>Ben: “On a first glance, governments, politics and policy looks like the sort of place where I would not expect to find highly exploitable strategies, nor a place that will teach me the sorts of thinking that will help me find them in future.”</li><li>People in policy spend a lot of time thinking about how to influence governments. But governments are generally too conventional and slow to reap the benefits of weird actions with extreme outcomes.</li><li>Working in policy doesn&#x27;t cultivate the right type of thinking. You&#x27;re usually in a conventional governmental (or academic) environment, stuck inside the system, getting seduced by local incentive gradients and prestige hierarchies. You often need to spend a long time working your way to positions of actual importance in the government, which leaves you prone to value drift or over-specialisation in the wrong thing.</li><ul><li>At the very least, you have to operate on the local incentives as well as someone who actually cares about them, which can be damaging to one’s ability to think clearly.</li></ul><li>Political landscapes are not the sort of environment where people can easily ignore the local social incentives to focus on long-term, global goals. Short term thinking (election cycles, media coverage, etc) is not the sort of thinking that lets you build a new institution over 10 years or more.</li><ul><li>Ben: “When I’ve talked to senior political people, I’ve often heard things of the sort ‘We were working on a big strategy to improve infrastructure / international aid / tech policy etc, but then suddenly public approval changed and then we couldn’t make headway / our party wasn’t in power / etc.’ which makes me think long term planning is strongly disincentivised.”</li></ul><li>One lesson of a heavy-tailed world is that signals that you’re taking safe bets are <em>anti-signals</em> of value. Many people following a standard academic track saying “Yeah, I’m gonna get a masters in public policy” sounds <em>fine</em>, <em>sensible, and safe</em>, and therefore <em>cannot</em> be an active sign that you will do something a million times more impactful than the median.</li></ul><p>The above is not a full, gears-level analysis of how to find and exploit a heavy tail, because almost all of the work here lies in identifying the particular strategy. Nevertheless, because of the considerations above, Ben thinks that talented, agenty and rational people should be able in many cases to identify places to win, and then execute those plans, and that this is much less the case in policy.</p><h2>Richard&#x27;s model: Business (mostly) as usual</h2><p>I disagree with Ben on all three points above, to varying degrees.</p><p>On the first point, I agree that the distribution of success has become much more heavy-tailed since the industrial revolution. However, I think the distribution of success is often very different from the distribution of impact, because of replacement effects. If Facebook hadn&#x27;t become the leading social network, then MySpace would have. If not Google, then Yahoo. If not Newton, then Leibniz (and if Newton, then Leibniz anyway). Probably the alternatives would have been somewhat worse, but not significantly so (and if they were, different competitors would have come along). The distinguishing trait of modernity is that even a small difference in quality can lead to a huge difference in earnings, via network effects and global markets. But that isn&#x27;t particularly interesting from an x-risk perspective, because money isn&#x27;t anywhere near being our main bottleneck.</p><p>You might think that since Facebook has billions of users, their executives are a small group with a huge amount of power, but I claim that they&#x27;re much more constrained by competitive pressures than they seem. Their success depends on the loyalty of their users, but the bigger they are, the easier it is for them to seem untrustworthy. They also need to be particularly careful since antitrust cases have busted the dominance of several massive tech companies before. (While they could swing a few elections before being heavily punished, I don’t think this is unique to the internet age - a small cabal of newspaper owners could probably have done the same centuries ago). Similarly, I think the founders of Wikipedia actually had fairly little counterfactual impact, and currently have fairly little power, because they&#x27;re reliant on editors who are committed to impartiality.</p><p>What we should be more interested in is cases where small groups didn&#x27;t just ride a trend, but actually created or significantly boosted it. Even in those cases, though, there&#x27;s a big difference between success and impact. Lots of people have become very rich from shuffling around financial products or ad space in novel ways. But if we look at the last fifty years overall, they&#x27;re far from dominated by extreme transformative events - in fact, Western societies have changed very little in most ways. Apart from IT, our technology remains roughly the same, our physical surroundings are pretty similar, and our standards of living have stayed flat or even dropped slightly. (This is a version of Tyler Cowen and Peter Thiel&#x27;s views; for a better articulation, I recommend <em>The Great Stagnation</em> or <em>The Complacent Class).</em> Well, isn&#x27;t IT enough to make up for that? I think it will be eventually, as AI develops, but right now most of the time spent on the internet is wasted. I don&#x27;t think current IT has had much of an effect by standard metrics of labour productivity, for example.</p><p><strong>Should you pivot?</strong></p><p>Ben might claim that this is because few people have been optimising hard for positive impact using high-variance strategies. While I agree to some extent, I also think that there are pretty strong incentives to have impact regardless. We&#x27;re in the sort of startup economy where scale comes first and monetisation comes second, and so entrepreneurs already strive to create products which influence millions of people even when there’s no clear way to profit from them. And entrepreneurs are definitely no strangers to high-variance strategies, so I expect most approaches to large-scale influence to already have been tried.</p><p>On the other hand, I do think that reducing existential risk is an area where a small group of people are managing to have a large influence, a claim which seems to contrast with the assertion above. I’m not entirely sure how to resolve this tension, but I’ve been thinking lately about an analogy from finance. <a href="https://medium.com/conversations-with-tyler/nate-silver-conversations-with-tyler-1bdafe685d77">Here&#x27;s Tyler Cowen</a>:</p><blockquote>I see a lot of money managers, so there’s Ray Dalio at Bridgewater. He saw one basic point about real interest rates, made billions off of that over a great run. Now it’s not obvious he and his team knew any better than anyone else.</blockquote><blockquote>Peter Lynch, he had fantastic insights into consumer products. Use stuff, see how you like it, buy that stock. He believed that in an age when consumer product stocks were taking off.</blockquote><blockquote>Warren Buffett, a certain kind of value investing. Worked great for a while, no big success, a lot of big failures in recent times.</blockquote><p>The analogy isn’t perfect, but the idea I want to extract is something like: once you’ve identified a winning strategy or idea, you can achieve great things by exploiting it - but this shouldn’t be taken as strong evidence that you can do exceptional things in general. For example, having a certain type of personality and being a fan of science fiction is very useful in identifying x-risk as a priority, but not very useful in founding a successful startup. Similarly, being a philosopher is very useful in identifying that helping the global poor is morally important, but not very useful in figuring out how to solve systemic poverty.</p><p>From this mindset, instead of looking for big wins like “improving intellectual coordination”, we should be looking for things which are easy conditional on existential risk actually being important, and conditional on the particular skillsets of x-risk reduction advocates. Another way of thinking about this is as a distinction between high-impact goals and high-variance strategies: once you’ve identified a high-impact goal, you can pursue it without using high-variance strategies. Startup X may have a crazy new business idea, but they probably shouldn&#x27;t execute it in crazy new ways. Actually, their best bet is likely to be joining Y Combinator, getting a bunch of VC funding, and following Paul Graham&#x27;s standard advice. Similarly, reducing x-risk is a crazy new idea for how to improve the world, but it&#x27;s pretty plausible that we should pursue it in ways similar to those which other successful movements used. Here are some standard things that have historically been very helpful for changing the world:</p><ul><li>dedicated activists</li><li>good research</li><li>money</li><li>public support</li><li>political influence</li></ul><p>My prior says that all of these things matter, and that most big wins will be due to direct effects on these things. The last two are the ones which we’re disproportionately lacking; I’m more optimistic about the latter for a variety of reasons.</p><p><strong>AI policy is a particularly good place to have a large impact.</strong></p><p>Here&#x27;s a general argument: governments are very big levers, because of their scale and ability to apply coercion. A new law can be a black swan all by itself. When I think of really massive wins over the past half-century, I think about the eradication of smallpox and polio, the development of space technology, and the development of the internet. All of these relied on and were driven by governments. Then, of course, there are the massive declines in poverty across Asia in particular. It&#x27;s difficult to assign credit for this, since it&#x27;s so tied up with globalisation, but to the extent that any small group was responsible, it was Asian governments and the policies of Deng Xiaoping, Lee Kuan Yew, Rajiv Gandhi, etc.</p><p>You might agree that governments do important things, but think that influencing them is very difficult. Firstly, that&#x27;s true for most black swans, so I don&#x27;t think that should make policy work much less promising even from Ben&#x27;s perspective. But secondly, from the outside view, our chances are pretty good. We&#x27;re a movement comprising many very competent, clever and committed people. We&#x27;ve got the sort of backing that makes policymakers take people seriously: we&#x27;re affiliated with leading universities, tech companies, and public figures. It&#x27;s likely that a number of EAs at the best universities already have friends who will end up in top government positions. We have enough money to do extensive lobbying, if that&#x27;s judged a good idea. Also, we&#x27;re correct, which usually helps. The main advantage we&#x27;re missing is widespread popular support, but I don&#x27;t model this as being crucial for issues where what&#x27;s needed is targeted interventions which &quot;pull the rope sideways&quot;. (We&#x27;re also missing knowledge about what those interventions should be, but that makes policy research even more valuable).</p><p>Here&#x27;s a more specific route to impact: in a few decades (assuming long timelines and slow takeoff) AIs that are less generally intelligent that humans will be causing political and economic shockwaves, whether that&#x27;s via mass unemployment, enabling large-scale security breaches, designing more destructive weapons, psychological manipulation, or something even less predictable. At this point, governments will panic and AI policy advisors will have real influence. If competent and aligned people were the obvious choice for those positions, that&#x27;d be fantastic. If those people had spent several decades researching what interventions would be most valuable, that&#x27;d be even better.</p><p>This perspective is inspired by Milton Friedman, who argued that the way to create large-scale change is by nurturing ideas which will be seized upon in a crisis.</p><blockquote>Only a crisis - actual or perceived - produces real change. When that crisis occurs, the actions that are taken depend on the ideas that are lying around. That, I believe, is our basic function: to develop alternatives to existing policies, to keep them alive and available until the politically impossible becomes the possible.</blockquote><p>The major influence of the Institute of Economic Affairs on Thatcher’s policies is an example of this strategy’s success. An advantage of this approach is that it can be implemented by clusterings of like-minded people collaborating with each other; for that reason, I&#x27;m not so worried about policy work cultivating the wrong mindset (I&#x27;d be more worried on this front if policy researchers were very widely spread out).</p><p>Another fairly specific route to impact: several major AI research labs would likely act on suggestions for coordinating to make AI safer, if we had any. Right now I don’t think we do, and so research into that could have a big multiplier. If a government ends up running a major AI lab (which seems pretty likely conditional on long timelines) then they may also end up following this advice, via the effect described in the paragraph above.</p><p><strong>Underlying generators of this disagreement</strong></p><p>More generally, Ben and I disagree on where the bottleneck to AI safety is. I think that finding a technical solution is probable, but that most solutions would still require careful oversight, which may or may not happen (maybe 50-50). Ben thinks that finding a technical solution is improbable, but that if it&#x27;s found it&#x27;ll probably be implemented well. I also have more credence on long timelines and slow takeoffs than he does. I think that these disagreements affect our views on the importance of influencing governments in particular.</p><p>We also have differing views on what the x-risk reduction community should look like. I favour a broader, more diverse community; Ben favours a narrower, more committed community. I don&#x27;t want to discuss this extensively here, but I will point out that there are many people who are much better at working within a system than outside it - people who would do well in AI safety PhDs, but couldn&#x27;t just teach themselves to do good research from scratch like Nate Soares did; brilliant yet absent-minded mathematicians; people who could run an excellent policy research group but not an excellent startup. I think it&#x27;s valuable for such people (amongst which I include myself), to have a &quot;default&quot; path to impact, even at the cost of reducing the pressure to be entrepreneurial or agenty. I think this is pretty undeniable when it comes to technical research, and cross-applies straightforwardly to policy research and advocacy.</p><p>Ben and I agree that going into policy is much more valuable if you&#x27;re thinking very strategically and <a href="https://www.lesswrong.com/posts/qu95AwSrKqQSo4fCY/the-outside-the-box-box">out of the &quot;out of the box&quot; box</a> than if you&#x27;re not. Given this mindset, there will probably turn out to be valuable non-standard things which you can do.</p><p>Do note that this essay is intrinsically skewed since I haven&#x27;t portrayed Ben&#x27;s arguments in full fidelity and have spent many more words arguing my side. Also note that, despite being skeptical about some of Ben&#x27;s points, I think his overall view is important and interesting and more people should be thinking along similar lines.</p><p><em>Thanks to Anjali Gopal for comments on drafts.</em></p> richard_ngo DW4FyzRTfBfNDWm6J 2018-11-22T13:43:40.684Z Comment by richard_ngo on Insomnia: a promising cure https://forum.effectivealtruism.org/posts/q8g2MXQCmKoYhEjsT/insomnia-a-promising-cure#rJqne5Z68dZEKfo5k <p>CBT-I is also recommended in Why We Sleep (see <a href="http://thinkingcomplete.blogspot.com/2018/08/book-review-why-we-sleep.html">my summary of the book</a>).</p><p>Nitpick: &quot;The former two have diminishing returns, but the latter does not.&quot; It definitely does - I think getting 12 or 13 hours sleep is actively worse for you than getting 9 hours.</p> richard_ngo rJqne5Z68dZEKfo5k 2018-11-20T15:34:14.648Z Comment by richard_ngo on What's Changing With the New Forum? https://forum.effectivealtruism.org/posts/dMJ475rYzEaSvGDgP/what-s-changing-with-the-new-forum#FX56ZezJZfpjTs4Sj <blockquote>Posts on the new Forum are split into two categories:</blockquote><blockquote>Frontpage posts are timeless content covering the ideas of effective altruism. They should be useful or interesting even to readers who only know the basic concepts of EA and aren’t very active within the community.</blockquote><p>I&#x27;m a little confused about this description. I feel like intellectual progress often requires presupposition of fairly advanced ideas which build on each other, and which are therefore inaccessible to &quot;readers who only know the basic concepts&quot;. Suppose that I wrote a post outlining views on AI safety aimed at people who already know the basics of machine learning, or a post discussing a particular counter-argument to an unusual philosophical position. Would those not qualify as frontpage posts? If not, where would they go? And where do personal blogs fit into this taxonomy?</p> richard_ngo FX56ZezJZfpjTs4Sj 2018-11-08T10:45:51.092Z Comment by richard_ngo on Why Do Small Donors Give Now, But Large Donors Give Later? https://forum.effectivealtruism.org/posts/pZmjjeAmGz7gFiGCP/why-do-small-donors-give-now-but-large-donors-give-later#TqKajhrThZpW482fG <p>It&#x27;s a clever explanation, but I&#x27;m not sure how much to believe it without analysing other hypotheses. E.g. maybe tax-deductibility is a major factor, or maybe it&#x27;s just much harder to give away large amounts of money quickly. </p> richard_ngo TqKajhrThZpW482fG 2018-10-30T14:55:32.539Z Comment by richard_ngo on What is the Most Helpful Categorical Breakdown of Normative Ethics? https://forum.effectivealtruism.org/posts/93GmXP8ikHhQzjs2i/what-is-the-most-helpful-categorical-breakdown-of-normative#YuieWSp2DrxfRftni <p>I think it's a mischaracterisation to think of virtue ethics in terms of choosing the most virtuous actions (in fact, one common objection to virtue ethics is that it doesn't help very much in choosing actions). I think virtue ethics is probably more about <em>being</em> the most virtuous, and making decisions for virtuous reasons. There's a difference: e.g. you're probably not virtuous if you choose normally-virtuous actions for the wrong reasons.</p> <p>For similar reasons, I disagree with cole_haus that virtue ethicists choose actions to produce the most virtuous outcomes (although there is at least one school of virtue ethics which seems vaguely consequentialist, the eudaimonists. See <a href="https://plato.stanford.edu/entries/ethics-virtue)">https://plato.stanford.edu/entries/ethics-virtue)</a>. Note however that I haven't actually looked into virtue ethics in much detail.</p> <p>Edit: contractarianism is a fourth approach which doesn't fit neatly into either division</p> richard_ngo YuieWSp2DrxfRftni 2018-08-15T21:23:16.519Z Comment by richard_ngo on Fisher & Syed on Tradable Obligations to Enhance Health https://forum.effectivealtruism.org/posts/pSWwYLJKquZa55WWA/fisher-and-syed-on-tradable-obligations-to-enhance-health#wNmYfRaKWtwAuxZsE <p>My default position would be that IKEA have an equal obligation, but that it's much more difficult and less efficient to try and make IKEA fulfill that obligation.</p> richard_ngo wNmYfRaKWtwAuxZsE 2018-08-14T13:30:35.937Z Comment by richard_ngo on Request for input on multiverse-wide superrationality (MSR) https://forum.effectivealtruism.org/posts/92wCvqF73Gzg5Jnrr/request-for-input-on-multiverse-wide-superrationality-msr#85yKwD5yos4pTnLoD <p>A few doubts:</p> <ol> <li><p>It seems like MSR requires a multiverse large enough to have many well-correlated agents, but not large enough to run into the problems involved with infinite ethics. Most of my credence is on no multiverse or infinite multiverse, although I'm not particularly well-read on this issue.</p> </li> <li><p>My broad intuition is something like &quot;Insofar as we can know about the values of other civilisations, they're probably similar to our own. Insofar as we can't, MSR isn't relevant.&quot; There are probably exceptions, though (e.g. we could guess the direction in which an r-selected civilisation's values would vary from our own).</p> </li> <li><p>I worry that MSR is susceptible to self-mugging of some sort. I don't have a particular example, but the general idea is that you're correlated with other agents <em>even if you're being very irrational</em>. And so you might end up doing things which seem arbitrarily irrational. But this is just a half-fledged thought, not a proper objection.</p> </li> <li><p>And lastly, I would have much more confidence in FDT and superrationality in general if there were a sensible metric of similarity between agents, apart from correlation (because if you always cooperate in prisoner's dilemmas, then your choices are perfectly correlated with CooperateBot, but intuitively it'd still be more rational to defect against CooperateBot, because your decision algorithm isn't similar to CooperateBot in the same way that it's similar to your psychological twin). I guess this requires a solution to logical uncertainty, though.</p> </li> </ol> <p>Happy to discuss this more with you in person. Also, I suggest you cross-post to Less Wrong.</p> richard_ngo 85yKwD5yos4pTnLoD 2018-08-14T13:15:21.842Z Comment by richard_ngo on Want to be more productive? https://forum.effectivealtruism.org/posts/yLd2c2rg5Jrq9HR6K/want-to-be-more-productive#uAae86EXBZvMbTirQ <p>As a followup to byanyothername's questions: Could you say a little about what distinguishes your coaching from something like a CFAR workshop?</p> richard_ngo uAae86EXBZvMbTirQ 2018-06-12T00:28:58.430Z Comment by richard_ngo on EA Hotel with free accommodation and board for two years https://forum.effectivealtruism.org/posts/JdqHvyy2Tjcj3nKoD/ea-hotel-with-free-accommodation-and-board-for-two-years#K8Kxiqh3P5EmsdLiv <p>Kudos for doing this. The main piece of advice which comes to mind is to make sure to push this via university EA groups. I don't think you explicitly identified students as a target demographic in your post, but current students and new grads have the three traits which make the hotel such an attractive proposition: they're unusually time-rich, cash-poor, and willing to relocate.</p> richard_ngo K8Kxiqh3P5EmsdLiv 2018-06-05T13:13:07.219Z