Nice, thanks for those links, great to have those linked here since we didn't point to them in the report. I've seen the Open Phil one but I don't think I'd seen the Animal Ethics study, it looks very interesting.
Thanks for raising the point about speed of establishment for Clean Meat and Genetic Circuits! Our definition for the "origin year" (from here) is "The year that the technology or area is purposefully explored for the first time." So it's supposed to be when someone starts working on it, not when someone first has the idea. We think that Willem van Eelen started working on developing clean meat in the 1950's, so we set the origin year to be around then. Whereas as far as we're aware no-one was working on genetic circuits until much later.
At the moment I'm not sure whether the supplementary notes say anywhere that we think van Eelen was working on developing clean meat in the 50's, I think Megan is going to update the notes to make this clearer.
Thanks both (and Owen too), I now feel more confident that geometric mean of odds is better!
(Edit: at 1:4 odds I don't feel great about a blanket recommendation, but I guess the odds at which you're indifferent to taking the bet are more heavily stacked against us changing our mind. And Owen's <1% is obviously way lower)
(don't feel extremely confident about the below but seemed worth sharing)
I think it's really great to flag this! But as I mentioned to you elsewhere I'm not sure we're certain enough to make a blanket recommendation to the EA community.
I think we have some evidence that geometric mean of odds is better, but not that much evidence. Although I haven't looked into the evidence that Simon_M shared from Metaculus.
I guess I can potentially see us changing our minds in a year's time and deciding that arithmetic mean of probabilities is better after all, or that some other method is better than both of these.
Then maybe people will have made a costly change to a new method (learning what odds are, what a geometric mean is, learning how to apply it in practice, maybe understanding the argument for using the new method) that turns out not to have been worth it.
I mean, depending on what you mean by "an okay approach sometimes... especially when you want to do something quick and dirty" I may agree with you! What I said was:
This is not Tetlock’s advice, nor is it the lesson from the forecasting tournaments, especially if we use the nebulous modern definition of “outside view” instead of the original definition.
I guess I was reacting to the part just after the bit you quoted
For an entire book written by Yudkowsky on why the aforementioned forecasting method is bogus
Which I took to imply "Daniel thinks that the aforementioned forecasting method is bogus". Maybe my interpretation was incorrect. Anyway, seems very possible we in fact roughly agree here.
Re your 1, 2, 3, 4: It seems cool to try doing 4, and I can believe it's better (I don't have a strong view). Fwiw re 1 vs 2, my initial reaction is that partitioning by outside/inside view lets you decide how much weight you give to each, and maybe we think that for non-experts it's better to mostly give weight to the outside view, so the partitioning performed a useful service. I guess this is kind of what you were trying to argue against and unfortunately you didn't convince me to repent :).
Here are some forecasts for near-term progress / impacts of AI on research. They are the results of some small-ish number of hours of reading + thinking, and shouldn't be taken at all seriously. I'm sharing in case it's interesting for people and especially to get feedback on my bottom line probabilities and thought processes. I'm pretty sure there are some things I'm very wrong about in the below and I'd love for those to be corrected.
Deepmind will announce excellent performance from Alphafold2 (AF2) or some successor / relative for multi-domain proteins by end of 2023; or some other group will announced this using some AI scheme: 80% probability
Deepmind will announce excellent performance from AF2 or some successor / relative for protein complexes by end of 2023; or some other group will announced this using some AI scheme: 70% probability
Widespread adoption of a system like OpenAI Codex for data analysis will happen by end of 2023: 20% probability
I realise that "excellent performance" etc is vague, I choose to live with that rather than putting in the time to make everything precise (or not doing the exercise at all).
If you don't know what multi-domain proteins and protein complexes are, I found this Mohammed Al Quraishi blog very useful (maybe try ctrl-f for those terms), although maybe you need to start with some relevant background knowledge. I don't have a great sense for how big a deal this would be for various areas of biological science, but my impression is that they're both roughly the same order of magnitude of usefulness as getting excellent performance on single-domain proteins was (i.e. what AF2 has already achieved).
As for why:
80% chance that excellent AI performance on multi-domain proteins is announced by end of 2023
There is quite a lot of data (though 10x less than for single proteins)
Unsure whether transfer learning (I maybe using the wrong term) is relevant here?
Top reasons against
Maybe even if it's done by say mid 2023 it won't be announced until after 2023 because of Deepmind's media strategy
In particular, targeting a CASP would seem to require the high performance to be achieved by mid 2022; maybe this is the most likely scenario in worlds where Deepmind announces this before end of 2023
(although if Deepmind doesn't get there by CASP15, it seems like another group might announce something in say 2023)
Protein complexes are (maybe?) qualitatively different to single proteins
Other reasons against
Maybe the lack of data will be decisive
Maybe Deepmind's priorities will change, etc, as in noted above in the multi-domain case
Are rival schemes targeting this?
20% chance of widespread adoption of a system like OpenAI Codex for data analysis by end of 2023
(NB this is just about data analysis / "data science" rather than about usage of Codex in general)
My "best guess" scenario
OpenAI releases an API for data science that is cheap but not free. In its current iteration, the software is "handy" but not more than that. A later iteration, released in 2023, is significantly more powerful and useful. But by the end of 2023 it is still not yet "widely used".
Some reasons against event happening
Maybe Codex is currently not that useful in practice for data analysis
I think OpenAI won't release it for free so it won't become part of the "standard toolkit" in the same way that e.g. RStudio has
Things like RStudio take a long time to diffuse / become adopted
E.g. my guess is ~5 years to get 25% uptake of for ipython notebook or rstudio by data scientists, or something like that
How much are OpenAI going to push this on people?
How much are they pushing the data science aspect particularly?
Will this be ~free to use or will it be licensed?
How quickly will it improve? How often does OpenAI release improved versions of things?
How fast did ipython notebook/rstudio get adopted?
Separately, various people seem to think that the appropriate way to make forecasts is to (1) use some outside-view methods, (2) use some inside-view methods, but only if you feel like you are an expert in the subject, and then (3) do a weighted sum of them all using your intuition to pick the weights. This is not Tetlock’s advice, nor is it the lesson from the forecasting tournaments, especially if we use the nebulous modern definition of “outside view” instead of the original definition. (For my understanding of his advice and those lessons, see this post, part 5. For an entire book written by Yudkowsky on why the aforementioned forecasting method is bogus, see Inadequate Equilibria, especially this chapter. Also, I wish to emphasize that I myself was one of these people, at least sometimes, up until recently when I noticed what I was doing!)
This is a bit tangential to the main point of your post, but I thought I'd give some thoughts on this, partly because I basically did exactly this procedure a few months ago in an attempt to come to a personal all-things-considered view about AI timelines (although I did "use some inside-view methods" even though I don't at all feel like I'm an expert in the subject!).
I liked your AI Impacts post, thanks for linking to it! Maybe a good summary of the recommended procedure is the part at the very end. I do feel like it was useful for me to read it.
Tetlock describes how superforecasters go about making their predictions.56 Here is an attempt at a summary:
Sometimes a question can be answered more rigorously if it is first “Fermi-ized,” i.e. broken down into sub-questions for which more rigorous methods can be applied.
Next, use the outside view on the sub-questions (and/or the main question, if possible). You may then adjust your estimates using other considerations (‘the inside view’), but do this cautiously.
Seek out other perspectives, both on the sub-questions and on how to Fermi-ize the main question. You can also generate other perspectives yourself.
Repeat steps 1 – 3 until you hit diminishing returns.
Your final prediction should be based on an aggregation of various models, reference classes, other experts, etc.
I'm less sure about the direct relevance of Inadequate Equilibria for this, apart from it making the more general point that ~"people should be less scared of relying on their own intuition / arguments / inside view". Maybe I haven't scrutinised it closely enough.
To be clear, I don't think "weighted sum of 'inside views' and 'outside views'" is the gold standard or something. I just think it's an okay approach sometimes (maybe especially when you want to do something "quick and dirty").
If you strongly disagree (which I think you do), I'd love for you to change my mind! :)
This from Paul Christiano in 2014 is also very relevant (part of it makes similar points to a lot of the recent stuff from Open Philanthropy, but the arguments are very brief; it's interesting to see how things have evolved over the years): Three impacts of machine intelligence
(idea probably stolen from somewhere else) create an organisation employing an army of superforecasters to gather facts and/or forecasts about the world that are vitally important from an EA perspective.
Maybe it's hard to get to $100million? E.g. 400 employees each costing $250k would get you there, which (very naively) seems on the high end of what's likely to work well. Also e.g. other comments in this post have said that CSET was set up for $55m/5 years.
Here are some thoughts after reading a book called "The Inner Game of Tennis" by Timothy Gallwey. I think it's quite a famous book and maybe a lot of people know it well already. I consider it to be mainly about how to prevent your system 2/conscious mind/analytical mind from interfering with the performance of your system 1/subconscious mind/intuitive mind. This is explained in the context of tennis, but it seems applicable to many other contexts, as the author himself argues. If that sounds interesting, I recommend checking the book out, it's short and quite readable.
My interest in the book comes mainly from thinking about the best way to go about doing research, at a day-to-day level. Although the arguments of the book seem most directly applicable to learning a physical skill/activity and (to some extent) to performing well at key moments, I still think there are lessons for mental activities performed routinely, i.e. for activities like research.
I think reading the book has generally pushed me a bit more in favour of "trusting my system 1/intuitive mind" while doing research, e.g. trusting that my brain is doing some important processing when I feel inclined to just stare into space and not make any apparent progress to whatever it is I'm trying to achieve at that moment. This feels pretty important.
I think Owen Cotton Barratt says some interesting things about trusting his intuition for prioritisation in this interview with Lynette Bye, which feels kind of related.
The book predates by many decades Kahneman's Thinking Fast and Slow, which (I think) popularised the concept of system 1 mind and system 2 mind. The book instead refers to "self 1" and "self 2" which seem to have roughly similar meanings, although unfortunately reversed: Gallwey's self 1 and Kahneman's system 2 refer to the conscious/analytical mind, while Gallwey's self 2 and Kahneman's system 1 refer to the subconscious/intuitive mind.
Here are some disorganised notes on bits that seemed worth highlighting (page numbers refer to 2015 edition published by Pan Books):
p13 mastering the mental side of tennis:
picture desired outcomes as clearly as possible
allow self 2 to perform and learn from successes and failures
learn to see non-judgementally: see what is happening rather than (just) seeing how well or badly it's happening
all subsidiary to the master skill: relaxed concentration
p38 "Remember that you are not your tennis game. You are not your body. Trust the body to learn and to play, as you would trust another person to do a job, and in a short time it will perform beyond your expectations. Let the flower grow."
p41 communicating with self 2
Gallwey exhorts the reader to trust their self 2 (system 1 / intuitive mind). But how can we be sure that self 2 will be optimising for the thing "we" (self 1) thinks is important? Gallwey gives 3 ways to convey to self 2 what the goal is, in the context of tennis:
Asking for results: visualise the exact path of the ball. Hold that image in your mind for several seconds
Asking for form: observe some particular aspect of your form (e.g. the flatness of your racket while it moves through the ball). Don't make an effort to make the change. Just visualise the change you want
Asking for qualities: imagine you are playing the role of a top tennis player on the court for a film
There are particular benefits of playing the role of someone very different to you
I'm not sure how to turn this into policies for doing research well. Things that seem interesting to explore: visualising the output you want at the start of the day; reflecting each day on how what you did links to your ultimate goals; picturing yourself as playing the role of a researcher you admire.
p80 on the "ego satisfaction" from a self-1-controlled success
Gallwey talks a lot about the ego satisfaction from self-1-controlled success.
In the context of research, this doesn't seem to ring true for me from my experience (maybe it's obviously true for tennis or similar activities for people who have experience there, I don't know).
p82 "Fighting the mind does not work. What works best is learning to focus it"
Gallwey talks about focussing on the seams of the ball and other techniques to focus the (self 1) mind on something kind of irrelevant while playing tennis so that the body and self 2 can perform without interference.
p87 on what focus is: "Focus is not achieved by staring hard at something. It is not trying to force focus, nor does it mean thinking hard about something. Natural focus occurs when the mind is interested. When this occurs, the mind is drawn irresistibly toward the object (or subject) of interest. It is effortless and relaxed, not tense and overly controlled."
Re research, this seems like good advice for tackling a difficult problem or making progress on some task. One related thing is that I find it much easier to "effortlessly focus" on what I think is important if I'm free of distractions.
p127 On managing stress. Pressures come at us from all corners: demands from partners, bosses, coaches, society, etc. These external demands can end up being internalised by self 1 and feeling as if they're things you really want, but this is an illusion.
(kind of reminds me of the message from another book I liked called Essentialism)
Takeaways from some reading about economic effects of human-level AI
I spent some time reading things that you might categorise as “EA articles on the impact of human-level AI on economic growth”. Here are some takeaways from reading these (apologies for not always providing a lot of context / for not defining terms; hopefully clicking the links will provide decent context).
Thanks for this, I think it's really brilliant, I really appreciate how clearly the details are laid out in the blog and report. It's really cool to be able to see external reviewer comments too.
I found it kind of surprising that there isn't any mention of civilizational collapse etc when thinking about growth outcomes for the 21st century (e.g. in Appendix G, but also apparently in your bottom line probabilities in e.g. Section 4.6 "Conclusion" -- or maybe it's there and I missed it / it's not explicit).
I guess your probabilities for various growth outcomes in Appendix G are conditional on ~no civilizational collapse (from any cause) and ~no AI-triggered fundamental reshaping of society that unexpectedly prevents growth? Or should I read them more as "conditional on ~no civilizational collapse etc other than due to AI", with the probability mass for AI-triggered collapse etc being incorporated into your "AI robots don't have a tendency to drive explosive growth because none of our theories are well-suited for this situation" and/or "an unanticipated bottleneck prevents explosive growth"?
I wrote this last Autumn as a private “blog post” shared only with a few colleagues. I’m posting it publicly now (after mild editing) because I have some vague idea that it can be good to make things like this public. Decision theories are a pretty well-worn topic in EA circles and I'm definitely not adding new insights here. These are just some fairly naive thoughts-out-loud about how CDT and EDT handle various scenarios. If you've already thought a lot about decision theory you probably won't learn anything from this.
The last two weeks of the Decision Theory seminars I’ve been attending have focussed on contrasting causal decision theory (CDT) and evidential decision theory (EDT). This seems to be a pretty active area of discussion in the literature - one of the papers we looked at was published this year, and another is yet to be published.
In terms of the history of the field, it seems that Newcomb’s problem prompted a move towards CDT (e.g. in Lewis 1981). I find that pretty surprising because to me Newcomb’s problem provides quite a bit of motivation for EDT, and without weird scenarios like Newcomb’s problem I think I might have taken something like CDT to be the default, obviously correct theory. But it seems like you didn’t need to worry about having a “causal aspect” to decision theories until Newcomb’s problem and other similar problems brought out a divergence in recommendations from (what became known as) CDT and EDT.
I guess this is a very well-worn area (especially in places like Lesswrong) but anyway I can’t resist giving my fairly naive take even though I’m sure I’m just repeating what others have said. When I first heard about things like Newcomb’s problem a few years ago I think I was a pretty ardent CDTer, whereas nowadays I am much more sympathetic to EDT.
In Newcomb’s problem, it seems pretty clear to me that one-boxing is the best option, because I’d rather have $1,000,000 than $1000. Seems like a win for EDT.
Dicing With Death is designed to give CDT problems, and in my opinion it does this very effectively. In Dicing With Death, you have to choose between going to Aleppo or Damascus, and you know that whichever you choose, death will have predicted your choice and be waiting for you (a very bad outcome for you). Luckily, a merchant offers you a magical coin which you can toss to decide where to go, in which case death won’t be able to predict where you go, giving you a 50% chance of avoiding death. The merchant will charge a small fee for this. However CDT gets into some strange contortions and as a result recommends against paying for the magical coin, even though the outcome if you pay for the magical coin seems clearly better. EDT recommends paying for the coin, another win for EDT.
To me, The Smoking Lesion is a somewhat problematic scenario for EDT. Still, I feel like it’s possible for EDT to do fine here if you think carefully enough.
You could make the following simple model for what happens in The Smoking Lesion: in year 1, no-one knows why some people get cancer and some don’t. In year 2, it’s discovered that everyone who smokes develops cancer, and furthermore there’s a common cause (a lesion) that causes both of these things. Everyone smokes iff they have the lesion, and everyone gets cancer iff they have the lesion. In year 3, following the publication of these results, some people who have the lesion try not to smoke. Either (i) none of them can avoid smoking because the power of the lesion is too strong; or (ii) some of them do avoid smoking, but (since they still have the lesion) they still develop cancer. In case (i), the findings from year 2 remain valid even after everyone knows about them. In case (ii), the findings from year 2 are no longer valid: they just tell you about how the world would have been if the correlation between smoking and cancer wasn’t known.
The cases where you use the knowledge about the year 2 finding to decide not to smoke are exactly the cases where the year 2 finding doesn’t apply. So there’s no point in using the knowledge about the year 2 finding to not smoke: either your not smoking (through extreme self-control etc) is pointless because you still have the lesion and this is a case where the year 2 finding doesn’t apply, or it’s pointless because you don’t have the lesion.
So it seems to me like the right answer is to smoke if you want to, and I think EDT can recommend this by incorporating the fact that if you choose not to smoke purely because of the year 2 finding, this doesn’t give you any evidence about whether you have the lesion (though this is pretty vague and I wouldn’t be that surprised if making it more precise made me realise it doesn’t work).
In general it seems like these issues arise from treating the agent’s decision making process as being removed from the physical world - a very useful abstraction which causes issues in weird edge cases like the ones considered above.
I wrote this last Autumn as a private “blog post” shared only with a few colleagues. I’m posting it publicly now (after mild editing) because I have some vague idea that it can be good to make things like this public. It is quite rambling and doesn't really have a clear point (but I think it's at least an interesting topic).
Say you want to come up with a model for AI timelines, i.e. the probability of transformative AI being developed by year X for various values of X. You put in your assumptions (beliefs about the world), come up with a framework for combining them, and get an answer out. But then you’re not happy with the answer - your framework must have been flawed, or maybe on reflection one of your assumptions needs a bit of revision. So you fiddle with one or two things and get another answer - now it looks much better, close enough to your prior belief that it seems plausible, but not so close that it seems suspicious.
Is this kind of procedure valid? Here’s one case where the answer seems to be yes: if your conclusions are logically impossible, you know that either there’s a flaw in your framework or you need to revise your assumptions (or both).
A closely related case is where the conclusion is logically possible, but extremely unlikely. It seems like there’s a lot of pressure to revise something then too.
But in the right context revising your model in this way can look pretty dodgy. It seems like you’re “doing things the wrong way round” - what was the point of building the model if you were going to fiddle with the assumptions until you got the answer you expected anyway?
I think this is connected to a lot of related issues / concepts:
Option pricing models in finance: you start (both historically and conceptually) with the nice clean Black-Scholes model, which fails to explain actually observed option prices. Due to this, various assumptions are relaxed or modified, adding (arguably, somewhat ad hoc) complexity until, for the right set of parameters, the model gets all (sufficiently important) observed option prices right.
Regularisation / overfitting in ML: you might think of overfitting as “placing too much weight on getting the answer you expect”.
“One person's modus ponens is another’s modus tollens”: if we’re presented with a logical argument, usually the person presenting it wants us to accept the premises and agree that the argument is valid, in which case we must accept the conclusion. If we don’t like the conclusion, we often focus on showing that the argument is invalid. But if you think the conclusion is very unlikely, you also have the option of acknowledging the argument as valid, but rejecting one of the premises. There are lots of fun examples of this from science and philosophy on Gwern’s page on the subject.
“Begging the question”: a related accusation in philosophy that seems to mean roughly “your conclusion follows trivially from your premises but I reject one of your premises (and by the way it should have been obvious that I’d reject one of your premises so it was a waste of both my time and yours that you made this argument)”
Reductio ad absurdum: disprove something by using it as an assumption that leads to an implausible (or maybe logically impossible) conclusion
“Proving too much”: an accusation in philosophy that is supposed to count against the argument doing the “proving”.
(Not) updating your beliefs from an argument that appears convincing on the face of it: if the conclusions are implausible enough, you might not update your beliefs too much the first time you encounter the argument, even if it appears watertight.
Sanity checking your answer: check that the results of a complex calculation or experiment roughly match the result you get from a quick and crude approach.
Presumably, you could put this question of whether and how much to modify your model into some kind of formal Bayesian framework where on learning a new argument you update all your beliefs based on your prior beliefs in the premises, conclusion, and validity of the argument. I’m not sure whether there’s a literature on this, or whether e.g. highly skilled forecasters actually think like this.
In general though, it seems (to me) that there’s something important about “following where the assumptions / model takes you”. Maybe, given all the ways we fall short of being perfectly rational, we should (and I think that in fact we do) put more emphasis on this than a perfectly rational Bayesian agent would. Avoiding having a very strong prior on the conclusion seems helpful here.
One (maybe?) low-effort thing that could be nice would be saying "these are my top 5" or "these are listed in order of how promising I think they are" or something (you may well have done that already and I missed it).
Thanks, I think this is a great topic and this seems like a useful list (although I do find reading through 19 different types of options without much structure a bit overwhelming!).
I'll just ~repost a private comment I made before.
Encouraging and facilitating aspiring/junior researchers and more experienced researchers to connect in similar ways
This feels like an especially promising area to me. I'd guess there are lots of cases where this would be very beneficial for the junior researcher and at least a bit beneficial for the experienced researcher. It just needs facilitation (or something else, e.g. a culture change where people try harder to make this happen themselves, some strong public encouragement to juniors to make this happen, ...).
This isn't based on really strong evidence, maybe mostly my own (limited) experience + assuming at least some experienced researchers are similar to me. And that there are lots of excellent junior researcher candidates out there (again from first hand impressions).
Improving the vetting of (potential) researchers, and/or better “sharing” that vetting
This also seems like a big deal and an area where maybe you could improve things significantly with a relatively small amount of effort. I don't have great context here though.
I wrote this last Autumn as a private “blog post” shared only with a few colleagues. I’m posting it publicly now (after mild editing) because I have some vague idea that it can be good to make things like this public.
In this post I’m going to discuss two papers regarding imprecise probability that I read this week for a Decision Theory seminar. The first paper seeks to show that imprecise probabilities don’t adequately constrain the actions of a rational agent. The second paper seeks to refute that claim.
Just a note on how seriously to take what I’ve written here: I think I’ve got the gist of what’s in these papers, but I feel I could spend a lot more time making sure I’ve understood them and thinking about which arguments I find persuasive. It’s very possible I’ve misunderstood or misrepresented the points the papers were trying to make, and I can easily see myself changing my mind about things if I thought and read more.
Also, a note on terminology: it seems like “sharpness/unsharpness” and “precision/imprecision” are used interchangeably in these papers, as are “probability” and “credence”. There might well be subtle distinctions that I’m missing, but I’ll try to consistently use “imprecise probabilities” here.
I imagine there are (at least) several different ways of formulating imprecise probabilities. One way is the following: your belief state is represented by a set of probability functions, and your degree of belief in a particular proposition is represented by the set of values assigned to it by the set of probability functions. You also then have an imprecise expectation: each of your probability functions has an associated expected utility. Sometimes, all of your probability functions will agree on the action that has the highest expected value. In that case, you are rationally required to take that action. But if there’s no clear winner, that means there’s more than one permissible action you could take.
Subjective Probabilities should be Sharp
The first paper, Subjective Probabilities should be Sharp, was written in 2010 by Elga. The central claim is that there’s no plausible account of how imprecise probabilities constrain which choices are reasonable for a perfectly rational agent.
The argument centers around a particular betting scenario: someone tells you “I’m going to offer you bet A and then bet B, regarding a hypothesis H”:
Bet A: win $15 if H, else lose $10
Bet B: lose $10 if H, else win $15
You’re free to choose whether to take bet B independently of whether you choose bet A.
Depending on what you believe about H, it could well be that you prefer just one of the bets to both bets. But it seems like you really shouldn’t reject both bets. Taking both bets guarantees you’ll win exactly $5, which is strictly better than the $0 you’ll win if you reject both bets.
But under imprecise probabilities, it’s rationally permissible to have some range of probabilities for H, which implies that it’s permissible to reject both bet A and bet B. So imprecise probabilities permit something which seems like it ought to be impermissible.
Elga considers various rules that might be added to the initial imprecise probabilities-based decision theory, and argues that none of them are very appealing. I guess this isn’t as good as proving that there are no good rules or other modifications, but I found it fairly compelling on the face of it.
The rules that seemed most likely to work to me were Plan and Sequence. Both rules more or less entail that you should accept bet B if you already accepted bet A, in which case rejecting both bets is impermissible and it looks like the theory is saved.
Elga tries to show that these don’t work by inviting us to imagine the case where a particular agent called Sally faces the decision problem. Sally has imprecise probabilities, maximises expected utility and has a utility function that is linear in dollars.
Elga argues that in this scenario it just doesn’t make sense for Sally to accept bet B only if she already accepted bet A - the decision to accept bet B shouldn’t depend on anything that came before. It might do if Sally had some risk averse decision theory, or had a utility function that was concave in dollars - but by assumption, she doesn’t. So Plan and Sequence, which had seemed like the best candidates for rescuing imprecise probabilities, aren’t plausible rules for a rational agent like Sally.
Should Subjective Probabilities be Sharp?
The 2014 paper by Bradley and Steele, Should Subjective Probabilities be Sharp? is, as the name suggests, a response to Elga’s paper. The core of their argument is that the assumptions for rationality implied by Elga’s argument are too strong and that it’s perfectly possible to have rational choice with imprecise probabilities provided that you don’t make these too-strong assumptions.
I’ll highlight two objections and give my view.
Bradley and Steele give the label Retrospective Rationality to the idea that an agent’s sequence of decisions should not be dominated by another sequence the agent could have made. They seem to reject Retrospective Rationality as a constraint on rational decision making because “[it] is useless to an agent who is wondering what to do… [the agent] should be concerned to make the best decision possible at [the time of the decision]”.
My view: I don’t find this a very compelling argument, at least in the current context - it seems to me that the agent should avoid foreseeably violating Retrospective Rationality, and in Elga’s betting scenario the irrationality of the “reject both bets” sequence of decisions seems perfectly foreseeable.
Their second objection is that Elga is wrong to think that your current decision about whether to accept bet B should be unaffected by whether you previously accepted or rejected bet A (they make a similar point regarding the decision to take bet A with vs without the knowledge that you’re about to be offered bet B).
My view: it’s true that, because in Elga’s betting scenario the outcomes of the bets are correlated, knowing whether or not you previously accepted bet A might well change your inclination to accept bet B, e.g. because of risk aversion or a non-linear utility function. But to me it seems right that for an agent whose decision theory doesn’t include these features, it would be irrational to change their inclination to accept bet B based on what came before - and Elga was considering such an agent. So I think I side with Elga here.
Summary and some thoughts
In summary, in Subjective Probabilities should be Sharp, Elga illustrates how imprecise probabilities appear to permit a risk-neutral agent with linear utility to make irrational choices. In addition, Elga argues that there aren’t any ways to rescue things while keeping imprecise probabilities. In Should Subjective Probabilities be Sharp?, Bradley and Steele argue that Elga makes some implausibly strong assumptions about what it takes to be rational. I didn't find these arguments very convincing, although I might well have just failed to appreciate the points they were trying to make.
I think it basically comes down to this: for an agent with decision theory features like Sally’s, i.e. no risk aversion and linear utility, the only way to avoid passing up opportunities like making a risk-free $5 by taking bet A and bet B is if you’re always willing to take one side of any particular bet. The problem with imprecise probabilities is that they permit you to refrain from taking either side, which implies that you’re permitted to decline the risk-free $5.
The fan of imprecise probabilities can wriggle out of this by saying that you should be allowed to do things like taking bet B only if you just took bet A - but I agree with Elga that this just doesn’t make sense for an agent like Sally. I think the reason this might look overly demanding on the face of it is that we’re not like Sally - we’re risk averse and have concave utility. But agents who are risk averse or have concave utility are allowed both to sometimes decline bets and to take risk-free sequences of bets, even according to Elga’s rationality requirements, so I don’t think this intuition pushes against Elga’s rationality requirements.
It feels kind of useful to have read these papers, because
I’ve been kind of aware of imprecise probabilities and had a feeling I should think about them, and this has given me a bit of a feel for what they’re about.
It makes further reading in this area easier.
It’s good to get an idea of what sort of considerations people think about when deciding whether a decision theory is a good one. Similarly to when I dug more into moral philosophy, I now have more of a feeling along the lines of “there’s a lot of room for disagreement about what makes a good decision theory”.
Relatedly, it’s good to get a bit of a feeling of “there’s nothing really revolutionary or groundbreaking here and I should to some extent feel free to do what I want”.
Thanks for these comments and for the chat earlier!
It sounds like to you, AGI means ~"human minds but better"* (maybe that's the case for everyone who's thought deeply about this topic, I don't know). On the other hand, the definition I used here, "AI that can perform a significant fraction of cognitive tasks as well as any human and for no more money than it would cost for a human to do it", falls well short of that on at least some reasonable interpretations. I definitely didn't mean to use an unusually weak definition of AGI here (I was partly basing it on this seemingly very weak definition from Lesswrong, i.e. "a machine capable of behaving intelligently over many domains"), but maybe I did.
Under at least some interpretations of "AI that can perform a significant fraction of cognitive tasks as well as any human and for no more money than it would cost for a human to do it", you don't (as I understand it) think that AGI strongly implies TAI; but my impression is that you don't think AGI under this definition is the right thing to analyse.
Given your AGI definition, I probably want to give a significantly larger probability to "AGI implies TAI" than I did in this post (though on an inside view I'm probably not in "90% seems on the low end" territory, having not thought about this enough to have that much confidence).
I probably also want to push back my AGI timelines at least a bit (e.g. by checking what AGI definitions my outside view sources were using; though I didn't do this very thoroughly in the first place so the update might not be very large).
*I probably missed some nuance here, please feel free to clarify if so.
But for P(TAI|AGI) your bottom line is very different from what most people in the community seem to think
Ah right, I get the point now, thanks. I suppose my P(TAI|AGI) is supposed to be my inside view as opposed to my all-things-considered view, because I'm using it only for the inside view part of the process. The only things that are supposed to be all-things-considered views are things that come out of this long procedure I describe (i.e. the TAI and AGI timelines). But probably this wasn't very clear.
Thanks, this was interesting. Reading this I think maybe I have a bit of a higher bar than you re what counts as transformative (i.e. at least as big a deal as the industrial revolution). And again, just to say I did give some probability to transformative AI that didn't act through economic growth. But the main thing that stands out to me is that I haven't really thought all that much about what the different ways powerful AI might be transformative (as is also the case for almost everything else here too!).
Actually I think what distant strangers think can matter a lot to someone, if it corresponds to what they do being highly prestigious. The person experiences that directly through friends/family/random people they meet being impressed (etc).
I guess it's true that, if most of your friends/people you interact with already think EA is great, the effect is at least a bit weaker (maybe much weaker).
I like the point about "diluting the 'quality' of the movement" as being something that potentially biases people against movement growth, it wouldn't have occurred to me.
This still seems like a weaker effect to me than the one I described, but I guess this at least depends on how deeply embedded in EA the person we're thinking about is. And of course being deeply embedded in EA correlates strongly with being in a position to influence movement growth.
I guess I'm saying that getting into social justice is more like "instant gratification", and joining EA is more like "playing the long game" / "taking relative pain now for a huge payoff later".
Also / alternatively, maybe getting into social justice is impressing one group of people but making another group of people massively dislike you (and making a lot of people shrug their shoulders), whereas when the correctness of EA is known to all, having got in early will lead to brownie points from everyone.
So maybe the subgroup is "most people at some future time" or something?
(hopefully it's clear, but I'm ~trying to argue from the point of view of the post; I think this is fun to think about but I'm not sure how much I really believe it)
I think I might have got the >20% number from Ajeya's biological anchors report. Of course, I agree that, say, 18% growth might for 20 years might also be at least as big a deal as the Industrial Revolution. It's just a bit easier to think about a particular growth level (for me anyway). Based on this, maybe I should give some more probability to the "high enough growth for long enough to be at least as big a deal as the Industrial Revolution" than when I was thinking just about the 20% number. (Edit: just to be clear, I did also give some (though not much) probability to non-extreme-economic-growth versions of transformative AI)
I guess this wouldn't be a big change though so it's probably(?) not where the disagreement comes from. E.g. if people are counting 10% growth for 10 years as at least as big a deal as the Industrial Revolution I might start thinking that the disagreement mostly comes from definitions.
I see a potential tension between how much weight you give this claim within your framework, versus how much you defer to outside views
I don't know, for what it's worth I feel like it's pretty okay to have an inside view that's in conflict with most other people's and to still give a pretty big weight (i.e. 80%) to the outside view. (maybe this isn't what you're saying)
(and potentially even modest epistemology – gasp!)
Not sure I understood this, but the related statement "epistemic modesty implies Ben should give more than 80% weight to the outside view" seems reasonable. Actually maybe you're saying "your inside view is so contrarian that it is very inside view-y, which suggests you should put more weight on the outside view than would otherwise be the case", maybe I can sort of see that.
I really don't have strong arguments here. I guess partly from experience working on an automated trading system (i.e. actually trying to automate something), partly from seeing Robin Hanson arguing that automation has just been continuing at a steady pace for a long time (or something like that; possible I'm completely misremembering this). Partly from guessing that other people can be a bit naive here.
I didn't consciously choose to mostly(?) focus on EAs for my outside view, but I suppose ultimately it's because these are the sources I know about. I wasn't exactly trying to do a thorough survey of relevant literature / thinking here (as I hope was clear!).
I guess ~how much of a biased view that gives depends on how good the possible "non-EA" sources are. I guess I'd be kind of surprised if there were really good "non-EA" sources that I missed. I'd be very interested to hear about examples.
As for the term "outside view", I feel pretty confused about the inside vs outside view distinction, and doing this exercise didn't really help with my confusion :).
I also wanted to share a comment on this from Max Daniel (also from last Autumn) that I found very interesting.
But many EAs already have lots of close personal relationships with other EAs, and so they can already get social status by acting in ways approved by those peers. I'm not sure it helps if the number of distant strangers also liking these ideas grow.
I actually think that, if anything, 'hidden motives' on balance cause EAs to _under_value growth: It mostly won't feel that valuable because it has little effect on your day-to-day life, and it even threatens your status by recruiting competitors.
This is particularly true for proposed growth trajectories that would chance the social dynamics of the movement. Most EAs enjoy abstract, intellectual discussions with other people who are smart and are politically liberal, so any proposal that would dilute the 'quality' of the movement or recruit a lot of conservatives is harmful for the enjoyment most current EAs derive from community interactions. (There may also be impartial reasons against such growth trajectories of course.)
I wrote this last Autumn as a private “blog post” shared only with a few colleagues. I’m posting it publicly now (after mild editing) because I have some vague idea that it can be good to make things like this public.
I recently finished listening to Kevin Simler and Robin Hanson’s excellent The Elephant in the Brain. Although I’d probably been exposed to the main ideas before, it got me thinking more about people’s hidden motivations for doing things.
In particular, I’ve been thinking a bit about the motives (hidden or otherwise) for being an Effective Altruist.
It would probably feel really great to save someone’s life by rescuing them from a burning building, or to rescue a drowning child as in Peter Singer’s famous drowning child argument, and so you might think that the feeling of saving a life is reward enough. I do think it would feel really great to pull someone from a burning building or to save a drowning child - but does it feel as great to save a life by giving $4500 to AMF? Not to me.
It’s not too hard to explain why saving someone from a burning building would feel better - you get to experience the gratitude from the person, their loved ones and their friends, for example. Simler and Hanson give an additional reason, or maybe the underlying reason, which I find quite compelling: when you perform a charitable act, you experience a benefit by showing others that you’re the kind of person who will look out for them, making people think that you’d make a good ally (friend, romantic partner, and so on). To be clear, this is a hidden, subconscious motive - according to the theory, you will not be consciously aware that you have this motive.
What explains Effective Altruism, then? Firstly I should say that I don’t think Simler and Hanson would necessarily argue that “true altruism” doesn’t exist - I think they’d say that people are complicated, and you can rarely use a single motive (hidden or not) to explain the behaviour of a diverse group of individuals. So true altruism may well be part of the explanation, even on their view as I understand it. Still, presumably true altruism isn’t the only motive even for really committed Effective Altruists.
One thing that seems true about our selfish, hidden motives is that they only work as long as they can remain hidden. So maybe, in the case of charitable behaviour, it’s possible to alert everyone to the selfish hidden motive: “if you’re donating purely because you want to help others, why don’t you donate to the Against Malaria Foundation, and do much more good than you do currently by donating to [some famous less effective charity]?” When everyone knows that there’s a basically solid argument for only donating to effective charities if you want to benefit others, when people donate to ineffective charities it’ll transparently be due to selfish motives.
Thinking along these lines, joining the Effective Altruism movement can be seen as a way to “get in at the ground floor”: if the movement is eventually successful in changing the status quo, you will get brownie points for having been right all along, and the Effective Altruist area you’ve built a career in will get a large prestige boost when everyone agrees that it is indeed effectively altruistic.
One fairly obvious (and hardly surprising) prediction you would make from this is that if Effective Altruism doesn’t look like it will grow further (either through community growth or through wider adoption of Effective Altruist ideas), you would expect Effective Altruists to feel significantly less motivated.
This in turn suggests that spreading Effective Altruist ideas might be important purely for maintaining motivation for people already part of the Effective Altruist community. This sounds pretty obvious, but I don’t really hear people talking about it.
Maybe this is a neglected source of interventions. This would make sense given the nature of the hidden motives Simler and Hanson describe - a key feature of these hidden motives is that we don’t like to admit that we have them, which is hard to avoid if we want to use them to justify interventions.
In any case, I don’t think that the existence of this motive for being part of the Effective Altruism movement is a particularly bad thing. We are all human, after all. If Effective Altruist ideas are eventually adopted as common sense partly thanks to the Effective Altruism movement, that seems like a pretty big win to me, regardless of what might have motivated individuals within the movement.
It would also strike me as a pretty Pinker-esque story of quasi-inevitable progress: the claim is that these (true) Effective Altruist beliefs will propagate through society because people like being proved right. Maybe I’m naive, but in this particular case it seems plausible to me.
Here are some notes I made while reading a transcript of a seminar called You and Your Research by Richard Hamming. (I'd previously read this article with the same name, but I feel like I got something out of reading this seminar transcript although there's a lot of overlap).
"Once you get your courage up and believe that you can do important problems, then you can"
In the Q&A he talks about researchers in the 40's and 50's naturally having courage after coming out of WW2.
Age makes you less productive because when you have prestigious awards you only work on 'big' problems
Bad working conditions can force you to be creative
You have to work very hard to succeed
"I spent a good deal more of my time for some years trying to work a bit harder and I found, in fact, I could get more work done"
"Just hard work is not enough - it must be applied sensibly."
On coping with ambiguity
"Great scientists… believe the theory enough to go ahead; they doubt it enough to notice the errors and faults so they can step forward and create the new replacement theory"
"If you are deeply immersed and committed to a topic, day after day after day, your subconscious has nothing to do but work on your problem"
"So the way to manage yourself is that when you have a real important problem you don't let anything else get the center of your attention - you keep your thoughts on the problem. Keep your subconscious starved so it has to work on your problem, so you can sleep peacefully and get the answer in the morning, free."
On thinking great thoughts
"Great Thoughts Time" from lunchtime on Friday
E.g. "What will be the role of computers in all of AT&T?", "How will computers change science?"
On having problems to try new ideas on:
Most great scientists "have something between 10 and 20 important problems for which they are looking for an attack. And when they see a new idea come up, one hears them say ``Well that bears on this problem.'' They drop all the other things and get after it."
Having an open office door is better:
"if you have the door to your office closed, you get more work done today and tomorrow, and you are more productive than most. But 10 years later somehow you don't know quite know what problems are worth working on; all the hard work you do is sort of tangential in importance."
On the importance of working hard
"The people who do great work with less ability but who are committed to it, get more done that those who have great skill and dabble in it, who work during the day and go home and do other things and come back and work the next day. They don't have the deep commitment that is apparently necessary for really first-class work"
On using commitment devices to create pressure on yourself to perform
"I found out many times, like a cornered rat in a real trap, I was surprisingly capable. I have found that it paid to say, ``Oh yes, I'll get the answer for you Tuesday,'' not having any idea how to do it"
On putting yourself under stress
"if you want to be a great scientist you're going to have to put up with stress. You can lead a nice life; you can be a nice guy or you can be a great scientist. But nice guys end last, is what Leo Durocher said. If you want to lead a nice happy life with a lot of recreation and everything else, you'll lead a nice life."
"If you want to think new thoughts that are different, then do what a lot of creative people do - get the problem reasonably clear and then refuse to look at any answers until you've thought the problem through carefully how you would do it"
On been very successful over a long career
"Somewhere around every seven years make a significant, if not complete, shift in your field… When you go to a new field, you have to start over as a baby. You are no longer the big mukity muk and you can start back there and you can start planting those acorns which will become the giant oaks"
On vision and research management
"When your vision of what you want to do is what you can do single-handedly, then you should pursue it. The day your vision, what you think needs to be done, is bigger than what you can do single-handedly, then you have to move toward management"
I recently spent some time trying to work out what I think about AI timelines. I definitely don’t have any particular insight here; I just thought it was a useful exercise for me to go through for various reasons (and I did find it very useful!).
As it came out, I "estimated" a ~5% chance of TAI by 2030 and a ~20% chance of TAI by 2050 (the probabilities for AGI are slightly higher). As you’d expect me to say, these numbers are highly non-robust.
When I showed them the below plots a couple of people commented that they were surprised that my AGI probabilities are higher than my TAI ones, and I now think I didn’t think about non-AGI routes to TAI enough when I did this. I’d now probably increase the TAI probabilities a bit and lower the AGI ones a bit compared to what I’m showing here (by “a bit” I mean ~maybe a few percentage points).
I generated these numbers by forming an inside view, an outside view, and making some heuristic adjustments. The inside and outside views are ~weighted averages of various forecasts. My timelines are especially sensitive to how I chose and weighted forecasts for my outside view.
Here are my timelines in graphical form:
And here they are again alongside some other timelines people have made public:
If you want more detail, there’s a lot more in this google doc. I’ll probably write another shortform post with some more thoughts / reflections on the process later.
I wrote this last Summer as a private “blog post” just for me. I’m posting it publicly now (after mild editing) because I have some vague idea that it can be good to make things like this public. These rambling thoughts come from my very naive point of view (as it was in the Summer of 2020; not to suggest my present day point of view is much less naive). In particular if you’ve already read lots of moral philosophy you probably won’t learn anything from reading this.
Generally, reading various moral philosophy writings has probably made me (even) more comfortable trusting my own intuitions / reasoning regarding what “morality” is and what the “correct moral theory” is.
I think that, when you start engaging with moral philosophy, there’s a bit of a feeling that when you’re trying to reason about things like what’s right and wrong, which moral theory is superior to the other, etc, there are some concrete rules you need to follow, and (relatedly) certain words or phrases have a solid, technical definition that everyone with sufficient knowledge knows and agrees on. The “certain words or phrases” I have in mind here are things like “morally right”, “blameworthy”, “ought”, “value”, “acted wrongly”, etc.
To me right now, the situation seems a bit more like the following: moral philosophers (including knowledgeable amateurs, etc) have in mind definitions for certain words, but these definitions may be more or less precise, might change over time, and differ from person to person. And in making a “moral philosophy” argument (say, writing down an argument for a certain moral theory), the philosopher can use flexibility of interpretation as a tool to make their argument appear more forceful than it really is. Or, the philosopher’s argument might imply that certain things are self-evidently true, and the reader might be (maybe unconsciously) fooled into thinking that this is the case, when in fact it isn’t.
It seems to me now that genuinely self-evident truths are in very short supply in moral philosophy. And, now that I think this is the case, I feel like I have much more licence to make up my own mind about things. That feels quite liberating.
But it does also feel potentially dangerous. Of course, I don’t think it’s dangerous that *I* have freedom to decide what “doing good” means to me. But I might find it dangerous that others have that freedom. People can consider committing genocide to be “doing what is right” and it would be nice to have a stronger argument against this than “this conflicts with my personal definition of what good is”. And, of course, others might well think it’s dangerous that I have the freedom to decide what doing good means.
What does morality even mean?
Now that we’re in this free-for-all, even defining morality seems problematic.
I suppose I can make some observations about myself, like
When I see injustice in the world, I feel a strong urge to do something about it
When I see others suffering, I want to relieve that suffering
I have a strong intuition that it’s conscious experience that ultimately matters - “what you don’t know can’t hurt you” is, I think, literally true
And some conscious experiences are clearly very bad (and some are clearly very good)
And so on
I guess these things are all in the region of “wanting to improve the lives of others”. This sounds a lot like wanting to do what is morally good / morally praiseworthy, and seems at least closely related to morality.
In some ways, whether I label some of my goals and beliefs as being to do with “morality” doesn’t matter - either way, it seems clear that the academic field of moral philosophy is pretty relevant. And presumably when people talk about morality outside of an academic context, they’re at least sometimes talking about roughly the thing I’m thinking of.
Some initial thoughts on "Are We Living At The Hinge Of History"?
In the below I give a very rough summary of Will MacAskill’s article Are We Living At The Hinge Of History? and give some very preliminary thoughts on the article and some of the questions it raises.
I definitely don’t think that what I’m writing here is particularly original or insightful: I’ve thought about this for no more than a few days, any points I make are probably repeating points other people have already made somewhere, and/or are misguided, etc. This seems like an incredibly deep topic which I feel like I’ve barely scratched the surface of. Also, this is not a focussed piece of writing trying to make a particular point, it’s just a collection of thoughts on a certain topic.
(If you want to just see what I think, skip down to "Some thoughts on the issues discussed in the article")
The Hinge of History claim (HH): we are among the most influential people ever (past or future). Influentialness is, roughly, how much good a particular person at a particular time can do through direct expenditure of resources (rather than investment)
Two prominent longtermist EA views imply HH
Two worldviews prominent in longtermist EA imply that HH is true:
Time of Perils view: we live at a time of unusually high extinction risk, and we can do an unusual amount to reduce this risk
Value Lock-In view: we’ll soon invent a technology that allows present-day agents to assert their values indefinitely into the future (in the Bostrom-Yudkowsky version of this view, the technology is AI)
Arguments against HH
The base rates argument
Claim: our prior should be that we’re as likely as anyone else, past or present, to be the most influential person ever (Bostrom’s Self-Sampling Assumption (SSA)). Under this prior, it’s astronomically unlikely that any particular person is the most influential person ever.
Then the question is how much should we update from this prior
The standard of evidence (Bayes factor) required to favour HH is incredibly high. E.g. we need a Bayes factor of ~107 to move from a 1 in 100 million credence to a 1 in 10 credence. For comparison, a p=0.05 result from a randomised controlled trial gives a Bayes factor of 3 under certain reasonable assumptions.
The arguments for Time of Perils or Value Lock-In might be somewhat convincing; but hard to see how they could be convincing enough
E.g. our track record of understanding the importance of historical events is very poor
When considering how much to update from the prior, we should be aware that there are biases that will tend to make us think HH is more likely than it really is
Counterargument 1: we only need to be at an enormously influential time, not the most influential, and the implications are ~the same either way
Counter 1 to counterargument 1: the Bostrom-Yudkowsky view says we’re at the most influential time ever, so you should reject the Bostrom-Yudkowsky view if you’re abandoning the idea that we’re at the most influential time ever. So there is a material difference between “enormously influential time” and “most influential time”.
Counter 2 to counterargument 1: if we’re not at the most influential time, presumably we should transfer our resources forward to the most influential time, so the difference between “enormously influential time” and “most influential time” is highly action-relevant.
Counterargument 2: the action-relevant thing is the influentialness of now compared to any time we can pass resources on to
Again the Bostrom-Yudkowsky view is in conflict with this
But MacAskill concedes that it does seem right that this is the action-relevant thing. So e.g. we could assume we can only transfer resources 1000 years into the future and define Restricted-HH: we are among the most influential people out of the people who will live over the next 1000 years
The inductive argument
Claim: The influentialness of comparable people has been increasing over time, and we should expect this to continue, so the influentialness of future people who we can pass resources onto will be greater
Evidence: if we consider the state of knowledge and ethics in 1600 vs today, or in 1920/1970 vs today, it seems clear that we have more knowledge and better ethics now than we did in 1600 or in 1920/1970
And seems clear that there are huge gaps in our knowledge today (so doesn’t seem that we should expect this trend to break)
Arguments for HH
Argument 1: we’re living on a single planet, implying greater influentialness
Implies particular vulnerabilities e.g. asteroid strikes
Implies individual people have an unusually large fraction of total resources
Implies instant global communication
Asteroids are not a big risk
For other prominent risks like AI or totalitarianism, being on multiple planets doesn’t seem to help
We might well have quite a long future period on earth (1000s or 10,000s of years), which makes being on earth now less special
And in the early stages of space settlement the picture isn’t necessarily that relevantly different to the single planet one
Argument 2: we’re now in a period of unusually fast economic and tech progress, implying greater influentialness. We can’t maintain the present-day growth rate indefinitely.
MacAskill seems sympathetic to the argument, but says it implies not that today is the most important time, but that the most important time is some time might be in the next few thousand years
Also, maybe longtermist altruists are less influential during periods of fast economic growth because rapid change makes it harder to plan reliably
And comparing economic power across long timescales is difficult
A few other arguments for HH are briefly touched on in a footnote: that existential risk / value lock-in lowers the number of future people in the reference class for the influentialness prior; that we might choose other priors that are more favourable for HH, and that earlier people can causally affect more future people
Some quick meta-level thoughts on the article
I wish it had a detailed discussion about choosing a prior for influentialness, which I think is really important.
There’s a comment that the article ignores the fact that the annual risk of extinction or lock-in in the future has implications for present-day influentialness because in Trammell’s model this is incorporated into the pure rate of time preference. I find that pretty weird. Trammell’s model is barely referenced elsewhere in the paper so I don’t really see why we should neglect to discuss something just because it happens to be interpreted in a certain way within his model. Maybe I missed the point here.
Some thoughts on the issues discussed in the article
Two main points from the article
It kind of feels like there are two somewhat independent things that are most interesting from the article:
1. The claim: we should reject the Time of Perils view, and the Bostrom-Yudkowsky view, because in both cases the implication for our current influentialness is implausible
2. The question: what do high level / relatively abstract arguments tell us about whether we can do the most good by expending resources now or by passing resources on to future generations?
Avoiding rejecting the Time of Perils and Bostrom-Yudkowsky views
I think there are a few ways we can go to avoid rejecting the Time of Perils and Bostrom-Yudkowsky views
We can find the evidence in favour of them strong enough to overwhelm the SSA prior through conventional Bayesian updating
We can find the evidence in favour of them weaker than in the previous case, but still strong enough that we end up giving them significant credence in the face of the SSA prior, through some more forgiving method than Bayesian updating
We can use a different prior, or claim that we should be uncertain between different priors
Or we can just turn the argument (back?) around, and say that the SSA prior is implausible because it implies such a low probability for the Time of Perils and Bostrom-Yudkowsky views. Toby Ord seems to say something like this in the comments to the EA Forum post (see point 3).
A nearby alternative is to modify the Time of Perils and Bostrom-Yudkowsky views a bit so that they don’t imply we’re among the most influential people ever. E.g. for the Bostrom-Yudkowsky view we could make the value lock-in a bit “softer” by saying that for some reason, not necessarily known/stated, the lock-in would probably end after some moderate (on cosmological scales) length of time. I’d guess that many people might find a modified view more plausible even independently of the influentialness implications.
I’m not really sure what I think here, but I feel pretty sympathetic to the idea that we should be uncertain about the prior and that this maybe lends itself to having not too strong a prior against the Time of Perils and Bostrom-Yudkowsky views.
On the question of whether to expend resources now or later
The arguments MacAskill discusses suggest that the relevant time frame is the next few thousand years (because the next few thousand years seem (in expectation) especially high influentialness and because it might be effectively impossible to pass our resources further into the future).
It seems like the pivotal importance of priors on influentialness (or similar) then evaporates: it no longer seems that implausible on the SSA prior that now is a good time to expend resources rather than save. E.g. say there’ll be a 20 year period in the next 1000 years where we want to expend philanthropic resources rather than save them to pass on to future generations. Then a reasonable prior might be that we have a 20/1000 = 1 in 50 chance of being in that period. That’s a useful reference point and is enough to make us skeptical about arguments that we are in such a period, but it doesn’t seem overwhelming. In fact, we’d probably want to spend at least some resources now even purely based on this prior.
In particular, it seems like some kind of detailed analysis is needed, maybe along the lines of Trammell’s model or at least using that model as a starting point. I think many of the arguments in MacAskill’s article should be part of that detailed analysis, but, to stress the point, they don’t seem decisive to me.
In the article, the Inductive Argument is supported by the idea of moral progress: MacAskill cites the apparent progress in our moral values over the past 400 years as evidence for the idea that we should expect future generations to have better moral values than we do. Obviously, whether we should expect moral progress in the future is a really complex question, but I’m at least sympathetic to the idea that there isn’t really moral progress, just moral fashions (so societies closer in time to ours seem to have better moral values just because they tend to think more like us).
Of course, if we don’t expect moral progress, maybe it’s not so surprising that we have very high influentialness: if past and future actors don’t share our values, it seems very plausible on the face of it that we’re better off expending our resources now than passing them off to future generations in the hope they’ll carry out our wishes. So maybe MacAskill’s argument about influentialness should update us away from the idea of moral progress?
But if we’re steadfast in our belief in moral progress, maybe it’s not so surprising that we have high influentialness because we find ourselves in a world where we are among the very few with a longtermist worldview, which won’t be the case in the future as longtermism becomes a more popular view. (I think Carl Shulman might say something like this in the comments to the original EA Forum post)
My overall take
I think “how plausible is this stuff under an SSA prior” is a useful perspective
Still, thinking about this hasn’t caused me to completely dismiss the Time of Perils View or the Bostrom-Yudkowsky view (I probably already had some kind of strong implausibility prior on those views).
The arguments in the article are useful for thinking about how much (e.g.) the EA longtermist community should be spending rather than saving now, but a much more detailed analysis seems necessary to come to a firm view on this.
A quote to finish
I like the way the article ends, providing some motivation for the Inductive Argument in a way I find appealing on a gut level:
Just as our powers to grow crops, to transmit information, to discover the laws of nature, and to explore the cosmos have all increased over time, so will our power to make the world better — our influentialness. And given how much there is still to understand, we should believe, and hope, that our descendents look back at us as we look back at those in the medieval era, marvelling at how we could have got it all so wrong.
Some initial thoughts on moral realism vs anti-realism
I wrote this last Summer as a private “blog post” just for me. I’m posting it publicly now (after mild editing) because I have some vague idea that it can be good to make things like this public. These thoughts come from my very naive point of view (as it was in the Summer of 2020; not to suggest my present day point of view is much less naive). In particular if you’ve already read lots of moral philosophy you probably won’t learn anything from reading this. Also, I hope my summaries of other people’s arguments aren’t too inaccurate.
Recently, I’ve been trying to think seriously about what it means to do good. A key part of Effective Altruism is asking ourselves how we can do the most good. Often, considering this question seems to be mostly an empirical task: how many lives will be saved through intervention A, and how many through intervention B? Aside from the empirical questions, though, there are also theoretical ones. One key consideration is what we mean by doing good.
There is a branch of philosophy called moral philosophy which is (partly) concerned with answering this question.
It’s important to me that I don’t get too drawn into the particular framings that have evolved within the academic discipline of moral philosophy, which are, presumably, partly due to cultural or historical forces, etc. This is because I really want to try to come up with my own view, and I think that (for me) the best process for this involves not taking other people’s views or existing works too seriously, especially while I try to think about these things seriously for the first time.
Still, it seems useful to get familiar with the major insights and general way of thinking within moral philosophy, because
I’ll surely learn a lot of useful stuff
I’ll be able to communicate with other people who are familiar with moral philosophy (which probably includes most of the most interesting people to talk to on this topic).
I’ve read a couple of Stanford Encyclopedia of Philosophy articles, and a series of posts by Lukas Gloor arguing for moral anti-realism.
I found the Stanford Encyclopedia of Philosophy articles fairly tough going but still kind of useful. I thought the Gloor posts were great.
The Gloor posts have kind of convinced me to take the moral anti-realist side, which, roughly, denies the existence of definitive moral truths.
While I suppose I might consider my “inside view” to be moral anti-realist at the moment, I can easily see this changing in the future. For example, I imagine that if I read a well-argued case for moral realism, I might well change my mind.
In fact, prior to reading Gloor’s posts, I probably considered myself to be a moral realist. I think I’d heard arguments, maybe from Will MacAskill, along the lines that i) if moral anti-realism is true, then nothing matters, whereas if realism is true, you should do what the true theory requires you to do, and ii) there’s some chance that realism is true, therefore iii) you should do what the true theory requires you to do.
Gloor discusses an argument like this in one of his posts. He calls belief in moral realism founded on this sort of argument “metaethical fanaticism” (if I’ve understood him correctly).
I’m not sure that I completely understood everything in Gloor’s posts. But the “fanaticism” label does feel appropriate to me. It feels like there’s a close analogy with the kinds of fanaticism that utilitarianism is susceptible to, for example. An example of that might be a Pascal’s wager type argument - if there’s a finite probability that I’ll get infinite utility derived from an eternal life in a Christian heaven, I should do what I can to maximise that probability.
It feels like something has gone wrong here (although admittedly it’s not clear what), and this Pascal’s wager argument doesn’t feel at all like a strong argument for acting as if there’s a Christian heaven. Likewise, the “moral realist wager” doesn’t feel like a strong argument for acting as if moral realism is true, in my current view.
Gloor also argues that we don’t lose anything worth having by being moral anti-realists, at least if you’re his brand of moral anti-realist. I think he calls the view he favours “pluralistic moral reductionism”.
On his view, you take any moral view (or maybe combination of views) you like. These can (and maybe for some people, “should”) be grounded in our moral intuitions, and maybe use notions of simplicity of structure etc, just as a moral realist might ground their guess(?) at the true moral theory in similar principles. Your moral view is then your own “personal philosophy”, which you choose to live by.
One unfortunate consequence of this view is that you don’t really have any grounds to argue with someone else who happens to have a different view. Their view is only “wrong” in the sense that it doesn’t agree with yours; there’s no objective truth here.
From this perspective, it would arguably be nicer if everyone believed that there was a true moral view that we should strive to follow (even if we don’t know what it is). Especially if you also believe that we could make progress towards that true moral view.
I’m not sure how big this effect is, but it feels like more than nothing. So maybe I don’t quite agree that we don’t lose anything worth having by being moral anti-realists.
In any case, the fact that we might wish that moral realism is true doesn’t (presumably) have any bearing on whether or not it is true.
I already mentioned that reading Gloor’s posts has caused me to favour moral anti-realism. Another effect, I think, is that I am more agnostic about the correct moral theory. Some form of utilitarianism, or at least consequentialism, seems far more plausible to me as the moral realist “one true theory” than a deontological theory or virtue ethics theory. Whereas if moral anti-realism is correct, I might be more open to non-consequentialist theories. (I’m not sure whether this new belief would stand up to a decent period of reflection, though - maybe I’d be just as much of a convinced moral anti-realist consequentialist after some reflection).
Thanks for doing this and for doing the 80k podcast, I enjoyed the episode.
What are some longtermist cause areas other than AI, biorisk and cause prioritisation that you'd be keen to see more work on?
I gather that Open Phil has grown a lot recently. Can you say anything about the growth and hiring you expect for Open Phil over the next say 1-3 years? E.g. would you expect to hire lots more generalists, or maybe specialists in new cause areas, etc.
I haven't thought about this angle very much, but it seems like a good angle which I didn't talk about much in the post, so it's great to see this comment.
I guess the question is whether you can take the model, including the optimal allocation assumption, as corresponding to the world as it is plus some kind of (imagined) quasi-effective global coordination in a way that seems realistic. It seems like you're pretty skeptical that this is possible (my own inside view is much less certain about this but I haven't thought about it that much).
One thing that comes to mind is that you could incorporate into the model spending on dangerous tech by individual states for self-defence into the hazard rate equation through epsilon - it seems like the risk from this should probably increase with consumption (easier to do it if you're rich), so it doesn't seem that unreasonable. Not sure whether this is getting to the core of the issue you've raised, though.
I suppose you can also think about this through the "beta and epsilon aren't really fixed" lens that I put more emphasis on in the post. It seems like greater / less coordination (generally) implies more / less favourable epsilon and beta, within the model.
Thanks for this, it's pretty interesting to get your perspective as someone who's been (I presume) heavily engaged in the community for some time. I thought your other post on the All-Party Parliamentary Group for Future Generations was awesome, by the way.
You asked for comments including "small" thoughts so here are some from me, for what they're worth. These are my current views which I can easily see changing if I were to think about this more etc.
I think I basically agree that there doesn't seem to have been much progress in cause prioritisation in say the last five years, compared to what you might have hoped for.
(mainly written to clarify my own thoughts:) It seems like you can do cause prioritisation work either by comparing different causes, or by investigating a particular cause (especially a cause that's relatively unknown or poorly investigated), or by doing more "foundational" things like asking "what is moral value anyway?", "how should we compare options under uncertainty", etc.
My impression the Effective Altruism community has invested a significant amount of resources into cause prioritisation research, and relative lack of progress is because it's hard
The Global Priorities Institute is basically doing cause prioritisation (as far as I know, and by the vague definition of cause prioritisation I have in my head) - maybe it's more on the foundational / academic field building side (i.e. fleshing out and formally writing up existing arguments), but my impression is that it's mostly stuff that seems worth working through to work out how to do the most good
I think you could give the cause prioritisation label to some of the work from the Future of Humanity Institute's macrostrategy team(?)
Open Philanthropy Project spends a lot of their resources doing some version of this, as you noted
Rethink Priorities is basically doing this (though I might agree with you that it would be better if they were able to compare across causes rather than investigating a particular cause)
I'd consider work on forecasting / understanding AI progress, as is done by e.g. AI Impacts as cause prioritisation
The above (which is probably far from comprehensive) seems like a decent fraction of the resources of the "longtermist" part of the community (the part I'm familiar with). I suppose I lean towards wanting a larger fraction of resources allocated to cause prioritisation, but I don't think it's that obvious either way. Anyway, regardless of whether the right fraction of resources have been spent on this, I think it's just very hard and that this explains a lot of what you're describing.
Maybe one reason there's not much work comparing causes in particular is that there's so much uncertainty, which makes it very difficult to do well enough that the output is valuable. In particular
people don't agree on empirical issues that can radically alter the relative importance of different causes (e.g. AI timelines)
people don't agree on "the correct moral theory" / whatever the ultimate objective is / what you ~call "different views"
Edit: reading the above you could probably get the impression that I think you're wrong to "raise the alarm" about the need for more / different cause prioritisation, but I don't think that at all. I think I'm pretty sympathetic to most of what you wrote.
This post contains some notes that I wrote after ~ 1 week of reading about Certificates of Impact as part of my work as a Research Scholar at the Future of Humanity Institute, and a bit of time after that thinking and talking about the idea here and there.
In this post, I
describe what Certificates of Impact are, including a concrete proposal,
provide some lists of ways that it might be good or bad, and reasons it might or might not work,
provide some other miscellaneous thoughts relevant to future work on Certificates of Impact, and
provide some links to relevant resources.
I’m sharing this here in case it’s useful - the intended audience is people who are curious about what Certificates of Impact are, and (to some extent) people who are thinking seriously about Certificates of Impact.
Note that, since I haven’t invested much time thinking about Certificates of Impact, my understanding of this area is fairly shallow. I’ve tried to include appropriate caveats in the text to reflect this, but I might not have always succeeded, so please bear this in mind.
What Certificates of Impact are
Within this document, I’m using Certificates of Impact to refer to the general idea about creating a market in altruistic impact. I think that the general idea is also referred to as Impact Certificates, Tradeable Altruistic Impact, and Impact Purchases.
Certificates of Impact is an idea that's been floating around in the Effective Altruism community for some time. Paul Christiano and Katja Grace ran an experiment with Certificates of Impact about 5 years ago. I've seen various EA forum posts about Certificates of Impact too (see the final section of this post for some links).
By a market in altruistic impact, I mean something like the following: we imagine a future where there are people who want to donate to charity, and there are people who are doing high impact projects, and rather than them making the effort to seek each other out, they connect through this market. In the market, the individuals or organisations doing the projects issue Certificates of Impact, and donors buy them. And maybe as a donor you don't need to try so hard to find the best project, you just buy some certificates from some marketplace; and as someone doing a high impact project, you don't have to work so hard to connect to donors, because you find that there are profit seeking organisations that are willing to buy your certificates, and that's your source of funding.
Note both that the above is quite vague, and also there are probably some aspects you could change and still have something that could fall under Certificates of Impact.
A semi-concrete proposal
There are lots of varieties of Certificates of Impact-type systems that could be tried. To make things easier, from now on in this document I’ll assess a concrete proposal called Certificates of Impact with Dedication (idea due to Owen Cotton-Barratt):
A market is created / exists where someone can issue a Certificate for work they believe to be altruistically impactful. We call this person the Issuer. There is a statute of limitations on issuing Certificates of two years (i.e. Certificates can’t be issued for work more than two years old). The Certificate is assessed by a Validator who confirms that the work specified on the Certificate has in fact been done. The Issuer then sells the Certificate in the market, maybe via an auction mechanism. Note that the Certificate can refer to some percentage of the project, so for example it might represent 40% of the altruistic impact of a project, while the Issuer keeps the other 60%. The Certificate is traded on the secondary market by professional traders and then bought by an Ultimate Buyer who is the ultimate consumer of the Certificate. The Ultimate Buyer then Dedicates the Certificate, possibly to themselves so that they get the credit for the altruistic impact.
Whoever the certificate gets Dedicated to is the one who gets the credit for the counterfactual altruistic impact of the project that the certificate refers to. And once a certificate has been Dedicated it can't be traded anymore, so Dedication is its end point. Importantly (in my view), if you don't have a Dedication mechanism it's not clear whether people who own Certificates of Impact have bought them so that they can resell them at a profit, or because they want to have altruistic impact.
Brainstorm-style lists of ways Certificates of Impact might go well or badly
In this section, I list ways Certificates of Impact might go well or badly, or why it might or might not work. Generally, I’ve tried to err on the side of including things even where I consider them to be very speculative.
Note that my opinions, where I give them, are pretty unstable: I can easily imagine myself changing my mind on reflection or after seeing new arguments.
How good might this be?
Let’s imagine that a well-functioning Certificates of Impact with Dedication system exists with a large number of active participants including profit-seeking intermediaries. I list the ways this could be good below.
Here is a summary of the possible benefits. For each one, I’ve put my opinion regarding how likely it is in brackets.
More efficient allocation of time and money dedicated to altruistically impactful work. (I think this is the most likely benefit)
Improved quality of people working on altruistically impactful work. (I think this is plausible)
Larger pool of donations towards altruistically impactful work. (I think this is plausible)
Assessments of impact can be deferred to the future. (I think this is guaranteed under the proposal I’m using)
Here’s more detail
Resources intended for altruistic impact are allocated more efficiently (in the sense of getting higher impact for the same quantity of input resources through better allocation)
This applies to both cash from donors (the Ultimate Buyers), and cash and non-cash resources, like hours of work, from people working to create altruistic impact (the Issuers).
This comes about from having a well-functioning market in Certificates of Impact. Markets are great at efficient allocation of resources.
Profit-seeking organisations bring prices to correct levels through expert analysis
Financial security analysts
Funding (through ordinary finance world) for altruistically impactful projects / start ups at the start (possibly easier if you can issue certificates for a project before it starts, but this isn’t essential)
Lower the barrier for / correctly incentivise risky altruistic projects
Work intended for altruistic impact gets focussed on the best cause areas (through financial incentive / existence of funding)
I have a bit of a worldview that money is an incredibly effective signal and motivator for getting people to work on stuff - see the army of smart graduates going into law, banking, consultancy, accountancy, etc.
Funding makes altruistically impactful startups sexier -> easier to attract top talent
Larger pool of donations to effective charities through presence of large and salient Certificates of Impact market, which makes it more culturally usual to donate to effective charities
Probably second-order benefits too (get people to think about what is effective)
Some deference to the future since prices today are set by expectations of how things will be valued in the future
Good because it’s easier to assess things after the fact
Also good to the extent that we trust future people’s moral judgements more than our own
How feasible is this?
I list below considerations for thinking about how feasible it is to get to a state where this is big and being used by lots of people (whether it is actually achieving the desired outcomes of improved efficiency etc or not).
Some kind of critical mass will be necessary to get this off the ground.
Not clear to me what size is necessary
Related: lack of standardisation might make it hard to create a liquid market in the certificates.
Related: getting profit-seeking entities involved might require a huge, standardised market. I’m not sure whether the best/most likely version of Certificates of Impact includes profit-seeking entities.
People need to trust and understand what the Certificates of Impact are supposed to represent.
(maybe?) People need to feel that the Certificates of Impact really represent causal impact - buying a Certificate of Impact causes the impact to happen.
Assessing the value of the projects needs to be feasible.
This needs to work under a mix of altruistic preferences (I think this is probably not an issue, but I’m not sure)
How bad might this be?
I list below the ways a large (but maybe not well-functioning) market in Certificates of Impact with Dedication could fail to have a positive impact, or even be net negative.
Here is a summary of the possible issues / harms. For each one, I’ve put my opinion regarding how likely it is and/or how bad it might be in brackets.
The quality of the altruistic impact pricingwould be inadequate, and this is net negative, e.g. because it’s just too hard to evaluate the altruistic impact of a project. (I think this is fairly likely but perhaps unlikely to make Certificates of Impact strongly net negative)
The presence of money / explicit valuing of projects would destroy intrinsic motivation for Issuers and generally make the whole thing very sordid and transactional. (I think this is unlikely to be a major issue)
The weirdness of the whole concept from the point of view of the Issuer and/or Ultimate Buyer would make this unworkable. (I think this is somewhat unlikely to happen)
Trying to motivate altruistic behaviour with money will be net negative due to fraud etc. (I think this is quite plausible and would be quite bad)
Explicit certification of rich people as causing large altruistic impact will be very unpopular. (unclear to me how likely this is or how bad it would be)
Poor market behaviour such as crashes and bubbles would be very damaging. (seems unlikely to be net-negative to me, but it’s probably still a concern)
Impact is expensive because you have to pay at the price at which you value the impact, rather than e.g. what the project costs. (unlikely to be a major issue in my view, but probably still a concern)
Infrastructure costs would outweigh the benefits (I think this is possibly an issue / closely related to feasibility)
Here’s more detail
Inadequate pricing of altruistic impact
If Ultimate Buyers have poor altruistic preferences (e.g. favouring cute animals over vast numbers of future people) (or perhaps even completely selfish preferences?) the right things won’t get rewarded and the market could be completely dysfunctional.
Some things are not easy to assess after the fact - an extreme case is an intervention that causes extinction.
Downside risks from projects are not accounted for - this incentivises low (or negative) expected impact projects because the price the Issuer can sell at is floored at zero, distorting the expected benefit for the Issuer
You’d be better off making a GiveWell for longtermist projects, giving money to the Open Philanthropy Project, etc
Undesirable effects of putting a price on everything
This would make the altruistic ecosystem very sordid and transactional (for everyone involved and/or for the general public).
This would remove intrinsic motivation for the Issuer.
The weirdness of the concept of Certificates of Impact makes this unworkable
People who want to donate won’t buy into the idea that buying a certificate causes the impact to happen. It’s too confusing and causally remote from the work the Issuer did.
It would be pretty weird for people who are altruistically motivated to do a project if they sell a certificate for all the work and consequently are not (deemed to be) causally responsible for the impact of their project.
You can sell a fraction of the work, but what fraction should you choose?
Looking at the current financial system helps see the flaws that markets can have. E.g. fraud by Issuer - even at a low level by creating incentives to exaggerate impact. Or any other issues with the Issuer optimising for the value of the certificate rather than optimising for altruistic impact. In the same way that public corporations now have an overwhelmingly strong “duty to shareholders” to maximise profits, even though this is very harmful to society.
Contrast “provide money to enable intrinsically motivated people to do good thing” with “pay for anyone to do things that appear high impact”. Maybe you don’t get great outcomes with the latter.
Contrast with the current status quo where things feel pretty cooperative (to me), at least inside Effective Altruism circles.
Ordinary people may be very unhappy with a system that explicitly certifies that a rich person caused lots of positive impact.
The weird things that markets can do - like crash or have bubbles - would be very damaging.
Unlike for a normal market, you have to pay at the price you value the impact at, rather than paying a cheaper price and benefitting from the consumer surplus as you do in ordinary product markets.
The infrastructure costs would outweigh the benefits: e.g. project validation, impact assessment, market infrastructure, etc
Information value of experimentation
The negative points may not weigh so heavily if we think we can run small, reversible experiments to get more information.