Taboo "Outside View"

post by kokotajlod · 2021-06-17T09:39:12.385Z · EA · GW · 20 comments

Contents

  What does “Outside view” mean now?
    List O’ Things People Describe As Outside View:
    List O’ Things People Describe As Inside View:
  What did “Outside view” mean originally?
  This expansion of meaning is bad
  Taboo Outside View; use this list of words instead
  Conclusion
None
20 comments

No one has ever seen an AGI takeoff, so any attempt to understand it must use these outside view considerations.

—[Redacted for privacy]

What? That’s exactly backwards. If we had lots of experience with past AGI takeoffs, using the outside view to predict the next one would be a lot more effective.

—My reaction

Two years ago I wrote a deep-dive summary of Superforecasting and the associated scientific literature. I learned about the “Outside view” / “Inside view” distinction, and the evidence supporting it. At the time I was excited about the concept and wrote: “...I think we should do our best to imitate these best-practices, and that means using the outside view far more than we would naturally be inclined.

Now that I have more experience, I think the concept is doing more harm than good in our community. The term is easily abused and its meaning has expanded too much. I recommend we permanently taboo [? · GW] “Outside view,” i.e. stop using the word and use more precise, less confused concepts instead. This post explains why. 

What does “Outside view” mean now?

Over the past two years I’ve noticed people (including myself!) do lots of different things in the name of the Outside View. I’ve compiled the following lists based on fuzzy memory of hundreds of conversations with dozens of people:

Big List O’ Things People Describe As Outside View:

Big List O’ Things People Describe As Inside View:

What did “Outside view” mean originally?

As far as I can tell, it basically meant reference class forecasting. Kaj Sotala tells me the original source of the concept (cited by the Overcoming Bias post that brought it to our community) was this paper. Relevant quote: “The outside view is ... essentially ignores the details of the case at hand, and involves no attempt at detailed forecasting of the future history of the project. Instead, it focuses on the statistics of a class of cases chosen to be similar in relevant respects to the present one.” If you look at the text of Superforecasting, the “it basically means reference class forecasting” interpretation holds up. Also, “Outside view” redirects to “reference class forecasting” in Wikipedia.

To head off an anticipated objection: I am not claiming that there is no underlying pattern to the new, expanded meanings of “outside view” and “inside view.” I even have a few ideas about what the pattern is. For example, priors are sometimes based on reference classes, and even when they are instead based on intuition, that too can be thought of as reference class forecasting in the sense that intuition is often just unconscious, fuzzy pattern-matching, and pattern-matching is arguably a sort of reference class forecasting. And Ajeya’s model can be thought of as inside view relative to e.g. GDP extrapolations, while also outside view relative to e.g. deferring to Dario Amodei.

However, it’s easy to see patterns everywhere if you squint. These lists are still pretty diverse. I could print out all the items on both lists and then mix-and-match to create new lists/distinctions, and I bet I could come up with several at least as principled as this one.

This expansion of meaning is bad

When people use “outside view” or “inside view” without clarifying which of the things on the above lists they mean, I am left ignorant of what exactly they are doing and how well-justified it is. People say “On the outside view, X seems unlikely to me.” I then ask them what they mean, and sometimes it turns out they are using some reference class, complete with a dataset. (Example: Tom Davidson’s four reference classes for TAI). Other times it turns out they are just using the anti-weirdness heuristic. Good thing I asked for elaboration! 

Separately, various people seem to think that the appropriate way to make forecasts is to (1) use some outside-view methods, (2) use some inside-view methods, but only if you feel like you are an expert in the subject, and then (3) do a weighted sum of them all using your intuition to pick the weights. This is not Tetlock’s advice, nor is it the lesson from the forecasting tournaments, especially if we use the nebulous modern definition of “outside view” instead of the original definition. (For my understanding of his advice and those lessons, see this post, part 5. For an entire book written by Yudkowsky on why the aforementioned forecasting method is bogus, see Inadequate Equilibria, especially this chapter [LW · GW]. Also, I wish to emphasize that I myself was one of these people, at least sometimes, up until recently when I noticed what I was doing!)

Finally, I think that too often the good epistemic standing of reference class forecasting is illicitly transferred to the other things in the list above. I already gave the example of the anti-weirdness heuristic; my second example will be bias correction: I sometimes see people go “There’s a bias towards X, so in accordance with the outside view I’m going to bump my estimate away from X.” But this is a different sort of bias correction. To see this, notice how they used intuition to decide how much to bump their estimate, and they didn’t consider other biases towards or away from X. The original lesson was that biases could be corrected by using reference classes. Bias correction via intuition may be a valid technique, but it shouldn’t be called the outside view.

I feel like it’s gotten to the point where, like, only 20% of uses of the term “outside view” involve reference classes. It seems to me that “outside view” has become an applause light [LW · GW] and a smokescreen for over-reliance on intuition, the anti-weirdness heuristic, deference to crowd wisdom, correcting for biases in a way that is itself a gateway to more bias... 

I considered advocating for a return to the original meaning of “outside view,” i.e. reference class forecasting. But instead I say:

Taboo Outside View; use this list of words instead

I’m not recommending that we stop using reference classes! I love reference classes! I also love trend extrapolation! In fact, for literally every tool on both lists above, I think there are situations where it is appropriate to use that tool. Even the anti-weirdness heuristic.

What I ask is that we stop using the words “outside view” and “inside view.” I encourage everyone to instead be more specific. Here is a big list of more specific words that I’d love to see, along with examples of how to use them:

Conclusion

Whenever you notice yourself saying “outside view” or “inside view,” imagine a tiny Daniel Kokotajlo hopping up and down on your shoulder chirping “Taboo outside view.” 

 

Many thanks to the many people who gave comments on a draft: Vojta, Jia, Anthony, Max, Kaj, Steve, and Mark. Also thanks to various people I ran the ideas by earlier.

20 comments

Comments sorted by top scores.

comment by EliezerYudkowsky · 2021-06-17T17:23:55.491Z · EA(p) · GW(p)

I worriedly predict that anyone who followed your advice here would just switch to describing whatever they're doing as "reference class forecasting" since this captures the key dynamic that makes describing what they're doing as "outside viewing" appealing: namely, they get to pick a choice of "reference class" whose samples yield the answer they want, claim that their point is in the reference class, and then claiming that what they're doing is what superforecasters do and what Philip Tetlock told them to do and super epistemically virtuous and anyone who argues with them gets all the burden of proof and is probably a bad person but we get to virtuously listen to them and then reject them for having used the "inside view".

My own take:  Rule One of invoking "the outside view" or "reference class forecasting" is that if a point is more dissimilar to examples in your choice of "reference class" than the examples in the "reference class" are dissimilar to each other, what you're doing is "analogy", not "outside viewing".

All those experimental results on people doing well by using the outside view are results on people drawing a new sample from the same bag as previous samples.  Not "arguably the same bag" or "well it's the same bag if you look at this way", really actually the same bag: how late you'll be getting Christmas presents this year, based on how late you were in previous years.  Superforecasters doing well by extrapolating are extrapolating a time-series over 20 years, which was a straight line over those 20 years, to another 5 years out along the same line with the same error bars, and then using that as the baseline for further adjustments with due epistemic humility about how sometimes straight lines just get interrupted some year.  Not by them picking a class of 5 "relevant" historical events that all had the same outcome, and arguing that some 6th historical event goes in the same class and will have that same outcome.

Replies from: kokotajlod, Linch
comment by kokotajlod · 2021-06-17T21:05:33.215Z · EA(p) · GW(p)

Good point, I'll add analogy to the list. Much that is called reference class forecasting is really just analogy, and often not even a good analogy.

I really think we should taboo "outside view." If people are forced to use the term "reference class" to describe what they are doing, it'll be more obvious when they are doing epistemically shitty things, because the term "reference class" invites the obvious next questions: 1. What reference class? 2. Why is that the best reference class to use?

comment by Linch · 2021-06-17T17:45:36.305Z · EA(p) · GW(p)

All those experimental results on people doing well by using the outside view are results on people drawing a new sample from the same bag as previous samples.  Not "arguably the same bag" or "well it's the same bag if you look at this way", really actually the same bag: how late you'll be getting Christmas presents this year, based on how late you were in previous years

Hmm, I'm not convinced that this is meaningfully different in kind rather than degree. You aren't predicting a randomly chosen holdout year, so saying that 2021 is from the same distribution as 2011-2020 is still a take. "X thing I do in the future is from the same distribution of all my attempts in past years*" is still a judgement call, albeit a much easier one than AI timelines. 

I agree with (part of) your broader point that incareful applications of the outside view and similar vibes is very susceptible to motivated reasoning (including but not limited to the absurdity heuristic), but I guess my take here is that we should just be more careful individually and more willing to point out bad epistemic moves in others (as you've often done a good job of!) as a community. 

All our tools are limited and corruptible, and I don't think on balance reference class forecasting is more susceptible to motivated reasoning than other techniques.

*are you using your last 10 years? since you've been an adult? all the years you've been alive? 

comment by Ben Garfinkel (bmg) · 2021-06-18T22:20:00.683Z · EA(p) · GW(p)

When people use “outside view” or “inside view” without clarifying which of the things on the above lists they mean, I am left ignorant of what exactly they are doing and how well-justified it is. People say “On the outside view, X seems unlikely to me.” I then ask them what they mean, and sometimes it turns out they are using some reference class, complete with a dataset. (Example: Tom Davidson’s four reference classes for TAI). Other times it turns out they are just using the anti-weirdness heuristic. Good thing I asked for elaboration!

FWIW, as a contrary datapoint, I don’t think I’ve really encountered this problem much in conversation. In my own experience (which may be quite different from yours): when someone makes some reference to an “outside view,” they say something that indicates roughly what kind of “outside view” they’re using. For example, if someone is just extrapolating a trend forward, they’ll reference the trend. Or if someone is deferring to expert opinion, they’ll reference expert opinion. I also don’t think I’d find it too bothersome, in any case, to occasionally have to ask the person which outside view they have in mind.

So this concern about opacity wouldn’t be enough to make me, personally, want people to stop using the term “outside view.”

If there’s a really serious linguistic issue, here, I think it’s probably that people sometimes talk about "the outside view” as though there's only a single relevant outside view. I think Michael Aird made a good comment [EA(p) · GW(p)] on my recent democracy post, where he suggests that people should taboo the phrase “the outside view” and instead use the phrase “an outside view.” (I was guilty of using the phrase “the outside view” in that post — and, arguably, of leaning too hard on one particular way of defining a reference class.) I’d be pretty happy if people just dropped the “the,” but kept talking about “outside views.”[1]


  1. It’s of course a little ambiguous what counts as an “outside view,” but in practice I don’t think this is too huge of an issue. In my experience, which again may be different from yours, “taking an outside view” still does typically refer to using some sort of reference-class-based reasoning. It’s just the case that there are lots of different reference classes that people use. (“I’m extrapolating this 20-year trend forward, for another five years, because if a trend has been stable for 20 years it’s typically stable for another five.” “I’m deferring to the experts in this survey, because experts typically have more accurate views than amateurs.” Etc.) I do feel like this style of reasoning is useful and meaningfully distinct from, for example, reasoning based on causal models, so I’m happy to have a term for it, even if the boundaries of the concept are somewhat fuzzy. ↩︎

Replies from: kokotajlod
comment by kokotajlod · 2021-06-19T06:00:00.197Z · EA(p) · GW(p)

Thanks for this thoughtful pushback. I agree that YMMV; I'm reporting how these terms seem to be used in my experience but my experience is limited.

I think opacity is only part of the problem; illicitly justifying sloppy reasoning is most of it. (My second and third points in "this expansion of meaning is bad" section.) There is an aura of goodness surrounding the words "outside view" because of the various studies showing how it is superior to the inside view in various circumstances, and because of e.g. Tetlock's advice to start with the outside view and then adjust. (And a related idea that we should only use inside view stuff if we are experts... For more on the problems I'm complaining about, see the meme, or Eliezer's comment.) This is all well and good if we use those words to describe what was actually talked about by the studies, by Tetlock, etc. but if instead we have the much broader meaning of the term, we are motte-and-bailey-ing ourselves.




 

Replies from: bmg
comment by Ben Garfinkel (bmg) · 2021-06-19T11:54:11.099Z · EA(p) · GW(p)

I agree that people sometimes put too much weight on particular outside views -- or do a poor job of integrating outside views with more inside-view-style reasoning. For example, in the quote/paraphrase you present at the top of your post, something has clearly gone wrong.[1]

But I think the best intervention, in this case, is probably just to push the ideas "outside views are often given too much weight" or "heavily reliance on outside views shouldn't be seen as praiseworthy" or "the correct way to integrate outside views with more inside-view reasoning is X." Tabooing the term itself somehow feels a little roundabout to me, like a linguistic solution to a methodological disagreement.

I think you're right that "outside view" now has a very positive connotation. If enough community members become convinced that this positive connotation is unearned, though, I think the connotation will probably naturally become less positive over time. For example, the number of upvotes on this post is a signal that people shouldn't currently expect that much applause for using the term "outside view."


  1. As a caveat, although I'm not sure how much this actually matters for the present discussion, I probably am significantly less concerned about the problem than you are. I'm pretty confident that the average intellectual doesn't pay enough attention to "outside views" -- and I think that, absent positive reinforcement from people in your community, it actually does take some degree of discipline to take outside views sufficiently seriously. I'm open to the idea that the average EA community member has over-corrected, here, but I'm not yet convinced of it. I think it's also possible that, in a lot of cases, the natural substitute for bad outside-view-heavy reasoning is worse inside-view-heavy reasoning. ↩︎

Replies from: kokotajlod
comment by kokotajlod · 2021-06-21T08:02:53.371Z · EA(p) · GW(p)

But I think the best intervention, in this case, is probably just to push the ideas "outside views are often given too much weight" or "heavily reliance on outside views shouldn't be seen as praiseworthy" or "the correct way to integrate outside views with more inside-view reasoning is X." Tabooing the term itself somehow feels a little roundabout to me, like a linguistic solution to a methodological disagreement.

 

On the contrary; tabooing the term is more helpful, I think. I've tried to explain why in the post. I'm not against the things "outside view" has come to mean; I'm just against them being conflated with / associated with each other, which is what the term does. If my point was simply that the first Big List was overrated and the second Big List was underrated, I would have written a very different post!

I'm pretty confident that the average intellectual doesn't pay enough attention to "outside views" -- and I think that, absent positive reinforcement from people in your community, it actually does take some degree of discipline to take outside views sufficiently seriously.

By what definition of "outside view?" There is some evidence that in some circumstances people don't take reference class forecasting seriously enough; that's what the original term "outside view" meant. What evidence is there that the things on the Big List O' Things People Describe as Outside View are systematically underrated by the average intellectual?

Replies from: bmg
comment by Ben Garfinkel (bmg) · 2021-06-21T11:26:13.679Z · EA(p) · GW(p)

On the contrary; tabooing the term is more helpful, I think. I've tried to explain why in the post. I'm not against the things "outside view" has come to mean; I'm just against them being conflated with / associated with each other, which is what the term does. If my point was simply that the first Big List was overrated and the second Big List was underrated, I would have written a very different post!

My initial comment was focused on your point about conflation, because I think this point bears on the linguistic question more strongly than the other points do. I haven’t personally found conflation to be a large issue. (Recognizing, again, that our experiences may differ.) If I agreed with the point about conflation, though, then I would think it might be worth tabooing the term "outside view."

By what definition of "outside view?"

By “taking an outside view on X” I basically mean “engaging in statistical or reference-class-based reasoning.” I think it might also be best defined negatively: "reasoning that doesn’t substantially involve logical deduction or causal models of the phenomenon in question. "[1]

I think most of the examples in your list fit these definitions.

Epistemic deference is a kind of statistical/reference-class-based reasoning, for example, which doesn't involve applying any sort of causal model of the phenomenon in question. The logic is “Ah, I should update downward on this claim, since experts in domain X disagree with it and I think that experts in domain X will typically be right.”

Same for anti-weirdness: The idea is that weird claims are typically wrong.

I’d say that trend extrapolation also fits: You’re not doing logical reasoning or relying on a causal model of the relevant phenomenon. You’re just extrapolating a trend forward, largely based on the assumption that long-running trends don’t typically end abruptly.

“Foxy aggregation,” admittedly, does seem like a different thing to me: It arguably fits the negative definition, depending on how you generate your weights, but doesn’t seem to fit statistical/reference-class one. It also feels like more of a meta-level thing. So I wouldn’t personally use the term “outside view” to talk about foxy aggregation. (I also don’t think I’ve personally heard people use the term “outside view” to talk about foxy aggegration, although I obviously believe you have.)

There is some evidence that in some circumstances people don't take reference class forecasting seriously enough; that's what the original term "outside view" meant. What evidence is there that the things on the Big List O' Things People Describe as Outside View are systematically underrated by the average intellectual?

A condensed answer is: (a) I think most public intellectuals barely use any of the items on this list (with the exception of the anti-weirdness heuristic); (b) I think some of the things on this list are often useful; (c) I think that intellectuals and people more generally are very often bad at reasoning causally/logically about complex social phenomena; (d) I expect intellectuals to often have a bias against outside-view-style reasoning, since it often feels somewhat unsatisfying/unnatural and doesn't allow them to display impressive-seeming domain-knowledge, interesting models of the world, or logical reasoning skills; and (e) I do still think Tetlock’s evidence is at least somewhat relevant to most things on the list, in part because I think they actually are somewhat related to each other, although questions of external validity obviously grow more serious the further you move from the precise sorts of questions asked in his tournaments and the precise styles of outside-view reasoning displayed by participants. [2]

There’s also, of course, a bit of symmetry here. One could also ask: “What evidence is there that the things on the Big List O' Things People Describe as Outside View are systematically overrated by the average intellectual?” :)


  1. These definitions of course aren’t perfect, and other people sometimes use the term more broadly than I do, but, again, some amount of fuzziness seems OK to me. Most concepts have fuzzy boundaries and are hard to define precisely. ↩︎

  2. On the Tetlock evidence: I think one thing his studies suggest, which I expect to generalize pretty well to many different contexts, is that people who are trying to make predictions about complex phenemona (especially complex social phenemona) often do very poorly when they don't incorporate outside views into their reasoning processes. (You can correct me if this seems wrong, since you've thought about Tetlock's work far more than I have.) So, on my understanding, Tetlock's work suggests that outside-view-heavy reasoning processes would often substitute for reasoning processes that lead to poor predictions anyways. At least for most people, then, outside-view-heavy reasoning processes don't actually need to be very reliable to constitute improvements -- and they need to be pretty bad to, on average, lead to worse predictions.

    Another small comment here: I think Tetlock's work also counts, in a somewhat broad way, against the "reference class tennis" objection to reference-class-based forecasting. On its face, the objection also applies to the use of reference classes in standard forecasting tournaments. There are always a ton of different reference classes someone could use to forecast any given political event. Forecasters need to rely on some sort of intuition, or some sort of fuzzy reasoning, to decide on which reference classes to take seriously; it's a priori plausible that people would be just consistently very bad at this, given the number of degrees of freedom here and the absence of clear principles for making one's selections. But this issue doesn't actually seem to be that huge in the context of the sorts of questions Tetlock asked his participants. (You can again correct me if I'm wrong.) The degrees-of-freedom problem might be far larger in other contexts, but the fact that the issue is manageable in Tetlockian contexts presumably counts as at least a little bit of positive evidence. ↩︎

Replies from: kokotajlod
comment by kokotajlod · 2021-06-22T21:07:25.285Z · EA(p) · GW(p)

I said in the post, I'm a fan of reference classes. I feel like you think I'm not? I am! I'm also a fan of analogies. And I love trend extrapolation. I admit I'm not a fan of the anti-weirdness heuristic, but even it has its uses. In general most of what you are saying in this thread is stuff I agree with, which makes me wonder if we are talking past each other. (Example 1: Your second small comment about reference class tennis. Example 2: Your first small comment, if we interpret instances of "outside view" as meaning "reference classes" in the strict sense, though not if we use the broader definition you favor. Example 3: your points a, b, c, and e. (point d, again, depends on what you mean by 'outside view,' and also what counts as often.)

My problem is with the term "Outside view." (And "inside view" too!) I don't think you've done much to argue in favor of it in this thread. You have said that in your experience it doesn't seem harmful; fair enough, point taken. In mine it does. You've also given two rough definitions of the term, which seem quite different to me, and also quite fuzzy. (e.g. if by "reference class forecasting" you mean the stuff Tetlock's studies are about, then it really shouldn't include the anti-weirdness heuristic, but it seems like you are saying it does?) I found myself repeatedly thinking "but what does he mean by outside view? I agree or don't agree depending on what he means..." even though you had defined it earlier. You've said that you think the practices you call "outside view" are underrated and deserve positive reinforcement; I totally agree that some of them are, but I maintain that some of them are overrated, and would like to discuss each of them on a case by case basis instead of lumping them all together under one name. Of course you are free to use whatever terms you like, but I intend to continue to ask people to be more precise when I hear "outside view" or "inside view." :)



 

Replies from: bmg
comment by Ben Garfinkel (bmg) · 2021-06-23T12:03:55.872Z · EA(p) · GW(p)

It’s definitely entirely plausible that I’ve misunderstood your views.

My interpretation of the post was something like this:

There is a bag of things that people in the EA community tend to describe as “outside views.” Many of the things in this bag are over-rated or mis-used by members of the EA community, leading to bad beliefs.

One reason for this over-use or mis-use is that the the term “outside view” has developed an extremely positive connotation within the community. People are applauded for saying that they’re relying on “outside views” — “outside view” has become “an applause light” — and so will rely on items in the bag to an extent that is epistemically unjustified.

The things in the bag are also pretty different from each other — and not everyone who uses the term “outside view” agrees about exactly what belongs in the bag. This conflation/ambiguity can lead to miscommunication.

More importantly, when it comes to the usefulness of the different items in the bag, some have more evidential support than others. Using the term “outside view” to refer to everything in the bag might therefore lead people to overrated certain items that actually have weak evidential support.

To sum up, tabooing the term “outside view” might solve two problems. First, it might reduce miscommunication. Second, more importantly, it might cause people to stop overrating some of the reasoning processes that they currently characterize as involving “outside views.” The mechanisms by which tabooing the term can help to solve the second problem are: (a) it takes away an “applause light,” whose existence incentivizes excessive use of these reasoning processes, and (b) it allows people to more easily recognize that some of these reasoning processes don't actually have much empirical support.

I’m curious if this feels roughly right, or feels pretty off.

Part of the reason I interpreted your post this way: The quote you kicked the post off suggested to me that your primary preoccupation was over-use or mis-use of the tools people called “outside views,” including more conventional reference-class forecasting. It seemed like the quote is giving an example of someone who’s refusing to engage in causal reasoning, evaluate object-level arguments, etc., based on the idea that outside views are just strictly dominant in the context of AI forecasting. It seemed like this would have been an issue even if the person was doing totally orthodox reference-class forecasting and there was no ambiguity about what they were doing.[1]

I don’t think that you’re generally opposed to the items in the “outside view” bag or anything like that. I also don’t assume that you disagree with most of the points I listed in my last comment, for why I think intellectuals probably on average underrated the items in the bag. I just listed all of them because you asked for an explanation for my view, I suppose with some implication that you might disagree with it.

You've also given two rough definitions of the term, which seem quite different to me, and also quite fuzzy. (e.g. if by "reference class forecasting" you mean the stuff Tetlock's studies are about, then it really shouldn't include the anti-weirdness heuristic, but it seems like you are saying it does?)

I think it’s probably not worth digging deeper on the definitions I gave, since I definitely don’t think they're close to perfect. But just a clarification here, on the anti-weirdness heuristic: I’m thinking of the reference class as “weird-sounding claims.”

Suppose someone approaches you not the street and hands you a flyer claiming: “The US government has figured out a way to use entangled particles to help treat cancer, but political elites are hoarding the particles.” You quickly form a belief that the flyer’s claim is almost certainly false, by thinking to yourself: “This is a really weird-sounding claim, and I figure that virtually all really weird-sounding claims that appear in random flyers are wrong.”

In this case, you’re not doing any deductive reasoning about the claim itself or relying on any causal models that directly bear on the claim. (Although you could.) For example, you’re not thinking to yourself: “Well, I know about quantum mechanics, and I know entangled particles couldn’t be useful for treating cancer for reason X.” Or: “I understand economic incentives, or understand social dynamics around secret-keeping, so I know it’s unlikely this information would be kept secret.” You’re just picking a reference class — weird-sounding claims made on random flyers — and justifying your belief that way.

I think it’s possible that Tetlock’s studies don’t bear very strongly on the usefulness of this reference class, since I imagine participants in his studies almost never used it. (“The claim ‘there will be a coup in Venezuela in the next five years’ sounds really weird to me, and most claims that sound weird to me aren't true, so it’s probably not true!”) But I think the anti-weirdness heuristic does fit with the definitions I gave, as well as the definition you give that characterizes the term's "original meaning." I also do think that Tetlock's studies remain at least somewhat relevant when judging the potential usefulness of the heuristic.


  1. I initially engaged on the miscommunication, point, though, since this is the concern that would mostly strongly make me want to taboo the term. I’d rather address the applause light problem, if it is a problem, but trying get people in the EA community stop applauding, and the evidence problem, if it is a problem, by trying to just directly make people in the EA community more aware of the limits of evidence. ↩︎

Replies from: kokotajlod
comment by kokotajlod · 2021-06-23T17:03:20.548Z · EA(p) · GW(p)

Wow, that's an impressive amount of charitable reading + attempting-to-ITT you did just there, my hat goes off to you sir!

I think that summary of my view is roughly correct. I think it over-emphasizes the applause light aspect compared to other things I was complaining about; in particular, there was my second point in the "this expansion of meaning is bad" section, about how people seem to think that it is important to have an outside view and an inside view (but only an inside view if you feel like you are an expert) which is, IMO, totally not the lesson one should draw from Tetlock's studies etc., especially not with the modern, expanded definition of these terms. I also think that while I am mostly complaining about what's happened to "outside view," I also think similar things apply to "inside view" and thus I recommend tabooing it also. 

In general, the taboo solution feels right to me; when I imagine re-doing various conversations I've had, except without that phrase, and people instead using more specific terms, I feel like things would just be better. I shudder at the prospect of having a discussion about "Outside view vs inside view: which is better? Which is overrated and which is underrated?" (and I've worried that this thread may be tending in that direction) but I would really look forward to having a discussion about "let's look at Daniel's list of techniques and talk about which ones are overrated and underrated and in what circumstances each is appropriate."

Now I'll try to say what I think your position is:

1. If people were using "outside view" without explaining more specifically what they mean, that would be bad and it should be tabood, but you don't see that in your experience
2. If the things in the first Big List were indeed super diverse and disconnected from the evidence in Tetlock's studies etc., then there would indeed be no good reason to bundle them together under one term. But in fact this isn't the case; most of the things on the list are special cases of reference-class / statistical reasoning, which is what Tetlock's studies are about. So rather than taboo "outside view" we should continue to use the term but mildly prune the list.
3. There may be a general bias in this community towards using the things on the first Big List, but (a) in your opinion the opposite seems more true, and (b) at any rate even if this is true the right response is to argue for that directly rather than advocating the tabooing of the term.

How does that sound?

Replies from: bmg
comment by Ben Garfinkel (bmg) · 2021-06-29T13:05:06.146Z · EA(p) · GW(p)

Thank you (and sorry for my delayed response)!

I shudder at the prospect of having a discussion about "Outside view vs inside view: which is better? Which is overrated and which is underrated?" (and I've worried that this thread may be tending in that direction) but I would really look forward to having a discussion about "let's look at Daniel's list of techniques and talk about which ones are overrated and underrated and in what circumstances each is appropriate."

I also shudder a bit at that prospect.

I am sometimes happy making pretty broad and sloppy statements. For example: "People making political predictions typically don't make enough use of 'outside view' perspectives" feels fine to me, as a claim, despite some ambiguity around the edges. (Which perspectives should they use? How exactly should they use them? Etc.)

But if you want to dig in deep, for example when evaluating the rationality of a particular prediction, you should definitely shift toward making more specific and precise statements. For example, if someone has based their own AI timelines on Katja's expert survey, and they wanted to defend their view by simply evoking the principle "outside views are better than inside views," I think this would probably a horrible conversation. A good conversation would focus specifically on the conditions under which it makes sense to defer heavily to experts, whether those conditions apply in this particular case, etc. Some general Tetlock stuff might come into the conversation, like: "Tetlock's work suggests it's easy to trip yourself up if you try to use your own detailed/causal model of the world to make predictions, so you shouldn't be so confident that your own 'inside view' prediction will be very good either." But mostly you should be more specific.

Now I'll try to say what I think your position is:

  1. If people were using "outside view" without explaining more specifically what they mean, that would be bad and it should be tabood, but you don't see that in your experience
  2. If the things in the first Big List were indeed super diverse and disconnected from the evidence in Tetlock's studies etc., then there would indeed be no good reason to bundle them together under one term. But in fact this isn't the case; most of the things on the list are special cases of reference-class / statistical reasoning, which is what Tetlock's studies are about. So rather than taboo "outside view" we should continue to use the term but mildly prune the list.
  3. There may be a general bias in this community towards using the things on the first Big List, but (a) in your opinion the opposite seems more true, and (b) at any rate even if this is true the right response is to argue for that directly rather than advocating the tabooing of the term.

How does that sound?

I'd say that sounds basically right!

The only thing is that I don't necessarily agree with 3a.

I think some parts of the community lean too much on things in the bag (the example you give at the top of the post is an extreme example). I also think that some parts of the community lean too little on things in the bag, in part because (in my view) they're overconfident in their own abilities to reason causally/deductively in certain domains. I'm not sure which is overall more problematic, at the moment, in part because I'm not sure how people actually should be integrating different considerations in domains like AI forecasting.

There also seem to be biases that cut in both directions. I think the 'baseline bias' is pretty strongly toward causal/deductive reasoning, since it's more impressive-seeming, can suggest that you have something uniquely valuable to bring to the table (if you can draw on lots of specific knowledge or ideas that it's rare to possess), is probably typically more interesting and emotionally satisfying, and doesn't as strongly force you to confront or admit the limits of your predictive powers. The EA community has definitely introduced an (unusual?) bias in the opposite direction, by giving a lot of social credit to people who show certain signs of 'epistemic virtue.' I guess the pro-causal/deductive bias often feels more salient to me, but I don't really want to make any confident claim here that it actually is more powerful.

Replies from: bmg, kokotajlod
comment by Ben Garfinkel (bmg) · 2021-06-29T13:46:41.044Z · EA(p) · GW(p)

As a last thought here (no need to respond), I thought it might useful to give one example of a concrete case where: (a) Tetlock’s work seems relevant, and I find the terms “inside view” and “outside view” natural to use, even though the case is relatively different from the ones Tetlock has studied; and (b) I think many people in the community have tended to underweight an “outside view.”

A few years ago, I pretty frequently encountered the claim that recently developed AI systems exhibited roughly “insect-level intelligence.” This claim was typically used to support an argument for short timelines, since the claim was also made that we now had roughly insect-level compute. If insect-level intelligence has arrived around the same time as insect-level compute, then, it seems to follow, we shouldn’t be at all surprised if we get ‘human-level intelligence’ at roughly the point where we get human-level compute. And human-level compute might be achieved pretty soon.

For a couple of reasons, I think some people updated their timelines too strongly in response to this argument. First, it seemed like there are probably a lot of opportunities to make mistakes when constructing the argument: it’s not clear how “insect-level intelligence” or “human-level intelligence” should be conceptualised, it’s not clear how best to map AI behaviour onto insect behaviour, etc. The argument also hadn't yet been vetted closely or expressed very precisely, which seemed to increase the possibility of not-yet-appreciated issues.

Second, we know that there are previous of examples of smart people looking at AI behaviour and forming the impression that it suggests “insect-level intelligence.” For example, in Nick Bostrom’s paper “How Long Before Superintelligence?” (1998) he suggested that “approximately insect-level intelligence” was achieved sometime in the 70s, as a result of insect-level computing power being achieved in the 70s. In Moravec’s book Mind Children (1990), he also suggested that both insect-level intelligence and insect-level compute had both recently been achieved. Rodney Brooks also had this whole research program, in the 90s, that was based around going from “insect-level intelligence” to “human-level intelligence.”

I think many people didn’t give enough weight to the reference class “instances of smart people looking at AI systems and forming the impression that they exhibit insect-level intelligence” and gave too much weight to the more deductive/model-y argument that had been constructed.

This case is obviously pretty different than the sorts of cases that Tetlock’s studies focused on, but I do still feel like the studies have some relevance. I think Tetlock’s work should, in a pretty broad way, make people more suspicious of their own ability to perform to linear/model-heavy reasoning about complex phenomena, without getting tripped up or fooling themselves. It should also make people somewhat more inclined to take reference classes seriously, even when the reference classes are fairly different from the sorts of reference classes good forecasters used in Tetlock’s studies. I do also think that the terms “inside view” and “outside view” apply relatively neatly, in this case, and are nice bits of shorthand — although, admittedly, it’s far from necessary to use them.

This is the sort of case I have in the back of my mind.

(There are also, of course, cases that point in the opposite direction, where many people seemingly gave too much weight to something they classified as an "outside view." Early under-reaction to COVID is arguably one example.)

Replies from: Paul_Christiano
comment by Paul_Christiano · 2021-06-30T17:35:55.709Z · EA(p) · GW(p)

The Nick Bostrom quote (from here) is:

In retrospect we know that the AI project couldn't possibly have succeeded at that stage. The hardware was simply not powerful enough. It seems that at least about 100 Tops is required for human-like performance, and possibly as much as 10^17 ops is needed. The computers in the seventies had a computing power comparable to that of insects. They also achieved approximately insect-level intelligence.

I would have guessed this is just a funny quip, in the sense that (i) it sure sounds like it's just a throw-away quip, no evidence is presented for those AI systems being competent at anything (he moves on to other topics in the next sentence), "approximately insect-level" seems appropriate as a generic and punchy stand in for "pretty dumb," (ii) in the document he is basically just thinking about AI performance on complex tasks and trying to make the point that you shouldn't be surprised by subhuman performance on those tasks, which doesn't depend much on the literal comparison to insects, (iii) the actual algorithms described in the section (neural nets and genetic algorithms) wouldn't plausibly achieve insect-level performance in the 70s since those algorithms in fact do require large training processes (and were in fact used in the 70s to train much tinier neural networks).

(Of course you could also just ask Nick.)

I also think it's worth noting that the prediction in that section looks reasonably good in hindsight. It was written right at the beginning of resurgent interest in neural networks (right before Yann LeCun's paper on MNIST with neural networks). The hypothesis "computers were too small in the past so that's why they were lame" looks like it was a great call, and Nick's tentative optimism about particular compute-heavy directions looks good. I think overall this is a significantly better take than mainstream opinions in AI. I don't think this literally affects your point, but it is relevant if the implicit claim is "And people talking about insect comparisons were lead astray by these comparisons."

I suspect you are more broadly underestimating the extent to which people used "insect-level intelligence" as a generic stand-in for "pretty dumb," though I haven't looked at the discussion in Mind Children and Moravec may be making a stronger claim. I'd be more inclined to tread carefully if some historical people tried to actually compare the behavior of their AI system to the behavior of an insect and found it comparable as in posts like this one [AF · GW] (it's not clear to me how such an evaluation would have suggested insect-level robotics in the 90s or even today, I think the best that can be said is that today it seems compatible with insect-level robotics in simulation today). I've seen Moravec use the phrase "insect-level intelligence" to refer to the particular behaviors of "following pheromone trails" or "flying towards lights," so I might also read him as referring to those behaviors in particular. (It's possible he is underestimating the total extent of insect intelligence, e.g. discounting the complex motor control performed by insects, though I haven't seen him do that explicitly and it would be a bit off brand.)

ETA: While I don't think 1990s robotics could plausibly be described as "insect-level," I actually do think that the linked post on bee vision could plausibly have been written in the 90s and concluded that computer vision was bee-level, it's just a very hard comparison to make and the performance of the bees in the formal task is fairly unimpressive.

Replies from: bmg
comment by Ben Garfinkel (bmg) · 2021-06-30T18:57:29.949Z · EA(p) · GW(p)

I suspect you are more broadly underestimating the extent to which people used "insect-level intelligence" as a generic stand-in for "pretty dumb," though I haven't looked at the discussion in Mind Children and Moravec may be making a stronger claim.

I think that's good push-back and a fair suggestion: I'm not sure how seriously the statement in Nick's paper was meant to be taken. I hadn't considered that it might be almost entirely a quip. (I may ask him about this.)

Moravec's discussion in Mind Children is similarly brief: He presents a graph of the computing power of different animal's brains and states that "lab computers are roughly equal in power to the nervous systems of insects."He also characterizes current AI behaviors as "insectlike" and writes: "I believe that robots with human intelligence will be common within fifty years. By comparison, the best of today's machines have minds more like those of insects than humans. Yet this performance itself represents a giant leap forward in just a few decades." I don't think he's just being quippy, but there's also no suggestion that he means anything very rigorous/specific by his suggestion.

Rodney Brooks, I think, did mean for his comparisons to insect intelligence to be taken very seriously. The idea of his "nouvelle AI program" was to create AI systems that match insect intelligence, then use that as a jumping-off point for trying to produce human-like intelligence. I think walking and obstacle navigation, with several legs, was used as the main dimension of comparison. The Brooks case is a little different, though, since (IIRC) he only claimed that his robots exhibited important aspects of insect intelligence or fell just short insect intelligence, rather than directly claiming that they actually matched insect intelligence. On the other hand, he apparently felt he had gotten close enough to transition to the stage of the project that was meant to go from insect-level stuff to human-level stuff.

A plausible reaction to these cases, then, might be:

OK, Rodney Brooks did make a similar comparison, and was a major figure at the time, but his stuff was pretty transparently flawed. Moravec's and Bostrom's comments were at best fairly off-hand, suggesting casual impressions more than they suggest outcomes of rigorous analysis. The more recent "insect-level intelligence" claim is pretty different, since it's built on top of much more detailed analysis than anything Moravec/Bostrom did, and it's less obviously flawed than Brooks' analysis. The likelihood that it reflects an erroneous impression is, therefore, a lot lower. The previous cases shouldn't actually do much to raise our suspicion levels.

I think there's something to this reaction, particularly if there's now more rigorous work being done to operationalize and test the "insect-level intelligence" claim. I hadn't yet seen the recent post [AF · GW] you linked to, which, at first glance, seems like a good and clear piece of work. The more rigorous work is done to flesh out the argument, the less I'm inclined to treat the Bostrom/Moravec/Brooks cases as part of an epistemically relevant reference class.

My impression a few years ago was that the claim wasn't yet backed by any really clear/careful analysis. At least, the version that filtered down to me seemed to be substantially based on fuzzy analogies between RL agent behavior and insect behavior, without anyone yet knowing much about insect behavior. (Although maybe this was a misimpression.) So I probably do stand by the reference class being relevant back then.

Overall, to sum up, my position here is something like: "The Bostrom/Moravec/Brooks cases do suggest that it might be easy to see roughly insect-level intelligence, if that's what you expect to see and you're relying on fuzzy impressions, paying special attention to stuff AI systems can already do, or not really operationalizing your claims. This should make us more suspicious of modern claims that we've recently achieved 'insect-level intelligence,' unless they're accompanied by transparent and pretty obviously robust reasoning. Insofar as this work is being done, though, the Bostrom/Moravec/Brooks cases become weaker grounds for suspicion."

Replies from: Paul_Christiano
comment by Paul_Christiano · 2021-06-30T20:34:03.666Z · EA(p) · GW(p)

I do think my main impression of insect <-> simulated robot parity comes from very fuzzy evaluations of insect motor control vs simulated robot motor control (rather than from any careful analysis, of which I'm a bit more skeptical though I do think it's a relevant indicator that we are at least trying to actually figure out the answer here in a way that wasn't true historically). And I do have only a passing knowledge of insect behavior, from watching youtube videos and reading some book chapters about insect learning. So I don't think it's unfair to put it in the same reference class as Rodney Brooks' evaluations to the extent that his was intended as a serious evaluation.

Replies from: abergal
comment by abergal · 2021-07-01T01:39:04.689Z · EA(p) · GW(p)

Yeah, FWIW I haven't found any recent claims about insect comparisons particularly rigorous.

comment by kokotajlod · 2021-07-02T06:22:31.830Z · EA(p) · GW(p)

I am sometimes happy making pretty broad and sloppy statements. For example: "People making political predictions typically don't make enough use of 'outside view' perspectives" feels fine to me, as a claim, despite some ambiguity around the edges. (Which perspectives should they use? How exactly should they use them? Etc.)

I guess we can just agree to disagree on that for now. The example statement you gave would feel fine to me if it used the original meaning of "outside view" but not the new meaning, and since many people don't know (or sometimes forget) the original meaning...
 

A good conversation would focus specifically on the conditions under which it makes sense to defer heavily to experts, whether those conditions apply in this particular case, etc. Some general Tetlock stuff might come into the conversation, like: "Tetlock's work suggests it's easy to trip yourself up if you try to use your own detailed/causal model of the world to make predictions, so you shouldn't be so confident that your own 'inside view' prediction will be very good either." But mostly you should be more specific.

100% agreement here, including on the bolded bit.

I think some parts of the community lean too much on things in the bag (the example you give at the top of the post is an extreme example). I also think that some parts of the community lean too little on things in the bag, in part because (in my view) they're overconfident in their own abilities to reason causally/deductively in certain domains. I'm not sure which is overall more problematic, at the moment, in part because I'm not sure how people actually should be integrating different considerations in domains like AI forecasting.

Also agree here, but again I don't really care which one is overall more problematic because I think we have more precise concepts we can use and it's more helpful to use them instead of these big bags. 

There also seem to be biases that cut in both directions. I think the 'baseline bias' is pretty strongly toward causal/deductive reasoning, since it's more impressive-seeming, can suggest that you have something uniquely valuable to bring to the table (if you can draw on lots of specific knowledge or ideas that it's rare to possess), is probably typically more interesting and emotionally satisfying, and doesn't as strongly force you to confront or admit the limits of your predictive powers. The EA community has definitely introduced an (unusual?) bias in the opposite direction, by giving a lot of social credit to people who show certain signs of 'epistemic virtue.' I guess the pro-causal/deductive bias often feels more salient to me, but I don't really want to make any confident claim here that it actually is more powerful.

I think I agree with all this as well, noting that this causal/deductive reasoning definition of inside view isn't necessarily what other people mean by inside view, and also isn't necessarily what Tetlock meant. I encourage you to use the term "causal/deductive reasoning" instead of "inside view," as you did here, it was helpful (e.g. if you had instead used "inside view" I would not have agreed with the claim about baseline bias)

comment by Linch · 2021-06-17T17:59:56.724Z · EA(p) · GW(p)

I mostly use outside views to mean reference classes, but I agree that this term has expanded to mean more than is originally denoted. I'm not sure how big a problem this is in practice; I think  by default phrases in natural language expands to mean more than their technical beginnings (consider phrases like "modulo", "pop the stack," etc). My intuition is that zealously guarding against this expansion by specifying new broader words (rather than being precise in-context) seems quite doomed as an overall enterprise, though it might buy you a few years. 

A related point is that if we do go with "reference classes" as the preferred phrase, we should be cognizant that for most questions there's a number of different relevant reference classes [LW · GW], and saying that a particular reference class we've picked is the best/only reference class is quite a strong claim, and (as EliezerYudkowsky alludes to) quite susceptible to motivated reasoning. 

Replies from: kokotajlod
comment by kokotajlod · 2021-06-17T18:32:09.007Z · EA(p) · GW(p)

I agree it's hard to police how people use a word; thus, I figured it would be better to just taboo the word entirely. 

I totally agree that it's hard to use reference classes correctly, because of the reference class tennis problem. I figured it was outside the scope of this post to explain this, but I was thinking about making a follow-up... at any rate, I'm optimistic that if people actually use the words "reference class" instead of "outside view" this will remind them to notice how there are more than one reference class available, how it's important to argue that the one you are using is the best, etc.