Potential downsides of using explicit probabilities

post by MichaelA · 2020-01-20T02:14:22.150Z · score: 25 (14 votes) · EA · GW · 17 comments


  Time and effort costs
  Excluding some of one’s knowledge
    Intuitive expertise
    Less measurable or legible things
  Causing overconfidence; underestimating the value of information
  The optimizer’s curse
  Reputational issues
      Sentience scores might reduce our credibility with potential collaborators

Epistemic status: This is basically meant as a collection and analysis of existing ideas, not as anything brand new. I’m not an expert on the topics covered. I’d appreciate feedback or comments in relation to any mistakes, unclear phrasings, etc. (and just in general!).

In various communities (including the EA and rationalist communities), it’s common to make use of explicit, numerical probabilities.[1]

At the extreme end, this may involve explicit attempts to calculate what would maximise expected utility, and then do that thing.

It could also involve attempts to create explicit, probabilistic models (EPMs), perhaps involving expected value calculations, and use this as an input into decision-making. (So the EPM may not necessarily be the only input, or necessarily be intended to include everything that’s important.) Examples of this include the cost-effectiveness analyses created by GiveWell or ALLFED.

Most simply, a person may generate just a single explicit probability (EP; e.g., “I have a 20% chance of getting this job”), and then use that as an input into decision-making.

(For simplicity, in this post I’ll often say “using EPs” as a catchall term for using a single EP, using EPMs, or maximising expected utility. I’ll also often say “alternative approaches” to refer to more qualitative or intuitive methods, ranging from simply “trusting your gut” to extensive deliberations where you don’t explicitly quantify probabilities.)

Many arguments for the value of using EPs have been covered elsewhere (and won’t be covered here). I find many of these quite compelling, and believe that one of the major things the EA and rationalist communities get right is relying on EPs more than the general public does.

But use of EPs is also often criticised. And it’s certainly the case that I (and I suspect most EAs and rationalists) don’t use EPs for most everyday decisions, at least, and I think that that’s probably often a good thing.

So the first aim of this post is to explore some potential downsides of using EPs (compared to alternative approaches) that people have proposed. I’ll focus on not the case of ideal rational agents, but of actual humans, in practice, with our biases and limited computational abilities. Specifically, I discuss the following (non-exhaustive) list of potential downsides:

  1. Time and effort costs
  2. Excluding some of one’s knowledge (which could’ve been leveraged by alternative approaches)
  3. Causing overconfidence
  4. Underestimating the value of information
  5. The optimizer’s curse
  6. Anchoring (to the EP, or to the EPM’s output)
  7. Causing reputational issues

As I’ll discuss, these downsides will not always apply when using EPs, and many will also sometimes apply when using alternative approaches. And when these downsides do apply to uses of EPs, they may often be outweighed by the benefits of using EPs. So this post is not meant to definitively determine the sorts of situations one should vs shouldn’t use EPs in. But I do think these downsides are often at least important factors to consider.

Sometimes people go further, and link discussion of these potential downsides of using EPs as humans, in practice, to claims like that there’s an absolute, binary distinction between “risk” and “(Knightian) uncertainty”, or between situations in which we “have” vs “don’t have” probabilities, or something like that. Here’s one statement of this sort of view (from Dominic Roser, who disagrees with it):

According to [one] view, certainty has two opposites: risk and uncertainty. In the case of risk, we lack certainty but we have probabilities. In the case of uncertainty, we do not even have probabilities. [...] According to a popular view, then, how we ought to make policy decisions depends crucially on whether we have probabilities.

I’ve previously argued [LW · GW] that there’s no absolute, binary risk-uncertainty distinction, and that believing that there is such a distinction can lead to using bad decision-making procedures. I’ve also argued [LW · GW] that we can always assign probabilities (or at least use something like an uninformative prior). But I didn’t address the idea that it might be valuable for humans to act as if there’s a binary risk-uncertainty distinction, or as if it’s impossible to assign probabilities in some cases.

Thus, the second aim of this post is to explore whether that’s a good idea. I argue that it is not (with the one potential, partial exception of reputational issues).

So each section will:

Time and effort costs

The most obvious downside of using EPs (or at least EPMs) is that it may often take a lot of time and energy to use them well enough to get better results than one would get from alternative approaches (e.g., trusting your gut).

For example, GiveWell’s researchers collectively spend “hundreds of hours [...] per year on cost-effectiveness analysis”. I’d argue that that’s worthwhile when the stakes are as high as they are in GiveWell’s case (i.e., determining which charities receive tens of millions of dollars each year).

But what if I’m just deciding what headphones to buy? Is it worth it for me to spend a few hours constructing a detailed model of all the factors relevant to the question, and then finding (or estimating) values for each of those factors, for each of a broad range of different headphones?

Here, the stakes involved are quite low, and it’s also fairly unlikely that I’ll use the EPM again. (In contrast, GiveWell continues to use its models, with modifications, year after year, making the initial investment in constructing the models more worthwhile.) It seems the expected value of me bothering to do this EPM is lower than the expected value of me just reading a few reviews and then “going with my gut” (and thus saving time for other things).[2][3]

Does this mean that we must be dealing with “Knightian uncertainty” in this case, or must be utterly unable to “know” the relevant probabilities?

Not at all. In fact, I’d argue that the headphones example is actually one where, if I did spend a few hours doing research, I could come up with probabilities that are much more "trustworthy" [LW(p) · GW(p)] than many of the probabilities involved in situations like GiveWell’s (when it is useful for people to construct EPMs). So I think the issue of time and effort costs may be quite separate even from the question of how trustworthy our probabilities are, let alone the idea that there might be a binary risk-uncertainty distinction.

Excluding some of one’s knowledge

Let’s say that I’m an experienced firefighter in a burning building (untrue on both counts, but go with me on this). I want to know the odds that the floor I’m on will collapse. I could (quite arbitrarily) construct the following EPM:

Probability of collapse = How hot the building is (on a scale from 0-1) * How non-sturdily the building seems to have been built (on a scale from 0-1)

I could also (quite arbitrarily) decide on values of 0.6 and 0.5, respectively. My model would then tell me that the probability of the floor collapsing is 0.3.

It seems like that could be done quite quickly, and while doing other things. So it seems that the time and effort costs involved in using this EPM are probably very similar to the costs involved in using an alternative approach (e.g., trusting my gut). Does this mean constructing an EPM here is a wise choice?

Intuitive expertise

There’s empirical evidence that the answer is “No” for examples like this; i.e., examples which meet the “conditions for intuitive expertise”:

In such situations, our intuitions may quite reliably predict later events. Furthermore, we may not consciously, explicitly know the factors that informed these intuitions. As Kahneman & Klein write: “Skilled judges are often unaware of the cues that guide them”.

Klein describes the true story that inspired my example, in which a team of firefighters were dealing with what they thought was a typical kitchen fire, when the lieutenant:

became tremendously uneasy — so uneasy that he ordered his entire crew to vacate the building. Just as they were leaving, the living room floor collapsed. If they had stood there another minute, they would have dropped into the fire below. Unbeknownst to the firefighters, the house had a basement and that’s where the fire was burning, right under the living room.

I had a chance to interview the lieutenant about this incident, and asked him why he gave the order to evacuate. The only reason he could think of was that he had extrasensory perception. He firmly believed he had ESP.

During the interview I asked him what he was aware of. He mentioned that it was very hot in the living room, much hotter than he expected given that he thought the fire was in the kitchen next door. I pressed him further and he recalled that, not only was it hotter than he expected, it was also quieter than he expected. Fires are usually noisy but this fire wasn’t. By the end of the interview he understood why it was so quiet: because the fire was in the basement, and the floor was muffling the sounds.

It seems that the lieutenant wasn’t consciously aware of the importance of the quietness of the fire. As such, if he’d constructed and relied on an EPM, he wouldn’t have included the quietness as a factor, and thus may not have pulled his crew out in time. But through a great deal of expertise, with reliable feedback from the environment, he was intuitively aware of the importance of that factor.

So when the conditions for intuitive expertise are met, methods other than EPM may reliably outperform EPM, even ignoring costs in time and energy, because they allow us to more fully leverage our knowledge.[4]

But, again, does this mean that we must be dealing with “Knightian uncertainty” in this case, or must be utterly unable to “know” the relevant probabilities? Again, not at all. In fact, the conditions for intuitive expertise would actually be met precisely when we could have relatively trustworthy probabilities - there have to be fairly stable patterns in the environment, and opportunities to learn these patterns. The issue is simply that, in practice, we often haven’t learned these probabilities on a conscious, explicit level, even though we theoretically could have.

On the flipside, using EPMs may often beat alternative methods when the conditions for intuitive expertise aren’t met, and this may be most likely when we face especially _un_trustworthy probabilities. Relatedly, it’s worth noting that just the fact that, in a particular situation, we feel more confident in our intuitive assessment than in an EPM doesn’t necessarily mean our intuitive assessment is actually more reliable in that situation. As Kahneman & Klein note:

True experts, it is said, know when they don’t know. However, nonexperts (whether or not they think they are) certainly do not know when they don’t know. Subjective confidence is therefore an unreliable indication of the validity of intuitive judgments and decisions.

[...] Although true skill cannot develop in irregular or unpredictable environments, individuals will some times make judgments and decisions that are successful by chance. These “lucky” individuals will be susceptible to an illusion of skill and to overconfidence (Arkes, 2001). The financial industry is a rich source of examples.

Less measurable or legible things

An additional argument is that using EPs may make it harder to leverage knowledge about things that are less measurable and/or legible (with legibility seeming to approximately [EA · GW] mean susceptibility to being predicted, understood, and monitored).

For example, Alice is decided whether to donate to the Centre for Pesticide Suicide Prevention (CPSP), which focuses on advocating for policy changes, or to GiveDirectly, which simply gives unconditional cash transfers to people living in extreme poverty. She may decide CPSP’s impacts are “too hard to measure”, and “just can’t be estimated quantitatively”. Thus, if she uses EPs, she might neglect to even seriously consider CPSP. But if she considered in-depth, qualitative arguments, she might decide that CPSP seems a better bet.

I think it’s very plausible that this is a sort of situation where, in order to leverage as much of one’s knowledge as possible, it’s wise to use qualitative approaches. But we can still use EPs in these cases - we can just give our best guesses about the value of variables we can’t measure, and about what variables to consider and how to structure our model. (And in fact, GiveWell did construct a quantitative cost-effectiveness model for CPSP.) And it’s not obvious to me which of these approaches would typically make it easier for us to leverage our knowledge in these less measurable and legible cases.

Finally, what implications might this issue have for the idea of a binary risk-uncertainty distinction? I disagree with Alice’s view that CPSP’s impacts “just can’t be estimated quantitatively”. The reality is simply that CPSP’s impacts are very hard to estimate, and that the probabilities we’d arrive at if we estimated them would be quite untrustworthy. In contrast, our estimates of GiveDirectly’s impact would be relatively more trustworthy. That’s all we need to say to make sense of the idea that this is (perhaps) a situation in which we should use approaches other than EPs; I don’t think we need to even act as if there’s a binary risk-uncertainty distinction.

Causing overconfidence; underestimating the value of information

Two common critiques of using EPs are that:

These critiques are closely related, so I’ll discuss both in this section.

An example of the first of those critiques comes from Chris Smith. Smith discusses one particular method for dealing with “poorly understood uncertainty”, and then writes:

Calling [that method] “making a Bayesian adjustment” suggests that we have something like a general, mathematical method for critical thinking. We don’t.

Similarly, taking our hunches about the plausibility of scenarios we have a very limited understanding of and treating those hunches like well-grounded probabilities can lead us to believe we have a well-understood method for making good decisions related to those scenarios. We don’t.

Many people have unwarranted confidence in approaches that appear math-heavy or scientific. In my experience, effective altruists are not immune to that bias.

An example of (I think) both of those critiques together comes from Daniela Waldhorn [EA · GW]:

The existing gaps in this field of research entail that we face significant constraints when assessing the probability that an invertebrate taxon is conscious. In my opinion, the current state of knowledge is not mature enough for any informative numerical estimation of consciousness among invertebrates. Furthermore, there is a risk that such estimates lead to an oversimplification of the problem and an underestimation of the need for further investigation.

I’m somewhat sympathetic to these arguments. But I think it’s very unclear whether arguments about overconfidence and VoI should push us away from rather than towards using EPs; it really seems like it could go either way. This is for two reasons.

Firstly, we can clearly represent low confidence in our EPs, by:

Secondly, if we do use EPs (and appropriately wide confidence intervals), this unlocks ways of moving beyond just the general idea that further information would be valuable; it lets us also:

In fact, there’s an entire body of work on VoI analysis, and a necessary prerequisite for conducting such an analysis is having an EPM.

It does seem plausible to me that, even if we do all of those things, we or others will primarily focus on our (perhaps implicit) point estimate, and overestimate its trustworthiness, just due to human psychology (or EA/rationalist psychology). But that doesn’t seem obvious. Nor does it seem obvious that the overconfidence that may result from using EPs will tend to be greater than the overconfidence that may result from other approaches (like relying on all-things-considered intuitions; recall Kahneman & Klein’s comments from earlier).

And in any case, this whole discussion was easy to have just in terms of very untrustworthy or low-confidence probabilities - there was no need to invoke the idea of a binary risk-uncertainty distinction, or the idea that there are some matters about which we can simply can’t possibly estimate any probabilities.[6]

The optimizer’s curse

Smith gives a “rough sketch” of the optimizer’s curse:

Optimizers start by calculating the expected value of different activities.

Estimates of expected value involve uncertainty.

Sometimes expected value is overestimated, sometimes expected value is underestimated.

Optimizers aim to engage in activities with the highest expected values.

Result: Optimizers tend to select activities with overestimated expected value.

[...] The optimizer’s curse occurs even in scenarios where estimates of expected value are unbiased (roughly, where any given estimate is as likely to be too optimistic as it is to be too pessimistic).

[...] As uncertainty increases, the degree to which the cost-effectiveness of the optimal-looking program is overstated grows wildly.

The implications of, and potential solutions to, the optimizer’s curse seem to be complicated and debatable. For more detail, see this post [LW · GW], Smith’s post, comments on Smith's post [EA · GW], and discussion of the [LW · GW] related problem of [LW · GW] Goodhart's law [LW · GW].

As best I can tell:

I've deliberately kept the above points brief (again, see the sources linked to for further explanations and justifications). This is because those claims are only relevant to the question of when to use EPs if the optimizer’s curse is a larger problem when using EPs than when using alternative approaches, and I don't think it necessarily is. For example, Smith notes:

The optimizer’s curse can show up even in situations where effective altruists’ prioritization decisions don’t involve formal models or explicit estimates of expected value. Someone informally assessing philanthropic opportunities in a linear manner might have a thought like:

“Thing X seems like an awfully big issue. Funding Group A would probably cost only a little bit of money and have a small chance leading to a solution for Thing X. Accordingly, I feel decent about the expected cost-effectiveness of funding Group A.

Let me compare that to how I feel about some other funding opportunities…”

Although the thinking is informal, there’s uncertainty, potential for bias, and an optimization-like process. (quote marks added because I couldn’t double-indent)

This makes a lot of sense to me. But Smith also adds:

Informal thinking isn’t always this linear. If the informal thinking considers an opportunity from multiple perspectives, draws on intuitions, etc., the risk of [overestimating the cost-effectiveness of the optimal-looking program] may be reduced.

I’m less sure what he means by this. I’m guessing [EA(p) · GW(p)] he simply means that using multiple, different perspectives means that the various errors and uncertainties are likely to “cancel out” to some extent, reducing the effective uncertainty, and thus reducing the impacts the amount by which one is likely to overestimate the value of the best-seeming thing. But if so, it seems that this partial protection could also be achieve by using multiple, different EPMs, making different assumptions in them, getting multiple people to estimate values for inputs, etc.

So ultimately, I think that the problem Smith raises is significant, but I’m quite unsure if it’s a downside of using EPs in particular.

I also don’t think that the optimizer’s curse suggests it’d be valuable to act as if there’s a binary risk-uncertainty distinction. It is clear that the curse gets worse as uncertainty increases (i.e., when one’s probabilities are less trustworthy), but it does so in a gradual, continuous manner. So it seems to me that, again, we’re best off speaking just in terms of more and less trustworthy probabilities, and not imagining that totally different behaviours are warranted if we’re facing “risk” rather than “Knightian uncertainty”.[7]


Anchoring or focalism is a cognitive bias where an individual depends too heavily on an initial piece of information offered (considered to be the "anchor") when making decisions. (Wikipedia)

One critique of using EPs, or at least making them public, seems to effectively be that people may become anchored on the EPs given. For example, Jason Schukraft [EA · GW] writes:

I contend that publishing specific estimates of invertebrate sentience (e.g., assigning each taxon a ‘sentience score’) would be, at this stage of investigation, at best unhelpful and probably actively counterproductive. [...]

Of course, having studied the topic for some time now, I expect that my estimates would be better than the estimates of the average member of the EA community. If that’s true, then it’s tempting to conclude that making my estimates public would improve the community’s overall position on this topic. However, I think there are at least three reasons to be skeptical of this view.

[One reason is that] It’s difficult to present explicit estimates of invertebrate sentience in a way in which those estimates don’t steal the show. It’s hard to imagine a third party summarizing our work (either to herself or to others) without mentioning lines like ‘Rethink Priorities think there is an X% chance ants have the capacity for valenced experience.’ There are very few serious estimates of invertebrate sentience available, so members of the community might really fasten onto ours.

I think that this critique has substantial merit, but that this is most clear in relation to making EPs public, rather than just in relation to using EPs oneself. As Schukraft writes:

To be clear: I don’t believe it’s a bad idea to think about probabilities of sentience. In fact, anyone directly working on invertebrate sentience ought to be periodically recording their own estimates for various groups of animals so that they can see how their credences change over time.[8]

I expect that one can somewhat mitigate this issue by providing various strong caveats when EPs are quite untrustworthy. And (at least somewhat) similar issues can also occur when not using EPs (e.g., if just saying something is “very likely”, or giving a general impression of disapproval of what a certain organisation is doing). But I think caveats wouldn’t remove the issue entirely.[9] And I’d guess that the anchoring would be worse if using EPs than if not.

Finally, anchoring does seem a more important downside when one’s probabilities are less trustworthy (because then the odds people will be anchored to a bad estimate are higher). But again, it seems easy, and best, to think about this in terms of more and less trustworthy probabilities, rather than in terms of a binary risk-uncertainty distinction.

Reputational issues

Finally, in the same post, Schukraft [EA · GW] notes another issue with using EPs:

Sentience scores might reduce our credibility with potential collaborators

[....] science, especially peer-reviewed science, is an inherently conservative enterprise. Scientists simply don’t publish things like probabilities of sentience. For a long time, even the topic of nonhuman sentience was taboo because it was seen as unverifiable. Without a clear, empirically-validated methodology behind them, such estimates would probably not make it into a reputable journal. Intuitions, even intuitions conditioned by careful reflection, are rarely admitted in the court of scientific opinion.

Rethink Priorities is a new, non-academic organization, and it is part of a movement that is—frankly—sort of weird. To collaborate with scientists, we first need to convince them that we are a legitimate research outfit. I don’t want to make that task more challenging by publishing estimates that introduce the perception that our research isn’t rigorous. And I don’t think that perception would be entirely unwarranted. Whenever I read a post and encounter an overly precise prediction for a complex event (e.g., ‘there is a 16% chance Latin America will dominate the plant-based seafood market by 2025’), I come away with the impression that the author doesn’t sufficiently appreciate the complexity of the forces at play. There may be no single subject more complicated than consciousness. I don’t want to reduce that complexity to a number.

Some of my thoughts on this potential downside mirror those I made with regards to anchoring:

But unlike all the other downsides I’ve covered, this one does seem like it might warrant acting (in public) as if there is a binary risk-uncertainty distinction. This is because the people one wants to maintain a good reputation with may think as though there is such a distinction. But it should be noted that this only requires publicly acting as if there’s such a distinction; you don’t have to think as if there’s such a distinction.

One last thing to note is that it also seems possible that similar reputational issues could result from not using EPs. For example, if one relies on qualitative or intuitive approaches, one’s thinking may be seen as “hand-wavey”, “soft”, and/or imprecise by people from a more “hard science” background.


I’d be interested in people’s thoughts on all of the above; one motivation for writing this post was to see if someone could poke holes in, and thus improve, my thinking.

  1. I should note that this post basically takes as a starting assumption the Bayesian interpretation of probability, “in which, instead of frequency or propensity of some phenomenon, probability is interpreted as reasonable expectation representing a state of knowledge or as quantification of a personal belief” (Wikipedia). But I think at least a decent amount of what I say would hold for other interpretations of probability (e.g., frequentism). ↩︎

  2. Of course, I could quickly and easily make an extremely simplistic EPM, or use just a single EP. But then it’s unclear if that’d do better than similarly quick and easy alternative approaches, for the reasons discussed in the following sections. ↩︎

  3. This seems analogous to the idea that utilitarianism itself may often recommend against the action of trying to explicitly calculate what action utilitarianism would recommend (given that that’s likely to slow one down massively). Amanda Askell has written a post on that topic [EA · GW], in which she says: “As many utilitarians have pointed out, the act utilitarian claim that you should ‘act such that you maximize the aggregate wellbeing’ is best thought of as a criterion of rightness and not as a decision procedure. In fact, trying to use this criterion as a decision procedure will often fail to maximize the aggregate wellbeing. In such cases, utilitarianism will actually say that agents are forbidden to use the utilitarian criterion when they make decisions.” ↩︎

  4. Along similar lines, Holden Karnofsky (of GiveWell, at the time) writes: “It’s my view that my brain instinctively processes huge amounts of information, coming from many different reference classes, and arrives at a prior; if I attempt to formalize my prior, counting only what I can name and justify, I can worsen the accuracy a lot relative to going with my gut.” ↩︎

  5. This is different to the idea that people may tend to overestimate EPs, or overestimate cost-effectiveness, or things like that. That claim is also often made, and is probably worth discussing, but I leave it out of this post. Here I’m focusing instead on the separate possibility of people being overconfident about the accuracy of whatever estimate they’ve arrived at, whether it’s high or low. ↩︎

  6. Here’s Nate Soares [LW · GW] making similar points: “In other words, even if my current credence is 50% I can still expect that in 35 years (after encountering a black swan or two) my credence will be very different. This has the effect of making me act uncertain about my current credence, allowing me to say "my credence for this is 50%" without much confidence. So long as I can't predict the direction of the update, this is consistent Bayesian reasoning.

    As a bounded Bayesian, I have all the behaviors recommended by those advocating Knightian uncertainty. I put high value on increasing my hypothesis space, and I often expect that a hypothesis will come out of left field and throw off my predictions. I'm happy to increase my error bars, and I often expect my credences to vary wildly over time. But I do all of this within a Bayesian framework, with no need for exotic "immeasurable" uncertainty.” ↩︎

  7. Smith’s own views on this point seem a bit confusing. At one point, he writes: “we don’t need to assume a strict dichotomy separates quantifiable risks from unquantifiable risks. Instead, real-world uncertainty falls on something like a spectrum.” But at various other points, he writes things like “The idea that all uncertainty must be explainable in terms of probability is a wrong-way reduction [i.e., a bad idea; see his post for details]”, and “I don’t think ignorance must cash out as a probability distribution”. ↩︎

  8. While I think this is a good point, I also think it may sometimes be worth considering the risk that one might anchor oneself to one’s own estimate. This could therefore be a downside of even just generating an EP oneself, not just of making EPs public. ↩︎

  9. I briefly discuss empirical findings that are somewhat relevant to these points here [EA(p) · GW(p)]. ↩︎


Comments sorted by top scores.

comment by Ozzie Gooen (oagr) · 2020-01-20T14:22:07.835Z · score: 9 (5 votes) · EA(p) · GW(p)

Kudos for this write-up, and for your many other posts (both here and on LessWrong, it seems) on uncertainty.

Overall, I'm very much in the "Probabilities are pretty great and should eventually be used for most things" camp. That said, I think the "Scout vs. Soldier" mindset is useful, so to speak; investigating both sides is pretty useful. I'd definitely assign some probability to being wrong here.

My impression is that we're probably in broad agreement here.

Some quick points that come to mind:

  1. The debate on "are explicit probabilities useful" is very similar to those of "are metrics useful", "are cost-benefit analyses useful", and "is consequentialist reasoning useful." I expect that there's broad correlation between those who agree/disagree with these.

  2. In cases where probabilities are expected to be harmfully, hopefully probabilities could be used to tell us as such. Like, we could predict that explicit and public use would be harmful.

  3. I'd definitely agree that it's very possible to use probabilities poorly. I think a lot of Holden's criticisms here would fall into this camp. Neural Nets for a while were honestly quite poor, but thankfully that didn't lead to scientists abandoning those. I think probabilities are a lot better now, but we could learn to get much better than them later. I'm not sure how we can get much better without them.

  4. The optimizer's curse can be adjusted for with reasonable use of Bayes. Bayesian hierarchical models should deal with it quite well. There's been some discussion of this around "Goodhart" on LessWrong.

comment by MichaelA · 2020-01-20T23:25:17.750Z · score: 4 (3 votes) · EA(p) · GW(p)

I think I agree with pretty much all of that. And I'd say my position is close to yours, though slightly different; I might phrase mine like: "My understanding is that probabilities should always be used by ideal, rational agents with unlimited computational abilities etc. (Though that's still slightly 'received wisdom' for me.) And I also think that most people, and perhaps even most EAs and rationalists, should use probabilities more often. But I doubt they should actually be used for most tiny decisions, by actual humans. And I think they've sometimes been used with far too little attention to their uncertainty - but I also think that this really isn't an intrinsic issues with probabilities, and that intuitions are obviously also very often used overconfidently."

(Though this post wasn't trying to argue for that view, but rather to explore the potential downsides relatively neutrally and just see what that revealed.)

I'm not sure I know what you mean by the following two statements: "Probabilities [...] should eventually be used for most things" and "I think probabilities are a lot better now, but we could learn to get much better than them later." Could you expand on those points? (E.g., would you say we should eventually use probabilities even the 100th time we make the same decision as before about what to put in our sandwiches?)

Other points:

1. Yes, I share that view. But I think it's also interesting to note it's not a perfect correlation. E.g. Roser writes:

while I believe that we always have probabilities, this paper refrains from taking a stance on how we ought to decide on the basis of these probabilities. The question whether we have probabilities is completely separate from the question how we ought to make use of them. Here, I only ask the former question. The two issues are often not kept separate: the camp that is in favour of relying on probabilities is often associated with processing them in line with expected utility theory. I myself am in favour of relying on probabilities but I reject expected utility theory (and related stances such as cost-benefit analysis), at least if it comes as a formal way of spelling out a maximizing consequentialist moral stance which does not properly incorporate rights.

2. Yes, I agree. Possibly I should've emphasised that more. I allude to a similar point with "It seems the expected value of me bothering to do this EPM is lower than the expected value of me just reading a few reviews and then “going with my gut” (and thus saving time for other things)", and the accompanying footnote about utilitarianism.

4. I think I've seen what you'e referring to, e.g. in lukeprog's post [LW · GW] on the optimizer's curse. And I think the basic idea makes sense to me (though not to the extent I could actually act on it right away if you handed me some data). But Chris Smith quotes the proposed solution, and then writes:

For entities with lots of past data on both the (a) expected values of activities and (b) precisely measured, realized values of the same activities, this may be an excellent solution.
In most scenarios where effective altruists encounter the optimizer’s curse, this solution is unworkable. The necessary data doesn’t exist.[7] The impact of most philanthropic programs has not been rigorously measured. Most funding decisions are not made on the basis of explicit expected value estimates. Many causes effective altruists are interested in are novel: there have never been opportunities to collect the necessary data.
The alternatives I’ve heard effective altruists propose involve attempts to approximate data-driven Bayesian adjustments as well as possible given the lack of data. I believe these alternatives either don’t generally work in practice or aren’t worth calling Bayesian.

That seems to me like at least a reason to expect the proposed solution to not work very well. My guess would be that we can still use our best guesses to make adjustments (e.g., just try to quantify our vague sense that a randomly chosen charity wouldn't be very cost-effective), but I don't think I understand the topic well enough to speak on that, really.

(And in any case, I'm not sure it's directly relevant to the question of whether we should use EPs anyway, because, as covered in this post, it seems like the curse could affect alternative approaches too, and like the curse doesn't mean we should abandon our best guess, just that we should be more uncertain about it.)

comment by Ozzie Gooen (oagr) · 2020-01-21T11:49:02.815Z · score: 3 (2 votes) · EA(p) · GW(p)

Hm... Some of this would take a lot more writing than would make sense in a blog post.

On overconfidence in probabilities vs. intuitions: I think I mostly agree with you. One cool thing about probabilities is that they can be much more straightforwardly verified/falsified and measured using metrics for calibration. If we had much larger systems, I believe we could do a great deal of work to better ensure calibration with defined probabilities.

"should eventually be used for most things"

I'm not saying that humans should come up with unique probabilities for most things on most days. One example I'd consider "used for most things" is a case where an AI uses probabilities to tell humans which actions seem the best, and humans go with what the AI states. Similar could be said for "a trusted committee" that uses probabilities as an in-between.

"we could learn to get much better than them later"

I think there are strong claims that topics like Bayes, Causality, Rationality even, are still relatively poorly understood, and may be advanced a lot in the next 30-100 years. As we get better with them, I predict we would get better at formal modeling.

I reject expected utility theory (and related stances such as cost-benefit analysis), at least if it comes as a formal way of spelling out a maximizing consequentialist moral stance which does not properly incorporate rights.

This is a complicated topic. It think a lot of Utilitarians/Consequentialists wouldn't deem many interpretations of rights as metaphysical or terminally-valuable things. Another way to look at it would be to attempt to map the rights to a utility function. Utility functions require very, very few conditions. I'm personally a bit cynical of values that can't be mapped to utility functions, if even in a highly-uncertain way.

But Chris Smith quotes the proposed solution, and then writes... It's clear Chris Smith has thought about some of this topic a fair bit, but my impression is that I disagree with him. It's quite possible that much of the disagreement is semantic; where he says 'this solution is unworkable' I may say, 'the solution results in a very wide amount of uncertainty'. I think it's clear to everyone (the main researchers anyway) that there's little data about many of these topics, and that Bayesian or any kind of statistical manipulations can't fundamentally convert "very little data" into "a great deal of confidence".

Kudos for identifing that post. The main solution I was referring to was the one described in the second comment [LW(p) · GW(p)]:

In statistics the solution you describe is called Hierarchical or Multilevel Modeling. You assume that you data is drawn from a set of distributions which have their parameters drawn from another distribution. This automatically shrinks your estimates of the distributions towards the mean. I think it's a pretty useful trick to know and I think it would be good to do a writeup but I think you might need to have a decent grasp of bayesian statistics first.

The optimizer's curse arguably is basically within the class of Goodhart-like problems https://www.lesswrong.com/posts/5bd75cc58225bf06703754b2/the-three-levels-of-goodhart-s-curse [LW · GW]

I'm not saying that these are easy to solve, but rather, there is a mathematical strategy to generally fix them in ways that would make sense intuitively. There's no better approach than to try to approximate the mathematical approach, or go with an approach that in-expectation does a decent job at approximating the mathematical approach.

comment by MichaelA · 2020-01-21T23:23:55.072Z · score: 3 (2 votes) · EA(p) · GW(p)

That all seems to make sense to me. Thanks for the interesting reply!

comment by Ozzie Gooen (oagr) · 2020-01-21T22:55:30.388Z · score: 3 (2 votes) · EA(p) · GW(p)

And I think that, even when one is extremely uncertain, the optimizer’s curse doesn’t mean you should change your preference ordering (just that you should be far less certain about it, as you’re probably greatlyoverestimating the value of best-seeming option).

Ok, I'll flag this too. I'm sure there are statistical situations where an extreme outcome implies that an adjustment for correlation goodharting would make it seem worse than other options; i.e. change order.

That said, I'd guess this isn't likely to happen that often for realistic cases, especially when there aren't highly extreme outliers (which, to be fair, we do have with EA).

I think one mistake someone could make here would be to say that because the ordering may be preserved, the problem wouldn't be "fixed" at all. But, the uncertainties and relationships themselves are often useful information outside of ordering. So a natural conclusion in the case of intense noise (which leads to the optimizer's curse) would be to accept a large amount of uncertainty, and maybe use that knowledge to be more conservative; for instance, trying to get more data before going all-in on anything in particular.

comment by MichaelA · 2020-01-23T08:31:44.814Z · score: 3 (2 votes) · EA(p) · GW(p)

Yeah, I think all of that's right. I ended up coincidentally finding my way to a bunch of stuff about Goodhart on LW that I think is what you were referring to in another comment, and I've realised my explanation of the curse moved too fast and left out details. I think I was implicitly imagining that we'd already adjusted for what we know about the uncertainties of the estimates of the different options - but that wasn't made clear.

I've now removed the sentence you quote (as I think it was unnecessary there anyway), and changed my earlier claims to:

The implications of, and potential solutions to, the optimizer’s curse seem to be complicated and debatable. For more detail, see this post [LW · GW], Smith’s post, comments on Smith's post [EA · GW], and discussion of the [LW · GW] related problem of [LW · GW] Goodhart's law [LW · GW].
As best I can tell:
*The optimizer’s curse is likely to be a pervasive problem and is worth taking seriously.
*In many situations, the curse will just indicate that we're probably overestimating how much better (compared to the alternatives) the option we estimate is best is - it won't indicate that we should actually change what option we pick.
*But the curse can indicate that we should pick an option other than that which we estimate is best, if we have reason to believe that our estimate of the value of the best option is especially uncertain, and we don't model that information.
I've deliberately kept the above points brief (again, see the sources linked to for further explanations and justifications). This is because those claims are only relevant to the question of when to use EPs if the optimizer’s curse is a larger problem when using EPs than when using alternative approaches, and I don't think it necessarily is.

Now, that's not very clear, but I think it's more accurate, at least :D

comment by Ozzie Gooen (oagr) · 2020-01-23T09:35:16.953Z · score: 2 (1 votes) · EA(p) · GW(p)

I think that makes sense. Some of it is a matter of interpretation.

From one perspective, the optimizer's curse is a dramatic and challenging dilemma facing modern analysis. From another perspective, it's a rather obvious and simple artifact from poorly-done estimates.

I.E. they sometimes say that if mathamaticians realize something is possible, they consider the problem trivial. Here the optimizer's curse is considered a reasonably-well-understood phenomena, unlike some other estimation-theory questions currently being faced.

comment by MichaelA · 2020-07-20T05:52:12.242Z · score: 2 (1 votes) · EA(p) · GW(p)

(If people stumble upon this in future, I'd also recommend reading Greg Lewis' interesting Use resilience, instead of imprecision, to communicate uncertainty [EA · GW].)

comment by cole_haus · 2020-01-24T01:07:35.768Z · score: 2 (2 votes) · EA(p) · GW(p)

Some related things that come to mind:

  • Challenges to Bayesian Confirmation Theory outlines some conceptual potential issues arising from the use of explicit probabilities in a Bayesian framework.
  • Gerd Gigerenzer likes to claim that "fast and frugal" heuristics often just perform better than more formal, quantitative models. These claims can be linked to the bias-variance tradeoff and extreme priors.
  • The optimizer's curse can be generalized to the satisficer's curse. This generalization doesn't obviously seem to differentially affect explicit probabilities though.
comment by MichaelA · 2020-01-24T02:20:32.313Z · score: 1 (1 votes) · EA(p) · GW(p)

Thanks for these links. I know a little about the satisficer's curse, and share the view that "This generalization doesn't obviously seem to differentially affect explicit probabilities though." Hopefully I'll have time to look into the other two things you mention at some point.

(My kneejerk reaction to ""fast and frugal" heuristics often just perform better than more formal, quantitative models" is that if it's predictable that a heuristic would result in more accurate answers, even if we imagine we could have unlimited time for computations or whatever, then that fact, and ideally whatever causes it, can just be incorporated into the explicit model. But that's just a kneejerk reaction. And in any case, if he's just saying that in practice heuristics are often better, then I totally agree.)

comment by Ramiro · 2020-01-21T18:28:26.940Z · score: 1 (1 votes) · EA(p) · GW(p)
And I think that, even when one is extremely uncertain, the optimizer’s curse doesn’t mean you should change your preference ordering (just that you should be far less certain about it, as you’re probably greatlyoverestimating the value of best-seeming option).

I'm not very sure, but I imagine that the Optimizer's curse might result in a reason against maximizing expected utility (though I'd distinguish it from using explicit probability models in general) if we're dealing with a bounded budget - in which case, one might prefer a suboptimal option with low variance...?

(Plus, idk if this is helpful: in social contexts, a decision rule might incorporate the distribution of the cognitive burdens - I'm thinking about Prudence in Accounting, or maybe something like a limited precautionary principle. If you use an uninformative prior to assess a risk / liability / asset of a company, it might be tempted to hide information)

comment by MichaelA · 2020-01-23T08:51:51.569Z · score: 1 (1 votes) · EA(p) · GW(p)

I now believe the statement of mine you quote was incorrect, and I've updated the optimizer's curse section, primarily to remove the sentence you quoted (as I think it's unnecessary in any case) and to alter an earlier part where I made a very similar claim so that it now says:

As best I can tell:
*The optimizer’s curse is likely to be a pervasive problem and is worth taking seriously.
*In many situations, the curse will just indicate that we're probably overestimating how much better (compared to the alternatives) the option we estimate is best is - it won't indicate that we should actually change what option we pick.
*But the curse can indicate that we should pick an option other than that which we estimate is best, if we have reason to believe that our estimate of the value of the best option is especially uncertain, and we don't model that information.

(I think I already knew this but just previously didn't explain it properly, leaving the conditions I had in mind as assumed, even though they often won't hold in practice.)

But I think this updated version doesn't address the points you make. From "if we're dealing with a bounded budget - in which case, one might prefer a suboptimal option with low variance", it sounds to me like maybe what you're getting at is risk-aversion and/or diminishing returns to a particular thing?

For example, let's say I can choose either A, which gives me $1 thousand in expectation, or B, which gives me $1 million in expectation. So far, B obviously seems way better. But what if B is way higher uncertainty (or way higher risk, if one prefers that phrasing)? Then maybe I'd prefer A.

I'd personally consider this biased if it's pure risk-aversion, and the dollar values perfectly correspond to my "utility" from this. But in reality, each additional dollar is less valuable. For example, perhaps I'm broke, and by far the most important thing is that I get $1000 to get myself out of a real hole - a quite low chance of much higher payoffs isn't worth it, because I get far less than 1000 times as much value out of 1000 times as much money.

If that's what you were getting at, I think that's all valid, and I think the optimizer's curse does probably magnify those reasons to sometimes not go with what you estimate will give you, in expectation, the most of some thing you value. But I think really that doesn't depend on the optimizer's curse, and is more about uncertainty in general. Also, I think it's really important to distinguish "maximising expected utility" from "maximising expected amount of some particular thing I value". My understanding is that "risk-aversion" based on diminishing returns to dollars, for example, can 100% make sense within expected utility maximisation - it's only pure risk-aversion (in terms of utility itself) that can't.

(Let me know if I was totally misunderstanding you.)

comment by Ramiro · 2020-01-23T12:29:08.531Z · score: 1 (1 votes) · EA(p) · GW(p)

I am very satisfied with the new text. I think you understood me pretty well; the problem is, I was a little bit unclear and ambiguous.

I'm not sure if this impacts your argument: I think diminishing returns accounts pretty well for saturation (ie., gaining $1 is not as important as losing $1); but it's plausible to complement subjective expected utility theory with pure risk-aversion, like Lara Buchak does.

But what I actually had in mind is something like, in the extreme for unbounded utility, St. Petersburg paradox: if you're willing to constantly bet all your budget, you'll sure end up with $0 and bankrupt. In real life, I guess that if you were constantly updating your marginal utility per dollar, this wouldn't be a problem (so I agree with you - this is not a challenge to expected utility maximisation).

comment by MichaelA · 2020-01-24T00:16:35.199Z · score: 1 (1 votes) · EA(p) · GW(p)

Yeah, I've seen mentions of Buchak's work and one talk from her, but didn't really get it, and currently (with maybe medium confidence?) still think that, when talking about utility itself, and thus having accounted for diminishing returns and all that, one should be risk-neutral.

I hadn't heard of martingales, and have relatively limited knowledge of the St Petersburg paradox. It seems to me (low confidence) that:

  • things like the St Petersburg paradox and Pascal's mugging are plausible candidates for reasons to reject standard expected utility maximisation, at least in certain edge cases, and maybe also expected value reasoning
  • Recognising that there are diminishing returns to many (most?) things at least somewhat blunts the force of those weird cases
  • Things like accepting risk aversion or rounding infinitemal probabilities to 0 may solve the problems without us having to get rid of expected value reasoning or entirely get rid of expected utility maximisation (just augment it substantially)
  • There are some arguments for just accepting as rational what expected utility maximisation says in these edge cases - it's not totally clear that our aversion to the "naive probabilistic" answer here is valid; maybe that aversion just reflects scope neglect, or the fact that, in the St Petersburg case, there's the overlooked cost of it potentially taking months of continual play to earn substantial sums
  • I don't think these reveal problems with using EPs specifically. It seems like the same problems could occur if you talked in qualitative terms about probabilities (e.g., "at least possible", "fairly good odds"), and in either case the "fix" might look the same (e.g., rounding down either a quantitative or qualitative probability to 0 or to impossibility).
    • But it does seem that, in practice, people not using EPs are more likely to round down low probabilities to 0. This could be seen as good, for avoiding Pascal's mugging, and/or as bad, for a whole host of other reasons (e.g., ignoring many x-risks).

Maybe a fuller version of this post would include edge cases like that, but I know less about them, and I think they could create "issues" (arguably) even when one isn't using explicit probabilities anyway.

comment by Ramiro · 2020-01-24T15:55:23.540Z · score: 2 (2 votes) · EA(p) · GW(p)

I mostly agree with you. I subtracted the reference to martingales from my previous comment because: a) not my expertise, b) this discussion doesn’t need additional complexity.

I'm sorry for having raised issues about paradoxes (perhaps there should be a Godwin's Law about them); I don’t think we should mix edge cases like St. Petersburg (and problems with unbounded utility in general) with the optimizer’s curse – it’s already hard to analyze them separately.

when talking about utility itself, and thus having accounted for diminishing returns and all that, one should be risk-neutral.

Pace Buchak, I agree with that, but I wouldn't say it aloud without adding caveats: in the real world, our problems are often of dynamic choice (and so one may have to think about optimal stopping and strategies, information gathering, etc.), we don't observe utility-functions, we have limited cognitive resources, and we are evaluated and have to cooperate with others, etc. So I guess some "pure" risk-aversion might be a workable satisficing heuristics to [signal you] try to avoid the worst outcomes when you can't account for all that. But that's not talking about utility itself - and certainly not talking probability / uncertainty itself.

comment by MichaelA · 2020-01-24T23:50:52.240Z · score: 1 (1 votes) · EA(p) · GW(p)
I subtracted the reference to martingales from my previous comment because: a) not my expertise, b) this discussion doesn’t need additional complexity.
I'm sorry for having raised issues about paradoxes (perhaps there should be a Godwin's Law about them); I don’t think we should mix edge cases like St. Petersburg (and problems with unbounded utility in general) with the optimizer’s curse – it’s already hard to analyze them separately.

In line with the spirit of your comment, I believe, I think that it's useful to recognise that not all discussions related to pros and cons of probabilities or how to use them or that sort of thing can or should address all potential issues. And I think that it's good to recognise/acknowledge when a certain issue or edge case actually applies more broadly than just to the particular matter at hand (e.g., how St Petersburg is relevant even aside from the optimizer's curse). An example of roughly the sort of reasoning I mean with that second sentence, from Tarsney writing on moral uncertainty:

The third worry suggests a broader objection, that content-based normalization approach in general is vulnerable to fanaticism. Suppose we conclude that a pluralistic hybrid of Kantianism and contractarianism would give lexical priority to Kantianism, and on this basis conclude that an agent who has positive credence in Kantianism, contractarianism, and this pluralistic hybrid ought to give lexical priority to Kantianism as well. [...]
I am willing to bite the bullet on this objection, up to a point: Some value claims may simply be more intrinsically weighty than others, and in some cases absolutely so. In cases where the agent’s credence in the lexically prioritized value claim approaches zero, however, the situation begins to resemble Pascal’s Wager (Pascal, 1669), the St. Petersburg Lottery (Bernoulli, 1738), and similar cases of extreme probabilities and magnitudes that bedevil decision theory in the context of merely empirical uncertainty. It is reasonable to hope, then, that the correct decision-theoretic solution to these problems (e.g. a dismissal of “rationally negligible probabilities” (Smith, 2014, 2016) or general rational permission for non-neutral risk attitudes (Buchak, 2013)) will blunt the force of the fanaticism objection.

But I certainly don't think you need to apologise for raising those issues! They are relevant and very worthy of discussion - I just don't know if they're in the top 7 issues I'd discuss in this particular post, given its intended aims and my current knowledge base.

comment by Ramiro · 2020-01-25T14:03:41.982Z · score: 1 (1 votes) · EA(p) · GW(p)

Oh, I only apologised because, well, if we start discussing about catchy paradoxes, we'll soon lose the track of our original point.

But if you enjoy it, and since it is a relevant subject, I think people use 3 broad "strategies" to tackle St. Petersburg paradoxes and the like:

[epistemic status: low, but it kind makes sense]

a) "economist": "if you use a bounded version, or takes time into account, the paradox disappears: just apply a logarithmic function for diminishing returns..."

b) "philosopher": "unbounded utility is weird" or "beware, it's Pascal's Wager with objective probabilities!"

c) "statistician": "the problem is this probability distribution, you can't apply central limit / other theorem, or the indifference principle, or etc., and calculate its expectation"