List of ways in which cost-effectiveness estimates can be misleading

post by saulius · 2019-08-20T18:05:03.872Z · EA · GW · 27 comments


  How cost estimates can be misleading
  How effectiveness estimates can be misleading
  Complications of estimating the impact of donated money

In my cost-effectiveness estimate of corporate campaigns, I wrote a list [EA · GW] of all the ways in which my estimate could be misleading. I thought it could be useful to have a more broadly-applicable version of that list for cost-effectiveness estimates in general. It could maybe be used as a checklist to see if no important considerations were missed when cost-effectiveness estimates are made or interpreted.

The list below is probably very incomplete. If you know of more items that should be added, please comment. I tried to optimize the list for skimming.

How cost estimates can be misleading

How effectiveness estimates can be misleading


Complications of estimating the impact of donated money

I'm a research analyst at Rethink Priorities. The views expressed here are my own and do not necessarily reflect the views of Rethink Priorities.

Author: Saulius Šimčikas. Thanks to Ash Hadjon-Whitmey, Derek Foster, and Peter Hurford for reviewing drafts of this post. Also, thanks to Derek Foster for contributing to some parts of the text.


Byford, S., & Raftery, J. (1998). Perspectives in economic evaluation. Bmj, 316(7143), 1529-1530.

Cookson, R., Mirelman, A. J., Griffin, S., Asaria, M., Dawkins, B., Norheim, O. F., ... & Culyer, A. J. (2017). Using cost-effectiveness analysis to address health equity concerns. Value in Health, 20(2), 206-212.

Dolan, P., & Kahneman, D. (2008). Interpretations of utility and their implications for the valuation of health. The economic journal, 118(525), 215-234.

Farquhar, S., Cotton-Barratt, O. (2015). Breaking DALYs down into YLDs and YLLs for intervention comparison

GiveWell. (2017). Approaches to Moral Weights: How GiveWell Compares to Other Actors

Hurford, P. (2014). To inspire people to give, be public about your giving.

Hurford, P. (2016). Five Ways to Handle Flow-Through Effects

Hurford, P., Davis, M. A. (2018). What did we take away from our work on vaccines

Kamm, F. (2015). Cost effectiveness analysis and fairness. Journal of Practical Ethics, 3(1).

Karimi, M., Brazier, J., & Paisley, S. (2017). Are preferences over health states informed?. Health and quality of life outcomes, 15(1), 105.

Karnofsky, H. (2011). Leverage in charity

Karnofsky, H. (2015). Key Questions about Philanthropy, Part 1: What is the Role of a Funder?

Karnofsky, H. (2016). Why we can't take expected value estimates literally (even when they're unbiased)

Nord, E. (2005). Concerns for the worse off: fair innings versus severity. Social science & medicine, 60(2), 257-263.

Pyne, J. M., Fortney, J. C., Tripathi, S., Feeny, D., Ubel, P., & Brazier, J. (2009). How bad is depression? Preference score estimates from depressed patients and the general population. Health Services Research, 44(4), 1406-1423.

Sauber, J. (2008). Put your money where your heart is

Sethu, H. (2018). How ranking of advocacy strategies can mislead

Snowden, J. (2018). Revisiting leverage

Vivalt, E. (2019). How Much Can We Generalize from Impact Evaluations?

Wiblin, R. (2017). Most people report believing it's incredibly cheap to save lives in the developing world

Wilson, E. C. (2015). A practical guide to value of information analysis. Pharmacoeconomics, 33(2), 105-121.

Wise, J. (2013). Giving now vs. later: a summary.

  1. I’ve heard the claim that Nothing But Nets used to say that it costs $10 to provide a bednet because it’s an easy number to remember and think about, despite the fact that it costs less. According to GiveWell, on average the total cost to purchase, distribute, and follow up on the distribution of a bednet funded by Against Malaria Foundation is $4.53. ↩︎

  2. Another example of counterfactuals: suppose there is a very cost-effective stall that gives people vegan leaflets. Someone opens another identical stall right next to it. Half of the people who would have gone to the old stall now go to the new one. The new stall doesn’t attract any people who wouldn’t have been attracted anyway so it has zero impact. But if you estimate its effectiveness ignoring this circumstance, it can still be high ↩︎


Comments sorted by top scores.

comment by weeatquince · 2019-08-20T22:16:32.433Z · EA(p) · GW(p)


Similar to not costing others work, you can end up in situations where the same impact is counted multiple times across all the charities involved, giving an inflated picture of the total impact.

Eg. If Effective Altruism (EA) London runs an event and this leads to an individual signing the Giving What We Can (GWWC) pledge and donating more the charity, both EA London and GWWC and the individual may take 100% of the credit in their impact measurement.

Replies from: Benjamin_Todd, saulius
comment by Benjamin_Todd · 2019-08-22T21:36:33.857Z · EA(p) · GW(p)

Just a quick note that 'double counting' can be fine, since the counterfactual impact of different groups acting in concert doesn't necessarily sum to 100%.

See more discussion here: [EA · GW]

Also note that you can also undercount for similar reasons. For instance, if you have impact X, but another org would have had done X otherwise, you might count your impact as zero. But that ignores that by doing X, you free up the other org to do something else high impact.

I think I'd prefer to frame this issue as something more like "how you should assign credit as a donor in order to have the best incentives for the community isn't the same as how you'd calculate the counterfactual impact of different groups in a cost-effectiveness estimate".

comment by saulius · 2019-08-20T22:20:49.132Z · EA(p) · GW(p)

Good point, thanks! :)

comment by Derek · 2019-08-26T16:53:50.770Z · EA(p) · GW(p)

Most cost-effectiveness analyses by EA orgs (and other charities) use a ratio of costs to effects, or effects to costs, as the main - or only - outcome metric, e.g. dollars per life saved, or lives affected per dollar. This is a good start, but it can be misleading as it is not usually the most decision-relevant factor.

If the purpose is to inform a decision of whether to carry out a project, it is generally better to present:

(a) The probability that the intervention is cost-effective at a range of thresholds (e.g. there is a 30% chance that it will avert a death for less than my willingness-to-pay of $2,000, 50% at $4,000, 70% at $10,000...). In health economics, this is shown using a cost-effectiveness acceptability curve (CEAC).

(b) The probability that the most cost-effective option has the highest net benefit (a term that is roughly equivalent to 'net present value'), which can be shown with a cost-effectiveness acceptability frontier (CEAF). It's a bit hard to get one's head around, but sometimes the most cost-effective intervention has lower expected value than an alternative, because the distribution of benefits is skewed.

(c) A value of information analysis to assess how much value would be generated by a study to reduce uncertainty. As we found in our evaluation [EA · GW] of Donational, sometimes interventions that have a poor cost-effectiveness ratio and a low probability of being cost-effective nevertheless warrant further research; and the same can be true of interventions that look very strong on those metrics.

See Briggs et al. (2012) for a general overview of uncertainty analysis in health economics, Barton et al. (2008) for CEACs, CEAFs and expected value of perfect information, and Wilson (2014) for a practical guide to VOI analyses (including the value of imperfect information gathered from studies).

Of course, these require probabilistic analyses that tend to be more time-consuming and perhaps less transparent than deterministic ones, so simpler models that give a basic cost-effectiveness ratio may sometimes be warranted. But it should always be borne in mind that they will often mislead users as to the best course of action.

Replies from: saulius
comment by saulius · 2019-08-27T20:25:52.186Z · EA(p) · GW(p)

I haven't read the articles you linked, but I'm wondering:

(a) If the outcome of a CEA is a probability distribution like the one below, we can see that there is a 5% probability that it costs less than $1,038 to avert a death, 30.1% probability that it costs less than $2,272, etc. Isn’t that the same?



sometimes the most cost-effective intervention has lower expected value than an alternative, because the distribution of benefits is skewed.

Is that because of the effect that I call “Optimizer’s curse” in my article?

Please don’t feel like you have to answer if you don’t know the answers off the top of your head or it’s complex to explain. I don’t really need these answers for anything, I’m just curious. And if I did need the answers, I could find them in the links :)

comment by Davidmanheim · 2019-11-25T08:48:08.669Z · EA(p) · GW(p)

Undervaluing Diversification: Optimizing for highest Benefit-Cost ratios will systematically undervalue diversification, especially when the analyses are performed individually, instead of as part of a portfolio-building process.

Example 1: Investing in 100 projects to distribute bed-nets correlates the variance of outcomes in ways that might be sub-optimal, even if they are the single best project type. The consequent fragility of the optimized system has various issues, such as increased difficulty embracing new intervention types, or the possibility that the single "best" intervention is actually found to be sub-optimal (or harmful,) destroying the reputation of those who optimized for it exclusively, etc.

Example 2: The least expensive way to mitigate many problems is to concentrate risks or harms. For example, on cost-benefit grounds, the best site for a factory is the industrial areas, not the residential areas. This means that the risks of fires, cross-contamination, and knock-on-effects of any accidents increase because they are concentrated in small areas. Spreading out the factories somewhat will reduce this risk, but the risk externality is a function of the collective decision to pick the lowest cost areas, not any one cost-benefit analysis.

Additional concern: Optimizing for low social costs as measured by economic methods will involve pushing costs on the poorest people, because they typically have the lowest value-to-avoid-harm.

comment by Derek · 2019-08-26T15:58:25.230Z · EA(p) · GW(p)

Any deterministic analysis (using point estimates, rather than probability distributions, as inputs and outputs) is unlikely to be accurate because of interactions between parameters. This also applies to deterministic sensitivity analyses: by only changing a limited subset of the parameters at a time (usually just one) they tend to underestimate the uncertainty in the model. See Claxton (2008) for an explanation, especially section 3.

This is one reason I don't take GiveWell's estimates too seriously (though their choice of outcome measure is probably a more serious problem).

Replies from: Halffull
comment by Halffull · 2019-08-27T19:32:33.182Z · EA(p) · GW(p)

I tend to think this is also true of any analysis which includes only one way interactions or one way causal mechanisms, and ignores feedback loops and complex systems analysis. This is true even if each of parameters is estimaed using probability distributions.

comment by Peter_Hurford · 2019-08-22T19:26:04.058Z · EA(p) · GW(p)

I think this would make a great reference checklist for anyone developing CEEs to go through as they write up their CEE and indirect effects sections.

comment by ishaan · 2019-08-21T23:32:44.550Z · EA(p) · GW(p)

brainstorming / regurgitating some random additional ideas -

Goodhart's law - a charity may from the outset design itself or self-modify itself around Effective Altruist metrics, thereby pandering to the biases of the metrics and succeeding in them despite being less Good than a charity which scored well on the same metrics despite no prior knowledge of them. (Think of the difference between someone who has aced a standardized test due to intentional practice and "teaching to the test" vs. someone who aced it with no prior exposure to standardized tests - the latter person may possess more of the quality that the test is designed to measure). This is related to "influencing charities" issue, but focusing on the potential for defeating of the metric itself, rather than direct effects of the influence.

Counterfactuals of donations (other than the matching thing)- a highly cost effective charity which can only pull from an effective altruist donor pool might have less impact than a slightly less cost effective charity which successfully redirects donations from people who wouldn't have donated to a cost effective charity (this is more of an issue for the person who controls talent, direction, and other factors, not the person who controls money).

Model inconsistency - Two very different interventions will naturally be evaluated by two very different models, and some models may inherently be harsher or more lenient on the intervention than others. This will be true even if all the models involved are as good and certain as they can realistically be.

Regression to the mean - The expected value of standout candidates will generally regress to the mean of the pool from which they are drawn, since at least some of the factors which caused them to rise to the top will be temporary (including legitimate factors that have nothing to do with mistaken evaluations)

Replies from: Davidmanheim
comment by Davidmanheim · 2019-11-25T08:51:36.526Z · EA(p) · GW(p)

Good points. (Also, I believe am personally required to upvote posts that reference Goodhart's law.)

But I think both regression to the mean and Goodhart's law are covered, if perhaps too briefly, under the heading "Estimates based on past data might not be indicative of the cost-effectiveness in the future."

comment by Larks · 2019-09-12T03:08:18.657Z · EA(p) · GW(p)

Thanks for writing this.

Could you give an example of this one, please?

Conflating expected value estimates with effectiveness estimates. There is a difference between a 50% chance to save 10 children, and a 100% chance to save 5 children. Estimates sometimes don’t make a clear distinction.

I understand these are two different things, but am wondering exactly what problems you are seeing this equivocation causing. Is this a risk-aversion issue?

Replies from: saulius
comment by saulius · 2019-09-12T22:03:50.033Z · EA(p) · GW(p)

Yes, the distinction is important for people who want to make sure they had at least some impact (I’ve met some people like that). Also, after reading GiveWell’s CEA, you might be tempted to say “I donated $7000 to AMF so I saved two lives.” Interpreting their CEA this way would be misleading, even if it’s harmless. Maybe you saved 0, maybe you saved 4 (or maybe it’s more complicated because AMF, GiveWell, and whoever invented bednets should get some credit for saving those lives as well, etc.).

Another related problem is that probabilities in CEAs are usually subjective Bayesian probabilities. It’s important to recognize that such probabilities are not always on equal footing. E.g., I remember how people used to say things like “I think this charity has at least 0.000000001% chance of saving the world. If I multiply by how many people I expect to ever live… Oh, so it turns out that it’s way more cost-effective than AMF!” I think that this sort of reasoning is important but it often ignores the fact that the 0.000000001% probability is not nearly as robust as probabilities GiveWell uses. Hence you are more likely to fall for the Optimizer’s Curse. In other words, choosing between AMF and the speculative charity here feels choosing between eating at a restaurant with one 5 star Yelp review and eating at a restaurant with 200 Yelp reviews averaging 4.75 star (wording stolen from Karnofsky (2016). I'd choose the latter restaurant.

Also, an example where the original point came up in practice can be seen in this comment [EA(p) · GW(p)].

comment by abrahamrowe · 2019-08-21T17:51:57.258Z · EA(p) · GW(p)

Another issue is if multiple charities are working on the same issue, and cooperating, there might be times when a particular charity actively chooses to take less cost-effective actions in order to improve movement wide cost-effectiveness. This happens frequently with the animal welfare corporate campaigns. For example:

Charity A has 100 good volunteers in City A, where Company A is headquartered. To run a campaign against them would cost Charity A $1000, and Company A uses 10M chickens a year. Or, they could run a campaign against Company B in a different city where they have fewer volunteers for $1500.

Charity B has 5 good volunteers in City A, but thinks they could secure a commitment from Company B in City B, where they have more volunteers, for $1000. Company B uses 1M chickens per year. Or, by spending more money, they could secure a commitment from Company A for $1500.

Charities A and B are coordinating, and agree that Companies A and B committing will put pressure on a major target (Company C), and want to figure out how to effectively campaign.

They consider three strategies (note - this isn't how the cost-effectiveness would work for commitments since they impact chickens for longer than a year, etc, but for simplicity's sake):

Strategy 1: They both campaign against both targets, at half the cost it would be for them to campaign on their own, and a charity evaluators views the victories as split evenly between them.

Charity A cost-effectiveness: (5M + 0.5M Chickens / $500 + $750) = 4,400 chickens / dollar

Charity B is also 4,400 chickens / dollar.

$2500 total spent across all charities

Strategy 2: Charity A targets Company A, and Charity B targets Company B

Charity A: 10,000 chickens / dollar

Charity B: 1,000 chickens / dollar

$2000 total spent across all charities

Strategy 3: Charity A targets Company B, Charity B targets Company A

Charity A: 667 chickens / dollar

Charity B: 6696 chickens / dollar

$3,000 total spent across all charities

These charities know that a charity evaluator is going to be looking at them, and trying to make a recommendation between the two based on cost-effectiveness. Clearly, the charities should choose Strategy 2, because the least money will be spent overall (and both charities will spend less for the same outcome). But if the charity evaluator is fairly influential, Charity B might push hard for less ideal Strategies 1 or 3, because those make its cost-effectiveness look much better. Strategy 2 is clearly the right choice for Charity B to make, but if they do, an evaluation of their cost-effectiveness will look much worse.

I guess a simple way of putting this is - if multiple charities are working on the same issue, and have different strengths relevant at different times, it seems likely that often they will make decisions that might look bad for their own cost-effectiveness ratings, but were the best thing to do / right decision to make.

Also, on the matching funds note - I personally think it would be better to assume matching funds are truly match rather than not. I've fundraised for maybe 5 nonprofits, and out of probably 20+ matching campaigns in that period, maybe 2 were not truly matches. Additionally, often nonprofits will ask major donors to match funds as a way to encourage the major donor to give more (e.g. "you could give $20k like you planned, or you could help us run our 60k year end fundraiser by matching 30k" type of thing). So I'd guess that for most matching campaigns, the fact that it is a matching campaign means there will be some multiplier on your donation, even if it is small. Maybe it is still misleading then? But overall a practice that makes sense for nonprofits to do.

comment by Davidmanheim · 2019-11-25T08:40:42.747Z · EA(p) · GW(p)

Re: Bias towards measurable results

A closely related issue is justification-bias, where expectations that the cost-benefit analysis be justified leads t0 exclusion of disputed values. One example of this is the US Army Corps of Engineers, which produces Cost-Benefit analyses that are then given to congress for funding. Because some values (ecological diversity, human enjoyment, etc.) are both hard to quantify, and the subject of debate between political groups, including them leaves the analysis open to far more debate. The pressure to exclude them leads to their implicit minimization.

comment by Peter_Hurford · 2019-08-22T19:26:28.218Z · EA(p) · GW(p)

Do you have any thoughts on how we should change our current approach, if at all, to using and interpreting CEEs in light of these issues?

Replies from: saulius
comment by saulius · 2019-08-23T10:32:38.351Z · EA(p) · GW(p)

Not really. I just think that we should be careful when using CEEs. Hopefully, this post can help with that. I think it contains little new info for people who have been working with CEEs for a while. I imagine that these are some of the reasons why GiveWell and ACE give CEEs only limited weight in recommending charities.

Maybe I’d like some EAs to take CEEs less literally, understand that they might be misleading in some way, and perhaps analyze the details before citing them. I think that CEEs should start conversations, not end them. I also feel that early on some non-robust CEEs were overemphasized when doing EA outreach, but I’m unsure if that’s still a problem nowadays.

comment by saulius · 2019-08-20T18:14:16.411Z · EA(p) · GW(p)

I first published this post on August 7th. However, after about 10 hours, I moved the post to drafts because I decided to make some changes and additions. Now I made those changes and re-published it. I apologize if the temporal disappearance of the article lead to any confusion or inconvenience.

Replies from: Derek
comment by Derek · 2019-08-26T15:12:59.050Z · EA(p) · GW(p)

This was my fault, sorry. I was travelling and ill so I was slow giving feedback on the draft. I belatedly sent Saulius some comments without realising it had just been published, so he took it down it in order to incorporate some of my suggestions.

Replies from: saulius, Derek
comment by saulius · 2019-08-27T18:11:05.778Z · EA(p) · GW(p)

No need to apologize Derek, I should've given you a deadline or at least tell you that I'm about to publish it. Besides, I don't think anyone shared a link to the article in those 10 hours so no harm done. Thank you very much for all your suggestions and comments.

comment by Derek · 2019-08-26T16:57:35.799Z · EA(p) · GW(p)

I wish I'd spent more time reviewing this before publication as I failed to mention some key points. I'll add some of them as comments.

comment by Aaron Gertler (aarongertler) · 2020-05-22T03:33:31.479Z · EA(p) · GW(p)

This post was awarded an EA Forum Prize; see the prize announcement [EA · GW] for more details.

My notes on what I liked about the post, from the announcement:

There’s not much I can say about this post, other than: “Read it and learn”. It’s just a smorgasbord of specific, well-cited examples of ways in which one of the fundamental activities of effective altruism can go awry.

I will note that I appreciate examples of ways in which cost-effectiveness estimates could underestimate the true impact of an action. Posts on this topic often focus only on overestimation, which sometimes makes the whole enterprise of doing good seem faintly underwhelming (should we assume that every estimate we hear is too high? Probably not).

comment by bfinn · 2019-09-02T11:59:42.156Z · EA(p) · GW(p)

Good article. Various things you mention are examples of bad metrics. Another common kind is metrics involving thresholds, e.g. the number of people below a poverty line. Since they treat all people below, or above, the line as equal to each other, when this is far from the case. (Living on $1/day is far harder than $1.90/day.) This often results in organisations wasting vast amounts of money/effort moving people from just below the line to just above, with little actual improvement, and perhaps ignoring others who could have been helped much more even if they couldn't be moved across the line.

comment by MichaelStJules · 2020-04-08T00:08:34.470Z · EA(p) · GW(p)

How about: Not being consistent in whether indirect effects like opportunity costs are counted in impacts or total costs.

For example, say if you donate to a charity, and they hire someone who would have otherwise earned to give. Should we treat those lost donations as additional costs (possibly weighted by relative cost-effectiveness with your donations) or as a negative impact?

Doing cost-benefit analysis instead of cost-effectiveness analysis would put everything in the same terms and make sure this doesn't happen, but then we'd have to agree on how to convert to or from $.

Have we been generally only treating direct donations towards costs and everything else towards impacts?

Replies from: saulius
comment by saulius · 2020-04-08T08:54:53.036Z · EA(p) · GW(p)

Personally, I don't remember any cost-effectiveness estimate that accounted for things like money lost due to hiring earning-to-givers in any way.

Replies from: MichaelStJules
comment by MichaelStJules · 2020-04-08T16:27:41.166Z · EA(p) · GW(p)

Charity Entrepreneurship has included opportunity costs for cofounders in charity cost-effectiveness analyses towards the charities' impacts, e.g.:

comment by lucy.ea8 · 2019-08-21T05:14:59.701Z · EA(p) · GW(p)

Fairness and health equity. Cost-effectiveness estimates typically treat all health gains as equal. However, many think that priority should be given to those with severe health conditions and in disadvantaged communities, even if it leads to less overall decline in suffering or illness (Nord, 2005, Cookson et al. (2017), Kamm (2015)).

One other example is rural vs urban, it might be more cost-effective to solve a problem (say school attendance) in cities but costlier in rural settings. Just focusing on urban setting is wrong in this context. It seems discriminatory.