Many (many!) charities are too small to measure their own impact

post by Aaron Gertler (aarongertler) · 2021-02-15T06:39:10.320Z · EA · GW · 3 comments

This is a link post for https://giving-evidence.com/2021/01/18/small/

Contents

  1. They have the wrong incentive.
  2. They lack the necessary skills in evaluation. 
  3. They often lack the funding to do evaluation research properly. 
  4. They’re too small. 
None
3 comments

Why I crossposted this: I found it to be an interesting perspective on a not-uncommon assumption within EA (that every charity should be conducting self-evaluation), written by someone with a lot of field experience in charity consulting/evaluation (who is, I think, mostly aligned with EA's views on the importance of effectiveness).

I don't necessarily endorse any of this, but reasons 2-4 all seem like good reasons to at least not pursue certain types of self-evaluation. Reason 1 seems like a different kind of problem, but still rings true and points to the need for funders to be strict about what sorts of research they care about, or otherwise aim at changing the incentives in place.


Most charities should not evaluate their own impact. Funders should stop asking them to evaluate themselves. For one thing, asking somebody to mark their own homework was never likely to be a good idea.

This article explains the four very good reasons that most charities should not evaluate themselves, and gives new data about how many of them are too small.

Most operational charities should not (be asked to) evaluate themselves because:

1. They have the wrong incentive.

Their incentive is (obviously!) to make themselves look as great as possible – evaluations are used to compete for funding – so their incentive is to rig the research to make it flattering and/or bury research that doesn’t flatter them. I say this having been a charity CEO myself and done both.

2. They lack the necessary skills in evaluation. 

Most operational charities are specialists in, say, supporting victims of domestic violence or delivering first aid training or distributing cash in refugee camps. These are completely different skills to doing causal research, and one would not expect expertise in these unrelated skills to be co-located.

3. They often lack the funding to do evaluation research properly. 

One major problem is that a good experimental evaluation may involve gathering data about a control group which does not get the programme or which gets a different programme, and few operational charities have access to such a set of people.

A good guide is a mantra from evidence-based medicine, that research should “ask an important question and answer it reliably”. If there not enough money (or sample size) to answer the question reliably, don’t try to answer it at all.

4. They’re too small. 

Specifically, their programmes are too small: they do not have enough sample size for evaluations of just their programmes to produce statistically meaningful results, i.e., to distinguish the effects of the programme from that of other factors or random chance, i.e., results of self-evaluations by operational charities are quite likely to be just wrong. For example, when the Institute of Fiscal Studies did a rigorous study of the effects of breakfast clubs, it needed 106 schools in the sample: that is way more than most operational charities providing breakfast clubs have.

Giving Evidence has done some proper analysis to corroborate this view that many operational charities’ programmes are too small to reliably evaluate. The UK Ministry of Justice runs a ‘Data Lab’, which any organisation running a programme to reduce re-offending can ask to evaluate that programme: the Justice Data Lab uses the MoJ’s data to compare the re-offending behaviour of participants in the programme with that of a similar (‘propensity score-matched’) set of non-participants. It’s glorious because, for one thing, it shows loads of charities’ programmes all evaluated in the same way, on the same metric (12-month reoffending rate) by the same independent researchers. It is the sole such dataset of which we are aware, anywhere in the world.

In the most recent data (all its analyses up to October 2020), the JDL had analysed 104 programmes run by charities (‘the voluntary and community sector’), of which fully 62 prove too small to produce conclusive results. 60% of the charity-run programmes were too small to evaluate reliably.

The analyses also show the case for reliable intervention and not just guessing which charity-run programmes work or assuming that they all do:

a. Some charity-run programmes create harm: they increase reoffending, and

b. Charity-run programmes vary massively in how effective they are:

Hence most charities should not be PRODUCERS of research. But they should be USERS of rigorous, independent research – about where the problems are, why, what works to solve them, and who is doing what about them.

3 comments

Comments sorted by top scores.

comment by Jamie_Harris · 2021-02-18T16:21:09.826Z · EA(p) · GW(p)

Some other, partly overlapping reasons:

  • In rushing to measure their impact to meet requests for impact evaluation, they might just focus on the wrong things. E.g. proxy metrics that sound like good impact evaluation but aren't very good indicators really. If measuring in their "own" timelines, rather than when asked, charities might have more scope and time to do it carefully.
  • I think there's something to be said for just trying to do something really well and only subsequently stopping to take stock of what you have or haven't achieved. (We've taken pretty much the opposite approach at Animal Advocacy Careers and I periodically wonder whether that was a mistake)
  • if you're doing something that seems pretty clearly likely to be cost-effective, given the available evidence, spending resources on further evaluation might just be a waste.
  • Similarly, unless conducting and disseminating research is an important part of your theory of change, the research focus might be be a distraction if it doesn't seem likely to affect your decision-making.
comment by alana-UoEstudy2021 · 2021-02-15T15:27:35.174Z · EA(p) · GW(p)

Thanks for this- a really interesting read! 

I was wondering where you would suggest charities should get this 'independent research' from? One of the EA virtual events I attended briefly mentioned 'expert' research. Would you agree? If so I am curious what  you mean by 'experts'?

Again, thanks for the post!

comment by Aaron Gertler (aarongertler) · 2021-02-17T10:17:34.796Z · EA(p) · GW(p)

There are a lot of ways that scientific research can be useful to charities. For example, a vaccination charity might design its program based on the design of programs that were shown to be successful at increasing vaccination rates in randomized controlled trials. 

This is different from testing one's own program, which might be impractical for the reasons outlined in this post, but it's a "second-best" option that should at least make you more likely to run an impactful program.

I think EA tends to use a pretty standard definition of "experts" -- people who know a lot about a subject, and have some degree of skill in conducting research that leads them to learn more true information about the world.