Health and happiness research topics—Part 2: The HALY+: Improving preference-based health metrics 2021-01-03T15:08:47.581Z
Health and happiness research topics—Part 3: The sHALY: Developing subjective wellbeing-based health metrics 2020-12-30T15:27:43.198Z
Health and happiness research topics—Part 1: Background on QALYs and DALYs 2020-12-09T00:38:16.512Z
Market-shaping approaches to accelerate COVID-19 response: a role for option-based guarantees? 2020-04-28T10:10:30.266Z
Rethink Grants: an evaluation of Donational’s Corporate Ambassador Program 2019-07-23T23:53:20.274Z


Comment by Derek on Health and happiness research topics—Part 2: The HALY+: Improving preference-based health metrics · 2021-01-02T17:51:52.179Z · EA · GW


Comment by Derek on Health and happiness research topics—Part 1: Background on QALYs and DALYs · 2021-01-02T14:02:06.672Z · EA · GW

Hi Sam,

Thanks for the comments.

1. Have you done much stakeholder engagement? No. I discuss this a little bit in this section of Part 2, but I basically just suggest that people look into this and come up with a strategy before spending a huge amount of time on the research. I do know of academics who would may be able to advise on this, e.g. people who have developed previous metrics in consultation with NICE etc, but they’re busy and I suspect they wouldn’t want to invest a lot of time into efforts outside academia.

I think they’d reject the assumption that they are “not improving these metrics” and would point to considerable quantities of research in this area. The main issue, I think, is that they want a different kind of metric that what I’m proposing, e.g. they think it’s important that they are based on public preferences and are focused on health rather than wellbeing. A lot of resources are going into what I see (perhaps unfairly) as “tinkering around the edges,” e.g. testing variations of the time tradeoff/DCE and different versions of the EQ-5D, rather than addressing the fundamental problems.

As I say in Part 3 with respect to the sHALY (SWB-based HALY):

In my view, the strongest reason not to do this project is the apparent lack of interest among key stakeholders. Clinicians, patients, and major HALY “consumers” such as NICE and IHME seem strongly opposed to a pure SWB measure, even if focused on dimensions of health, and to the use of patient-reported values more broadly. As discussed in previous posts, this is due to a combination of normative concerns, such as the belief that those who pay for healthcare have the right to determine its distribution or that disability has disvalue beyond its effect on wellbeing, and doubts about the practicality of SWB measures in these domains.

So this project may only be worth considering if the sHALY would be useful for non-governmental purposes (e.g., within effective altruism), or in “supplementary” analyses alongside more standard methods (e.g., to highlight how QALYs neglect mental health). Either that, or changing the minds of large numbers of influential stakeholders will have to be a major part of the project—which may not be entirely unrealistic, given the increasing prominence of wellbeing in the public sector. We should also consider the possibility that projects such as this, which offer a viable alternative to the status quo, would themselves help to shift opinion.

That said, there is increasing increasing interest in hybrid health/wellbeing measures like the E-QALY, and scope for incremental improvement of current HALYs (see Part 2), and in the use of wellbeing for cross-sector prioritisation. In at least the latter case, you are likely to know more than me about how to effect policy change within governments.

2. Problem 4 - neglect of spillover affects – probably cannot be solved by changing the metric.  I discuss spillovers a little in Part 2 and plan to have a separate post on it in Part 6 (but it might be a while before that’s out, and it’s likely to focus on raising questions rather than providing solutions). I’m still unsure what to do about them and would like to see more research on this. I agree changing the metric alone won’t solve the issue, but it may help—knowing the extent to which the metric captures spillovers seems like an important starting point.

3. Who would you recommend to fund if I want to see more work like this? It probably depends what your aims are. If it’s to influence NICE, IHME, etc, it probably has to go via academia or those institutions. If you want to develop a metric for use in EA, funding individual EAs or EA orgs may work—but even then, it’s probably wise to work closely with relevant academics to avoid reinventing the wheel. So I guess if you have a lot of money to throw at this, funding academics or PhD students may be a good bet; there is already some funding available (I’m applying for PhD scholarships in this area at the moment), but it may be hard to get funding for ideas that depart radically from existing approaches. I list some relevant institutions and individuals in Part 2.

4. How is the E-QALY project going? It got very delayed due to COVID-19. I’m not sure what the new timeline is.

Comment by Derek on Health and happiness research topics—Part 3: The sHALY: Developing subjective wellbeing-based health metrics · 2020-12-30T21:41:35.320Z · EA · GW

Interesting, thanks!

Comment by Derek on Health and happiness research topics—Part 2: The HALY+: Improving preference-based health metrics · 2020-12-26T12:33:18.351Z · EA · GW

Thanks Bob! I will probably do this after publishing the next post.

Comment by Derek on Health and happiness research topics—Part 1: Background on QALYs and DALYs · 2020-12-16T18:11:15.152Z · EA · GW

I've made a few edits to address some of these issues, e.g.:

Clearly, there are many possible “wellbeing approaches” to economic evaluation and population health summary, defined both by the unit of value (hedonic states, preferences, objective lists, SWB) and by how they aggregate those units when calculating total value. Indeed, welfarism can be understood as a specific form of desire theory combined with a maximising principle (i.e., simple additive aggregation); and extra-welfarism, in some forms, is just an objective list theory plus equity (i.e., non-additive aggregation).

However, it seems that most advocates for the use of wellbeing in healthcare reject the narrow welfarist conception of utility, while retaining fairly standard, utility-maximising CEA methods—perhaps with some post-hoc adjustments to address particularly pressing distributional issues. So it seems reasonable to consider it a distinct (albeit heterogenous) perspective.

For the purpose of exposition, I will assume that the objective is to maximise total SWB (remaining agnostic between affect, evaluations, or some combination). This is not because I am confident it’s the right goal; in fact, I think healthcare decision-making should probably, at least in public institutions, give some weight to other conceptions of wellbeing, and perhaps to distributional concerns such as fairness. One reason to do so is normative uncertainty—we can’t be sure that the quasi-utilitarianism implied by that approach is correct—but it’s also a pragmatic response to the diversity of opinions among stakeholders and the challenges of obtaining good SWB measurements, as discussed in later posts.

However, I am fairly confident that SWB-maximization—or indeed any sensible wellbeing-focused strategy—would be an improvement over current practice, so it seems like a reasonable foundation on which to build. Moreover, most of these criticisms should hold considerable force from a welfarist, extra-welfarist, or simply “common sense” perspective. One certainly does not have to be a die-hard utilitarian to appreciate that reform is needed.

Changed the first two problem headings to avoid ambiguity and, in the first case, to focus on the result of the problem rather than the cause, which helps distinguish it from 5.

Comment by Derek on Health and happiness research topics—Part 1: Background on QALYs and DALYs · 2020-12-09T18:41:51.193Z · EA · GW

Hi Michael. Thanks for the feedback.

A few general points to begin with:

  1. I think it’s generally fine to use terminology any way you like as long as you’re clear about what you mean.
  2. In this piece I was summarising debates in health economics, and my framing reflects that literature.
  3. The main objective of these posts is to highlight particular issues that may deserve further attention from researchers, and sometimes that has to come at the expense of conceptual rigour (or at least I couldn’t think of a way to avoid that tradeoff). Like you, my natural inclination is to put everything in mutually exclusive and collectively exhaustive categories, but that doesn’t always result in the most action-relevant information being front and centre.

To address your specific points:

I try to make it very clear what I mean by “welfarism” and its alternatives:

The QALY originally emerged from welfare economics, grounded in expected utility theory (EUT), which defined welfare in terms of the satisfaction of individual preferences. QALYs were intended to reflect, at least approximately, the preferences of a rational individual decision-maker (as described by the von Neumann-Morgenstern [vNM] axioms) concerning their own health, and could therefore properly be called utilities.

Others have argued that QALYs should not represent utility in this sense. These “non-welfarists” or “extra-welfarists” typically believe things like equity, capability, or health itself are of intrinsic value (Brouwer et al., 2008; Coast, Smith, & Lorgelly, 2008; Birch & Donaldson, 2003; Buchanan & Wordsworth, 2015). If such considerations are included in the QALY, the (welfarist) utility of patients may not change proportionally with the size of QALY gains.

Most criticism of HALYs has come from three broad camps: welfare economics (which aims to maximise the satisfaction of individual preferences), extra-welfarism (which has other objectives), and wellbeing (often but not always from a classical utilitarian perspective).

In a nutshell, welfarists complain that QALYs, and CEAs based on them, do not reflect the preferences of rational, self-interested utility-maximizers.

Extra-welfarists, on the other hand, generally think the QALY (and CEA more broadly) is currently too welfarist. Though extra-welfarism is ill-defined and encompasses a broad range of views, the uniting belief is that there is inherent value in things other than the satisfaction of individuals’ preferences (Brouwer et al., 2008).

For the welfarist, there are broader efficiency-related issues with using cost-per-HALY CEAs for resource allocation […]  Therefore, counting everyone’s health the same does not maximise utility in the welfarist sense, even within the health sector.

So it should be clear that welfarism, as the term is used in modern (health) economics, offers a very specific theory of value (satisfaction of rational, self-regarding preferences that adhere to the axioms of expected utility theory) that is much more narrow than most desire theories. That said, I agree welfarism, extra-welfarism, and wellbeing-oriented ideas are not entirely distinct categories, and note overlaps between them:

Hedonism: … This is associated with the classical utilitarianism of Jeremy Bentham and John Stuart Mill, classical economics (mid-18th to late 19th century)…

Desire theories: Wellbeing consists in the satisfaction of preferences or desires. This is linked with neoclassical (welfare) economics, which began defining utility/welfare in terms of preferences around 1900 (largely because they were easier to measure than hedonic states), preference utilitarianism, …

Objective list theories: Wellbeing consists in the attainment of goods that do not consist in merely pleasurable experience nor in desire-satisfaction (though those can be on the list). … These have influenced some conceptions of psychological wellbeing,[46] and many extra-welfarist ideas. The capabilities approach also falls under this heading…

I mention distributional issues in the context of extra-welfarism:

These “non-welfarists” or “extra-welfarists” typically believe things like equity, capability, or health itself are of intrinsic value (Brouwer et al., 2008; Coast, Smith, & Lorgelly, 2008; Birch & Donaldson, 2003; Buchanan & Wordsworth, 2015). If such considerations are included in the QALY, the (welfarist) utility of patients may not change proportionally with the size of QALY gains.

Descriptively, it seems the extra-welfarists are winning. Although QALYs, and CEA as a whole, do not generally include overt consideration of distributional factors, they do depart from traditional welfare economics in a number of ways ...

This “QALY egalitarianism” is often challenged by welfarists on the grounds that WTP varies among individuals, but many extra-welfarists reject it for other reasons. For example, some have argued that more value should be attached to health gained by the young—those who have not yet had their “fair innings”—than by the elderly (Williams, 1997); by those in a worse initial state of health, or for larger individual health gains[43] (e.g., Nord, 2005); by those who were not responsible for their illness (e.g., Dworkin, 1981a, 1981b); by those at the end of life, as currently implemented by NICE; or by people of low socioeconomic status.[44]

They are addressed further in Part 2 when I discussed how HALYs should be aggregated.

I do think I could perhaps have been clearer about the distinction between HALYs and economic evaluation (the latter is typically HALY-maximising, but doesn’t have to be), and analogously between the unit of value (e.g. wellbeing, health) and moral theory (utilitarianism, egalitarianism, etc). I may edit the post later if I have time.

What you call problem 2 I'd reframe as expectations =/= reality.

“Preferences =/= value” was intended as shorthand for something like “the preferences on which current HALY weights are based do not accurately reflect the value of the states to people experiencing them”. Or as I put it elsewhere: “They are based on ill-informed judgements of the general public”. It wasn’t a philosophical comment on desire theories. Still, I can see how it might be misleading (plus it doesn’t strictly apply to DALYs, which arguably aren’t preference-based), so I may change it to your suggestion...though "expectations" doesn't really fit DALYs either, so I'd welcome alternative ideas.

I agree problem 3 (suffering/happiness) is about inadequate scaling and doesn’t presuppose hedonism, but I don’t think I imply otherwise. I decided to include it as a separate problem, even though it’s applicable to more than one type of scale/theory, because it’s an issue that is very neglected—in health economics and elsewhere. As noted above, the aim of this series is to draw attention to issues that I think more people should be working on, not make a conceptually/philosophically rigorous analysis.

That’s also why I didn’t have distributional issues as a separate “problem”. I note at the the start of the list that “The criticisms assume the objective is to maximize aggregate SWB” (while also noting that they “should also hold some force from a welfarist, extra-welfarist, or simply 'common sense' perspective”) and from that standpoint the current default (in most HALY-based analyses/guidelines) of HALY maximisation is not a “problem,” so long as they better reflect SWB. That said, as noted above, I do mention distributional issues earlier in the post and in Part 2, in case someone does want to work on those.

Problem 4 is not that HALYs don’t include spillovers; it’s that “They are difficult to interpret, capturing some but not all spillover effects.” (When I say “Neglect of spillover effects,” I mean that the issue of spillovers is problematically neglected in the literature, not that HALYs don’t measure them at all.) This should be clear from the text:

there is some evidence that people valuing health states take into account other factors, especially impact on relatives … On the other hand, it seems reasonable to assume health state values do not fully reflect the consequences for the rest of society—something that would be impossible for most respondents to predict, even if they were wholly altruistic.

I agree this is likely to be an issue with other metrics too (Part 6 is all about this, and it’s mentioned in Part 2), and I suspect it will mostly have to be dealt with at the aggregation stage, but it’s not the case that the content of the metrics is irrelevant. For example, the questionnaires (and therefore the descriptive system) could include items like “To what extent do you feel you’re a burden on others?” (a very common concern expressed in qualitative studies); and/or the valuation exercise could ask people to take into account the impact of their (e.g.) health condition on others (or alternatively to consider only their own health/wellbeing). If this makes a difference to the values produced, it would make HALYs/WELBYs easier to interpret, which would also inform broader evaluation methodology, like whether to administer health/wellbeing measures to relatives separately and add them to the total.

Problem 5 is not merely a restatement Problem 1, though of course they’re closely connected. Problem 1 focuses on why HALYs aren’t that good at prioritising within healthcare (i.e. achieving technical efficiency, from a fixed budget). Problem 5 is that are useless at cross-sector prioritisation (i.e. allocative efficiency). The cause is similar (health focus), and I think I combined them in an early draft; but as with states worse than dead, I wanted to have 5 as a separate issue in order to draw particular attention to it. The difference becomes especially relevant when comparing, for example, the sHALY (which assigns weight to health states based on SWB, thereby addressing Problem 1 but not 5) and the WELBY (which potentially addresses both, but probably at the expense of validity within specific domains such as healthcare, in which case it may be useful for high-level cross-sector prioritisation, e.g., setting budgets for different government departments [Problem 5], but not for priority-setting within, say, the NHS [Problem 1]). Following similar feedback from others, I did change 5 to “They are consequently of limited use in prioritising across sectors or cause areas” in my main list in order to highlight the relationship.

(Really, all of these problems are due to (a) the descriptive system, (b) the valuation method, and possibly (c) the aggregation method, so any further breakdown risks overlap and confusion—but those categories don’t really tell you why you should care about them, or what elements you should focus on, so it didn’t seem like a helpful typology for the “Problems” section.)

Still, I am not entirely happy with this way of dividing things up or framing things (e.g., some problems focus more “causes” and some on “effects”) and would welcome suggestions of alternatives that are both conceptually rigorous/consistent and draw attention to the practical implications.

Comment by Derek on EA Forum feature suggestion thread · 2020-12-03T20:17:44.974Z · EA · GW

As far as I can tell, it isn't possible to have line breaks in footnotes (though I may just be doing something wrong). This also precludes bulleted/numbered lists, block quotes, etc. Any chance that could be changed? 

Comment by Derek on EA Forum feature suggestion thread · 2020-12-03T18:42:41.569Z · EA · GW

H3s are still being converted to regular Paragraph format when I paste them in from GDocs. What am I doing wrong?

Comment by Derek on A counterfactual QALY for USD 2.60–28.94? · 2020-09-10T12:30:53.982Z · EA · GW

I'm sure there are many giving opportunities in global health that are better than the GiveWell top charities, and I'm pleased to see promising small or medium-sized projects like this being brought to the attention of EAs. 

However, I think you should try to get better estimates of QALYs gained (or DALYs averted)—especially if you're going to feature the cost-effectiveness ratio so prominently in your write-up. This should be possible by referring to the relevant literature. The current estimates don't seem all that plausible to me, e.g. an episode of "simple malaria" (by which you presumably mean there are no other complications like anaemia) tends to last a few weeks or less, so even if it could be immediately cured at the beginning, it wouldn't reach your lower estimate of 0.1 QALYs, let alone the upper of 5 QALYs. For life-threatening conditions, I don't think you should have the theoretical maximum of "save all lives" as the upper estimate, as that wouldn't happen in any context, and certainly not this one. If you must rely on your intuitive guesstimates, perhaps you should use 90% or 95% credible intervals.

Good luck with the project!


Comment by Derek on EA Forum feature suggestion thread · 2020-07-04T20:51:33.300Z · EA · GW

''Next" and "Previous" arrows/buttons at the bottom of a post, to move to the next/previous post - useful when you haven't read the forum for a while and want to catch up. This would obviously have to assume a certain ordering (e.g. chronological vs karma) and selection (e.g. all or excluding Community/Questions), which could perhaps be adjusted in Settings.

Comment by Derek on EA Forum feature suggestion thread · 2020-06-17T22:04:42.759Z · EA · GW

Level 3 headings should be supported. Unless it's changed recently, it currently jumps from Level 2 to Level 4, which makes it hard to logically format complex documents.

Comment by Derek on Market-shaping approaches to accelerate COVID-19 response: a role for option-based guarantees? · 2020-04-29T18:46:50.986Z · EA · GW

Thanks for the comments!

1. The put could cover ~90% of the cost of the accelerated production, taking into account the additional costs.

2. Sales are likely to be higher if they move more quickly: the company with the first billion vaccines is likely to sell a lot more items than the company with the second, and this could more than offset any additional costs. (The second may not sell any, even if it’s a good product, if the first can meet all needs quickly enough.)

3. Some variants outlined in the brief, such as declining payouts, can further incentivise haste.

4. I’ve nothing against academic/PPP efforts, especially if they are under existing arrangements (since they normally take ages to negotiate), and put options will not always be the best approach. But in the current situation we need as many teams on this as we can get, and options-based guarantees may help generate new ideas or get existing ones to market more quickly.

Comment by Derek on What posts do you want someone to write? · 2020-04-03T16:33:26.962Z · EA · GW

Should Covid-19 be a priority for EAs?

A scale-neglectedness-tractability assessment, or even a full cost-effectiveness analysis, of Covid as a cause area (compared to other EA causes) could be useful. I'm starting to look into this now – please let me know if it's already been done.

Comment by Derek on What posts do you want someone to write? · 2020-04-03T16:30:39.643Z · EA · GW

"The longtermist case for animal welfare"

Have you seen this?

Comment by Derek on Coronavirus: how much is a life worth? · 2020-03-27T00:10:11.829Z · EA · GW

Suicide is a very poor indicator of the dead/neutral point, for a host of reasons.

A few small, preliminary surveys I've seen place it around 2/10, though it ranges from about 0.5 to 6 depending on whom and how you ask.

(I share your concerns in parentheses, and am doing some work along these lines - it's been sidelined in part due to covid projects.)

Comment by Derek on What posts you are planning on writing? · 2020-03-26T18:39:32.875Z · EA · GW

Hah! I was working on them before getting sidelined with covid stuff.

I can send you the drafts if you send me a PM. The content is >80% done (I've decided to add more, so the % complete has dropped) but they need reorganising into ~10 manageable posts rather than 3 massive ones.

Comment by Derek on Founders Pledge Charity Recommendation: Action for Happiness · 2020-03-19T22:00:33.464Z · EA · GW

Thanks Aidan! Hope you're feeling better now.

Most of your comments sound about right.

On retention rates: Your general methods seem to make sense, since one would expect gradual tapering off of benefits, but your inputs seem even more optimistic than I originally thought.

I'm not sure Strong Minds is a great benchmark for retention rates, partly because of the stark differences in context (rural Uganda vs UK cities), and partly because IIRC there were a number of issues with SM's study, e.g. a non-randomised allocation and evidence of social desirability bias in outcome measurement, plus of course general concerns related to the fact it was a non-peer-reviewed self-evaluation. Perhaps retention rates of effects from UK psychotherapy courses of similar duration/intensity would be more relevant? But I haven't looked at the SM study for about a year, and I haven't looked into other potential benchmarks, so perhaps yours was a sensible choice.

Also not a great benchmark in a UK context, but Haushofer and colleagues recently did a study* of Problem Management+ in Uganda that found no benefits at the end of a year (paper forthcoming), even though it showed effectiveness at the 3 month mark in a previous study in Kenya.

*Haushofer, J., Mudida, R., & Shapiro, J. (2019). The Comparative Impact of Cash Transfers and Psychotherapy on Psychological and Economic Well-being. Working Paper. Available upon request.

Comment by Derek on AMA: Elie Hassenfeld, co-founder and CEO of GiveWell · 2020-03-19T15:43:02.106Z · EA · GW

Do you think GiveWell top charities are the best of all current giving opportunities? If so, what is the next best opportunity?

Comment by Derek on AMA: Elie Hassenfeld, co-founder and CEO of GiveWell · 2020-03-19T13:01:06.802Z · EA · GW

Do you think adopting subjective wellbeing as your primary focus would materially affect your recommendations?

In particular:

(a) Would using SWB as the primary outcome measure in your cost-effectiveness analysis change the rank ordering of your current top charities in terms of estimated cost-effectiveness?

(b) If it did, would that affect the ranking of your recommendations?

(c) Would it likely cause any of your current top charities to no longer be recommended?

(d) Would it likely cause the introduction of other charities (such as ones focused on mental health) into your top charity list?

Comment by Derek on AMA: Elie Hassenfeld, co-founder and CEO of GiveWell · 2020-03-18T21:46:20.442Z · EA · GW

How likely is it that GiveWell will ultimately (e.g. over a 100-year or 10,000-year period) do more harm than good? If that happens, what is the most likely explanation?

Comment by Derek on AMA: Elie Hassenfeld, co-founder and CEO of GiveWell · 2020-03-18T21:37:21.469Z · EA · GW

A recent post on this forum (one of the most upvoted of all time) argued that "randomista" development projects like GiveWell's top charities are probably less cost-effective than projects to promote economic growth. Do you have any thoughts on this?

Comment by Derek on Founders Pledge Charity Recommendation: Action for Happiness · 2020-03-07T16:51:45.831Z · EA · GW

I like your general approach to this evaluation, especially:

  • the use of formal Bayesian updating from a prior derived in part from evidence for related programmes
  • transparent manual discounting of the effect size based on particular concerns about the direct study
  • acknowledgement of most of the important limitations of your analysis and of the RCT on which it was based
  • careful consideration of factors beyond the cost-effectiveness estimate.

I'd like to see more of this kind of medium-depth evaluation in EA.

I don't have time at the moment for a close look at the CEA, but aside from limitations acknowledged in your text, 3 aspects stand out as potential concerns:

1. The "conservative" and "optimistic" results are quite extreme. This seems to be in part because "conservative" and "optimistic" values for several parameters are multiplied together (e.g. DALYs gained, yearly retention rate of benefits, % completing the course, discount rates...). As you'll know, it is highly improbable that even, say, three independent parameters would simultaneously obtain at, say, the 10th percentile: 0.1*0.1*0.1 = 0.001. Did you consider making a probabilistic model in Guesstimate, Causal, Excel (with macros for Monte Carlo simulation), R, etc in order to generate confidence intervals around the final results? (I appreciate there are major advantages to using Sheets, but it should be fairly straightforward to reproduce at least the "Main CEA" and "Subjective CEA inputs" tabs in, for example, Guesstimate. This would also enable a rudimentary sensitivity analysis.)

2. The inputs for "Yearly retention rate of benefits" (row 10) seem pretty high (0.30, 0.50, and 0.73 for conservative, best guess, and optimistic, respectively) and the results seem fairly sensitive to this parameter. IIRC the study this was based on only had an 8-week follow-up, which would be about half your "conservative" figure (8/52 = 0.15). Even their "extended" follow-up (without a control group) was only for another 2 months. It is certainly plausible that the benefits endure for several months, but I would say that estimates of about 0.1, 0.3, and 0.7 are more reasonable. With those inputs, the cost per DALY increases to about $47,000, $4,500, or $196. That central figure is roughly on a par with CBT for depression in high-income countries, i.e. pretty good but not comparable with developing-country interventions. (And I wouldn't take the "optimistic" figure seriously for the reasons given in (1) above.)

3. I haven't seen the "growth model" on which the cost estimates are based, but my guess is that it doesn't account for the opportunity cost of facilitators' (or participants') time. IIRC each course is led by two "skilled" volunteers who may otherwise do another pro-social activity.

Comment by Derek on Founders Pledge Charity Recommendation: Action for Happiness · 2020-03-05T22:36:21.531Z · EA · GW
There is also evidence that health problems have a much smaller effect on subjective well-being than one might imagine.

This is only the case for (some) physical health problems, especially those associated with reduced mobility. People tend to underestimate the SWB impact of (at least some) mental health problems. See e.g. Gilbert & Wilson, 2000; De Wit et al., 2000; Dolan & Kahneman, 2007; Dolan 2008; Pyne et al., 2009; Karimi et al., 2017

Comment by Derek on Poverty in Depression-era England: Excerpts from Orwell's "Wigan Pier" · 2020-02-12T01:26:34.067Z · EA · GW

You might want to mention the publication date (1937)

Comment by Derek on A Local Community Course That Raises Mental Wellbeing and Pro-Sociality · 2020-01-31T23:05:48.949Z · EA · GW

Thanks - I missed that on my skim. But the "extended" follow-up is only for another two months. It does seem to indicate that effects persist for at least that period, without any trend towards baseline, which is promising (though without a control group the counterfactual is impossible to establish with confidence). I wonder why they didn't continue to collect data beyond this period.

Comment by Derek on A Local Community Course That Raises Mental Wellbeing and Pro-Sociality · 2020-01-31T22:45:52.944Z · EA · GW

Thanks - "trained facilitator" might be a bit misleading. Still, it looks like there were two volunteer course leaders for each course, selected in part for their unspecified "skills", who were given "on-going guidance and support" to facilitate the sessions, and who have to arrange a venue etc themselves, then go through a follow-up process when it's over. So it's not a trivial amount of overhead for an average of 13 participants.

Comment by Derek on A Local Community Course That Raises Mental Wellbeing and Pro-Sociality · 2020-01-31T14:49:13.690Z · EA · GW

I don't have much time to spend on this, but here are a few thoughts based on a quick skim of the paper.

The study was done by some of the world's leading experts in wellbeing and the study design seems okay-ish ('waitlist randomisation'). The main concern with internal validity, which the authors acknowledge, is that changes in the biomarkers, while mostly heading in the right direction, were far from statistically significant. This could indicate that the effects reported on other measures were due to some factor other than actual SWB improvement, e.g. social desirability bias. But biomarkers are not a great metric, and measures were taken to address these concerns, so I find it plausible that the effects in the study population were (nearly) as large as reported.

- The participants were self-selected, largely from people who were already involved with Action for Happiness ("The charity aims to help people take action to create more happiness, with a focus on pro-social behaviour to bring happiness to others around them"), and largely in the UK. They also had to register online. It's unclear how useful it would be for other populations.
- It's quite an intensive program, involving weekly 2–2.5 hour group meetings with a trained facilitator two volunteer facilitators. ("Each of these sessions builds on a thematic question, for example, what matters in life, how to find meaning at work, or how to build happier communities.") This may limit its scalability and accessibility to certain groups.
- Follow-up was only for 2 months, the duration of the course itself. (This limitation seems to be due to the study design: the control group was people taking the course 8 weeks later.)
- The effect sizes for depression and anxiety were smaller than for CBT, so it may still not be the best option for mental health treatment (though the CBT studies were done in populations with a diagnosed mental disorder, so direct comparison is hard; and subgroup analyses showed that people with lower baseline wellbeing benefited most from the program).
- For clarity, the average effect size for life satisfaction was about 1 point on a 10-point scale. This is good compared to most wellbeing interventions, but that might say more about how ineffective most other interventions are than about how good this one is.

So at the risk of sounding too negative: it's hardly surprising that people who are motivated enough to sign up for and attend a course designed to make them happier do in fact feel a bit happier while taking the course. It seems important to find out how long these effects endure, and whether the course is suitable for a broader range of people.

Comment by Derek on The EA Hotel is now the Centre for Enabling EA Learning & Research (CEEALAR) · 2020-01-29T18:05:45.588Z · EA · GW

But I really think the whole name should be reconsidered.

Comment by Derek on The EA Hotel is now the Centre for Enabling EA Learning & Research (CEEALAR) · 2020-01-29T18:05:21.971Z · EA · GW

You could keep the name but drop the first 'A': CEELAR. Excluding the 'A' of Altruism isn't great, but I think you're allowed to take major liberties with acronyms. And really, almost anything is better than CEEALAR.

Comment by Derek on AMA: Rob Mather, founder and CEO of the Against Malaria Foundation · 2020-01-28T11:56:42.157Z · EA · GW

Thanks Rob!

As you've said, in addition to averting deaths it looks like AMF considerably improves lives, e.g. by improving economic outcomes and reducing episodes of illness. Have you considered collecting data on subjective wellbeing in order to help quantify these improvements? Could that be integrated into your program without too much expense/difficulty?

On the other side of the coin, one possible negative impact of programs that increase wealth and/or population size is the suffering of animals farmed for food (since better-off people tend to eat more meat). Do you have any data on dietary changes resulting from bed net distribution (or similar programs)? Would it be feasible to collect that data in future?

Comment by Derek on AMA: Rob Mather, founder and CEO of the Against Malaria Foundation · 2020-01-24T09:51:20.571Z · EA · GW

A recent post on this forum (the fourth most popular ever, at the time of writing) argued that "randomista" development projects like AMF are probably less cost-effective than projects to promote economic growth. Do you have any thoughts on this?

Comment by Derek on AMA: Rob Mather, founder and CEO of the Against Malaria Foundation · 2020-01-24T09:48:03.462Z · EA · GW

What are your thoughts on the indirect ("flow-through") effects of AMF? For example:

1. What do you think are the main positive and negative indirect impacts of the program, both long- and short-term? (E.g. increasing productivity and economic growth, increasing/decreasing total population, strengthening health systems, greenhouse gas emissions, consumption of factory-farmed meat...) Do you have any data on these? Are you planning to gather data on any of them?

2. What proportion of the long-term benefit from the program is due to short-term direct effects such as saving lives and averting unpleasant episodes of malaria, relative to indirect benefits?

3. Do you hold a particular view of population ethics (totalism, averagism, person-affecting, etc)?

4. What is your response to critics who claim we are ultimately "clueless" about the long-run magnitude or even sign of interventions like this? (I think the basic argument is that e.g. averting deaths has a wide range of knock-on effects, both good and bad, and that we may not be justified in being confident that ultimately – say, over the next few hundred years - the impact will be net positive. See e.g. here, here, and here for a better explanation)

Comment by Derek on Katriel Friedman: The benefits of starting your own charity · 2020-01-24T08:56:50.547Z · EA · GW
It's a core part of the research ethics that they teach you when you're being trained to run an RCT — whether you can run them if you have equipoise (i.e., are certain that an intervention works).

You might want to clarify this. Equipoise is uncertainty about whether the intervention works, and is often considered a pre-requisite for an RCT. I'm sure Katriel understands this but the phrasing here is misleading.

Comment by Derek on AMA: Rob Mather, founder and CEO of the Against Malaria Foundation · 2020-01-22T21:58:55.798Z · EA · GW

Can you explain your '20 minute rule'?

Comment by Derek on Logarithmic Scales of Pleasure and Pain: Rating, Ranking, and Comparing Peak Experiences Suggest the Existence of Long Tails for Bliss and Suffering · 2020-01-20T15:26:15.978Z · EA · GW

Do you have any thoughts on whether valenced experience is asymmetrical, i.e. whether the most negative experiences (e.g. 10/10 on some suitable pain scale) are more bad than the most positive ones (e.g. 10/10 on some suitable pleasure scale) are good?

My hunch is that the worst experiences are more intense, at least if you exclude weird/rare things like Jhanas and 5-MeO-DMT trips, e.g. I'd give up days or weeks of 'maximum happiness' to avoid being burned alive for a minute. But not everyone shares this intuition, and I'm not sure how to settle the debate (at least until you prove and operationalise your symmetry theory of valence).

Comment by Derek on Logarithmic Scales of Pleasure and Pain: Rating, Ranking, and Comparing Peak Experiences Suggest the Existence of Long Tails for Bliss and Suffering · 2020-01-20T15:12:14.612Z · EA · GW

Thanks for this - very interesting.

Do you think your claims would apply to broader measures of subjective wellbeing, e.g. questions like "Overall, how satisfied are you with your life?" and "Overall, how happy were you yesterday?" (often on a 0-10 scale)? Or even to more specific measures of valenced experience, like depression (e.g. PHQ-9)?

Because I've been wondering whether:

(a) the Weber-Fechner law is limited to perception of clear physical stimuli (weight, pain, spicyness, etc), as distinct from 'internal' states and cognitive evaluations (though the internal/external distinction may not make sense here).

(b) a log scale is less useful/accurate when considering long periods of time (a day, a year, a lifetime), over which the variance in average wellbeing in a population will be lower than the variance in the intensity of specific events.

Comment by Derek on Physical Exercise for EAs – Why and How · 2020-01-12T19:39:25.313Z · EA · GW

This is very good, but I think busy (or unmotivated) EAs without much exercise experience would benefit from even more specific recommendations, especially for resistance exercises (i.e. strength training).

I found the Start Bodyweight program useful when beginning resistance training at home with no equipment other than a pull-up bar. An EA recommended the book Overcoming Gravity for more detailed information on bodyweight exercises.

I now I prefer to use the gym. At a glance, the following (which I just found with a quick Google search) seem like sensible gym-based* options for beginners, but maybe you have better ideas. [I'd add some core exercises to this, like situps and planks]

When I'm too busy to do the full range of strength and cardio (or when I'm travelling), I sometimes do moderate/high-intensity interval classes at home using YouTube videos. The Body Coach is pretty good - he has a videos with a range of difficulty (beginner to advanced), duration (10 min+), and muscle focus (legs, upper body, abs, full-body, etc). There are also videos meeting specific needs, e.g. low-impact routines so you don't disturb your neighbours or hurt your knees, and ones designed for small spaces. This kind of thing is perhaps the most efficient form of exercise: you can do it anywhere, it doesn't require any equipment, it's free, it covers both cardio and strength, and it doesn't take much time.

When travelling, I also take a resistance band. If you choose the weight carefully, a single band (which folds up to the size of a cigarette packet) can arguably substitute for any dumbbell that you'd use in the gym, and some of the machines as well. (The main thing you're lacking is the ability to do deadlifts, but there are ways around that too.)

I've heard some EAs recommend GymPass, especially if you travel a lot and don't like to exercise alone.

Feel free to correct me on any of this – I don't have any relevant expertise.

*They could obviously be done at home if you buy the equipment. The last one just needs dumbbells or resistance bands, which are pretty cheap.

Comment by Derek on 2019 AI Alignment Literature Review and Charity Comparison · 2020-01-03T13:44:38.972Z · EA · GW

Why isn't there a GiveWell-style evaluator for longtermist (or specifically AI safety) orgs?

Comment by Derek on New research on moral weights · 2020-01-02T20:50:19.827Z · EA · GW

Section 4 on subjective wellbeing is interesting.

• Across poor respondents in Kenya and Ghana, the average life satisfaction ladder score is 2.8 (where 0 is the lowest and 10 is the highest score).
• Respondents with higher consumption have higher life satisfaction ladder scores; doubling consumption is associated with being 0.4 steps higher on the ladder.
• When describing different points on the ladder respondents most often referred to levels of money and material goods. In contrast, health states were mentioned much less often with regards to life satisfaction. Having a health condition was associated with being 0.3 steps lower on the ladder.
• Overall, taken alone, these findings suggest that consumption is of greater relative importance to wellbeing of respondents than their preferences (described in Section 1-3) indicate.

I notice they only measured life satisfaction. Can you tell me why they didn't also include at least one measure of hedonic wellbeing, such as those used in the evaluations of GiveDirectly? It is really important to understand whether potential GiveWell top charity beneficiaries are actually unhappy (i.e. generally feel bad) or just dissatisfied with their material circumstances when someone with a clipboard asks them about it. (Life satisfaction is much more sensitive to relative wealth and status than is pleasure/misery.) For instance, this may be the critical factor when choosing between life-extending and life-improving interventions.

Comment by Derek on Uncertainty and sensitivity analyses of GiveWell's cost-effectiveness analyses · 2020-01-01T18:23:25.917Z · EA · GW

Did you ever get round to running the analysis with your best guess inputs?

If that revealed substantial decision uncertainty (and especially if you were very uncertain about your inputs), I'd also like to see it run with GiveWell's inputs. They could be aggregated distributions from multiple staff members, elicited using standard methods, or in some cases perhaps 'official' GiveWell consensus distributions. I'm kind of surprised this doesn't seem to have been done already, given obvious issues with using point estimates in non-linear models. Or do you have reason to believe the ranking and cost-effectiveness ratios would not be sensitive to methodological changes like this?

Comment by Derek on Is mindfulness good for you? · 2019-12-30T21:02:22.844Z · EA · GW

This is very useful – thanks for writing it up.

This heterogeneity across intervention types means that we should be cautious about broad claims about the efficacy of mindfulness for depression and anxiety.

True, but that applies equally to claims of null or small effect sizes, e.g. some forms of mindfulness could be very effective even if 'on average' it's not. Did any of the meta-analyses contain useful subgroup analyses?

(For what it's worth, a few years ago I used the Headspace app ~5x/week for 3 months and found it to be actively detrimental to my mood. Anecdotally, this seem fairly common:

Comment by Derek on Learning to ask action-relevant questions · 2019-12-28T20:09:21.962Z · EA · GW

I guess "action-relevant" has a better noun form, which could be a non-trivial advantage.

Comment by Derek on Learning to ask action-relevant questions · 2019-12-28T20:06:33.065Z · EA · GW
Ask yourself: “If I imagine a world in which I have answered this question, what would look different?”

This sounds like the "importance" part of the ITN framework. From EA Concepts:

If all problems in the area could be solved, how much better would the world be?
Comment by Derek on Learning to ask action-relevant questions · 2019-12-28T20:02:00.678Z · EA · GW
I'm not sure if "action-relevant" is accepted terminology

About a year ago I heard the term "action guiding", which I guess is the same?

Comment by Derek on We're Rethink Priorities. AMA. · 2019-12-17T23:27:38.555Z · EA · GW

I’ve asked several academics with domain expertise to review draft posts, or sections of posts, or advise on specific issues. Some have been very useful, but they understandably do not have time to engage fully (if at all). As a consequence, I often worry that I’m making dumb mistakes, or just reinventing the wheel, and there are often substantial delays while waiting for expert input. I think the lack of access to academic networks and infrastructure is perhaps the biggest weakness of RP as a research organisation, and it is related to the youth and inexperience of EA as a whole.

I'm not sure it can be fully solved – some fields only have half a dozen people in the world working on them, so it may be impossible to find someone with enough free time to help out. But I suspect a lot of progress could be made, e.g. I bet there are a lot of statisticians and economists who would be willing and able to help if only they knew we needed it. At the mid- and late career professionals’ meetup at EAG San Fransisco last June, it was suggested that retired academics, professional groups, and LinkedIn might be good sources of mentors/advisors. Someone mentioned as well – perhaps not for academic advice, but for support in other areas where EA orgs tend to be lacking, such as management. I'd be interested to see an effort to systematically connect experts with EA projects, perhaps through the EA Hub or 80,000 Hours.

Comment by Derek on We're Rethink Priorities. AMA. · 2019-12-15T12:37:34.183Z · EA · GW

The following is a tidy, oversimplified version of what happened.

I learned about Bentham and Mill in A-level history class (aged 17) and I think read a Peter Singer book. I was very left-wing at the time but I remember being really frustrated that all the other altruistically-minded kids in my class supported standard leftist policies for ideological reasons even when they harmed disadvantaged people. This influenced me to study philosophy at undergrad level, where I defended utilitarianism.

Unfortunately EA hadn’t been invented at the time so I spent the first year after graduation working in warehouses and call centers, followed by about nine years of direct development work in low-income countries. I got frustrated by the inefficiency of most development orgs and decided to switch fields into either law (‘earning to give’ before I'd heard of the concept) or public health (to do direct work with more quantifiable impacts).

Around the same time I was searching online for information about charity evaluation and came across GiveWell, then the Singer TED Talk and the wider EA community. This may have influenced me to choose public health, though there were other factors (e.g. the 2008 financial crash made it even harder than usual to pursue a lucrative law career). I spent 18 months in Australia doing whatever work I could find – mostly farm labouring – to pay for my master’s course.

During the course I became more involved in EA, and got interested in health economics, especially methods for cost-effectiveness analysis. But I couldn’t get a job or PhD in health economics with a general public health background, so to save up for a second master's I spent two more years doing mostly sub-minimum wage temp jobs, or saving dole money when I couldn’t find work (though I also got a bit of contract work with GiveWell towards the end of this period). Halfway through that course I ran out of money and had some health issues, so I took a leave of absence, during which time I worked on the 2019 Global Happiness Policy Report (Chapter 3), then got the Rethink job.

My reasons for continuing to work in EA are some mixture of those given by my colleagues.

Comment by Derek on We're Rethink Priorities. AMA. · 2019-12-15T11:41:49.903Z · EA · GW

Most likely academic research related to the use of subjective wellbeing in prioritisation systems (healthcare, central government, maybe EA orgs, etc). Might have applied for researcher positions in other EA orgs.

Comment by Derek on We're Rethink Priorities. AMA. · 2019-12-13T23:15:46.443Z · EA · GW

I’ve become a bit more longtermist in outlook and more uncertain of the sign/effect size of most interventions/projects, mostly due to issues around indirect effects/cluelessness.

Comment by Derek on We're Rethink Priorities. AMA. · 2019-12-13T23:08:56.992Z · EA · GW

I’m not a philosopher, but to the extent I have opinions on such things they are about the same as Moss’s, i.e. classical hedonistic utilitarianism with quite a lot of moral uncertainty. I have somewhat suffering-focused intuitions but (a) I’ve never seen a remotely convincing argument for a suffering-focused ethic, and (b) I think my intuitions – and, I suspect, those of many people who identify as suffering-focused – can be explained by other factors. In particular, I think there are problems with the scales people use to measure valence/wellbeing/value of lives, both in reality and in thought experiments, e.g. it seems common for philosophers to assume a symmetrical scale like -10 to +10, whereas it seems pretty obvious to me that the worst lives – or even, say, the 5th percentile of lives – are many times more bad then the best lives are good. So if the best few percent of lives are 10/10 and 0 is equivalent to being dead, the bottom few percent of any large population are probably somewhere between -100 and -100,000. (It is not widely appreciated just how awful things are for so many people.) If true, classical utilitarianism may have policy implications similar to prioritarianism and related theories, e.g. more resources for the worst off (assuming tractability). But I haven’t seen much literature on these scale issues so I’m not confident this is correct. If you know of any relevant research, preferably peer-reviewed, I’d be very interested.

Comment by Derek on How do cash transfers impact the people who don’t receive them? · 2019-12-04T17:18:52.324Z · EA · GW

[EDIT: I no longer endorse all of this comment. After looking more closely at the papers, I'm more confident that the spillover effects of the latest version of the program are neutral to positive (at least on humans – growth in meat consumption is an important caveat).]

Thanks for posting this.

Though not reported here, I was pleased to see that non-market effects were also recorded in the study, and that these were neutral or positive for both recipients (‘treated households’) and non-recipients.

For treated households, we find positive and significant effects for four of the six indices: psychological well-being, food security, education and security [i.e. crime rates]. Estimated effects are close to zero and not significant for the health index and female empowerment index. When looking at total effects including spillovers for the treated, we find a similar pattern for all but the security index. For untreated households, we find no significant effects of local cash transfers except for the education index, which is higher by 0.1 SD (p < 0.10). Importantly, we do not find evidence of adverse spillover effects for untreated households on any of the indices, with point estimates positive for all but the security index, which is indistinguishable from zero (-0.02 SD, SE 0.07).

I’m particularly interested in the “psychological wellbeing” index, which Appendix C1 says comprises a “weighted, standardized average of depression (10 question CES-D scale), happiness, life satisfaction, and perceived stress (PSS-4)”. I would like to know: (a) what measures were used for “happiness” and “life satisfaction”; (b) how the components of the index were weighted; and most of all (c) a breakdown of scores for each measure. I can’t find this information in the paper.

I’m asking because there is a fair amount of research suggesting that one person's income increase causes wellbeing declines among other members of the community (i.e. people feel worse when their neighbour gets richer), at least for some accounts of wellbeing. For instance, Haushofer, Reisinger, & Shapiro (2019) found that neighbours of GiveDirectly cash recipients experienced a decline in psychological wellbeing (seemingly measured by a similar index to the one used in the most recent study) about half as great as the psychological wellbeing benefit to the recipient. Depending on how many neighbours are affected by each transfer, this would seem to indicate that GiveDirectly may have a net negative effect on aggregate wellbeing. However, this effect was driven entirely by life satisfaction, an ‘evaluative’ or ‘cognitive’ measure; there were no negative spillovers on measures of ‘hedonic’ wellbeing, namely “happiness”, “stress”, and “depression”. As the authors note:

This result is intuitive: the wealth of one’s neighbors may plausibly affect one’s overall assessment of life, but have little effect on how many positive emotional experiences one encounters in everyday life. This result complements existing distinctions between these different facets of well-being, e.g. the finding that hedonic well-being has a “satiation point” in income, whereas evaluative well-being may not (Kahneman and Deaton, 2010).

Without seeing the disaggregated scores from the new study, it seems possible that there were non-trivial and statistically significant harms (or benefits) according to some components of the index. This matters to those with a preferred moral theory or conception of wellbeing, e.g. a classical utilitarian probably cares more about hedonic states than life evaluations, and a prioritarian more about severe states like depression than positive ones like happiness.