The (un)reliability of moral judgments: A survey and systematic(ish) review 2019-11-01T02:38:48.809Z · score: 9 (6 votes)
Model-free and model-based cognition in deontological and consequentialist reasoning 2019-09-23T20:03:48.793Z · score: 8 (4 votes)
The cost of slow growth chickens 2019-09-12T17:46:14.376Z · score: 17 (10 votes)
Uncertainty and sensitivity analyses of GiveWell's cost-effectiveness analyses 2019-08-31T23:33:58.276Z · score: 57 (21 votes)
Consumer preferences for labgrown and plant-based meat 2019-08-08T18:45:05.581Z · score: 18 (7 votes)
Realizing the Mass Public Benefit of Evidence-Based Psychological Therapies: The IAPT Program 2019-07-16T00:25:07.010Z · score: 8 (5 votes)
"Moral Bias and Corrective Practices" and the possibility of an ongoing moral catastrophe 2019-06-24T22:38:15.036Z · score: 12 (4 votes)
Summary of Cartwright and Hardie's "Evidence-based Policy: A practical guide to doing it better" 2019-06-17T21:25:25.006Z · score: 17 (9 votes)
Doning with the devil 2018-06-15T15:51:06.030Z · score: 3 (3 votes)


Comment by cole_haus on A bunch of new GPI papers · 2019-11-09T00:42:33.724Z · score: 1 (1 votes) · EA · GW

I think there's a missing "max" in the definition of the NBS in "A bargaining-theoretic approach to moral uncertainty", in case anyone gets/got confused. (I already emailed the author.)

Comment by cole_haus on Deliberation May Improve Decision-Making · 2019-11-05T01:35:00.223Z · score: 7 (3 votes) · EA · GW

One interesting impact of deliberation is that it can induce single-peakedness which makes the problem of social choice easier. Briefly:

  • There are a number of social choice impossibility theorems (e.g. Arrow, Gibbard-Satterthwaite) which say that, in general, it's impossible to aggregate individual preferences into social choices/preferences without giving up on some criterion you'd really like to satisfy (e.g. no dictators; if every individual prefers A to B, so does society).

  • Single-peaked preferences are preferences where:

    • every participant views options as varying along a single dimension,
    • every participant has an ideal choice in the set of choices, and
    • options further away from that ideal choice are less preferred.
  • If you restrict the domain of input, individual preferences to be single-peaked, you can escape some of the impossibility theorems and straightforward aggregation options become available.

  • It's plausible that deliberation causes a single dimension to become salient for everyone and for preferences to thereby become single-peaked.

Social Choice Theory and Deliberative Democracy: A Reconciliation talks about this kind of thing in more detail.

Comment by cole_haus on The (un)reliability of moral judgments: A survey and systematic(ish) review · 2019-11-01T17:36:26.453Z · score: 3 (3 votes) · EA · GW

I guess I'll also use the comments to offer my hot take which I don't think I can immediately justify in a fully explicit manner:

Of course our moral judgements are unreliable (though not in some of the ways investigated in the literature and almost certainly in additional ways that aren't investigated in the literature). There are some moral judgements which are stable but some which aren't and even the smaller set of unstable judgements is very concerning--especially for EAs and others trying to maximize (cf. optimizer's curse) and taking unconventional views. I don't think the expertise defense or most of the other responses are particularly successful. I think the "moral engineering" approach is essential. We should be building ethical systems that explicitly acknowledge and account for the noise in our intuitions. For example, something like rule utilitarianism seems substantially more appealing to me now than it did before I looked into this area; the regularization imposed by rule utilitarianism limits our ability to fit to noise (cf. bias-variance trade-off).

Comment by cole_haus on The (un)reliability of moral judgments: A survey and systematic(ish) review · 2019-11-01T16:58:01.010Z · score: 1 (1 votes) · EA · GW

Okay, thanks. I added a section to the summary:

Meta: This post attempts to summarize the interdisciplinary work on the (un)reliability of moral judgements. As that work contains many different perspectives with no grand synthesis and no clear winner (at present), this post is unable to offer a single, neat conclusion to take away. Instead, this post is worth reading if the (un)reliability of moral judgements seems important to you and you'd like to understand what the current state of investigation is.

Comment by cole_haus on The (un)reliability of moral judgments: A survey and systematic(ish) review · 2019-11-01T15:01:29.623Z · score: 1 (1 votes) · EA · GW

I'm mostly not trying to argue for any particular conclusion--more trying to summarize and relay the existing work. I was deliberately trying to avoid emphasising my idiosyncratic take because I didn't want readers to have to separate personal speculation from reportage. (I would have thought the "survey and systematic(ish) review" in the title help to set that expectation. Are those terms more ambiguous than I understand them to be?)

As far as consensus in the literature, there doesn't seem to be much of one. I think consensus is/will be especially hard because of the variety of researchers involved--philosophers, psychologists, etc. You can see the lack of consensus reflected in the wide variety of angles in "Indirect evidence" and "Responses".

Does that all make sense?

Comment by cole_haus on EA Forum Prize: Winners for September 2019 · 2019-10-31T23:56:49.809Z · score: 12 (8 votes) · EA · GW

I think even the updated version of Global development interventions are generally more effective than climate change interventions has serious modeling problems. I left a couple of comments early on but got no response. I've now added a new comment explaining my understanding of the problem more fully here.

I'm commenting here just to signal boost because it seems like it would be unfortunate if we built on the current estimate which looks to be wrong in important ways. (Though it's also possible I'm just confused!)

Comment by cole_haus on [updated] Global development interventions are generally more effective than Climate change interventions · 2019-10-31T23:39:44.994Z · score: 3 (2 votes) · EA · GW

Even after the update, I'm confused by this model. I'll restrict my attention to the "Realistic case" for simplicity.

The global social cost of carbon when constructed from the country-level estimates is effectively:

where is the global social cost of carbon and , , , etc. are the country level social costs of carbon for countries 'a', 'b', 'c', etc.

The spreadsheet model then uses an income adjustment factor of 1,260 expressing that a dollar means more to a poor person than a rich person. We can also think of this as a unit conversion factor: 1,260 American median income dollars = 1 GiveDirectly recipient dollar. We can write this conversion factor as .

The next step in the model is to divide the global social cost of carbon by the income adjustment factor. If we expand global social cost of carbon, this looks like:

If we just look at the units, this is:

where is a country-level-cost-of-carbon-weighted dollar, is a dollar in country 'a', etc. Converting country 'a' dollars to GiveDirectly recipient dollars via is only appropriate if country 'a' is in fact America. Otherwise, the units don't line up.

It seems like what we'd actually want as far as income adjustment is something like:

where is the global social cost of carbon expressed in GiveDirectly recipient dollars.

In other words, we can't just apply the same income-adjustment factor to every country's social cost of carbon because not every country has the same income. We need per-country income adjustment factors. Applying the 1,260 income adjustment factor to American social cost of carbon works but in applying this adjustment to the global cost of carbon, we are also implicitly saying that the social cost of carbon in Burkina Faso ought to be discounted just as heavily.

I think the only way the current approach works is if we assume that country-level costs have somehow already been adjusted into American median income dollars.

Comment by cole_haus on Attempt at understanding the role of moral philosophy in moral progress · 2019-10-28T22:05:56.849Z · score: 13 (6 votes) · EA · GW

Moral Bias and Corrective Practices: A Pragmatist Perspective (the full article is there if you scroll down) is interesting and somewhat relevant IMO. It argues that "the moral biases of slavery advocates proved largely immune to correction by the dominant methods of moral philosophy, which were deployed by white abolitionists. Ascent to the a priori led to abstract moral principles—the Golden Rule, the equality of humans before God—that settled nothing because their application to this world was contested. Table-turning exercises were ineffective for similar reasons. Reflective equilibrium did not clearly favor the abolitionists, given authoritarian, Biblical, and racist premises shared by white abolitionists and slavery advocates."

Comment by cole_haus on What evaluations are there for interventions to boost altruism? · 2019-10-24T22:43:33.683Z · score: 7 (2 votes) · EA · GW

I don't know offhand of any quantitative estimates but "moral enhancement" is a potentially relevant area of study. For example:

Comment by cole_haus on Older people may place less moral value on the far future · 2019-10-24T17:26:34.041Z · score: 5 (3 votes) · EA · GW

I've gotten the impression that eliciting discount rates is actually quite tricky and noisy. (From e.g. Time Discounting and Time Preference: A Critical Review which finds discount rates varying over many orders of magnitude.) How did you all deal with these sorts of issues? For example, what was your thinking on choosing the question framing you did rather than other frames?

Comment by cole_haus on Oddly, Britain has never been happier · 2019-10-23T00:35:38.061Z · score: 4 (3 votes) · EA · GW

Hmm, I just went to OurWorldInData and looked at their info.

They're each slightly different but it seems useful to look at as many data sources as possible.

Comment by cole_haus on Review of Climate Cost-Effectiveness Analyses · 2019-10-22T20:35:50.582Z · score: 1 (1 votes) · EA · GW

Ahh, if you're specifically looking for comparisons to global health, that makes sense that they're all EA-affiliated.

Comment by cole_haus on Review of Climate Cost-Effectiveness Analyses · 2019-10-20T23:26:14.781Z · score: 2 (2 votes) · EA · GW

Thanks for writing this up!

All four of the estimates you review appear to be (at least loosely) EA-affiliated. Were those the only ones you could find that satisfied your criteria? What were your criteria when deciding which estimates to review?

Comment by cole_haus on Altruistic equity allocation · 2019-10-17T20:17:39.224Z · score: 1 (1 votes) · EA · GW

A charity can try to mitigate this risk by understanding how much they would need to reduce their valuation to get additional donors interested.

A Vickrey auction should be a nice way of addressing this problem. The charity ends up with a complete, honest listing of all bidder's valuations. Google's IPO was structured along these lines.

How is valuation determined?

I think it's also nice to think about this from an "inside view". Typically, an asset is priced such that the price equals the net present value of the expected stream of rents the asset generates. The ability to calculate a valuation from (slightly) more tangible inputs means that the charity doesn't have to pick a totally arbitrary number when they're starting their price-finding/fundraising process. This definition also helps guide the "Equity in what?" discussion.

No one has an external incentive to behave honestly. The system is intended as an aid for generally-aligned donors and employees to actually understand how much they are contributing.

I know you have this and other caveats throughout, but I definitely worry that it'll be hard to ignore the numbers when they create perverse incentives (Goodhart's law, Campbell's law, etc.). Market mechanisms have lots of problems and I wouldn't want to import those into the charitable sector unnecessarily (which is not to say that markets don't have good features or that the charitable sector doesn't have structural issues). I'll have to think about this more carefully, but it seems like it would be nice to design a mechanism which is tailored for the unique features of the charitable sector (e.g. incentives ought to be more aligned and less zero sum, any potential market is probably fairly illiquid).

Another thing to think about: Equity typically confers voting rights. Is that appropriate here? Why or why not? The why not argument that comes to mind immediately is: In typical for-profit companies, voting rights are useful as a way of disciplining managers that have their own private incentives. Hopefully, the incentives of donors and charity management are already fairly aligned.

Comment by cole_haus on Effective Altruism and International Trade · 2019-10-16T03:17:06.872Z · score: 6 (4 votes) · EA · GW

Lant Pritchett (influential development economist) makes a related argument in Randomizing Development: Method or Madness?:

[C]ross-national evidence shows that the four-fold transformation of national development, to higher productivity economies, to more responsive states, the more capable organizations and administration and to more equal social treatment produces gains in poverty and human well-being that are orders of magnitude bigger than the best that can be hoped from better programs. Arguments that RCT research is a good (much less “best”) investment depend on both believing in an implausibly low likelihood that non-RCT research can improve progress national development and believing in an implausibly large likelihood that RCT evidence improves outcomes.

Basically, he's arguing for the cost-effectiveness of macro interventions over micro interventions.

Comment by cole_haus on [Link] "How feasible is long-range forecasting?" (Open Phil) · 2019-10-11T23:47:31.451Z · score: 16 (6 votes) · EA · GW

The accuracy of technological forecasts, 1890-1940 is a paper I happened to already know about that seems somewhat relevant but I didn't see mentioned:

Predictions of future technological changes and the effects of those changes, made by Americans between 1890 and 1940, are compared to the actual outcomes. Overall, less than half of the predictions have been fulfilled or are in the process of fulfillment. The accuracy of predictions appears at best weakly related to general technical expertise, and unrelated to specific expertise. One expert (or non-expert) appears to be as good a predictor as another. Predictions of continuing status quo are not significantly more or less accurate than predictions of change. Predictions of the effects of technology are significantly less accurate than predictions of technological changes.

Comment by cole_haus on Shapley values: Better than counterfactuals · 2019-10-11T17:10:05.990Z · score: 2 (2 votes) · EA · GW

and how finely we individuate them

The Banzhaf value should avoid this problem since it has the property of 2-Efficiency: "The 2-Efficiency property states that the allocation rule that satisfies it is immune against artificial merging or splitting of players."

Comment by cole_haus on Shapley values: Better than counterfactuals · 2019-10-10T21:22:19.870Z · score: 10 (6 votes) · EA · GW

I like this angle! It seems useful to compare the Shapley value in this domain to the Banzhaf value. (Brief, dense description: If Shapley value attributes value to pivotal actors during the sequential process of coalition formation (averaged across all permutations of coalition formation orderings), Banzhaf value attributes value to critical actors without which any given coalition would fail. See Shapley-Shubik power index and Banzhaf power index for similar concepts in a slightly different context.)

This paper has a nice table of properties:

Property Shapley Banzhaf
Efficiency Yes No
Dummy player property Yes Yes
Null player property Yes Yes
Symmetry Yes Yes
Anonymity Yes Yes
Additivity Yes Yes
Transfer property Yes Yes
2-Efficiency No Yes
Total power No Yes
Strong monotonicity Yes Yes
Marginal contributions Yes Yes

("Additivity' is the same as "linearity" here.)

Focusing on just the properties where they differ:

  • Efficiency: I've sometimes seen this called "full allocation" which is suggestive. It's basically just whether the full value of the coalition is apportioned to actors of the coalition or if some of it is leftover.
  • 2-Efficiency: "The 2-Efficiency property states that the allocation rule that satisfies it is immune against artificial merging or splitting of players."
  • Total power: "The Total power property establishes that the total payoff obtained for the players is the sum of all marginal contributions of every player normalized by ."

I'd have to think about this more carefully, but it's not immediately obvious to me which set of properties is better for the purpose at hand.

Comment by cole_haus on [Link] Experience Doesn’t Predict a New Hire’s Success (HBR) · 2019-10-04T23:44:39.799Z · score: 11 (7 votes) · EA · GW

They actually have a working paper for an updated version that I was just able to dig up (the links from Google Scholar seem broken ATM): The Validity and Utility of Selection Methods in Personnel Psychology: Practical and Theoretical Implications of 100 Years of Research Findings.

Comment by cole_haus on [Link] Experience Doesn’t Predict a New Hire’s Success (HBR) · 2019-10-04T23:10:48.748Z · score: 12 (4 votes) · EA · GW

The Validity and Utility of Selection Methods in Personnel Psychology: Practical and Theoretical Implications of 85 Years of Research Findings is a broader and (IMO) nice overview of this stuff. Here's a snippet from the central table:

Personnel measures Validity (r)
GMA tests .51
Work sample tests .54
Integrity tests .41
Conscientiousness tests .31
Employment interviews (structured) .51
Employment interviews (unstructured) .38
Job knowledge tests .48
Job tryout procedure1 .44
Peer ratings .49
T & E behavioral consistency method .45
Reference checks .26
Job experience (years) .18
Biographical data measures .35
Assessment centers .37
T & E point method .11
Years of education .10
Interests .10
Graphology .02
Age -.01
Comment by cole_haus on Assessing biomarkers of ageing as measures of cumulative animal welfare · 2019-09-28T15:54:03.598Z · score: 7 (6 votes) · EA · GW

Thanks, I find this quite interesting!

To test the limits of the theory, it seems like it might be useful to come up with factors that work in the opposite direction. I've heard that caloric restriction and castration both contribute to longevity in humans and seem (arguably?) affectively negative. Smoking is arguably affectively positive and increases biological ageing. I have to guess we could come up with many other examples.

Obviously, it's highly speculative at this point, but what would you guess the correlation coefficient is between cumulative welfare and biological ageing? How large does the correlation need to be before it's useful?

Comment by cole_haus on [Link] The Case for Charter Cities Within the EA Framework (CCI) · 2019-09-27T21:41:59.839Z · score: 1 (1 votes) · EA · GW

Also, seems like full-on state-building—which is presumably what would need to happen in terra nullius—is a different (harder?) task than a charter city. As I understand things, charter cities typically rely on many services/institutions of their host polity.

Comment by cole_haus on Psychology and Climate Change: An Overview · 2019-09-27T21:37:49.333Z · score: 5 (3 votes) · EA · GW

I thought the The Psychology of Environmental Decisions had some good info:

We argue that to promote pro-environmental decisions and to achieve public consensus on the need for action we must address individual and collective understanding (cognition) of environmental problems, as well as individual and collective commitments to take action to mitigate or prevent those problems. We review literature pertaining to psychological predispositions, mental models, framing, psychological distance, and the social context of decisions that help elucidate how these goals of cognition and commitment can be achieved.

Overcoming public resistance to carbon taxes is a bit narrower (perhaps generalizable) but still good and relevant IMO. They outline five general reasons for opposition to carbon taxes:

  • The personal costs are perceived to be too high
  • Carbon taxes can be regressive
  • Carbon taxes could damage the wider economy
  • Carbon taxes are believed not to discourage high‐carbon behavior
  • Governments may want to tax carbon to increase their revenues

and outline some policy advice:

  • Phasing in carbon taxes over time
  • Earmarking tax revenues for additional climate change mitigation
  • Redistributing taxes to improve fairness
  • Information sharing and communication
Comment by cole_haus on [Link] The Case for Charter Cities Within the EA Framework (CCI) · 2019-09-24T19:16:39.235Z · score: 7 (6 votes) · EA · GW

Yeah, I'm not expecting RCTs. I just think that some attempt at causal inference would be great (e.g. instrumental variable, difference in differences). I also don't think this is a purely procedural complaint (i.e. not just a rote repetition of "Correlation isn't causation!")--I think there are real risks around confounding and external validity.

I'm also fully onboard for the claim that institutions matter. For me, the uncertainty comes in when we ask "Can this intervention change the right institutions with the right direction and magnitude?".

(Also, I don't think it'll be that productive to talk about without bringing more serious evidence to bear but even 0 doesn't strike me as "very pessimistic". There have been plenty of well-intentioned policies with a net negative effect.)

Comment by cole_haus on [Link] The Case for Charter Cities Within the EA Framework (CCI) · 2019-09-23T21:57:36.046Z · score: 9 (4 votes) · EA · GW

A 0.5% boost in annual GDP per capita growth doesn't strike me as a very pessimistic "pessimistic" estimate.

Regulation and Growth looks to be one of the two citations on this (IMO crucial) parameter, but it's just a correlational study of regulations vs growth. The other is China's Special Economic Zones at 30 which looks to be basically a case study.

The Skeptics Guide to Institutions (four parts total) has some background from the skeptical perspective.

(For the record, I think charter cities are interesting. But I also think the domain is extremely complicated and it seems hard to get impact estimates that are even remotely reliable.)

Comment by cole_haus on [updated] Global development interventions are generally more effective than Climate change interventions · 2019-09-13T20:27:41.473Z · score: 1 (1 votes) · EA · GW

I mentioned it in my comment elsewhere, but—from a quick look at the paper and the supplementary material—I don't think it's much like any of these. They don't make any special mention that I could find of trying to translate purely economic measures into welfare. The only mention I could find about income adjustment is "rich/poor specifications" which appears to be about splitting the formula for growth of damages into one of two forms depending on whether the country is rich or poor.

Edit: They do mention "elasticity of marginal utility" in the discounting module section which is also known as "intergenerational inequality aversion".

Comment by cole_haus on [updated] Global development interventions are generally more effective than Climate change interventions · 2019-09-10T21:03:16.594Z · score: 3 (3 votes) · EA · GW

Thanks, this is interesting! I quickly read through the core paper and am a bit confused.

It seems like you're understanding income adjustment to be one of the main additions in the paper. Where are you seeing that? The title/abstract/etc. seem to be pitching greater spatial resolution as the main contribution. Greater spatial resolution helps with income adjustment but isn't sufficient. As far as I can tell the paper primarily uses regular old GDP per capita (with the de rigeur acknowledgement that GDP isn't a great welfare measure). The only income adjustment I see is a couple of mentions of rich/poor specifications and the supplementary information suggests that this is just splitting the formula for the growth of damages based on whether a country falls into the rich bin or poor bin.

They explain the increased cost not as due to income adjustment (as I understand things) but because:

The median estimates of the GSCC (Fig. 1) are significantly higher than the Inter-agency Working Group estimates, primarily due to the higher damages associated with the empirical macroeconomic production function

All that said, I wish it did use logarithmic utility because that seems like an important improvement!

(FYI: All the inline footnotes still link to a Google doc.)

Comment by cole_haus on Uncertainty and sensitivity analyses of GiveWell's cost-effectiveness analyses · 2019-09-08T23:49:23.503Z · score: 1 (1 votes) · EA · GW

(I think my other two recent comments sort of answer each of your questions.)

Comment by cole_haus on Uncertainty and sensitivity analyses of GiveWell's cost-effectiveness analyses · 2019-09-08T23:32:26.470Z · score: 2 (2 votes) · EA · GW

I'm a bit confused. In the GiveDirectly case for 'value of increasing consumption', you're still holding the discount rate constant, right?

Nope, it varies. One way you can check this intuitively is: if the discount rate and all other parameters were held constant, we'd have a proper function and our scatter plot would show at most one output value for each input.

taking GiveWell's point estimate as the prior mean, how do the cost-effectiveness estimates (and their uncertainty) change as we vary our uncertainty over the input parameters.

There are (at least) two versions I can think of:

  1. Adjust all the input uncertainties in concert. That is, spread all the point estimates by ±20% or all by ±30% , etc. This would be computationally tractable, but I'm not sure it would get us too much extra. I think the key problem with the current approach which would remain is that we're radically more uncertain about some of the inputs than the others.

  2. Adjust all the input uncertainties individually. That is, spread point estimate 1 by ±20%, point estimate 2 by ±10%, etc. Then, spread point estimate 1 by ±10%, spread point estimate 2 by ±20%, etc. Repeat for all combinations of spreads and inputs. This would actually give us somewhat useful information, but would be computational intractable given the number of input parameters.

Comment by cole_haus on Uncertainty and sensitivity analyses of GiveWell's cost-effectiveness analyses · 2019-09-08T23:23:06.087Z · score: 3 (3 votes) · EA · GW

Short version:

Do the expected values of the output probability distributions equal the point estimates that GiveWell gets from their non-probabilistic estimates?

No, but they're close.

More generally, are there any good write-ups about when and how the expected value of a model with multiple random variables differs from the same model filled out with the expected value of each of its random variables?

Don't know of any write-ups unfortunately, but the linearity of expectation means that the two are equal if and (generally?) only if the model is linear.

Long version:

When I run the Python versions of the models with point estimates, I get:

Charity Value/$
GiveDirectly 0.0038
END 0.0211
DTW 0.0733
SCI 0.0370
Sightsavers 0.0394
Malaria Consortium 0.0316
HKI 0.0219
AMF 0.0240

The (mostly minor) deviations from the official GiveWell numbers are due to:

  1. Different handling of floating point numbers between Google Sheets and Python
  2. Rounded/truncated inputs
  3. A couple models calculated the net present value of an annuity based on payments at the end of the each period instead of the beginning--I never got around to implementing this
  4. Unknown errors

When I calculate the expected values of the probability distributions given the uniform input uncertainty, I get:

Charity Value/$
GiveDirectly 0.0038
END 0.0204
DTW 0.0715
SCI 0.0354
Sightsavers 0.0383
Malaria Consortium 0.0300
HKI 0.0230
AMF 0.0231

I would generally call these values pretty close.

It's worth noting though that the procedure I used to add uncertainty to inputs doesn't produce inputs distributions that have the original point estimate as their expected value. By creating a 90% CI at ±20% of the original value, the CI is centered around the point estimate but since log normal distributions aren't symmetric, the expected value is not precisely at the the point estimate. That explains some of the discrepancy.

The rest of the discrepancy is presumably from the non-linearity of the models (e.g. there are some logarithms in the models). In general, the linearity of expectation means that the expected value of a linear model of multiple random variables is exactly equal to the linear model of the expected values. For non-linear models, no such rule holds. (The relatively modest discrepancy between the point estimates and the expected values suggests that the models are "mostly" linear.)

Comment by cole_haus on Uncertainty and sensitivity analyses of GiveWell's cost-effectiveness analyses · 2019-09-02T17:31:21.629Z · score: 1 (1 votes) · EA · GW

Oh, very cool! I like the idea of sampling from different GiveWell staffers' values (though I couldn't do that here since I regarded essentially all input parameters as uncertain instead of just the highlighted ones).

I hadn't thought about the MPT connection. I'll think about that more.

Comment by cole_haus on Uncertainty and sensitivity analyses of GiveWell's cost-effectiveness analyses · 2019-09-02T17:29:32.993Z · score: 13 (6 votes) · EA · GW

Thanks for your thoughts.

the thrust of what you're saying is "we should do uncertainty analysis (use Monte Carlo simulations instead of point-estimates) as our cost-effectiveness might be sensitive to it"

Yup, this is the thrust of it.

But you haven't shown that GiveWell's estimates are sensitive to a reliance on point estimates (have you?)

I think I have---conditionally. The uncertainty analysis shows that, if you think the neutral uncertainty I use as input is an acceptable approximation, substantially different rankings are within the bounds of plausibility. If I put in my own best estimates, the conclusion would still be conditional. It's just that instead of being conditional upon "if you think the neutral uncertainty I use as input is an acceptable approximation" it's conditional upon "if you think my best estimates of the uncertainty are an acceptable approximation".

So the summary point there is that there's really no way to escape conditional conclusions within a subjective Bayesian framework. Conclusions will always be of the form "Conclusion C is true if you accept prior beliefs B". This makes generic, public communication hard (as we're seeing!), but offers lots of benefits too (which I tried to demonstrate in the post---e.g. an explicit quantification of uncertainty, a sense of which inputs are most influential).

here's a new, really complicated methodology we could use

If I've given the impression that it's really complicated, I think might have misled. One of the things I really like about the approach is that you pay a relatively modest fixed cost and then you get this kind of analysis "for free". By which I mean the complexity doesn't infect all your actual modeling code. For example, the GiveDirectly model here actually reads more clearly to me than the corresponding spreadsheet because I'm not constantly jumping around trying to figure out what the cell reference (e.g. B23) means in formulas.

Admittedly, some of the stuff about delta moment-independent sensitivity analysis and different distance metrics is a bit more complicated. But the distance metric stuff is specific to this particular problem---not the methodology in general---and the sensitivity analysis can largely be treated as a black box. As long as you understand what the properties of the resulting number are (e.g. ranges from 0-1, 0 means independence), the internal workings aren't crucial.

I think it would actually be very useful for you to input your best guess inputs (and its likely to be more useful for you to do it than an average EA, given you've thought about this more)

Given the responses here, I think I will go ahead and try that approach. Though I guess even better would be getting GiveWell's uncertainty on all the inputs (rather than just the inputs highlighted in the "User weights" and "Moral inputs" tab).

Sorry for adding even more text to what's already a lot of text :). Hope that helps.

Comment by cole_haus on Uncertainty and sensitivity analyses of GiveWell's cost-effectiveness analyses · 2019-09-01T15:46:53.092Z · score: 4 (3 votes) · EA · GW

You looked at the overall recap and saw the takeaways there? e.g. Sensitivity analysis indicates that some inputs are substantially more influential than others, and there are some plausible values of inputs which would reorder the ranking of top charities.

These are sort of meta-conclusions though and I'm guessing you're hoping for more direct conclusions. That's sort of hard to do. As I mention in several places, the analysis depends on the uncertainty you feed into it. To maintain "neutrality", I just pretended to be equally uncertain about each input. But, given this, any simple conclusions like "The AMF cost-effectiveness estimates have the most uncertainty." or "The relative cost-effectiveness is most sensitive to the discount rate." would be misleading at best.

The only way to get simple conclusions like that is to feed input parameters you actually believe in to the linked Jupyter notebook. Or I could put in my best guesses as to inputs and draw simple conclusions from that. But then you'd be learning about me as much as you'd be learning about the world as you see it.

Does that all make sense? Is there another kind of takeaway that you're imagining?

Comment by cole_haus on Movement Collapse Scenarios · 2019-08-28T21:45:18.128Z · score: 1 (1 votes) · EA · GW

Not sure what "attenuation" means in this context.

It's probably correction for attenuation: 'Correction for attenuation is a statistical procedure ... to "rid a correlation coefficient from the weakening effect of measurement error".'

Comment by cole_haus on Key points from The Dead Hand, David E. Hoffman · 2019-08-12T18:23:08.620Z · score: 2 (2 votes) · EA · GW

I’m a bit unclear about this: it seems that this is true for dirty bombs, but it is extremely hard to make a fission bomb work.

I'm far from an expert, but Global Catastrophic Risks makes it sound like that's not the case:

With modern weapons-grade uranium, the background neutron rate is so low that terrorists, if they have such material, would have a good chance of setting off a high­ yield explosion simply by dropping one half of the material onto the other half. Most people seem unaware that if separated HEU is at hand it's a trivial job to set off a nuclear explosion ... even a high school kid could make a bomb in short order.

(the book is actually quoting Luis Alvarez there)

A US government sponsored experiment in the 1960s suggests that several physics graduates without prior experience with nuclear weapons and with access to only unclassified information could design a workable implosion type bomb. The participants in the experiment pursued an implosion design because they decided a gun-type device was too simple and not enough of a challenge (Stober, 2003).

Comment by cole_haus on An overview of arguments for concern about automation · 2019-08-08T17:53:54.080Z · score: 2 (2 votes) · EA · GW

I don't really have a coherent thesis in this response, just some thoughts/references that came to mind:

  • I thought this recent paper was a pretty reasonable framework for thinking about the high-level effects of automation on labor.
  • Wage stagnation: I thought Table 1 here was a pretty good overview of different studies, their methodologies and results
  • Wage stagnation: I think the type of issue raised in GDP-B: Accounting for the Value of New and Free Goods in the Digital Economy is plausibly an important addition to the discussion. We know GDP has never been a great proxy for welfare and it's plausibly getting worse over time. Importantly, it sounds plausible that the growth of the digital economy will be correlated with automation.
  • "it seems reasonable to worry about what might happen should the conditions that led to democracy no longer hold." It also seems plausible to me that there's path dependence/hysteresis such that democracy could persist even in the face of other conditions. One story you could tell along those lines is that populations in many countries are now generally older, wealthier, and more educated.
  • On inequality and stability: Max Weber's criteria for fundamental conflict (from Theoretical Sociology) are:

(1) Membership in social class (life chances in markets and economy), party (house of power or polity), and status groups (rights to prestige and honor) are correlated with each other; those high or low in one of these dimensions of stratification are high and low in the other two.

(2) High levels of discontinuity in the degrees of inequality within social hierarchies built around class, party, and status; that is, there are large gaps between those at high positions and those in middle positions, with large differences between the latter and those in lower positions with respect to class location, access to power, and capacity to command respect. And,

(3) low rates of mobility up and, also, down these hierarchies, thereby decreasing chances for those low in the system of stratification from bettering their station in life.

Comment by cole_haus on "Why Nations Fail" and the long-termist view of global poverty · 2019-07-23T07:44:07.105Z · score: 3 (2 votes) · EA · GW

Yes, I agree they're very incomplete--as advertised. I also think the original claims they're responding to are pretty incomplete.


I agree that time horizons are finite. If you're taking that as meaning that the defect/defect equilibrium reigns due to backward induction on a fixed number of games, that seems much too strong to me. Both empirically and theoretically, cooperation becomes much more plausible in indefinitely iterated games.

Does the single shot game that Acemoglu and Robinson implicitly describe really seem like a better description of the situation to you? It seems very clear to me that it's not a good fit. If I had to choose between a single shot game and an iterated game as a model, I'd choose the iterated game every time (and maybe just set the discount rate more aggressively as needed--as the post points out, we can interpret the discount rate as having to do with the probability of deposition).

Maybe the crux here is the average tenure of autocrats and who we're thinking of when we use the term?


(I don't say "solve" anywhere in the post so I think the quote marks there are a bit misleading.)

I agree that to come up with something closer to a conclusion, you'd have to do something like analyze the weighted value of each of these structural factors. Even in the absence of such an analysis, I think getting a fuller list of the structural advantages and disadvantages gets us closer to the truth than a one-sided list.

Also, if we accept the claim that Acemoglu and Robinson's empirical evidence is weak, then the fact that I haven't presented any evidence on the real-world importance of these theoretical mechanisms becomes a bit less troubling. It means there's something closer to symmetry in the absence of good evidence bearing on the relative importance of structural advantages and disadvantages in each type of society.

My intuition is that majoritarian tyrannies and collective action problems are huge, pervasive problems in the contemporary world, but I won't argue for that here. I can pretty quickly come up with several examples where it might be in an autocrat's self-interest to confront coordination problems and/or majoritarian tyrannies:

  • Reducing local air pollution would improve an autocrat's health
  • Reducing overuse of antibiotics in animal agriculture could reduce their risk of contracting an antibiotic-resistant infection
  • Allowing/encouraging immigration (for some autocratic country appealing to immigrants) could boost the economy in a way that benefits the autocrat and leads them to overrule the preferences of locals

Obviously, each of these examples is only the briefest sketch and way more work would have to be done to make things conclusive.

Comment by cole_haus on Age-Weighted Voting · 2019-07-18T22:44:04.062Z · score: 10 (5 votes) · EA · GW

This paper has increased my general skepticism on the accuracy of any estimates of discount rates: Time Discounting and Time Preference: A Critical Review. It has a table listing studies that find discount rates ranging from -6% to ∞% .

Comment by cole_haus on Age-Weighted Voting · 2019-07-18T22:39:52.265Z · score: 5 (3 votes) · EA · GW

I'm not sure how much you thought about this aspect, but I've recently become extra wary of surveys on this topic (beyond the ordinary skepticism I'd have for questions which are mostly about expressive preferences and not revealed preferences). Time Discounting and Time Preference: A Critical Review has a table listing studies that find discount rates ranging from -6% to ∞% . Even if that doesn't influence you as much as it did me, the paper has some good discussion of different methods of elicitation (which are especially likely to influence results given the difficulty of the domain).

Comment by cole_haus on "Why Nations Fail" and the long-termist view of global poverty · 2019-07-18T19:11:55.771Z · score: 16 (8 votes) · EA · GW

In addition to the empirical problems, I was very underwhelmed by the theoretical mechanisms Acemoglu and Robinson outline. I wrote up my complaints in a couple of blog posts:

  • Autocrats can accelerate growth through cooperation: Institutions as a fundamental cause of long-run growth claims that inclusive societies should, ceteris paribus have greater economic growth than authoritarian ones—in part, because autocrats can’t credibly commit to upholding property rights after productive investment has occurred. If we formalize this argument as a game, we see that the single shot case supports this claim. But once we turn to the (more plausible) repeated game, we see that mutual cooperation is an equilibrium.
  • Inclusive and extractive societies each have structural advantages: Acemoglu and Robinson claim that extractive societies are at an economic disadvantage because elites will block economic improvements in the name of self-interested stability. But majorities in inclusive societies might also block economic improvements in the name of self-interest. Furthermore, we might expect inclusive societies to be more disadvantaged by problems of collective action.

(These posts are still a bit drafty so apologies for typos, errors, etc.)

Comment by cole_haus on The Happy Culture: A Theoretical, Meta-Analytic, and Empirical Review of the Relationship Between Culture and Wealth and Subjective Well-Being · 2019-07-16T22:25:06.224Z · score: 3 (3 votes) · EA · GW

Nothing especially insightful to add. Just wanted to link to The French Unhappiness Puzzle: The Cultural Dimension of Happiness which is on a related topic and reasonably good.

Comment by cole_haus on Do we know how many big asteroids could impact Earth? · 2019-07-11T17:47:47.089Z · score: 3 (2 votes) · EA · GW

No idea really. The chapter reports "The best chance for discovery of such [dark Damocloid] bodies would be through their thermal radiation around perihelion, using infrared instrumentation on the ground (Rivkin et al., 2005) or in satellites." Rivken et al. 2005 is here.

Comment by cole_haus on Do we know how many big asteroids could impact Earth? · 2019-07-10T22:33:16.778Z · score: 8 (2 votes) · EA · GW

Global Catastrophic Risks (now slightly outdated with a 2008 publication date) has a chapter on comets and asteroids.

It estimates that an impactor with a diameter of 1 or 2 kilometers would be "civilization-disrupting" and 10 kilometers would "have a good chance of causing the extinction of the human species". So that's what the "big" means in this context.

We can estimate the population of possible impactors via impact craters, telescopic searches and dynamical analysis. Using these techniques, "[i]t is generally thought that the total population of near-Earth asteroids over a kilometre across is about 1100." But there are other classes of impactors with greater uncertainty-comets and Damocloids. "Whether small, dark Damocloids, of, for example, 1 km diameter exist in abundance is unknown - they are in essence undiscoverable with current search programmes."

This sounds like a plausible reconciliation of the apparently conflicting claims. OpenPhil is specifically talking about near-earth asteroids where we do indeed have fairly accurate estimates. The NASA employee referenced by MacAskill may be referring to the larger class of all possible impactors where uncertainly is much greater.

Comment by cole_haus on "Moral Bias and Corrective Practices" and the possibility of an ongoing moral catastrophe · 2019-07-02T00:35:26.624Z · score: 3 (2 votes) · EA · GW

Yup, I agree that she draws strong conclusions from weak evidence. I wish it were more careful, but I posted it anyway since this is really the only analysis I have seen along these lines.

Comment by cole_haus on Announcing the launch of the Happier Lives Institute · 2019-06-24T23:25:30.795Z · score: 3 (3 votes) · EA · GW

Contemporary Metaethics delineates the field as being about:

(a)  Meaning: what is the semantic function of moral discourse? Is the function of moral discourse to state facts, or does it have some other non-fact-stating role?

(b)  Metaphysics: do moral facts (or properties) exist? If so, what are they like? Are they identical or reducible to natural facts (or properties) or are they irreducible and sui generis?

(c)  Epistemology and justification: is there such a thing as moral knowledge? How can we know whether our moral judgements are true or false? How can we ever justify our claims to moral knowledge?

(d)  Phenomenology: how are moral qualities represented in the experience of an agent making a moral judgement? Do they appear to be ‘out there’ in the world?

(e)  Moral psychology: what can we say about the motivational state of someone making a moral judgement? What sort of connection is there between making a moral judgement and being motivated to act as that judgement prescribes?

(f)  Objectivity: can moral judgements really be correct or incorrect? Can we work towards finding out the moral truth?

It doesn't quite seem to me like the original claim fits neatly into any of these categories.

Comment by cole_haus on EA Forum: Footnotes are live, and other updates · 2019-05-21T18:41:08.074Z · score: 6 (4 votes) · EA · GW


Perhaps you all have considered this already, but I think there's a lot to like about sidenotes over footnotes, especially on the web (e.g. footnotes aren't always in sight at the bottom of a physical page).

Comment by cole_haus on Structure EA organizations as WSDNs? · 2019-05-10T22:02:59.574Z · score: 4 (3 votes) · EA · GW

How would you expect EA WSDNs to differ from current EA orgs concretely?

When it comes to worker cooperatives, I see the differences as all flowing from reducing conflicting interests. That is, in standard firms, owners are ultimately interested in profits and only instrumentally interested in working conditions while workers are ultimately interested in working conditions (broadly construed) and only instrumentally interested in profits. Worker cooperatives resolve this tension by making agents principals and principals agents.

This is an idealization, but it seems like the interests of all relevant actors in EA orgs (and nonprofits more generally?) are more aligned. The board and the workers are (at least in theory) largely (if not solely) motivated by the same do-gooding goal.

Comment by cole_haus on What is the current best estimate of the cumulative elasticity of chicken? · 2019-05-04T04:10:01.464Z · score: 3 (3 votes) · EA · GW

Consideration 1: Economists often consider small actors in competitive markets to be price-takers meaning that they cannot influence prices on their own. This seems like a pretty plausible characterization of any individual food buyer.

Consideration 2: "He reasoned that economics says a drop in demand for some commodity should cause prices to fall for that commodity, and overall consumption remains the same." This is not correct. In inward shift in the demand curve ("a drop in demand") (for ordinary downward sloping demand curves and upward sloping supply curves), causes both equilibrium price and quantity to decrease. I'd guess the thing he's trying to get at is that for a good which is unit elastic, a small drop in price is offset by a small increase in quantity which leads to total revenue being unchanged.

So our first option is to regard individual actors as too small to influence the price. If we reject this and think they do have an effect, their effect would be to shift the demand curve in---dropping equilibrium price and quantity.

Aside: I'm reasonably well-informed about economics and don't recall having ever heard the term "cumulative elasticity" before.

Comment by cole_haus on Why does EA use QALYs instead of experience sampling? · 2019-04-24T02:00:02.793Z · score: 8 (6 votes) · EA · GW

I don't really see ESM as being in opposition to QALYs. It seems like it's a method that you would use as an input in QALY weight determinations. Wikipedia lists some of the current methods for deriving QALY weights as:

Time-trade-off (TTO): Respondents are asked to choose between remaining in a state of ill health for a period of time, or being restored to perfect health but having a shorter life expectancy.
Standard gamble (SG): Respondents are asked to choose between remaining in a state of ill health for a period of time, or choosing a medical intervention which has a chance of either restoring them to perfect health, or killing them.
Visual analogue scale (VAS): Respondents are asked to rate a state of ill health on a scale from 0 to 100, with 0 representing being dead and 100 representing perfect health. This method has the advantage of being the easiest to ask, but is the most subjective.

There's also the "day reconstruction method" (DRM). The Oxford Handbook of Happiness talks about ESM, DRM and others relevant measurement approaches at various points.

I'd guess the trouble with using ESM, DRM and some other methods like them for QALY weights is it's hard to isolate the causal effect of particular conditions using these methods.

Comment by cole_haus on Thoughts on 80,000 Hours’ research that might help with job-search frustrations · 2019-04-18T23:29:47.218Z · score: 7 (6 votes) · EA · GW

Ah, I see that now. Thanks.

FWIW, I was specifically looking for a disclaimer and it didn't quickly come to my attention. It looks like a few other people in these subthreads may have also missed the disclaimer.