Comment by Dan_Keys on Announcing a contest: EA Criticism and Red Teaming · 2022-10-01T19:43:59.797Z · EA · GW

The Less Wrong posts Politics as Charity from 2010 and Voting is like donating thousands of dollars to charity from November 2012 have similar analyses to the 2020 80k article.

Comment by Dan_Keys on An experiment eliciting relative estimates for Open Philanthropy’s 2018 AI safety grants · 2022-09-12T21:51:49.619Z · EA · GW

Agreed that there are some contexts where there's more value in getting distributions, like with the Fermi paradox.

Or, before the grants are given out, you could ask people to give an ex ante distribution for "what will be your ex post point estimate of the value of this grant?" That feeds directly into VOI calculations, and it is clearly defined what the distribution represents. But note that it requires focusing on point estimates ex post.

Comment by Dan_Keys on An experiment eliciting relative estimates for Open Philanthropy’s 2018 AI safety grants · 2022-09-12T21:14:20.996Z · EA · GW

I think it would've been better to just elicit point estimates of the grants' expected value, rather than distributions. Using distributions adds complexity, for not much benefit, and it's somewhat unclear what the distributions even represent.

Added complexity: for researchers giving their elicitations, for the data analysis, for readers trying to interpret the results. This can make the process slower, lead to errors, and lead to different people interpreting things differently. e.g., For including both positive & negative numbers in the distributions.

Not much benefit: at least, when I read this report I mostly looked at the point estimates, except for the section showing that researchers' confidence intervals for the two elicitation methods didn't overlap.

Unclear what the distribution represents: The distribution is basically a probability distribution over a probability (p(x-risk)), and it's not obvious which uncertainties should be represented in the distribution and which are part of p(x-risk). e.g., If someone thinks that there's an 80% chance that a research direction is misguided & useless and a 20% chance that it's meaningful & relevant, should they just multiply their distribution by 0.2 (relative to research that is definitely in a meaningful & relevant direction), or should this give a more spread-out distribution with most of the probability mass near zero, or something in between?

Comment by Dan_Keys on An experiment eliciting relative estimates for Open Philanthropy’s 2018 AI safety grants · 2022-09-12T20:50:13.334Z · EA · GW

In the table with post-discussion distributions, how is the lower bound of the aggregate distribution for the Open Phil AI Fellowship -73, when the lowest lower bound for an individual researcher is -2.4? Also in that row, Researcher 3's distribution is given as "250 to 320", which doesn't include their median (35) and is too large for a scale that's normalized to 100.

Comment by Dan_Keys on How accurate are Open Phil's predictions? · 2022-06-18T01:52:19.080Z · EA · GW

I haven't seen a rigorous analysis of this, but I like looking at the slope, and I expect that it's best to include each resolved prediction as a separate data point. So there would be 743 data points, each with a y value of either 0 or 1.

Comment by Dan_Keys on How accurate are Open Phil's predictions? · 2022-06-18T01:50:40.304Z · EA · GW

There are several different sorts of systematic errors that you could look for in this kind of data, although checking for them requires including more features of each prediction than the ones that are here.

For example, to check for optimism bias you'd want to code whether each prediction is of the form "good thing will happen", "bad thing will happen", or neither. Then you can check if probabilities were too high for "good thing will happen" predictions and too low for "bad thing will happen" predictions. (Most of the example predictions were "good thing will happen" predictions, and it looks like probabilities were not generally too high, so probably optimism bias was not a major issue.)

Some other things you could check for:

  • tracking what the "default outcome" would be, or whether there is a natural base rate, to see if there has been a systematic tendency to overestimate the chances of a non-default outcome (or to underestimate it)
  • dividing predictions up into different types, such as predictions about outcomes in the world (e.g. >20 new global cage-free commitments), predictions about inputs / changes within the organization (e.g. will hire a comms person within 9 months), and predictions about people's opinions (e.g. [expert] will think [the grantee’s] work is ‘very good’), to check for calibration & accuracy on each type of prediction
  • trying to distinguish the relative accuracy of different forecasters. If there are too few predictions per forecaster, you could check if any forecaster-level features are correlated with overconfidence or with Brier score (e.g., experience within the org, experience making these predictions, some measure of quantitative skills). The aggregate pattern of overconfidence in the >80% and <20% bins can show up even if most forecasters are well-calibrated and only (say) 25% are overconfident, as overconfident predictions are averaged with well-calibrated predictions. And those 25% influence these sorts of results graphs more than it seems, because well-calibrated forecasters use the extreme bins less often. Even if only 25% of all predictions are made by overconfident forecasters, half of the predictions in the >80% bins might be from overconfident forecasters
Comment by Dan_Keys on Announcing Impact Island: A New EA Reality TV Show · 2022-04-01T23:39:17.875Z · EA · GW

Pardon my negativity, but I get the impression that you haven't thought through your impact model very carefully.

In particular, the structure where

Every week, an anonymous team of grantmakers rank all participants, and whoever accomplished the least morally impactful work that week will be kicked off the island. 

is selecting for mediocrity.

Given fat tails, I expect more impact to come from the single highest impact week than from 36 weeks of not-last-place impact.

Perhaps for the season finale you could bring back the contestant who had the highest impact week of the season and have them face off against the last survivor. That could also make for more exciting television than whatever you had planned for the 36th episode.

Comment by Dan_Keys on Announcing What The Future Owes Us · 2022-04-01T20:31:26.720Z · EA · GW

How much overlap is there between this book & Singer's forthcoming What We Owe The Past?

Comment by Dan_Keys on The State of the World — and Why Monkeys are Smarter than You · 2021-05-04T01:22:18.065Z · EA · GW

I got 13/13.

q11 (endangered species) was basically a guess. I thought that an extreme answer was more likely given how the quiz was set up to be counterintuitive/surprising. Also relevant: my sense is that we've done pretty well at protecting charismatic megafauna; the fact that I've heard about a particular species being at risk doesn't provide much information either way about whether things have gotten worse for it (me hearing about it is related to things being bad for it, and it's also related to successful efforts to protect it).

On q6 (age distribution of population increase) I figured that most people are age 15-74 and that group would increase roughly proportionally with the overall increase, which gives them the majority of the increase. The increase among the elderly will be disproportionately large, but that's not enough for it to be the biggest in absolute terms since they're only like 10% of the population.

On q7 (deaths from natural disaster) I wouldn't have been surprised if the drop in death rate was balanced out by the increase in population, but I had an inkling that it was faster. And the tenor of the quiz was that the surprisingly good answer was correct, so if population growth had balanced it out then probably it would've asked about deaths per capita rather than total deaths.

Comment by Dan_Keys on Getting money out of politics and into charity · 2020-10-16T02:58:48.654Z · EA · GW

For example: If there are diminishing returns to campaign spending, then taking equal amounts of money away from both campaigns would help the side which has more money.

Comment by Dan_Keys on Michael_Wiebe's Shortform · 2020-09-29T19:37:19.736Z · EA · GW

If humanity goes extinct this century, that drastically reduces the likelihood that there are humans in our solar system 1000 years from now. So at least in some cases, looking at the effects 1000+ years in the future is pretty straightforward (conditional on the effects over the coming decades).

In order to act for the benefit of the far future (1000+ years away), you don't need to be able to track the far future effects of every possible action. You just need to find at least one course of action whose far future effects are sufficiently predictable to guide you (and good in expectation).

Comment by Dan_Keys on The Web of Prevention · 2020-02-20T00:49:19.957Z · EA · GW

The initial post by Eliezer on security mindset explicitly cites Bruce Schneier as the source of the term, and quotes extensively from this piece by Schneier.

Comment by Dan_Keys on [Link] Aiming for Moral Mediocrity | Eric Schwitzgebel · 2020-01-04T04:22:57.233Z · EA · GW
In most of his piece, by “aiming to be mediocre”, Schwitzgebel means that people’s behavior regresses to the actual moral middle of a reference class, even though they believe the moral middle is even lower.

This skirts close to a tautology. People's average moral behavior equals people's average moral behavior. The output that people's moral processes actually produce is the observed distribution of moral behavior.

The "aiming" part of Schwitzgebel's hypothesis that people aim for moral mediocrity gives it empirical content. It gets harder to pick out the empirical content when interpreting aim in the objective sense.

Comment by Dan_Keys on Public Opinion about Existential Risk · 2018-08-26T22:30:07.623Z · EA · GW

Unless a study is done with participants who are selected heavily for numeracy and fluency in probabilities, I would not interpret stated probabilities literally as a numerical representation of their beliefs, especially near the extremes of the scale. People are giving an answer that vaguely feels like it matches the degree of unlikeliness that they feel, but they don't have that clear a sense of what (e.g.) a probability of 1/100 means. That's why studies can get such drastically different answers depending on the response format, and why (I predict) effects like scope insensitivity are likely to show up.

I wouldn't expect the confidence question to pick up on this. e.g., Suppose that experts think that something has a 1 in a million chance and a person basically agrees with the experts' viewpoint but hasn't heard/remembered that number. So they indicate "that's very unlikely" by entering "1%" which feels like it's basically the bottom of the scale. Then on the confidence question they say that they're very confident of that answer because they feel sure that it's very unlikely.

Comment by Dan_Keys on Public Opinion about Existential Risk · 2018-08-26T22:18:24.104Z · EA · GW

That can be tested on these data, just by looking at the first of the 3 questions that each participant got, since the post says that "Participants were asked about the likelihood of humans going extinct in 50, 100, and 500 years (presented in a random order)."

I expect that there was a fair amount of scope insensitivity. e.g., That people who got the "probability of extinction within 50 years" question first gave larger answers to the other questions than people who got the "probability of extinction within 500 years" question first.

Comment by Dan_Keys on EA Survey 2017 Series: Donation Data · 2017-09-12T16:03:08.217Z · EA · GW

I agree that asking about 2016 donations in early 2017 is an improvement for this. If future surveys are just going to ask about one year of donations then that's pretty much all you can do with the timing of the survey.

In the meantime, it is pretty easy to filter the data accordingly -- if you look only at donations made by EAs who stated that they joined on 2014 or before, the median donation is $1280.20 for 2015 and $1500 for 2016.

This seems like a better way to do the analyses. I think that the post would be more informative & easier to interpret if all of the analyses used this kind of filter. (For 2016 donations you could also include people who became involved in EA in 2015.)

For example, someone who hears a number for the median non-student donation in 2016 will by default assume that this refers to people who were non-student EAs throughout 2016. If possible, it's good to give the number which matches the scenario that they're imagining rather than needing to give caveats about how 35% of the people weren't EAs yet at the start of 2016. When people hear a non-intuitive analysis with a caveat then they're fairly likely to either a) forget about the caveat and mistakenly think that the number refers to the thing that they initially assumed that it meant or b) not know what to make of the caveated analysis and therefore not learn anything.

Comment by Dan_Keys on EA Survey 2017 Series: Donation Data · 2017-09-12T03:20:13.229Z · EA · GW

It is also worth noting that the survey was asking people who identify as EA in 2017 how much they donated in 2015 and 2016. These people weren't necessarily EAs in 2015 or 2016.

Looking at the raw data of when respondents said that they first became involved in EA, I'm getting that:

7% became EAs in 2017
28% became EAs in 2016
24% became EAs in 2015
41% became EAs in 2014 or earlier

(assuming that everyone who took the "Donations Only" survey became an EA before 2015, and leaving out everyone else who didn't answer the question about when they became an EA.)

So if we're looking at donations made in 2015, 35% of the people weren't EAs then and another 24% had only just become EAs that year. For 2016, 35% of the people weren't EAs yet at the start of the year and 7% weren't EAs at the end of the year.

(There were similar issues with the 2015 survey.)

These not-yet-EAs can have a large influence on the median, and to a lesser extent on the percentiles and the mean. They would also tend to create an upward trend in the longitudinal analysis (e.g., if many of the 184 individuals became EAs in 2015).

Comment by Dan_Keys on EA Survey 2017 Series: Distribution and Analysis Methodology · 2017-08-30T06:23:25.297Z · EA · GW

This year, a “Donations Only” version of the survey was created for respondents who had filled out the survey in prior years. This version was shorter and could be linked to responses from prior years if the respondent provided the same email address each year.

Are these data from prior surveys included in the raw data file, for people who did the Donations Only version this year? At the bottom of the raw data file I see a bunch of entries which appear not to have any data besides income & donations - my guess is that those are either all the people who took the Donations Only version, or maybe just the ones who didn't provide an email address that could link their responses.

Comment by Dan_Keys on High Time For Drug Policy Reform. Part 1/4: Introduction and Cause Summary · 2017-08-09T17:31:51.078Z · EA · GW

It might be possible to fix in a not-too-tedious way, by using find-replace in the source code to edit all of the broken links (and anchors?) at once.

Comment by Dan_Keys on Contra the Giving What We Can pledge · 2016-12-05T01:14:55.362Z · EA · GW

It appears that this analysis did not account for when people became EAs. It looked at donations in 2014, among people who in November 2015 were nonstudent EAs on an earning to give path. But less than half of those people were nonstudent EAs on an earning to give path at the start of 2014.

In fact, less than half of the people who took the Nov 2015 survey were EAs at the start of 2014. I've taken a look at the dataset, and among the 1171 EAs who answered the question about 2014 donations:
40% first got involved in EA in 2013 or earlier
21% first got involved in EA in 2014
28% first got involved in EA in 2015
11% did not answer the question about when they got involved in EA

This makes all of the analyses of median 2014 donation extremely misleading, unless they're limited to pre-2014 EAs (which they generally have not been).

I'm hoping that the next EA survey will do better with this issue. I believe the plan is to wait until January in order to ask about 2016 donations, which is a good start. Hopefully they will also focus on pre-2016 EAs when looking at typical donation size, since the survey will include a bunch of new EAs who we wouldn't necessarily expect to see donating within their first few months as an EA.

(Also speaking for myself only, not my employer.)

Comment by Dan_Keys on Altruistic Organizations Should Consider Counterfactuals When Hiring · 2016-09-12T23:17:20.626Z · EA · GW

If the prospective employee is an EA, then they are presumably already paying lots of attention to the question "How much good would I do in this job, compared with the amount of good I would do if I did something else instead?" And the prospective employee has better information than the employer about what that alternative would be and how much good it would do. So it's not clear how much is added by having the employer also consider this.

Comment by Dan_Keys on The 2015 Survey of Effective Altruists: Results and Analysis · 2016-07-30T23:08:58.947Z · EA · GW

Thanks for looking this up quickly, and good point about the selection effect due to attrition.

I do think that it would be informative to see the numbers when also limited to nonstudents (or to people above a certain income, or to people above a certain age). I wouldn't expect to see much donated from young low- (or no-) income students.

Comment by Dan_Keys on The 2015 Survey of Effective Altruists: Results and Analysis · 2016-07-30T04:00:40.849Z · EA · GW

For the analysis of donations, which asked about donations in 2014, I'd like to see the numbers for people who became EAs in 2013 or earlier (including the breakdowns for non-students and for donations as % of income for those with income of $10,000 or more).

37% of respondents first got involved with EA in 2015, so their 2014 donations do not tell us much about the donation behavior of EAs. Another 24% first got involved with EA in 2014, and it's unclear how much their 2014 donations tell us given that they only began to be involved in EA midyear.

Comment by Dan_Keys on How Should a Large Donor Prioritize Cause Areas? · 2016-04-26T08:05:55.777Z · EA · GW

My guess (which, like Michael's, is based on speculation and not on actual information from relevant decision-makers) is that the founders of Open Phil thought about institutional philosophy before they looked in-depth at particular cause areas. They asked themselves questions like:

How can we create a Cause Agnostic Foundation, dedicated to directing money wherever it will do the most good, without having it collapse into a Foundation For Cause X as soon as its investigations conclude that currently the highest EV projects are in cause area x?

Do we want to create a Cause Agnostic Foundation? Would it be a bad thing if a Cause Agnostic Foundation quickly picked the best cause and then transformed into the Foundation For Cause X?

Apparently they concluded that it was worth creating a (stable) Cause Agnostic Foundation, and that this would work better if they directed significant amounts of resources towards several different cause areas. I can think of several arguments for this conclusion:

  1. Spreading EA Ideas. It's easier to spread the ideas behind effective altruism (and to create a world where more resources are devoted to attempts at effective altruism) if there is a prominent foundation which is known for the methodology that it uses to choose causes rather than for its support of particular causes. And that works best if the foundation gives to several different cause areas.

  2. Diminishing Returns to Prestige. Donations can provide value by conferring prestige, not just by transferring money, and prestige can have sharply diminishing returns to amount donated. e.g., Giving to your alma mater, whether it's $10 or $10,000, lets them say that a higher percentage of alumni are donors. One might hope that this prestige benefit (with diminishing returns) would apply to many of the grants from a Cause Agnostic Foundation, and that it will be well-regarded enough to bring other people's attention to the causes & organizations that it supports.

  3. Ability to Pivot. If a foundation focuses on just one or two cause areas (and hires people to work on those cause areas, publicizes its reasons for supporting those cause areas, builds connections with other organizations in those cause areas, etc.) that can make it hard for it to keep an open mind about cause areas and potentially pivot to a different cause area which starts looking more promising a few years later.

  4. Learning. We can learn more if we pursue several different cause areas than if we just focus on one or two. This can include things like: getting better at cause prioritization by doing it a lot, getting better at evaluating organizations by dealing with some organizations that are in cause areas where progress is relatively easy to track, and learning how to interact with governments in the context of criminal justice reform and then being better able to pursue projects involving government in other cause areas.

  5. Hits. A foundation which practices hits-based-giving can tolerate a lot of risk, but they may need to have at least some visible hits over the years in order to remain institutionally strong. Diversifying across cause areas can help that happen.

My sense is that this is an incomplete list; there are other arguments like these.

It's worth noting that many of these lines of reasoning are specific to a foundation like Open Phil, and would not apply to a single wealthy donor looking to donate his or her own money.

Comment by Dan_Keys on Independent re-analysis of MFA veg ads RCT data · 2016-02-20T06:50:27.005Z · EA · GW

I can't tell what's being done in that calculation.

I'm getting a p-value of 0.108 from a Pearson chi-square test (with cell values 55, 809; 78, 856). A chi-square test and a two-tailed t-test should give very similar results with these data, so I agree with Michael that it looks like your p=0.053 comes from a one-tailed test.

Comment by Dan_Keys on The most persuasive writing neutrally surveys both sides of an argument · 2016-02-19T02:09:45.811Z · EA · GW

A quick search into the academic research on this topic roughly matches the claims in this post.

Meta-analyses by Allen (1991) (pdf, blog post summary) and O'Keefe (1999) (pdf, blog post summary) defined "refutational two-sided arguments" as arguments that include 1) arguments in favor of the preferred conclusion, 2) arguments against the preferred conclusion, and 3) arguments which attempt to refute the arguments against the preferred conclusion. Both meta-analyses found that refutational two-sided arguments were more persuasive than one-sided arguments (which include only the first of those 3 types of arguments), which in turn were more persuasive than nonrefutational two-sided arguments (which include the first 2 of those 3 types of arguments).

So: surveying both sides of the argument, and making the case for why one side holds more weight than the other, does seem to lead to more convincing writing.

These results are at a fairly broad level of generality. I don't know if any research has looked at questions like whether it matters if you include the strongest arguments against the preferred conclusion (vs. only including straw man arguments) or if it matters if you act as if the arguments against the preferred conclusion have been completely refuted (vs. somewhat outweighed by the arguments in favor of the preferred conclusion).

A quick skim through the list of articles citing Allen and O'Keefe's papers turned up some studies which look for additional sources of variability which might moderate this effect, but I didn't notice any that challenge the general pattern or which get into really good detail on whether normatively good arguments (e.g., non-straw-man, measured conclusions) are more convincing.

Comment by Dan_Keys on TLYCS Pamphleting Pilot Plan · 2015-01-30T18:20:27.221Z · EA · GW

Have you looked at the history of your 4 metrics (Visitors, Subscribers, Donors, Pledgers) to see how much noise there is in the baseline rates? The noisier they are, the more uncertainty you'll have in the effect size of your intervention.

Could you have the pamphlets only give a url that no one else goes to, and then directly track how many new subscribers/donors/pledgers have been to that url?