re: footnote 1

The paper The Standard Errors of Persistence, you cite as a criticism says the following about the robustness of Peruan study:

This study examines differences in household consumption and child stunting on either side of Peru’s Mitaboundary. It finds that areas which traditionally had to provide conscripted mine labour have household consumption almost 30 per centlower than on the other side of the boundary. We examine the regression in column 1 of Table 2, which compares equivalent household consumption in a hundred kilometre strip on either side of the boundary with controls for distance to the boundary, elevation, slope and household characteristics. The variable of interest is a dummy for being inside the boundary. We examine here how well the regression explains arbitrary patterns of consumption generated as spatial noise. To do this we take the locations where households live and simulate consumption levels based on median consumption at the points. The original study found a 28 per cent difference in consumption levels across the historic boundary. If we normalize the noise variables to have the same mean and standard deviation as the original consumption data, we get a difference of at least 28 per cent (positive or negative) in 70 per cent of cases.

What do you think of that? In general, it seems that your justification for relative robustness doesn't engage with the critiques at all. My understanding of their major point is that spatial autocorrelations of residuals are unaccounted for and might make noise look significant. The simpler example of a common spurious relationship was, AFIAK, first described in Spurious regressions in econometrics (see this decently looking blogpost for relevant intuitions).

I endorse Nuño's comment re: 0.00000000001%.

While it's pretty easy to agree that a probability of a stupid mistake/typo is greater than 0.00000000001%, it is sometimes hard to follow in practice. I think Yudkowsky communicates it's well on a more visceral level in his Infinite Certainty essay. I got to another level of appreciation of this point after doing a calibration exercise for mental arithmetics — all errors were unpredictable "oups" like misreading plus for minus or selecting the wrong answer after making correct calculations.

For social technology, I think we have been consistently disappointed by various attempts to reform education. Specifically, think about interventions like direct instruction investigated under the Follow Through project and, maybe, intervention tested by Gates Foundation.

I would be happy if people started more specialized newsletters.

Think of Nuño's Forecasting Newsletter but for other areas of EA/longtermist interests (e.g. broad topics from various research agendas). This seems to be straightforwardly positive per the neoliberal movement growth playbook. On top of doing community service of keeping peers up to date with relevant research; a newsletter is a pretty nice way to get relevant connections outside of the EA movement, to do low-risk high-fidelity outreach to a target demographic of people interested in high-priority causes, to gain reputation and status, and to practice core research skill of writing well.

If newsletters will bloom, it might be worth it to hire someone to help with production. Would be kinda cool to call this network The Republic of Newsletters :)

Great! I broadly endorse the above virtues and can't say much on the object level. On meta-level, I am curious about how do you think about the impact of this paper. I have certain guesses:

• The paper's conclusion says: "We hope that it should inspire a debate among philosophers and psychologists about what virtues utilitarians should prioritize the most." Is that it?
• Or are you aiming at figuring out recommendations for EAs to follow (akin to CEA's Guiding principles and Lucius Caviola's talk Against naive effective altruism)?
• Or maybe you want to re-associate utilitarianism with nice/warm virtues because it appears cold to some (like Bleeding Heart Libertarians was reframing libertarianism)?
I think David Nash does something similar with his EA Updates (here is the most recent one). While most of the links are focused on EA Forum and posts by EA/EA-adj orgs, he features occasional links from other venues.

You might be interested in checking out Ingredients for creating disruptive research teams e.g. on vision, autonomy, spaces for interaction.

I would be happy to hear stories of people becoming significantly less longtermist. What changed their minds?

Mission drift in Gates foundation makes me somewhat more skeptical of patient longtermist. I mean, maybe a patient philanthropist's discounting/expropriation rate shouldn't be too low.

I guess (p=.75) Nuño would say that the following interpretation is mostly reasonable: "inside view" here means that Nuño presents his impressions which rely a lot on stories he tells himself about various research directions being valuable or not, which others might reasonably disagree with him about.

I am thinking that because Nuño uses a simple model to estimate a fraction of researchers doing "valuable" work, the subjectivity is rooted in his takes on how valuable their individual research directions are.

[Phrasing this kinda weirdly as I want to get a visceral update on my belief in "when thinking is clearly described, I can guess that the author means by inside/outside view." I also think that (p=.33) Nuño was just not very careful and will say something like "I have no idea what I really meant at the time of writing it."]

Thank you for a speedy reply, Markus! Jade makes three major points (see the attached slide). I would appreciate your high-level impressions on these (if you lack time reporting oneliners like "mostly agree" or "too much nuance to settle on oneliner" still would be valuable).

If you'd take time to elaborate on any of these, I would prefer the last one. Specifically on:

What are the reasons why them preemptively engaging is likely to lead to prosocial regulation? [emphasis mine] Two reasons why. One: the rationale for a firm would be something like, "We should be doing the thing that governance will want us to do, so that they don't then go in and put in regulation that is not good for us." And if you assume that governance has that incentive structure to deliver on public goods, then firms, at the very least, will converge on the idea that they should be mitigating their externalities and delivering on prosocial outcomes in the same way that the state regulation probably would. The more salient one in the case of AI is that public opinion actually plays a fairly large role in dictating what firms think are prosocial. [...]

(Hey Max, consider reposting this to goodreads if you are on the platform.)

Do people at Gov AI generally agree with the message/messaging of the talk 2–3 years later?

The answer would be a nice data point for "are we clueless to give advice on AI policy" debate/sentiments. And I am curious about how beneficial corporations/financiers can be for ~selfish reasons (cf. BlackRock on environmental sustainability and coronavirus cure/vaccine).

Thinking along these lines, joining the Effective Altruism movement can be seen as a way to “get in at the ground floor”: if the movement is eventually successful in changing the status quo, you will get brownie points for having been right all along, and the Effective Altruist area you’ve built a career in will get a large prestige boost when everyone agrees that it is indeed effectively altruistic.

Joining EA seems like a very suboptimal way to get brownie points from society at large and even from groups which EA represents the best (students/graduates of elite colleges). Isn't getting into social justice a better investment? What are the subgroups you think EAs try hard to impress?

Yes, I think that the wording of the forum questions is reasonable. The problem is that I expect that your nuance will get lost in the two layers of communication: commenters recommending intros into X or even specific books; readers adding titles to their Goodreads.

Alexey Guzey wrote a convincing critique of Are ideas getting harder to find?'s methodology. It's still a draft but you can PM him about it.

While it's hard to disagree that people should be familiar with the basics of economics, statistics, &c. I am not excited by the spirit of the question and the references to 101 materials.

First, I expect research scholars (and forum readers) to have quite a bit of shared knowledge about topics useful for EA. But more importantly, such advice doesn't make use of distributed coordination [1] and doesn't expand our aggregated knowledge. So I would be much more excited for general recommendations with "randomness," which decorrelated the individual decisions.

For example: read a bunch about a few historical periods of your interest. No hurry with that; maybe, don't solely read about the West; maybe, read something by anthropologists or econ historians. Doing so will expand the group's intuitions about social movement, technological development, causes of war/conflict, power &c.

[1] A similar problem arises with a naive interpretation of 80K career advice (before stronger emphasis on personal fit): people would concentrate on the explicitly outlined paths without accounting for others doing the same.

Thank you for your work and doing AMA! I have two somewhat related questions:

• Do you think that psychedelics have the potential to improve the lives/wellbeing of people not suffering from any mental illnesses? Very anecdotally and only in the context of non-assisted/recreational use, one person I know claims that taking LSD substantially improved their default mood and wellbeing. Where "substantially" means that the contrast between past and present is obvious, "x2" improvement in their own words. While the reports of most other users I know were much more "yeah, fine past time" or at most "I was able to disentangle one emotional issue I was facing at the time."

• Similarly, I am curious about the psychedelic-induced transformative experience and other outliner events. Do you encounter these in your research? How often are these relative to outlier events from conventional treatments?

Previously, I was wondering what rather high rates of mental health issues imply about philosophical positions EA's are attracted to. Now I know that according to The psychology of philosophy: Associating philosophical views with psychological traits in professional philosophers "lower levels of well-being and higher levels of mental illness predicted hard determinism." I am highly uncertain but it doesn't seem that significant to me.

h/t Gavin

(Author of the blog post is a good friend of mine, which makes me biased. I am very unlikely to engage in a discussion of the book's or post's pitfalls and merits here.)

I think your comment (and particularly the first point) has much more to do with the difficulty of defining causality than with x-risks.

It seems natural to talk about force causing the mass to accelerate: when I push a sofa, I cause it to start moving. but Newtonian mechanics can't capture casualty basically because the equality sign in lacks direction. Similarly, it's hard to capture causality in probability spaces.

Following Pearl, I come to think that causality arises from manipulator/manipulated distinction.

So I think it's fair to speak about factors only with relation to some framing:

• If you are focusing on bio policy, you are likely to take great-power conflict as an external factor.
• Similarly, if you are focusing on preventing nuclear war between India and Pakistan, you are likely to take bioterrorism as an external factor.

Usually, there are multiple external factors in your x-risk modeling. The most salient and undesirable are important enough to care about them (and give them a name).

Calling bio-risks an x-factor makes sense formally; but doesn't make sense pragmatically because bio-risks are very salient (in our community) on their own because they are a canonical x-risk. So for me, part of the difference is that I started to care about x-risks first; and that I started to care about x-risk factors because of their relationship to x-risk.

People shared so many bad experiences with debate…

I had a great time debating (BP style) in Russia a few years ago. I clearly remember some moments which helped me to become better at thinking/speaking and world modeling:

• The initial feedback I got during the practice session is basically don't be a guy from the terrible video you shared :-). Make it easy for a judge to understand your arguments: improve the structure and speak slower. Focus on one core argument during your speech: don't squeeze multiple half-baked ideas in; deliver one but prove it fully.

• At my first tournament for newbies, an experienced debater gave a lecture on playing something-something resolutions and concluded with strongly recommending reading up on game theory (IIRC, The Strategy of Conflict and Governing the Commons).

• My second tournament was in Jedi format: I, an inexperienced Padawan, played with a skilled Jedi. I matched with a person because both of us liked LessWrong. I think we even managed to use "belief should pay rent" as part of an argument in a debate on the tyranny of the majority. I think it's plausible that we referred to Moloch at least once.

• Later on, improvement came from managing inferential distances during speeches; and grounding arguments in reality: being specific about harms and benefits, delivering appropriate ~examples to support intermediate claims.

I think the experience was worth it. It helped me to think more in-depth and about much more issues than I would have overwise (kinda like forecasting now). I quit because (a) tournaments are time-consuming; (b) I got bored playing social issues & identity politics.

While competitive debating is not about collaborative truth-seeking, in my experience, debtors are high cognitive decouplers. Arguing with them (outside of the game) felt good, and we were able to touch topics far outside of the default Overtone window (like taking the perspective of ISIS).

The culture was healthy because most people were just passionate about debating/grokking complex issues (like investor-state dispute settlements), and their incentives were not screwed up because the only upside to winning debate tournaments in Russia is internet points.

Upd: I feel that one of your main concerns is Goodharting. I think the BP system as we played it basically encouraged maximizing the expected utility of impacts of arguments you brought to the table i.e. harm/benefit to individual × scale × probability occurring × how well you proved it (which can be seen as the probability that your reasoning is correct). It's a bit harder to fit the importance of framing the issue and principled arguments into my simplification. But the first can be seen as prioritizing based on relative tractability (e.g. in almost all of the debate arguing that "we will save money by not implementing a policy" is a bad move because there are multiple other ways to save money and the benefits of the policy might be unique). The second is about the importance of metagame, incentive structures, commitments, and so on.

Thank you for engaging!

• First, "note that this [misha: Shapley value of evaluator] is just the counterfactual value divided by a fraction [misha: by two]." Right, this is exactly the same in my comment. I further divide by total impact to calculate the Shapley multiplier.
• Do you think we disagree?
• Why isn't my conclusion follows?
• Second, you conclude "And the Shapley value multiplier would be 1/(some estimates of how many players there are)", while your estimate is"0.3 to 0.5". There have been like 30 participants over two lotteries that year, so you should have ended up with something an order of magnitude less like "3% to 10%".
• Am I missing something?
• Third, for the model with more than two players, it's unclear to me who the players are. If these are funders + evaluators. You indeed will end up with because
• Shapley multipliers should add up to , and
• Shapley value of the funders is easy to calculate (any coalition without them lacks any impact).
• Please note that is from the comment above.
• (Note that this model ignores that the beneficiary might win the lottery and no donations will be made.)

In the end,

• I think that it is necessary to estimate X in "shallowly evaluated giving is as impactful as X times of in-depth evaluated giving". Because if impact of the evaluator is close to nil.
• I might not understand how you model impact here, please, be more specific about the modeling setup and assumptions.
• I don't think that you should split evaluators. Well, basically because you want to disentangle the impact of evaluation and funding provision and not to calculate Adam's personal impact.
• Like, take it to the extreme: it would be pretty absurd to say that the overwhelmingly successful (e.g. seeding a new ACE Top Charity in yet unknown but highly tractable area of animal welfare and e.g. discovering AI alignment prodigy) donor lottery had an impact less than an average comment because there have been too many people (100K) contributing a dollar to participate in it.
Recently Nuño asked me to do similar (but shallower) forecasting for ~150 project ideas. It took me about 5 hours. I think I could have done the evaluation faster but I left ~paragraph-long comments on like to projects and sentence long comments on most others; I haven't done any advanced modeling or guesstimating.

Comment by Misha_Yagudin on Relative Impact of the First 10 EA Forum Prize Winners · 2021-03-17T16:56:15.477Z · EA · GW

Thank you, Nuno!

• Am I understand correctly that the Shapley value multiplier (0.3 to 0.5) is responsible for preventing double counting?
• If so why don't you apply it to Positive status effects?  The effect was also partially enabled by the funding providers (maybe less so).
• Huh! I am surprised that your Shapley value calculation is not explicit but is reasonable.
• Let's limit ourselves to two players (= funding providers who are only capable of shallow evaluations and grantmakers who are capable of in-depth evaluation but don't have their own funds). You get Your estimate of "0.3 to 0.5" implies that shallowly evaluated giving is as impactful as "0 to 0.4" of in-depth evaluated giving.
• This x2.5..∞ multiplier is reasonable but doesn't feel quite right to put 10% on above ∞ :)
• This makes me further confused about the gap between the donor lottery and the alignment review.
There are a lot of things l like about this post. From small (e.g. the summary on top of it; and the table at the end) to large (e.g. it's a good thing to do given a desire to understand how to quantify/estimate impact better).

Here are some things I am perplexed about or disagree with:

• EAF hiring round estimate misses the enormous realized value of information. As far as I can see, EAF decided to move to London (partly) because of that.
• > We moved to London (Primrose Hill) to better attract and retain staff and collaborate with other researchers in London and Oxford.
• > Budget 2020: $994,000 (7.4 expected full-time equivalent employees). Our per-staff expenses have increased compared with 2019 because we do not have access to free office space anymore, and the cost of living in London is significantly higher than in Berlin. • The donor lottery evaluation seems to miss that$100K would have been donated otherwise.
• Further, I would suggest another decomposition.
• Impact = impact of running donor lottery as a tool (as opposed to donating without ~aggregation) + the counterfactuals impact of particular grants (as opposed to ~expected grants) + misc. side-effects (like a grantmaker joining LTFF).
• I can understand why you added the first two terms. But it seems to me that
• we can get a principled estimate about the first one based on arguments for donor lotteries (e.g. epistemic advantage coming from spending more time per dollar donated; and freed time of donors);
• One can get more empirical and have a quick survey here.
• estimating the second term is trickier because you need to make a guess about the impact of an average epistemically advantaged donation (as opposed to an average donation of 100K I which I think is missing from your estimate)
• Both of these are doable because we saw how other donor lottery winners gave their money and how wealthy/invested donors give their money.
• A good proxy for an impact of average donation might come from (a) EA survey donation data, (b) a  quick survey of lottery participants. The latter seems superior because participating in an early donor lottery suggests a higher engagement with EA ideas &c.
• After thinking a bit longer the choice of decomposition depends on what you want to understand better. It seems like your choice is better if you want to empirically understand whether the donor lottery is valuable.

• I think it might come down to you not subtracting the counterfactual impact of donating 100K w/o lottery from donors' lottery impact estimate.
While I think that quantification is in general good, my forecasting experience taught me that quantitative estimates without a robust track record and/or reasoning are quite unsatisfactory. I am a bit worried that misunderstanding of the Aumann agreement theorem might lead to overpraising communication of pure probabilities (which are often unhelpful). Comment by Misha_Yagudin on Chi's Shortform · 2021-01-22T14:57:37.020Z · EA · GW I agree that the mechanisms proposed in my comment are quite costly sometimes. But I think higher-effort downstream activities only need to be invoked occasionally (e.g. not everyone who downvotes needs to explain why but it's good that someone will occasionally) — if they are invoked consistently they will be picked up by people. Right, I think I see how this can backfire now. Maybe upvoting "ugh, I still think that this is likely but am uncomfortable about betting" might still encourage using qualifiers for reasons 1–3 while acknowledging vulnerability and reducing pressure on commenters? Comment by Misha_Yagudin on Chi's Shortform · 2021-01-21T11:11:11.568Z · EA · GW I mostly wanted to highlight that there is a confident but uncertain mode of communication. And that displaying uncertainty or lack of knowledge sometimes helps me be more relaxed. People surely pick up bits of style from others they respect; so aspiring EAs are likely to adopt the manners of respected members of our community. It seems plausible to me that this will lead to the negative consequences you mentioned in the fifth paragraph (e.g. there is too much deference to authority for the amounts of cluelessness and uncertainty we have). I think a solution might be not in discouraging display of uncertainty but in encouraging positive downstream activities like betting, quantification, acknowledging that arguments changed your mind &c — likely this will make cargo culting less probable (a tangential example is encouraging people to make predictions when they say "my model is…"). I agree underconfidence and anxiety could be confused on the forum. But not in real life as people leak clues about their inner state all the time. Comment by Misha_Yagudin on Chi's Shortform · 2021-01-20T23:28:00.798Z · EA · GW Hey Chi, let me report my personal experience: uncertainty and putting qualifiers feel quite different to me than anxious social signaling. The conversation in the beginning of Confidence all the way up points to the difference. You can be uncertain or potentially wrong, and be chill about it. Acknowledging uncertainty helps with (fear of) saying "oops, was wrong" and hence makes one more at ease. Same about predicting existential risk (because if it occurs, one won't be able to claim a prize). The US collapse will be devastating for the financial markets (plausible to me unless the USA will gradually lose power and importance, in which case interventions are less crucial). The incentives assumption seems plausible to me as well. So the market might not be a reliable predictor of it. Comment by Misha_Yagudin on Open and Welcome Thread: January 2021 · 2021-01-08T01:18:11.729Z · EA · GW It seems Considering Considerateness: Why communities of do-gooders should be exceptionally considerate is not as visible now because CEA removed "Our current thinking" (or something) from their webpage and the essay is not linked e.g. at https://www.effectivealtruism.org/resources/. So I want to highlight it as I liked it a lot a few years ago. Comment by Misha_Yagudin on CHOICE - Creating a memorable acronym for EA principles · 2021-01-08T01:12:20.862Z · EA · GW I weakly downvoted. Resulting intuitions still suffers from "correlation ≠ causation" and all curses of self-reported data (which, in my opinion, makes such measurements close to useless) but is a step forward. See this tweet and whole tread https://twitter.com/JessieSunPsych/status/1333086463232258049 h/t Guzey Comment by Misha_Yagudin on What are some low-information priors that you find practically useful for thinking about the world? · 2020-11-28T13:30:13.888Z · EA · GW Here is a Wikipedia reference: The Lindy effect is a theory that the future life expectancy of some non-perishable things like a technology or an idea is proportional to their current age, so that every additional period of survival implies a longer remaining life expectancy. Where the Lindy effect applies, mortality rate decreases with time. Consider looking into Stoic literature and exercises (say, negative visualizations) to come to peace with that possibility. • An internal part is being exposed to dangers associated with them. Internal locus of control means that you can take action to mitigate the risks. Consider having a plan to temporarily move to a likely peaceful area within your country or to another county. Comment by Misha_Yagudin on AMA: Markus Anderljung (PM at GovAI, FHI) · 2020-09-21T22:15:42.287Z · EA · GW Any insights into what constitutes good research management on the levels of (a) a facilitator helping a lab to succeed, and (b) an individual researcher managing himself (and occasional collaborators)? Comment by Misha_Yagudin on New book: Moral Uncertainty by MacAskill, Ord & Bykvist · 2020-09-20T16:11:30.465Z · EA · GW

What is moral uncertainty?

I use Emacs for my personal forecasts because it is convenient: the questions are in the todo-list, I can resolve the question with a few keystrokes, TODO-states make questions look beautiful, a small python script gives me a calibration chart…

To be honest, all major forecasting platforms have quite bad UX for small personal things, it always takes to many clicks to make forecasting question and so on. I wish they'd popularize personal predictions by having sort of "very quick capture" like many todo-list apps have [e.g. Amazing Marvin].

I forecast much fewer questions on GJ Open and found Tab Snooze to be an easy way to remind me that I wanted to make updates/take a look at new data.