Posts

Misha_Yagudin's Shortform 2020-01-08T14:23:19.002Z

Comments

Comment by Misha_Yagudin on Jade Leung: Why companies should be leading on AI governance · 2021-06-08T15:57:27.367Z · EA · GW

Thank you for a speedy reply, Markus! Jade makes three major points (see the attached slide). I would appreciate your high-level impressions on these (if you lack time reporting oneliners like "mostly agree" or "too much nuance to settle on oneliner" still would be valuable).

If you'd take time to elaborate on any of these, I would prefer the last one. Specifically on:

What are the reasons why them preemptively engaging is likely to lead to prosocial regulation? [emphasis mine] Two reasons why. One: the rationale for a firm would be something like, "We should be doing the thing that governance will want us to do, so that they don't then go in and put in regulation that is not good for us." And if you assume that governance has that incentive structure to deliver on public goods, then firms, at the very least, will converge on the idea that they should be mitigating their externalities and delivering on prosocial outcomes in the same way that the state regulation probably would. The more salient one in the case of AI is that public opinion actually plays a fairly large role in dictating what firms think are prosocial. [...]

Comment by Misha_Yagudin on EA Infrastructure Fund: May 2021 grant recommendations · 2021-06-08T12:28:59.245Z · EA · GW

(Hey Max, consider reposting this to goodreads if you are on the platform.)

Comment by Misha_Yagudin on Jade Leung: Why companies should be leading on AI governance · 2021-06-08T11:26:46.844Z · EA · GW

Do people at Gov AI generally agree with the message/messaging of the talk 2–3 years later?

The answer would be a nice data point for "are we clueless to give advice on AI policy" debate/sentiments. And I am curious about how beneficial corporations/financiers can be for ~selfish reasons (cf. BlackRock on environmental sustainability and coronavirus cure/vaccine).

Comment by Misha_Yagudin on Ben_Snodin's Shortform · 2021-05-19T13:04:54.704Z · EA · GW

Thinking along these lines, joining the Effective Altruism movement can be seen as a way to “get in at the ground floor”: if the movement is eventually successful in changing the status quo, you will get brownie points for having been right all along, and the Effective Altruist area you’ve built a career in will get a large prestige boost when everyone agrees that it is indeed effectively altruistic.

Joining EA seems like a very suboptimal way to get brownie points from society at large and even from groups which EA represents the best (students/graduates of elite colleges). Isn't getting into social justice a better investment? What are the subgroups you think EAs try hard to impress?

Comment by Misha_Yagudin on What are things everyone here should (maybe) read? · 2021-05-19T12:18:27.050Z · EA · GW

Yes, I think that the wording of the forum questions is reasonable. The problem is that I expect that your nuance will get lost in the two layers of communication: commenters recommending intros into X or even specific books; readers adding titles to their Goodreads.

I think this is kinda fine for wellbeing/adulting bits of your advice, which I liked.

Comment by Misha_Yagudin on What are things everyone here should (maybe) read? · 2021-05-19T11:39:33.197Z · EA · GW

Alexey Guzey wrote a convincing critique of Are ideas getting harder to find?'s methodology. It's still a draft but you can PM him about it.

Comment by Misha_Yagudin on What are things everyone here should (maybe) read? · 2021-05-19T11:16:11.328Z · EA · GW

While it's hard to disagree that people should be familiar with the basics of economics, statistics, &c. I am not excited by the spirit of the question and the references to 101 materials.

First, I expect research scholars (and forum readers) to have quite a bit of shared knowledge about topics useful for EA. But more importantly, such advice doesn't make use of distributed coordination [1] and doesn't expand our aggregated knowledge. So I would be much more excited for general recommendations with "randomness," which decorrelated the individual decisions.

For example: read a bunch about a few historical periods of your interest. No hurry with that; maybe, don't solely read about the West; maybe, read something by anthropologists or econ historians. Doing so will expand the group's intuitions about social movement, technological development, causes of war/conflict, power &c.

[1] A similar problem arises with a naive interpretation of 80K career advice (before stronger emphasis on personal fit): people would concentrate on the explicitly outlined paths without accounting for others doing the same.

Comment by Misha_Yagudin on AMA: Tim Ferriss, Michael Pollan, and Dr. Matthew W. Johnson on psychedelics research and philanthropy · 2021-05-17T11:50:06.797Z · EA · GW

Thank you for your work and doing AMA! I have two somewhat related questions:

  • Do you think that psychedelics have the potential to improve the lives/wellbeing of people not suffering from any mental illnesses? Very anecdotally and only in the context of non-assisted/recreational use, one person I know claims that taking LSD substantially improved their default mood and wellbeing. Where "substantially" means that the contrast between past and present is obvious, "x2" improvement in their own words. While the reports of most other users I know were much more "yeah, fine past time" or at most "I was able to disentangle one emotional issue I was facing at the time."

  • Similarly, I am curious about the psychedelic-induced transformative experience and other outliner events. Do you encounter these in your research? How often are these relative to outlier events from conventional treatments?

Comment by Misha_Yagudin on Misha_Yagudin's Shortform · 2021-04-30T17:03:58.206Z · EA · GW

Previously, I was wondering what rather high rates of mental health issues imply about philosophical positions EA's are attracted to. Now I know that according to The psychology of philosophy: Associating philosophical views with psychological traits in professional philosophers "lower levels of well-being and higher levels of mental illness predicted hard determinism." I am highly uncertain but it doesn't seem that significant to me.

h/t Gavin

Comment by Misha_Yagudin on On Sleep Procrastination: Going To Bed At A Reasonable Hour · 2021-04-19T09:51:33.826Z · EA · GW

Reading Why We Sleep.

Mandatory link: https://guzey.com/books/why-we-sleep/

(Author of the blog post is a good friend of mine, which makes me biased. I am very unlikely to engage in a discussion of the book's or post's pitfalls and merits here.)

Comment by Misha_Yagudin on evelynciara's Shortform · 2021-04-15T16:08:48.846Z · EA · GW

I think your comment (and particularly the first point) has much more to do with the difficulty of defining causality than with x-risks.

It seems natural to talk about force causing the mass to accelerate: when I push a sofa, I cause it to start moving. but Newtonian mechanics can't capture casualty basically because the equality sign in lacks direction. Similarly, it's hard to capture causality in probability spaces.

Following Pearl, I come to think that causality arises from manipulator/manipulated distinction.

So I think it's fair to speak about factors only with relation to some framing:

  • If you are focusing on bio policy, you are likely to take great-power conflict as an external factor.
  • Similarly, if you are focusing on preventing nuclear war between India and Pakistan, you are likely to take bioterrorism as an external factor.

Usually, there are multiple external factors in your x-risk modeling. The most salient and undesirable are important enough to care about them (and give them a name).

Calling bio-risks an x-factor makes sense formally; but doesn't make sense pragmatically because bio-risks are very salient (in our community) on their own because they are a canonical x-risk. So for me, part of the difference is that I started to care about x-risks first; and that I started to care about x-risk factors because of their relationship to x-risk.

Comment by Misha_Yagudin on EA Debate Championship & Lecture Series · 2021-04-09T13:29:54.907Z · EA · GW

People shared so many bad experiences with debate…

I had a great time debating (BP style) in Russia a few years ago. I clearly remember some moments which helped me to become better at thinking/speaking and world modeling:

  • The initial feedback I got during the practice session is basically don't be a guy from the terrible video you shared :-). Make it easy for a judge to understand your arguments: improve the structure and speak slower. Focus on one core argument during your speech: don't squeeze multiple half-baked ideas in; deliver one but prove it fully.

  • At my first tournament for newbies, an experienced debater gave a lecture on playing something-something resolutions and concluded with strongly recommending reading up on game theory (IIRC, The Strategy of Conflict and Governing the Commons).

  • My second tournament was in Jedi format: I, an inexperienced Padawan, played with a skilled Jedi. I matched with a person because both of us liked LessWrong. I think we even managed to use "belief should pay rent" as part of an argument in a debate on the tyranny of the majority. I think it's plausible that we referred to Moloch at least once.

  • Later on, improvement came from managing inferential distances during speeches; and grounding arguments in reality: being specific about harms and benefits, delivering appropriate ~examples to support intermediate claims.

I think the experience was worth it. It helped me to think more in-depth and about much more issues than I would have overwise (kinda like forecasting now). I quit because (a) tournaments are time-consuming; (b) I got bored playing social issues & identity politics.

While competitive debating is not about collaborative truth-seeking, in my experience, debtors are high cognitive decouplers. Arguing with them (outside of the game) felt good, and we were able to touch topics far outside of the default Overtone window (like taking the perspective of ISIS).

The culture was healthy because most people were just passionate about debating/grokking complex issues (like investor-state dispute settlements), and their incentives were not screwed up because the only upside to winning debate tournaments in Russia is internet points.

Upd: I feel that one of your main concerns is Goodharting. I think the BP system as we played it basically encouraged maximizing the expected utility of impacts of arguments you brought to the table i.e. harm/benefit to individual × scale × probability occurring × how well you proved it (which can be seen as the probability that your reasoning is correct). It's a bit harder to fit the importance of framing the issue and principled arguments into my simplification. But the first can be seen as prioritizing based on relative tractability (e.g. in almost all of the debate arguing that "we will save money by not implementing a policy" is a bad move because there are multiple other ways to save money and the benefits of the policy might be unique). The second is about the importance of metagame, incentive structures, commitments, and so on.

Comment by Misha_Yagudin on Relative Impact of the First 10 EA Forum Prize Winners · 2021-03-18T14:39:06.841Z · EA · GW

Thank you for engaging!

  • First, "note that this [misha: Shapley value of evaluator] is just the counterfactual value divided by a fraction [misha: by two]." Right, this is exactly the same in my comment. I further divide by total impact to calculate the Shapley multiplier.
    • Do you think we disagree?
    • Why isn't my conclusion follows?
  • Second, you conclude "And the Shapley value multiplier would be 1/(some estimates of how many players there are)", while your estimate is"0.3 to 0.5". There have been like 30 participants over two lotteries that year, so you should have ended up with something an order of magnitude less like "3% to 10%".
    • Am I missing something?
  • Third, for the model with more than two players, it's unclear to me who the players are. If these are funders + evaluators. You indeed will end up with because
    • Shapley multipliers should add up to , and
    • Shapley value of the funders is easy to calculate (any coalition without them lacks any impact).
    • Please note that is from the comment above.
  • (Note that this model ignores that the beneficiary might win the lottery and no donations will be made.)

In the end,

  • I think that it is necessary to estimate X in "shallowly evaluated giving is as impactful as X times of in-depth evaluated giving". Because if impact of the evaluator is close to nil.
    • I might not understand how you model impact here, please, be more specific about the modeling setup and assumptions.
  • I don't think that you should split evaluators. Well, basically because you want to disentangle the impact of evaluation and funding provision and not to calculate Adam's personal impact.
    • Like, take it to the extreme: it would be pretty absurd to say that the overwhelmingly successful (e.g. seeding a new ACE Top Charity in yet unknown but highly tractable area of animal welfare and e.g. discovering AI alignment prodigy) donor lottery had an impact less than an average comment because there have been too many people (100K) contributing a dollar to participate in it.
Comment by Misha_Yagudin on Relative Impact of the First 10 EA Forum Prize Winners · 2021-03-18T12:19:31.242Z · EA · GW

Recently Nuño asked me to do similar (but shallower) forecasting for ~150 project ideas. It took me about 5 hours. I think I could have done the evaluation faster but I left ~paragraph-long comments on like to projects and sentence long comments on most others; I haven't done any advanced modeling or guesstimating.

Comment by Misha_Yagudin on Relative Impact of the First 10 EA Forum Prize Winners · 2021-03-17T16:56:15.477Z · EA · GW

Thank you, Nuno! 

  • Am I understand correctly that the Shapley value multiplier (0.3 to 0.5) is responsible for preventing double counting?
    • If so why don't you apply it to Positive status effects?  The effect was also partially enabled by the funding providers (maybe less so).
    • Huh! I am surprised that your Shapley value calculation is not explicit but is reasonable.
      • Let's limit ourselves to two players (= funding providers who are only capable of shallow evaluations and grantmakers who are capable of in-depth evaluation but don't have their own funds). You get Your estimate of "0.3 to 0.5" implies that shallowly evaluated giving is as impactful as "0 to 0.4" of in-depth evaluated giving.
      • This x2.5..∞ multiplier is reasonable but doesn't feel quite right to put 10% on above ∞ :)
  • This makes me further confused about the gap between the donor lottery and the alignment review.
Comment by Misha_Yagudin on Relative Impact of the First 10 EA Forum Prize Winners · 2021-03-17T15:11:36.596Z · EA · GW

There are a lot of things l like about this post. From small (e.g. the summary on top of it; and the table at the end) to large (e.g. it's a good thing to do given a desire to understand how to quantify/estimate impact better). 

Here are some things I am perplexed about or disagree with:
 

  • EAF hiring round estimate misses the enormous realized value of information. As far as I can see, EAF decided to move to London (partly) because of that.
    • > We moved to London (Primrose Hill) to better attract and retain staff and collaborate with other researchers in London and Oxford.
    • > Budget 2020: $994,000 (7.4 expected full-time equivalent employees). Our per-staff expenses have increased compared with 2019 because we do not have access to free office space anymore, and the cost of living in London is significantly higher than in Berlin.

 

  • The donor lottery evaluation seems to miss that $100K would have been donated otherwise.
  • Further, I would suggest another decomposition.
    • Impact = impact of running donor lottery as a tool (as opposed to donating without ~aggregation) + the counterfactuals impact of particular grants (as opposed to ~expected grants) + misc. side-effects (like a grantmaker joining LTFF).
    • I can understand why you added the first two terms. But it seems to me that
      • we can get a principled estimate about the first one based on arguments for donor lotteries (e.g. epistemic advantage coming from spending more time per dollar donated; and freed time of donors);
        • One can get more empirical and have a quick survey here.
      • estimating the second term is trickier because you need to make a guess about the impact of an average epistemically advantaged donation (as opposed to an average donation of 100K I which I think is missing from your estimate)
        • Both of these are doable because we saw how other donor lottery winners gave their money and how wealthy/invested donors give their money.
        • A good proxy for an impact of average donation might come from (a) EA survey donation data, (b) a  quick survey of lottery participants. The latter seems superior because participating in an early donor lottery suggests a higher engagement with EA ideas &c.
    • After thinking a bit longer the choice of decomposition depends on what you want to understand better. It seems like your choice is better if you want to empirically understand whether the donor lottery is valuable.

 

  • Another weird thing is to see the 2017 Donor Lottery Grant having x5..10 higher impact than 2018 AI Alignment Literature Review and Charity Comparison.
    • I think it might come down to you not subtracting the counterfactual impact of donating 100K w/o lottery from donors' lottery impact estimate.
    • The basic source of impact of the donor lottery and charity review comes from an epistemic advantage (someone dedicating more time to think/evaluate donations; people being better informed about the charities they are likely to donate to). Given how well received the literature review is it seems to be (quite likely) helpful to individual donors and given that it (according to your guess) impacted $100K..1M it should be kinda as impactful or more impactful than an abstract donor lottery.
      • And it's hard to see this particular donor lottery as overwhelmingly more impactful than an average one.
Comment by Misha_Yagudin on Chi's Shortform · 2021-01-22T15:10:09.363Z · EA · GW

Chi, I appreciate the depth of your engagement! I mostly agree with your comments.

Comment by Misha_Yagudin on Chi's Shortform · 2021-01-22T15:07:31.471Z · EA · GW

I like your 1–5 list.

 

Tangentially, I just want to push back a bit on 1 and 2 being obviously good. While I think that quantification is in general good, my forecasting experience taught me that quantitative estimates without a robust track record and/or reasoning are quite unsatisfactory. I am a bit worried that misunderstanding of the Aumann agreement theorem might lead to overpraising communication of pure probabilities (which are often unhelpful).

Comment by Misha_Yagudin on Chi's Shortform · 2021-01-22T14:57:37.020Z · EA · GW

I agree that the mechanisms proposed in my comment are quite costly sometimes. But I think higher-effort downstream activities only need to be invoked occasionally (e.g. not everyone who downvotes needs to explain why but it's good that someone will occasionally) — if they are invoked consistently they will be picked up by people.

Right, I think I see how this can backfire now. Maybe upvoting "ugh, I still think that this is likely but am uncomfortable about betting" might still encourage using qualifiers for reasons 1–3 while acknowledging vulnerability and reducing pressure on commenters?

Comment by Misha_Yagudin on Chi's Shortform · 2021-01-21T11:11:11.568Z · EA · GW

I mostly wanted to highlight that there is a confident but uncertain mode of communication. And that displaying uncertainty or lack of knowledge sometimes helps me be more relaxed. 

People surely pick up bits of style from others they respect; so aspiring EAs are likely to adopt the manners of respected members of our community. It seems plausible to me that this will lead to the negative consequences you mentioned in the fifth paragraph (e.g. there is too much deference to authority for the amounts of cluelessness and uncertainty we have). I think a solution might be not in discouraging display of uncertainty but in encouraging positive downstream activities like betting, quantification, acknowledging that arguments changed your mind &c — likely this will make cargo culting less probable (a tangential example is encouraging people to make predictions when they say "my model is…"). 

I agree underconfidence and anxiety could be confused on the forum. But not in real life as people leak clues about their inner state all the time.

Comment by Misha_Yagudin on Chi's Shortform · 2021-01-20T23:28:00.798Z · EA · GW

Hey Chi, let me report my personal experience: uncertainty and putting qualifiers feel quite different to me than anxious social signaling. The conversation in the beginning of Confidence all the way up points to the difference. You can be uncertain or potentially wrong, and be chill about it. Acknowledging uncertainty helps with (fear of) saying "oops, was wrong" and hence makes one more at ease.

Comment by Misha_Yagudin on 2020: Forecasting in Review · 2021-01-10T17:16:39.827Z · EA · GW

I'm back to being the #1 forecaster there, after having momentarily lost the position to user @Hinterhunter.

This happened in 2021 :P

Comment by Misha_Yagudin on EA and the Possible Decline of the US: Very Rough Thoughts · 2021-01-08T15:11:45.922Z · EA · GW

I am aware of two (short-term) questions related to civil war scenarios on Metaculus:

Comment by Misha_Yagudin on EA and the Possible Decline of the US: Very Rough Thoughts · 2021-01-08T15:08:27.920Z · EA · GW

I think the evidence from the financial markets is a bit weaker.

First, let's imagine predicting that the forecasting platform will stop operating and assume that forecasting is only incentivized by points on this platform. The reasonable prediction is that platform will continue to operate because otherwise, points will become meaningless. Same about predicting existential risk (because if it occurs, one won't be able to claim a prize).

The US collapse will be devastating for the financial markets (plausible to me unless the USA will gradually lose power and importance, in which case interventions are less crucial). The incentives assumption seems plausible to me as well. So the market might not be a reliable predictor of it.

Comment by Misha_Yagudin on Open and Welcome Thread: January 2021 · 2021-01-08T01:18:11.729Z · EA · GW

It seems Considering Considerateness: Why communities of do-gooders should be exceptionally considerate is not as visible now because CEA removed "Our current thinking" (or something) from their webpage and the essay is not linked e.g. at https://www.effectivealtruism.org/resources/. So I want to highlight it as I liked it a lot a few years ago.

Comment by Misha_Yagudin on CHOICE - Creating a memorable acronym for EA principles · 2021-01-08T01:12:20.862Z · EA · GW

I weakly downvoted.  I felt meh about coming up with better acronyms because

  • it feels low-fidelity and I would rather have people forget/rephrase EA principles rather than learn them by heart;
  • guiding principles should not be changed frequently and without great need.

Also, I disliked the proposed acronym because

  • pro-life/pro-choice associations;
  • while choice is a generic word, it is associated with the choice/obligation debate within the community, which makes it not a very good choice.
Comment by Misha_Yagudin on Prize: Interesting Examples of Evaluations · 2020-12-12T17:45:02.942Z · EA · GW

Huh! The thread I linked to and David Manheim's winning comment cite the same paper :)

Comment by Misha_Yagudin on Prize: Interesting Examples of Evaluations · 2020-12-12T17:41:10.806Z · EA · GW

Correlating subjective metrics with objective outcomes to provide better intuitions about what an additional point on a scale might mean. Resulting intuitions still suffers from "correlation ≠ causation" and all curses of self-reported data (which, in my opinion, makes such measurements close to useless) but is a step forward.

See this tweet and whole tread https://twitter.com/JessieSunPsych/status/1333086463232258049 h/t Guzey

Comment by Misha_Yagudin on What are some low-information priors that you find practically useful for thinking about the world? · 2020-11-28T13:30:13.888Z · EA · GW

Here is a Wikipedia reference:

The Lindy effect is a theory that the future life expectancy of some non-perishable things like a technology or an idea is proportional to their current age, so that every additional period of survival implies a longer remaining life expectancy. Where the Lindy effect applies, mortality rate decreases with time.

Comment by Misha_Yagudin on Please Take the 2020 EA Survey · 2020-11-14T02:00:34.000Z · EA · GW

Well, I am far from expert, but my understanding is that differential privacy  operates on queries as opposed to individual datapoints. But there are tools s.a. randomized response which will provide plausible deniability to individual responses.

Comment by Misha_Yagudin on Thoughts on whether we're living at the most influential time in history · 2020-11-03T07:23:56.064Z · EA · GW

re: "This post has a lot of very small numbers in it. I might have missed a zero or two somewhere."

Hey Buck, consider using scientific notation instead of decimal one: "0.00000009%" is hard to read and 9e-10 is less prone to typos.

Comment by Misha_Yagudin on Aligning Recommender Systems as Cause Area · 2020-10-07T19:52:39.257Z · EA · GW

Partnership on AI now has a paper on the topic: What are you optimizing for? Aligning Recommender Systems with Human Values.

Comment by Misha_Yagudin on Introducing LEEP: Lead Exposure Elimination Project · 2020-10-06T17:40:48.711Z · EA · GW

I am curious about which other countries you identified as promising?

Listing them might be beneficial, as I can imagine that finding an experienced and well-connected candidate for a target location can change the outcome of cost-effectiveness calculation by increasing tractability. On other hand, good candidates might not be hard to find or be especially likely discovered via the EA network.

Comment by Misha_Yagudin on Singapore’s Technical AI Alignment Research Career Guide · 2020-10-03T12:47:04.352Z · EA · GW

This forecast suggests that extreme reputational risks are non-negligible.

Comment by Misha_Yagudin on Singapore’s Technical AI Alignment Research Career Guide · 2020-10-03T12:45:00.940Z · EA · GW

Working for SenseTime might be associated with reputational risks, according to FT:

The US blacklisted Megvii and SenseTime in October, along with voice recognition company iFlytek and AI unicorn Yitu, accusing the companies of aiding the “repression, mass arbitrary detention and high-technology surveillance” in the western Chinese region of Xinjiang.

At the same time, someone working for them might provide our community with cultural knowledge relevant to surveillance and robust totalitarianism.

Comment by Misha_Yagudin on Linch's Shortform · 2020-09-29T10:50:35.580Z · EA · GW

I think it is useful to separately deal with the parts of a disturbing event over which you have an internal or external locus of control. Let's take a look at riots:

  • An external part is them happening in your country. External locus of control means that you need to accept the situation. Consider looking into Stoic literature and exercises (say, negative visualizations) to come to peace with that possibility.
  • An internal part is being exposed to dangers associated with them. Internal locus of control means that you can take action to mitigate the risks. Consider having a plan to temporarily move to a likely peaceful area within your country or to another county.
Comment by Misha_Yagudin on AMA: Markus Anderljung (PM at GovAI, FHI) · 2020-09-21T22:15:42.287Z · EA · GW

Any insights into what constitutes good research management on the levels of (a) a facilitator helping a lab to succeed, and (b) an individual researcher managing himself (and occasional collaborators)?

Comment by Misha_Yagudin on The case for building more and better epistemic institutions in the effective altruism community · 2020-09-20T21:06:00.665Z · EA · GW

Roam Research is

> starting a fellowship program where we are giving grants to researchers to explore the space of Tools for Thought, Collective Intelligence, Augmenting The Human Intellect.

They recently raised $9M at a $200M seed evaluation and previously received two grants from EA LTFF.

Comment by Misha_Yagudin on New book: Moral Uncertainty by MacAskill, Ord & Bykvist · 2020-09-20T16:11:30.465Z · EA · GW

Now a thread from Toby Ord:

What is moral uncertainty? 

https://twitter.com/tobyordoxford/status/1306965360009187328

Comment by Misha_Yagudin on Pablo Stafforini’s Forecasting System · 2020-09-17T12:46:07.561Z · EA · GW

I use Emacs for my personal forecasts because it is convenient: the questions are in the todo-list, I can resolve the question with a few keystrokes, TODO-states make questions look beautiful, a small python script gives me a calibration chart…

To be honest, all major forecasting platforms have quite bad UX for small personal things, it always takes to many clicks to make forecasting question and so on. I wish they'd popularize personal predictions by having sort of "very quick capture" like many todo-list apps have [e.g. Amazing Marvin].

I forecast much fewer questions on GJ Open and found Tab Snooze to be an easy way to remind me that I wanted to make updates/take a look at new data.

Comment by Misha_Yagudin on Judgement as a key need in EA · 2020-09-12T16:57:39.437Z · EA · GW

I like the list of resources you put together, another laconic source of wisdom is What can someone do now to become a stronger fit for future Open Philanthropy generalist RA openings?.

Comment by Misha_Yagudin on Judgement as a key need in EA · 2020-09-12T16:54:31.419Z · EA · GW

Hey Ben, what makes you think that judgment can be generally improved?

 

When Owen posted "Good judgement" and its components, I briefly reviewed the literature on  transfer of cognitive skills:

This makes me think that general training (e.g. calibration and to a lesser extent forecasting) might not translate to an overall improvement in judgment. OTOH, surely, getting skills broadly useful for decision making (e.g. spreadsheets, probabilistic reasoning, clear writing) should be good.

 

A bit of a tangent. Hanson's Reality TV MBAs is an interesting idea. Gaining experience via being a personal assistant to someone else seems to be beneficial², so maybe this could be scaled up by having a reality TV show. Maybe it is a good idea to invite people with good judgment/research taste to stream some of their working sessions and so on? 

[1]: According to Wikipedia: Near transfer occurs when many elements overlap between the conditions in which the learner obtained the knowledge or skill and the new situation. Far transfer occurs when the new situation is very different from that in which learning occurred.

[2]: Moreover, it is one of the 80K's paths that may turn out to be very promising.

Comment by Misha_Yagudin on Challenges in evaluating forecaster performance · 2020-09-12T15:40:17.296Z · EA · GW

Thanks for challenging me :) I wrote my takes after this discussion above.

Comment by Misha_Yagudin on Challenges in evaluating forecaster performance · 2020-09-12T15:37:16.488Z · EA · GW

This example is somewhat flawed (because forecasting only once breaks the assumption I am making) but might challenge your intuitions a bit :)

Comment by Misha_Yagudin on Challenges in evaluating forecaster performance · 2020-09-12T15:35:46.236Z · EA · GW

Thanks, everyone, for engaging with me. I will summarize my thoughts and would likely not actively comment here anymore:

  • I think the argument holds given the assumption [(a) probability to forecast on each day are proportional for the forecasters (previously we assumed uniformity) + (b) expected number of active days] I made.
    • > I think intuition to use here is that the sample mean is an unbiased estimator of expectation (this doesn't depend on the frequency/number of samples). One complication here is that we are weighing samples potentially unequally, but if we expect each forecast to be active for an equal number of days this doesn't matter.
  • The second assumption seems to be approximately correct assuming the uniformity but stops working on the edge [around the resolution date], which impacts the average score on the order of .
    • This effect could be noticeable, this is an update.
  • Overall, given the setup, I think that forecasting weekly vs. daily shouldn't differ much for forecasts with a resolution date in 1y.
  • I intended to use this toy model to emphasize that the important difference between the active and semi-active forecasters is the distribution of days they forecast on.
  • This difference, in my opinion, is mostly driven by the 'information gain' (e.g. breaking news, pull is published, etc).
    • This makes me skeptical about features s.a. automatic decay and so on.
    • This makes me curious about ways to integrate information sources automatically.
    • And less so about notifications that community/followers forecasts have significantly changed. [It is already possible to sort by the magnitude of crowd update since your last forecast on GJO].

On a meta-level, I am

  • Glad I had the discussion and wrote this comment :)
  • Confused about people's intuitions about the linearity of EV.
    • I would encourage people to think more carefully through my argument.
  • This makes me doubt I am correct, but still, I am quite certain. I undervalued the corner cases in the initial reasoning. I think I might undervalue other phenomena, where models don't capture reality well and hence triggers people's intuitions:
    • E.g. randomness of the resolution day might magnify the effect of the second assumption not holding, but it seems like it shouldn't be given that in expectation one resolves the question exactly once.
  • Confused about not being able to communicate my intuitions effectively.
    • I would appreciate any feedback [not necessary on communication], I have a way to submit it anonymously: https://admonymous.co/misha
Comment by Misha_Yagudin on Challenges in evaluating forecaster performance · 2020-09-12T11:39:12.138Z · EA · GW

I mildly disagree. I think intuition to use here is that the sample mean is an unbiased estimator of expectation (this doesn't depend on frequency/number of samples). One complication here is that we are weighing samples potentially unequally, but if we expect each forecast to be active for an equal number of days this doesn't matter.

 

ETA: I think the assumption of "forecasts have an equal expected number of active days" breaks around the closing date, which impacts things in the monotonical example (this effect is linear in the expected number of active days and could be quite big in extremes).

Comment by Misha_Yagudin on Challenges in evaluating forecaster performance · 2020-09-12T10:43:52.737Z · EA · GW

re: limit — a nice example. Please notice, that Bob makes a forecast on a (uniformly) random day, so when you take an expectation over the days he is making forecasts on you get the average of scores for all days as if he forecasted every day.

Let be the number of total days, be the probability Bob forecasted on a day , be the brier score of the forecast made on day :

 

I am a bit surprised that it worked out here because it breaks the assumption of the equality of the expected number of days forecast will be active. Lack of this assumption will play out if when aggregating over multiple questions [weighted by the number of active days]. Still, I hope this example gives helpful intuitions

.

Comment by Misha_Yagudin on Challenges in evaluating forecaster performance · 2020-09-11T23:16:27.744Z · EA · GW

Here is a sketch of a formal argument, which will show that freshness doesn't matter much.

Let's calculate the average Brier score of a forecaster. We can see the contribution of hypothetical forecasts on day toward sum: . If forecasts are sufficiently random the expected number of days forecasts are active should be equal. Because , expected average Brier score is equal to the average of Briers scores for all days.

Comment by Misha_Yagudin on New book: Moral Uncertainty by MacAskill, Ord & Bykvist · 2020-09-11T18:57:40.450Z · EA · GW

https://twitter.com/willmacaskill/status/1304075463455838209

Here’s an informal history and summary, in tweet form.

Comment by Misha_Yagudin on Challenges in evaluating forecaster performance · 2020-09-11T15:28:22.599Z · EA · GW

Aha, of the top of my head one might go in the directions of (a) TD-learning type of reward; (b) variance reduction for policy evaluation.

After thinking for a few more minutes, it seems that forecasting more often but at random moments shouldn't impact the expected Brier score. But in practice people frequent forecasters are evaluated with respect to a different distribution (which favors information gain/"something relevant just happen") — so maybe some sort of importance sampling might help to equalize these two groups?