Posts

Forecasting Newsletter: July 2021 2021-08-01T15:07:00.985Z
Forecasting Newsletter: June 2021 2021-07-01T20:59:28.864Z
Shallow evaluations of longtermist organizations 2021-06-24T15:31:24.693Z
What should the norms around privacy and evaluation in the EA community be? 2021-06-16T17:31:59.174Z
2018-2019 Long Term Future Fund Grantees: How did they do? 2021-06-16T17:31:36.048Z
Forecasting Newsletter: May 2021 2021-06-01T15:51:16.532Z
Forecasting Newsletter: April 2021 2021-05-01T15:58:16.948Z
Forecasting Newsletter: March 2021 2021-04-01T17:01:15.831Z
Relative Impact of the First 10 EA Forum Prize Winners 2021-03-16T17:11:29.172Z
Introducing Metaforecast: A Forecast Aggregator and Search Tool 2021-03-07T19:03:44.627Z
Forecasting Newsletter: February 2021 2021-03-01T20:29:24.094Z
Forecasting Prize Results 2021-02-19T19:07:11.379Z
Forecasting Newsletter: January 2021 2021-02-01T22:53:54.819Z
A Funnel for Cause Candidates 2021-01-13T19:45:52.508Z
2020: Forecasting in Review 2021-01-10T16:05:37.106Z
Forecasting Newsletter: December 2020 2021-01-01T16:07:36.000Z
Big List of Cause Candidates 2020-12-25T16:34:38.352Z
What are good rubrics or rubric elements to evaluate and predict impact? 2020-12-03T21:52:27.802Z
Forecasting Newsletter: November 2020. 2020-12-01T17:00:40.460Z
An experiment to evaluate the value of one researcher's work 2020-12-01T09:01:49.034Z
Predicting the Value of Small Altruistic Projects: A Proof of Concept Experiment. 2020-11-22T20:07:57.499Z
Announcing the Forecasting Innovation Prize 2020-11-15T21:21:52.151Z
Incentive Problems With Current Forecasting Competitions. 2020-11-10T21:40:46.317Z
Forecasting Newsletter: October 2020. 2020-11-01T13:00:04.440Z
Forecasting Newsletter: September 2020. 2020-10-01T11:00:02.405Z
Forecasting Newsletter: August 2020. 2020-09-01T11:35:19.279Z
Forecasting Newsletter: July 2020. 2020-08-01T16:56:41.600Z
Forecasting Newsletter: June 2020. 2020-07-01T09:32:57.248Z
Forecasting Newsletter: May 2020. 2020-05-31T12:35:36.863Z
Forecasting Newsletter: April 2020 2020-04-30T16:41:38.630Z
New Cause Proposal: International Supply Chain Accountability 2020-04-01T07:56:17.225Z
NunoSempere's Shortform 2020-03-22T19:58:54.830Z
Shapley Values II: Philantropic Coordination Theory & other miscellanea. 2020-03-10T17:36:54.114Z
A review of two books on survey-making 2020-03-01T19:11:13.828Z
A review of two free online MIT Global Poverty courses 2020-01-15T11:40:41.519Z
[Part 1] Amplifying generalist research via forecasting – models of impact and challenges 2019-12-19T18:16:04.299Z
[Part 2] Amplifying generalist research via forecasting – results from a preliminary exploration 2019-12-19T16:36:10.564Z
Shapley values: Better than counterfactuals 2019-10-10T10:26:24.220Z
Why do social movements fail: Two concrete examples. 2019-10-04T19:56:02.028Z
EA Mental Health Survey: Results and Analysis. 2019-06-13T19:55:37.127Z

Comments

Comment by NunoSempere on Forecasting Newsletter: July 2021 · 2021-08-02T11:38:57.304Z · EA · GW

Thanks

Comment by NunoSempere on DeepMind: Generally capable agents emerge from open-ended play · 2021-07-27T18:42:56.189Z · EA · GW

My hot take: This seems like a somewhat big deal to me. It's what I would have predicted, but that's scary, given my timelines

Might be confirmation bias. But is it.

Comment by NunoSempere on Buck's Shortform · 2021-07-23T10:55:50.163Z · EA · GW

But if you already have this coalition value function, you've already solved the coordination problem and there’s no reason to actually calculate the Shapley value! If you know how much total value would be produced if everyone worked together, in realistic situations you must also know an optimal allocation of everyone’s effort. And so everyone can just do what that optimal allocation recommended.

This seems correct


A related claim is that the Shapley value is no better than any other solution to the bargaining problem. For example, instead of allocating credit according to the Shapley value, we could allocate credit according to the rule “we give everyone just barely enough credit that it’s worth it for them to participate in the globally optimal plan instead of doing something worse, and then all the leftover credit gets allocated to Buck”, and this would always produce the same real-life decisions as the Shapley value.

This misses some considerations around cost-efficiency/prioritization. If you look at your distorted "Buck values", you come away that Buck is super cost-effective; responsible for a large fraction of the optimal plan using just one salary. If we didn't have a mechanistic understanding of why that was, trying to get more Buck would become an EA cause area.

In contrast, if credit was allocated according to Shapley values, we could look at the groups whose Shapley value is the highest, and try to see if they can be scaled.


The section about "purely local" Shapley values might be pointing to something, but I don't quite know what it is, because the example is just Shapley values but missing a term? I don't know. You also say "by symmetry...", and then break that symmetry by saying that one of the parts would have been able to create $6,000 in value and the other $0. Needs a crisper example.


Re: coordination between people who have different values using SVs, I have some stuff here, but looking back the writting seems too corny.


Lastly, to some extent, Shapley values are a reaction to people calculating their impact as their counterfactual impact. This leads to double/triple counting impact for some organizations/opportunities, but not others, which makes comparison between them more tricky. Shapley values solve that by allocating impact such that it sums to the total impact & other nice properties. Then someone like OpenPhilanthropy or some EA fund can come and see which groups have the highest Shapley value (perhaps highest Shapley value per unit of money/ressources) and then try to replicate them/scale them. People might also make better decisions if they compare Shapley instead of counterfactual values (because Shapley values mostly give a more accurate impression of the impact of a position.)

So I see the benefits of Shapley values as fixing some common mistakes arising from using counterfactual values. This would make impact accounting slightly better, and coordination slightly better to the extent it relies on impact accounting for prioritization (which tbh might not be much.)

I'm not sure to what extent I agree with the claim that people are overhyping/misunderstanding Shapley values. It seems a plausible.

Comment by NunoSempere on A Sequence Against Strong Longtermism · 2021-07-22T23:14:36.222Z · EA · GW

I think that some of your anti-expected-value beef can be addressed by considering stochastic dominance as a backup decision theory in cases where expected value fails.

For instance, maybe I think that a donation to ALLFED in expectation leads to more lives saved than a donation to a GiveWell charity. But you could point out that the expected value is undefined, because maybe the future contains infinite amount of both flourishing and suffering. Then donating to ALLFED can still be the superior option if I think that it's stochastically dominant.

There are probably also tweaks to make to stochastic dominance, e.g., if you have two "games",

  • Game 1: Get X expected value in the next K years, then play game 3
  • Game 2: Get Y expected value in the next K years, then play game 3
  • Game 3: Some Pasadena-like game with undefined value

then one could also have a principle where Game 1 is preferable to Game 2 if X > Y, and this also sidesteps some more expected value problems.

Comment by NunoSempere on NunoSempere's Shortform · 2021-07-22T22:57:36.419Z · EA · GW

Notes on: A Sequence Against Strong Longtermism

Summary for myself. Note: Pretty stream-of-thought.

Proving too much

  • The set of all possible futures is infinite which somehow breaks some important assumptions longtermists are apparently making.
    • Somehow this fails to actually bother me
  • ...the methodological error of equating made up numbers with real data
    • This seems like a cheap/unjustified shot. In the world where we can calculate the expected values, it would seems fine to compare (wide, uncertain) speculative interventions with harcore GiveWell data (note that the next step would probably be to get more information, not to stop donating to GiveWell charities)
  • Sometimes, expected utility is undefined (Pasadena game)
    • The Pasadena game also fails to bother me, because the series hasn't (yet) showed that longtermism bets are "Pasadena-like"
    • (Also, note that you can use stochastic dominance to solve many expected value paradoxes, e.g, to decide between two universes with infinite expected value, or with undefined expected value.)
  • ...mention of E.T. Jaynes
    • Yeah, I'm also a fan of E.T. Jaynes, and I think that this is a cheap shot, not an argument.
  • Subject, Object, Instrument
    • This section seems confused/bad. In particular, there is a switch from "credences are subjective" to "we should somehow change our credences if this is useful". No, if one's best guess is that "the future is vast in size", then considering that one can change one's opinions to better attain goals doesn't make it stop being one's best guess

Overall: The core of this section seems to be that expected values are sometimes undefined. I agree, but this doesn't deter me from trying to do the most good by seeking more speculative/longtermist interventions. I can use stochastic dominance when expected utility fails me. 

The post also takes issue with the following paragraph from The Case For Strong Longtermism:

Then, using our figure of one quadrillion lives, the expected good done by Shivani contributing $10,000 to [preventing world domination by a repressive global political regime] would, by the lights of utilitarian axiology, be 100 lives. In contrast, funding for the Against Malaria Foundation, often regarded as the most cost-effective intervention in the area of short-term global health improvements, on average saves one life per $3500. (Nuño: italics and bold from the OP, not from original article)

I agree that the paragraph just intuitively looks pretty bad, so I looked at the context:

Now, the argument we are making is ultimately a quantitative one: that the expected impact 
one can have on the long-run future is greater than the expected impact one can have on the 
short run. It’s not true, in general, that options that involve low probabilities of high stakes 
systematically lead to greater expected values than options that involve high probabilities of 
modest payoffs: everything depends on the numbers. (For instance, not all insurance contracts 
are worth buying.) So merely pointing out that one ​might be able to influence the long run, or 
that one can do so to a nonzero extent (in expectation), isn’t enough for our argument. But, 
we will claim, any reasonable set of credences would allow that for at least one of these 
pathways, the expected impact is greater for the long-run. 

Suppose, for instance, Shivani thinks there’s a 1% probability of a transition to a world 
government in the next century, and that $1 billion of well-targeted grants — aimed (say) at 
decreasing the chance of great power war, and improving the state of knowledge on optimal 
institutional design — would increase the well-being in an average future life, under the 
world government, by 0.1%, with a 0.1% chance of that effect lasting until the end of 
civilisation, and that the impact of grants in this area is approximately linear with respect to 
the amount of spending. Then, using our figure of one quadrillion lives to come, the expected 
good done by Shivani contributing $10,000 to this goal would, by the lights of a utilitarian 
axiology, be 100 lives. In contrast, funding for Against Malaria Foundation, often regarded 
as the most cost-effective intervention in the area of short-term global health improvements, 
on average saves one life per $3500

Yeah, this is in the context of a thought experiment. I'd still do this with distributions rather than with point estimates, but ok.

The Credence Assumption

  • Ok, so the OP wants to argue that expected value theory breaks => the tool is not useful => we should abandon credences => longtermism somehow fails.
    • But I think that "My best guess is that I can do more good with more speculative interventions" is fairly robust to that line of criticism; it doesn't stop being my best guess just because credences are subjective.
      • E.g., if my best guess is that ALLFED does "more good" (e.g., more lives saved in expectation) than GiveWell charities, pointing out that actually the expected value is undefined (maybe the future contains both infinite amounts of flourishing and suffering) doesn't necessarily change my conclusion if I still think that donating to ALLFED is stochastically dominant.
  • Cox Theorem requires that probabilities be real numbers
    • The OP doesn't buy that. Sure, a piano is not going to drop on his head, but he might e.g., make worse decisions on account of being overconfident because he has not been keeping track of his (numerical) predictions and thus suffers from more hindsight bias than someone who kept track.
  • But what alternative do we have?
    • One can use e.g., upper and lower bounds on probabilities instead of real valued numbers: Sure, I do that. Longtermism still doesn't break.
  • Some thought experiment which looks like The Whispering Earring.
  • Instead of relying on explicit expected value calculations, we should rely on evolutionary approaches

The Poverty of Longtermism

  • "In 1957, Karl Popper proved it is impossible to predict the future of humanity, but scholars at the Future of Humanity Institute insist on trying anyway"
    • Come on
  • Yeah, this is just fairly bad
  • Lesson of the 20th Century
    • This is going to be an ad-hitlerium, isn't it
      • No, an ad-failures of communism
        • At this point, I stopped reading.
Comment by NunoSempere on Shallow evaluations of longtermist organizations · 2021-07-09T11:07:49.463Z · EA · GW

(Edited to add Centre for the Study of Existential Risk Four Month Report June - September 2020  to the CSER sources)

Comment by NunoSempere on Shallow evaluations of longtermist organizations · 2021-07-06T10:35:46.454Z · EA · GW

Sure, but it was particularly salient to me in this case because the evaluation was so negative

Comment by NunoSempere on Shallow evaluations of longtermist organizations · 2021-07-05T11:03:39.303Z · EA · GW

In what capacity are you asking? I'd be more likely to do so if you were asking as a team member, because the organization right now looks fairly small and I would almost be evaluating individuals.

Comment by NunoSempere on Shallow evaluations of longtermist organizations · 2021-07-05T11:01:35.179Z · EA · GW

So what I specifically meant was: It's interesting that the current leadership probably thinks that CSER is valuable (e.g., valuable enough to keep working at it, rather than directing their efforts somewhere else, and presumably valuable enough to absorb EA funding and talent). This presents a tricky updating problem, where I should probably average my own impressions from my shallow review with their (probably more informed) perspective. But in the review, I didn't do that, hence the "unmitigated inside view" label. 

Comment by NunoSempere on What should we call the other problem of cluelessness? · 2021-07-04T08:27:11.914Z · EA · GW

I like "opaqueness" for the reason that it is gradable.

Comment by NunoSempere on Shallow evaluations of longtermist organizations · 2021-06-28T15:17:26.841Z · EA · GW

This is a good question, and in hindsight, something I should have recorded. For the project as a whole, maybe two weeks to a month, but not of full-time work. I don't remember the times for each organization. 

Comment by NunoSempere on Shallow evaluations of longtermist organizations · 2021-06-26T22:36:52.660Z · EA · GW

Thanks Michael. Going through your options one by one.

  1. Inform decisions about donations that are each in something like the $10-$5000 dollar range. Not an aim I had, but sure, why not.
  2. Inform decisions about donations/grants that are each in something like the >$50,000 dollar range. So rather than inform those directly, inform the kind of research that you can either do or buy with money to inform that donation. $50,000 feels a little bit low for commissioning research to make a decision, though (could a $5k to $10k investment in a better version of this post make a $50k donation more than 10-20% better? Plausibly.
    • That said,  I'd be curious if any largish donations are changed as a result of this post, and why, and in particular why they didn't defer to the LTF fund.
  3. Inform decisions about which of these orgs (if any) to work for. Not really for myself, but I'd be happy for people to read this post as part of their decisions. Also, 80,000 hours exists.
  4. Provide feedback to these orgs that causes them to improve. Sure, but not a primary aim.
  5. Provide an accountability mechanism for these orgs that causes them to work harder or smarter so that they look better on such evaluations in future. No, not really.
  6. Just see if this sort of evaluation can be done, learn more about how to do that, and share that meta-level info with the EA public. Yep.
  7. [Something else]. Show the kind of thing that an organization like QURI can do! In particular, you can't do this kind of thing using software other than foretold (Metaculus is great, but the questions are too ambiguous; getting them approved takes time & in the case of a tournament, money, and for this post I only needed my own predictions (not that you can't run a tournament on foretold.))
  8. [Something else]. Learn more about the longtermist ecosystem myself
  9. [Something else]. So this was sort of on the edges of this project, but for making large amounts of predictions, one does need a pipeline, and improving that pipeline has been on my mind (and on Ozzie Gooen's). For instance, creating the 27 predictions one by one would be kind of a pain, so instead I use a Google doc script which feeds them to foretold.

I also think that 4. and 5. are too strongly worded. To the extent I'm providing feedback, I imagine it's more of a) of the sanity check variety or b) about how a relatively sane person perceives these organizations. For instance, if I don't get pushback about it in the comments, I'll think that its a good idea for the APPGFG to expand, but I doubt it's something that they themselves haven't thought about.

Comment by NunoSempere on Shallow evaluations of longtermist organizations · 2021-06-26T21:20:33.697Z · EA · GW

Thanks for considering ALLFED. We try to respond to inquiries quickly. We have looked back, and have not be able to locate any such inquiries. We will be finalizing our 2020 report with financial details soon.

This is most likely my fault; I think I got confused between allfed.org and allfed.info

To clarify, the cost of preparation does not include the scale up in a catastrophe

For clarity:
1. Your guesstimate model: 3% to 50% mitigation of the impact of war with a 30M to 200M, a war which has a probability 0.02% to 5% per year. You also say that so far, you've already mitigated the impact of such a war by 1% to 20%.
2. My model: a 0% to 15% mitigation (previously 0 to 5%, see below) of the impact of such a war with a 50M to 50B investment, where this is maybe not being fully prepared, but does include some serious paranoid preparation, some factories running, supply chains established, etc.
3. Objection: You're planning to go with the 30M to 200M path; my estimates should be for that path.
4. Answer: I'd have to think about it. Maybe 2x to 10x lower. Essentially I'd expect any preparation to at least fail partially, fail to get implemented, be ignored, not survive in institutional memory, etc.

In this paper, we found that if there were no resilient foods, expenditure on stored foods in a catastrophe would be approximately $90 trillion and about 10% of people would survive. However, if resilient foods could be produced at $2.5 per dry kilogram retail, 97% of people would survive but the total expenditure would only be ~$20 trillion. So one could argue that resilient foods would actually save money in a catastrophe

I'll read the paper. 

Just to make sure we are on the same page, if there were a 10% probability of full-scale nuclear war in the next 30 years and there were a 10% reduction in the long-term future potential of humanity given nuclear war, and if planning and R&D for resilient foods mitigated the far future impact of nuclear war by 50%, then that would improve the long-term potential of humanity by 0.5 percentage points (the product of the three percentages).

I see, thanks, I think I was getting this wrong (I've changed this in the guesstimate, but not in the post). With that in mind, your estimates now seem less high (but still very high). It changes my estimates slightly.

Separately, your numbers still seem fairly high. Suppose that in 1980 you had $100M and knew that there was going to be a pandemic (or another global financial crisis) in the next 100 years, but didn't knew the details; it seems unlikely that you could have made the covid pandemic or the 2008 financial crisis more than 10% better.

Comment by NunoSempere on Shallow evaluations of longtermist organizations · 2021-06-26T20:01:38.096Z · EA · GW

Thanks Michael, beautiful comment.

Comment by NunoSempere on Event-driven mission hedging and the 2020 US election · 2021-06-22T18:02:13.352Z · EA · GW

I think you'd have to think about the market equilibrium here. So for instance, if the price of capturing a tCO2e falls to $0.1/tonne, then more people will want to buy them, and the impact of a marginal tonne captured [1] might be lower. More generally, more people would be doing climate related projects, because the administration would be more welcoming of them.

In contrast, under $1/tonne, less people might want to buy them, and thus the marginal impact of a tonne captured might be higher. Similarly, perhaps fewer people would choose to carry out climate related projects, so the ones that exist might be more valuable. 

Would it be 10x higher? No, probably not, but then again a presidency probably wouldn't make carbon capture 10x more cost-effective (maybe 1.2x? I'm just shooting from the hip here).

[1]: or, closer to my own heart, the Shapley value of each tonne captured.

Comment by NunoSempere on 2018-2019 Long Term Future Fund Grantees: How did they do? · 2021-06-19T07:51:05.599Z · EA · GW

I would also want more bins than the ones I provide, i.e., not considering the total value is probably one of the parts I like less about this post. 

Comment by NunoSempere on 2018-2019 Long Term Future Fund Grantees: How did they do? · 2021-06-19T07:49:46.670Z · EA · GW

Makes sense. In particular, noticing that grants are all particularly legible might lead you update in the direction of a truncated distribution like you consider. So far, the LTFF seems like it has maybe moved  a bit in the direction of more legibility, but not that much.

Comment by NunoSempere on What are some key numbers that (almost) every EA should know? · 2021-06-18T07:53:07.712Z · EA · GW

Average yearly donation by EAs (EA survey respondents)

Comment by NunoSempere on 2018-2019 Long Term Future Fund Grantees: How did they do? · 2021-06-17T21:13:46.353Z · EA · GW

Suppose you give initial probability to all three normals. Then you sample an event, and its value is 1. Then you update against the green distribution, and in favor of the red and black distributions. The black distribution has a higher mean, but the red one has a higher standard deviation.

Comment by NunoSempere on 2018-2019 Long Term Future Fund Grantees: How did they do? · 2021-06-17T21:03:28.609Z · EA · GW

Well, because a success can be caused by a process who has a high mean, but also by a process which has a lower mean and a higher standard deviation. So for example, if you learn that someone has beaten Magnus Carlsen, it could be someone in the top 10, like Caruana, or it could be someone like Ivanchuk, who has a reputation as an "unreliable genius" and is currently number 56, but who, when he has good days, has extremely good days.

Comment by NunoSempere on 2018-2019 Long Term Future Fund Grantees: How did they do? · 2021-06-17T17:02:54.399Z · EA · GW

are you referring to what it seems the LTFF expected when they made the grant, what you think you would've expected at the time the grant was made, what you expect from EA/longtermist donations in general, or something else

Yes, that's tricky. The problem I have here is that different grants are in different domains and take different amounts. Ideally I'd have something like "utilons per dollar/other resources" but that's impractical. Instead, I judge a grant in its own terms: Did it achieve the purpose in the grants rationale? or something similarly valuable in case there was  a change of plan?

Comment by NunoSempere on 2018-2019 Long Term Future Fund Grantees: How did they do? · 2021-06-17T16:52:06.474Z · EA · GW

Thanks! To answer the questions under the first bullet point: 

  • Individuals performed better than organizations, but there weren't that many organizations. 
  • Individuals pursuing research directions mostly did legibly well, and the ones who didn't do legibly well seem like they had less of a well-defined plan, as one might expect. 
    • But some people with less defined directions also seem like they did well. 
    • Also note that maybe I'm rating research directions which didn't succeeded as less well defined. 
    • I don't actually have access to the applications, just to the grant blurbs and rationales
  • Grants to organize conferences and workshops generally delivered, and I imagine that they generally had more concrete roadmaps
  • There was only one upskilling grant.

In general, I think that the algorithm of looking at past similar grants and see if they succeed might be decently predictive for new grants, but that maybe isn't captured by the distinctions above.

Comment by NunoSempere on What should the norms around privacy and evaluation in the EA community be? · 2021-06-17T09:54:22.878Z · EA · GW

Did I imply that I thought it was bad for people to update in this way?

Reading it again, you didn't

Comment by NunoSempere on 2018-2019 Long Term Future Fund Grantees: How did they do? · 2021-06-17T09:01:41.201Z · EA · GW

Yes, for me updating upwards on total success on a lower percentage success rate seems intuitively fairly weird. I'm not saying it's wrong, it's that I have to stop and think about it/use my system 2. 

In particular, you have to have a prior distribution such that more valuable opportunities have a lower success rate. But then you have to have a bag of opportunities such that the worse they do, the more you get excited.

Now, I think this happens if you have a bag with "golden tickets", "sure things", and "duds".  Then not doing well would make you more excited if "sure things" were much less valuable than the weighted average of "duds" and "golden tickets".

But to get that, I think you'd have to have "golden tickets" be a binary thing. But in practice, take something like GovAI. It seems like its theory of impact is robust enough that I would expect to see a long tail of impact or impact proxies, rather than a binary success/not success, instead of a lottery ticket shaped impact. Say that I'd expect their impact distribution to be a power law: In that case, I would not get more excited if I saw them fail again and again. Conversely, if I do see them getting some successes, I would update upwards on the mean and the standard deviation of the power law distribution from which their impact is drawn.

Comment by NunoSempere on What should the norms around privacy and evaluation in the EA community be? · 2021-06-17T08:32:10.890Z · EA · GW

I find the simplicity of this appealing.

Comment by NunoSempere on What should the norms around privacy and evaluation in the EA community be? · 2021-06-17T08:31:40.944Z · EA · GW

If the extent of your evaluation is a quick search for public info, and you don't find much, I think the responsible conclusion is "it's unclear what happened" rather than "something went wrong". I think this holds even for projects that obviously should have public outputs if they've gone well.

So to push back against this, suppose that if you have four initial probabilities (legibly good, silently good, legibly bad, silently bad). Then you also have a ratio (legibly good + silently good) : (legibly bad + silently bad). 

Now if you learn that the project was not legibly good or legibly bad, then you update to (silently good, silently bad). The thing is, I expect this ratio  silently good : silently bad to be different than the original (legibly good + silently good) : (legibly bad + silently bad), because I expect that most projects, when they fail, do so silently, but that a large portion of successes have a post written about them. 

For an intuition pump, suppose that none of the projects from the LTF had any information to be found online about them. Then this would probably be an update downwards. But what's true about the aggregate seems also true probabilistically about the individual projects.

So overall, because I disagree that the "Bayesian" conclusion is uncertainty, I do see a tension between the thing to do to maintain social harmony and the thing to do if one wants to transmit a maximal amount of information. I think this is particularly the case "for projects that obviously should have public outputs if they've gone well".

But then you also have other things, like:

  • Some areas (like independent research on foundational topics) might be much, much more illegible than others (e.g, organizing a conference)
  • Doing this kind of update might incentivize people to go into more legible areas
  • An error rate changes things in complicated ways. In particular, maybe the error rate in the evaluation increases the more negative the evaluation is (though I think that the opposite is perhaps more likely). This would depend on your prior about how good most interventions are.
  • ...
Comment by NunoSempere on What should the norms around privacy and evaluation in the EA community be? · 2021-06-17T07:54:21.652Z · EA · GW

...but I still think that it's appropriate for people to reduce their trust in my conclusions if I'm getting "irrelevant details" wrong. If I notice an author make errors that I happen to notice, I'm going to raise my estimate for how many errors they've made that I didn't notice

This makes sense, but I don't think this is bad. In particular, I'm unsure about my own error rate, and maybe I do want to let people estimate my unknown-error rate as a function of my "irrelevant details" error rate.

Comment by NunoSempere on Forecasting Newsletter: May 2021 · 2021-06-07T10:20:45.720Z · EA · GW

Thanks

Comment by NunoSempere on Matt_Lerner's Shortform · 2021-05-31T15:21:19.919Z · EA · GW

Interesting. You might get more comments as a top-level post.

Comment by NunoSempere on Should someone start a grassroots campaign for USA to recognise the State of Palestine? · 2021-05-31T15:00:16.659Z · EA · GW

I don't agree with the current negative score for this post. It might not be particularly tractable, but just on account of its population (circa 13 million), Palestine is both important and has enough scale. Further, thinking about the issue might lead one to come up with tractable ways to influence the Israel-Palestinian conflict, or with high-value actions that a high-ranking US State official might take on other issues.

Comment by NunoSempere on Predict responses to the "existential risk from AI" survey · 2021-05-31T14:31:57.193Z · EA · GW

This is a total nerd-snipe, but I feel like I'm missing information about how strong selection effects are (i.e., did only people sympathetic to AI safety answer the survey? Was it only sent to people within those organizations who are sympathetic?)

That said, I'm guessing an average of around 20% for both questions, both widely spread. For instance, one could have 15% for the first question and 30% for the second question.  I'll be surprised if either question is sub-10% or above 60%.

Time taken to think about this: Less than half an hour. I tried to divide this organization by organization, but then realized uncertainties about respondent affiliation were too wide for that to be very meaningful.

Comment by NunoSempere on Relative Impact of the First 10 EA Forum Prize Winners · 2021-05-24T20:29:27.450Z · EA · GW

So here are the mistakes pointed out in the comments:

  • EAF's hiring round had a high value of information, which I didn't incorporate, per Misha's comment
  • "Why we have over-rated Cool Earth" was more impactful than I thought, per Khorton's comment
  • I likely underestimated the posible negative impact of the 2017 donor lotery report, which was quite positive on ALLFED, per MichaelA's comment.

I think this (a ~30% mistakes rate) is quite brutal, and still only a lower bound (because there might be other mistakes which commenters didn't point out.) I'm pointing this out here because I want to reference this error rate in a forthcoming post.

Comment by NunoSempere on How to promote effective giving in a mid-sized company? · 2021-05-21T11:56:51.600Z · EA · GW

I'd be interested in seeing what happens to this project (even if it fails, or if you fail to get traction, that would also be useful information)

Comment by NunoSempere on Encouraging employer to set up paycheck-deducted charitable contributions · 2021-05-21T11:08:43.889Z · EA · GW

I haven't seen this particular pathway proposed before, but you might find How we promoted EA at a large tech company relevant.

Comment by NunoSempere on Shapley values: Better than counterfactuals · 2021-05-10T14:59:42.141Z · EA · GW

Roses are redily
counterfactuals sloppily
but I don't thinkily
that we should use Shapily 

Comment by NunoSempere on Base Rates on United States Regime Collapse · 2021-04-28T14:41:50.672Z · EA · GW

I think that your probabilities are too high, because you are not processing enough data, or processing the data you have enough. For example, the new sovereign state prior (3%) would assume something like all countries having the same chance of popping out a state, which seems to clearly not be the case.

You might want to take a look at or contact the authors from the Rulers, Elections and Irregular Governance (REIGN) dataset/CoupCast, which has way more data behind it.

Comment by NunoSempere on [deleted post] 2021-04-23T14:30:25.239Z

Any thoughts about dividing far future into AI and non-AI? Also, I'm surprised to see GPI on "Infrastructure" rather than on "Far future"

Comment by NunoSempere on Getting a feel for changes of karma and controversy in the EA Forum over time · 2021-04-22T14:44:40.868Z · EA · GW

I would be interested in how to circumvent this for future analysis.

You can query by year, and then aggregate the years. From a past project, in nodejs:

/* Imports */
import fs from "fs"
import axios from "axios"

/* Utilities */
let print = console.log;
let sleep = (ms) => new Promise(resolve => setTimeout(resolve, ms))

/* Support function */
let graphQLendpoint = 'https://www.forum.effectivealtruism.org/graphql/'
async function fetchEAForumPosts(start, end){
  let response  = await axios(graphQLendpoint, ({
    method: 'POST',
    headers: ({ 'Content-Type': 'application/json' }),
    data: JSON.stringify(({ query: `
       {
        posts(input: {
          terms: {
          after: "${start}"
          before: "${end}"
          }
          enableTotal: true
        }) {
          totalCount
          results{
            pageUrl
            user {
              slug
              karma
            }
            
          }
        }
      }`
})),
  }))
  .then(res => res ? res.data ? res.data.data ? res.data.data.posts ? res.data.data.posts.results : null : null : null : null)
  return response
}

/* Body */
let years = [];
for (var i = 2005; i <= 2021; i++) {
   years.push(i);
}

// Example, getting only 1 year.
let main0 = async () => {
  let data = await fetchEAForumPosts("2005-01-01","2006-01-01")
  console.log(JSON.stringify(data,null,2))
}
//main0()

// Actual body
let main = async () => {
  let results = []
  for(let year of years){
    print(year)
    let firstDayOfYear = `${year}-01-01`
    let firstDayOfNextYear = `${year+1}-01-01`
    let data = await fetchEAForumPosts(firstDayOfYear, firstDayOfNextYear)
    //console.log(JSON.stringify(data,null,2))
    //console.log(data.slice(0,5))
    results.push(...data)
    await sleep(5000)
  }
  print(results)
  fs.writeFileSync("eaforumposts.json", JSON.stringify(results, 0, 2))
}
main()
Comment by NunoSempere on Mundane trouble with EV / utility · 2021-04-04T18:51:41.703Z · EA · GW

So here is something which sometimes breaks people: You're saying that you prefer A = 10% chance of saving 10 people over B = 1 in a million chance of saving a billion lives. Do you still prefer a 10% chance of A over a 10% chance of B?

If you are, note how you can be Dutch-booked.

Comment by NunoSempere on Mundane trouble with EV / utility · 2021-04-04T13:12:54.723Z · EA · GW

On Pascal's mugging specifically, Robert Miles has an interesting youtube video arguing that AI Safety is not a Pascal mugging, which the OP might be interested in: 

Comment by NunoSempere on Mundane trouble with EV / utility · 2021-04-03T08:53:47.254Z · EA · GW

1 & 2 might be normally be answered by the Von Neumann–Morgenstern utility theorem*

In the case you mentioned, you can try to calculate the impact of an education throughout the beneficiaries' lives. In this case, I'd expect it to mostly be an increase in future wages, but also some other positive externalities. Then you look at  the willingness to trade time for money, or the willingness to trade years of life for money, or the goodness and badness of life at different earning levels, and you come up with a (very uncertain) comparison.

If you want to look at an example of this, you might want to look at GiveWell's evaluations in general, or at their evaluation of deworming charities in particular.

I hope that's enough to point you to some directions which might answer your questions.

* But e.g., for negative utilitarians, axiom's 3 and 3' wouldn't apply in general (because they prefer to avoid suffering infinitely more than promoting happiness, i.e. consider L=some suffering, M=non-existence, N=some happiness) but they would still apply for the particular case where they're trading-off between different quantities of suffering. In any case, even if negative utilitarians would represent the world with two points (total suffering, total happiness), they still have a way of comparing between possible worlds (choose the one with the least suffering, then the one with the most happiness if suffering is equal).

Comment by NunoSempere on Announcing "Naming What We Can"! · 2021-04-01T15:38:48.092Z · EA · GW

Unsong: The Origins. 

Comment by NunoSempere on New Top EA Causes for 2021? · 2021-04-01T08:24:00.329Z · EA · GW

This isn't exactly a proposal for a new cause area, but I've felt that the current names of EA organizations are confusingly named. So I'm proposing  some name-swaps:

  • Probably Good should now be called "80,000 hours". Since 80,000 hours explicitly moved towards a more longtermist direction, it has abandoned some of its initial relationship to its name, and Probably Good seems to be picking some of that slack.
  • "80,000 hours should be renamed to "Center for Effective Altruism" (CEA). Although technically a subsidiary, 80,000 hours reaches more people than CEA, and produces more research. This change in name would reflect its de-facto leadership position in the EA community.
  • The Center for Effective Altruism should rebrand to "EA Infrastructure Fund", per CEA's strategical focus on events, local groups and the EA forum, and on providing infrastructure for community building more generally.
  • However, this leaves the "EA Infrastructure Fund" without a name. I think the main desiderata for a name is basically prestige, and so I suggest "Future of Humanity Institute", which sounds suitably ominous. Further, the association with Oxford might lead more applicants to apply, and require a lower salary (since status and monetary compensation are fungible), making the fund more cost-effective.
  • Fortunately, the Global Priorities Institute (GPI) recently determined that helping factory farmed animals is the most pressing priority, and that we never cared that much about humans in the first place. This leaves a bunch of researchers at the Future of Humanity Institute and at the Global Priorities Institute, which recently disbanded, unemployed, but Animal Charity Evaluators is offering them paid junior researcher positions. To reflect its status as the indisputable global priority, Animal Charity Evaluators should consider changing their name to "Doing Good Better".
  • To enable this last change and to avoid confusion, Doing Good Better would have to be put out of print.

I estimate that having better names only has a small or medium impact, but that tractability is sky-high. No comment on neglectedness. 

What do you blokes think?

Comment by NunoSempere on Report on Semi-informative Priors for AI timelines (Open Philanthropy) · 2021-03-31T20:18:34.826Z · EA · GW

Random thought on anthropics: 

  • If AGI had been developed early and been highly dangerous, one can't update on not seeing it
  • Anthropic reasoning might also apply to calculating the base rate of AGI; in the worlds where it existed and was beneficial, one might not be trying to calculate its a priori outside view.
Comment by NunoSempere on Report on Semi-informative Priors for AI timelines (Open Philanthropy) · 2021-03-29T17:07:34.814Z · EA · GW

Some notes on the Laplace prior:

  • On footnote 16, you "For example, the application of Laplace’s law described below implies that there was a 50% chance of AGI being developed in the first year of effort". But historically, participants in the Dartmouth conference were gloriously optimistic

"We propose that a 2-month, 10-man study of artificial intelligence be carried out during the summer of 1956 at Dartmouth College in Hanover, New Hampshire. The study is to proceed on the basis of the conjecture that every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it. An attempt will be made to find how to make machines use language, form abstractions and concepts, solve kinds of problems now reserved for humans, and improve themselves. We think that a significant advance can be made in one or more of these problems if a carefully selected group of scientists work on it together for a summer."

  • When you write "I also find that pr(AGI by 2036) from Laplace’s law is too high," what outside-view consideration are you basing that on? Also, is it really too high?
    • If you rule out AGI until 2028 (as you do in your report), the Laplace prior gives  you 1 - (1-[1/(2028-1956)+1])^(2036-2028) ≈ 10.4% ≈ 10%, which is well withing your range of 1% to 18%, and really near to your estimate of 8%.
    • The point that Laplace's prior depends on the unit of time chosen is really interesting, but it ends up not mattering once a bit of time has passed. For example, if we choose to use days instead of years, with (days since June 18 1956=23660, days until Mar 29 2028=2557, days until Jan 1 2036=5391), then Laplace's rule would give for the probability of AGI until 2036: 1 - (1-[1/(23660+2557+1)])^(5391-2557) = 10.2% ≈ 10%, pretty much the same as above.
      • It's fun to see that (1-(1/x))^x converges to 1/e pretty quickly, and that changing from years to days is equivalent to changing from  ~(1+(1/x))^(x*r) to ~(1+(1/(365*x)))^(365*x*r) , where x is the time passed in years and x*r is the time remaining in years. But both converge pretty quickly to (1/e)^r.
  • It is not clear to me that by adjusting the Laplace prior down  when you categorize AGI as a "highly ambitious but feasible technology" you are not updating twice: Once on the actual passage of time and another time given that AGI seems "highly ambitious". But one knows that AGI is "highly ambitious" because it has hasn't been solved in the first 65 years.

Given that, I'd still be tempted to go with the Laplace prior for this question, though I haven't really digested the report yet.

Comment by NunoSempere on Don't Be Bycatch · 2021-03-23T16:19:25.906Z · EA · GW

Nemo day, perhaps

Comment by NunoSempere on Want to alleviate developing world poverty? Alleviate price risk.​ (2018) · 2021-03-23T16:18:32.938Z · EA · GW

See also: https://en.wikipedia.org/wiki/Onion_Futures_Act

Comment by NunoSempere on BitBets: A Simple Scoring System for Forecaster Training · 2021-03-22T11:54:17.743Z · EA · GW

The auctioning scheme might not end up being proper, though

Comment by NunoSempere on Relative Impact of the First 10 EA Forum Prize Winners · 2021-03-22T11:48:31.342Z · EA · GW
  1. Yes, we agree
  2. No, we don't agree. I think that Adam did better than other potential donor lottery winners, and so his counterfactual value is higher, and thus his Shapley value is also higher. If all the other donors had been clones of Adam, I agree that you'd just divide by n. Thus, the "In every example here, this will be equivalent to calculating counterfactual value, and dividing by the number of necessary stakeholders" is in fact wrong, and I was implicitly doing both of the following in one step: a. Calculating Shapley values with "evaluators" as one agent and b. thinking of Adam's impact as a high proportion of the SV of the evaluator round,
  3. The rest of our disagreements hinge on 2., and I agree that judging the evaluator step alone would make more sense.
Comment by NunoSempere on BitBets: A Simple Scoring System for Forecaster Training · 2021-03-18T13:09:49.367Z · EA · GW

This has beautiful elements.

I'm also interested in using scoring rules for actual tournaments, so some thoughts on that:

  • This scoring rule incentivizes people to predict on questions for which their credence is closer to the extremes, rather than on questions where their credence is closer to even.
  • The rule is is some ways analogous to an automatic market maker which resets for each participant, which is an interesting idea. You could use a set-up such as this to elicit probabilities from forecasters, and give them points/money in the process.
  • You could start your bits somewhere other than at 50/50 (this would be equivalent to starting your automatic market maker somewhere else).