rohinmshah's Shortform 2021-08-25T15:43:46.964Z
[AN #80]: Why AI risk might be solved without additional intervention from longtermists 2020-01-03T07:52:24.981Z
Summary of Stuart Russell's new book, "Human Compatible" 2019-10-19T19:56:52.174Z
Alignment Newsletter One Year Retrospective 2019-04-10T07:00:34.021Z
Thoughts on the "Meta Trap" 2016-12-20T21:36:39.498Z
EA Berkeley Spring 2016 Retrospective 2016-09-11T06:37:02.183Z
EAGxBerkeley 2016 Retrospective 2016-09-11T06:27:16.316Z


Comment by rohinmshah on The motivated reasoning critique of effective altruism · 2021-09-15T22:02:24.750Z · EA · GW

It’s so easy to collapse into the arms of “if there’s even a small chance X will make a very good future more likely …” As with consequentialism, I totally buy the logic of this! The issue is that it’s incredibly easy to hide motivated reasoning in this framework. Figuring out what’s best to do is really hard, and this line of thinking conveniently ends the inquiry (for people who want that).

I have seen something like this happen, so I'm not claiming it doesn't, but it feels pretty confusing to me. The logic pretty clearly doesn't hold up. Even if you accept that "very good future" is all that matters, you still need to optimize for the action that most increases the probability of a very good future, and that's still a hard question, and you can't just end the inquiry with this line of thinking.

Comment by rohinmshah on The motivated reasoning critique of effective altruism · 2021-09-15T21:55:29.164Z · EA · GW

Yeah, I agree that would also count (and as you might expect I also agree that it seems quite hard to do).

Basically with (b) I want to get at "the model does something above and beyond what we already had with verbal arguments"; if it substantially affects the beliefs of people most familiar with the field that seems like it meets that criterion.

Comment by rohinmshah on The motivated reasoning critique of effective altruism · 2021-09-15T13:37:10.371Z · EA · GW

The obvious response here is that I don't think longtermist questions are more amenable to explicit quantitative modeling than global poverty, but I'm even more suspicious of other methodologies here.

Yeah, I'm just way, way more suspicious of quantitative modeling relative to other methodologies for most longtermist questions.

I think we might just be arguing about different things here?

Makes sense, I'm happy to ignore those sorts of methods for the purposes of this discussion.

Medicine is less amenable to empirical testing than physics, but that doesn't mean that clinical intuition is a better source of truth for the outcomes of drugs than RCTs.

You can't run an RCT on arms races between countries, whether or not AGI leads to extinction, whether totalitarian dictatorships are stable, whether civilizational collapse would be a permanent trajectory change vs. a temporary blip, etc.

What's the actual evidence for this?

It just seems super obvious in almost every situation that comes up? I also don't really know how you expect to get evidence; it seems like you can't just "run an RCT" here, when a typical quantitative model for a longtermist question takes ~a year to develop (and that's in situations that are selected for being amenable to quantitative modeling).

For example, here's a subset of the impact-related factors I considered when I was considering where to work:

  1. Lack of non-xrisk-related demands on my time
  2. Freedom to work on what I want
  3. Ability to speak publicly
  4. Career flexibility
  5. Salary

I think incorporating just these factors into a quantitative model is a hell of an ask (and there are others I haven't listed here -- I haven't even included the factors for the academia vs industry question). A selection of challenges:

  1. I need to make an impact calculation for the research I would do by default.
  2. I need to make that impact calculation comparable with donations (so somehow putting them in the same units).
  3. I need to predict the counterfactual research I would do at each of the possible organizations if I didn't have the freedom to work on what I wanted, and quantify its impact, again in similar units.
  4. I need to model the relative importance of technical research that tries to solve the problem vs. communication.
  5. To model the benefits of communication, I need to model field-building benefits, legitimizing benefits, and the benefit of convincing key future decision-makers.
  6. I need to quantify the probability of various kinds of "risks" (the org I work at shuts down, we realize AI risk isn't actually a problem, a different AI lab reveals that they're going to get to AGI in 2 years, unknown unknowns) in order to quantify the importance of career flexibility.

I think just getting a framework that incorporates all of these things is already a Herculean effort that really isn't worth it, and even if you did make such a framework, I would be shocked if you could set the majority of the inputs based on actually good reference classes rather than just "what my gut says". (And that's all assuming I don't notice a bunch more effects I failed to mention initially that my intuitions were taking into account but that I hadn't explicitly verbalized.)

It seems blatantly obvious that the correct choice here is not to try to get to the point of "quantitative model that captures the large majority of the relevant considerations with inputs that have some basis in reference classes / other forms of legible evidence", and I'd be happy to take a 100:1 bet that you wouldn't be able to produce a model that meets that standard (as I evaluate it) in 1000 person-hours.

I have similar reactions for most other cost effectiveness analyses in longtermism. (For quantitative modeling in general, it depends on the question, but I expect I would still often have this reaction.)

Eg, weird to use median staff member's views as a proxy for truth

If you mean that the weighting on saving vs. improving lives comes from the median staff member, note that GiveWell has been funding research that aims to set these weights in a manner with more legible evidence, because the evidence didn't exist. In some sense this is my point -- that if you want to get legible evidence, you need to put in large amounts of time and money in order to generate that evidence; this problem is worse in the longtermist space and is rarely worth it.

Comment by rohinmshah on The motivated reasoning critique of effective altruism · 2021-09-15T08:19:35.058Z · EA · GW

Replied to Linch -- TL;DR: I agree this is true compared to global poverty or animal welfare, and I would defend this as simply the correct way to respond to actual differences in the questions asked in longtermism vs. those asked in global poverty or animal welfare.

You could move me by building an explicit quantitative model for a popular question of interest in longtermism that (a) didn't previously have models (so e.g. patient philanthropy or AI racing doesn't count), (b) has an upshot that we didn't previously know via verbal arguments, (c) doesn't involve subjective personal guesses or averages thereof for important parameters, and (d) I couldn't immediately tear a ton of holes in that would call the upshot into question.

Comment by rohinmshah on The motivated reasoning critique of effective altruism · 2021-09-15T08:12:45.150Z · EA · GW

My guess is that longtermist EAs ( like almost all humans) have never been that close to purely quantitative models guiding decisions

I agree with the literal meaning of that, because it is generally a terrible idea to just do what a purely quantitative model tells you (and I'll note that even GiveWell isn't doing this). But imagining the spirit of what you meant, I suspect I disagree.

I don't think you should collapse it into the single dimension of "how much do you use quantitative models in your decisions". It also matters how amenable the decisions are to quantitative modeling. I'm not sure how you're distinguishing between the two hypotheses:

  1. Longtermists don't like quantitative modeling in general.
  2. Longtermist questions are not amenable to quantitative modeling, and so longtermists don't do much quantitative modeling, but they would if they tackled questions that were amenable to quantitative modeling.

(Unless you want to defend the position that longtermist questions are just as easy to model as, say, those in global poverty? That would be... an interesting position.)

Also, just for the sake of actual evidence, here are some attempts at modeling, biased towards AI since that's the space I know. Not all are quantitative, and none of them are cost effectiveness analyses.

  1. Open Phil's reports on AI timelines: Biological anchors, Modeling the Human Trajectory, Semi-informative priors, brain computation, probability of power-seeking x-risk
  2. Races: racing to the precipice, followup
  3. Mapping out arguments: MTAIR and its inspiration

 going from my stereotype of weeatquince(2020)'s views

Fwiw, my understanding is that weeatquince(2020) is very pro modeling, and is only against the negation of the motte. The first piece of advice in that post is to use techniques like assumption based planning, exploratory modeling, and scenario planning, all of which sound to me like "explicit modeling". I think I personally am a little more against modeling than weeatquince(2020).

Comment by rohinmshah on The motivated reasoning critique of effective altruism · 2021-09-15T07:36:14.629Z · EA · GW

Comment by rohinmshah on The motivated reasoning critique of effective altruism · 2021-09-14T22:10:39.367Z · EA · GW

Overall great post, and I broadly agree with the thesis. (I'm not sure the evidence you present is all that strong though, since it too is subject to a lot of selection bias.) One nitpick:

Most of the posts’ comments were critical, but they didn’t positively argue against EV calculations being bad for longtermism. Instead they completely disputed that EV calculations were used in longtermism at all!

I think you're (unintentionally) running a motte-and-bailey here.

Motte: Longtermists don't think you should build explicit quantitative models, take their best guess at the inputs, chug through the math, and do whatever the model says, irrespective of common sense, verbal arguments, model uncertainty, etc.

Bailey: Longtermists don't think you should use numbers or models (and as a corollary don't consider effectiveness).

(My critical comment on that post claimed the motte; later I explicitly denied the bailey.)

Comment by rohinmshah on How to succeed as an early-stage researcher: the “lean startup” approach · 2021-09-12T17:45:36.979Z · EA · GW

In that example, Alice has ~5 min of time to give feedback to Bob; in Toby's case the senior researchers are (in aggregate) spending at least multiple hours providing feedback (where "Bob spent 15 min talking to Alice and seeing what she got excited about" counts as 15 min of feedback from Alice). That's the major difference.

I guess one way you could interpret Toby's advice is to simply get a project idea from a senior person, and then go work on it yourself without feedback from that senior person -- I would disagree with that particular advice. I think it's important to have iterative / continual feedback from senior people.

Comment by rohinmshah on How to succeed as an early-stage researcher: the “lean startup” approach · 2021-09-10T21:02:06.637Z · EA · GW

I agree substituting the question would be bad, and sometimes there aren't any relevant experts in which case you shouldn't defer to people. (Though even then I'd consider doing research in an unrelated area for a couple of years, and then coming back to work on the question of interest.)

I admit I don't really understand how people manage to have a "driving question" overwritten -- I can't really imagine that happening to me and I am confused about how it happens to other people.

(I think sometimes it is justified, e.g. you realize that your question was confused, and the other work you've done has deconfused it, but it does seem like often it's just that they pick up the surrounding culture and just forget about the question they cared about in the first place.)

So I guess this seems like a possible risk. I'd still bet pretty strongly against any particular junior researcher's intuition being better, so I still think this advice is good on net.

(I'm mostly not engaging with the quantum example because it sounds like a very just-so story to me and I don't know enough about the area to evaluate the just-so story.)

Comment by rohinmshah on How to succeed as an early-stage researcher: the “lean startup” approach · 2021-09-10T20:52:29.309Z · EA · GW

so it's e.g. the mesa-optimizers paper or multiple LW posts by John Wentworth.  As far as I can tell, none of these seems to be following the proposed 'formula for successful early-career research'. 

I think the mesa optimizers paper fits the formula pretty well? My understanding is that the junior authors on that paper interacted a lot with researchers at MIRI (and elsewhere) while writing that paper.

I don't know John Wentworth's history. I think it's plausible that if I did, I wouldn't have thought of him as a junior researcher (even before seeing his posts). If that isn't true, I agree that's a good counterexample.

My impression is PhD students in AI in Berkeley [...]

I agree the advice is particularly suited to this audience, for the reasons you describe.

the actual advice would be more complex and nuanced, something like "update on the idea  taste of people who are better/are comparable and have spent more time thinking about something, but be sceptical and picky about your selection of people"

That sounds like the advice in this post? You've added a clause about being picky about the selection of people, which I agree with, but other than that it sounds pretty similar to what Toby is suggesting. If so I'm not sure why a caveat is needed.

Perhaps you think something like "if someone [who is better or who is comparable and has spent more time thinking about something than you] provides feedback, then you should update, but it isn't that important and you don't need to seek it out"?

Relevant comparison is something like "over 80% of them would think they should have spent marginally more time thinking about ideas of more senior AI people at Berkeley, and more time on problems they were given by senior people, and smaller amount of time thinking about their own ideas, and working on projects based on their ideas". Would you guess the answer would still be 80%? 

I agree that's more clearly targeting the right thing, but still not great, for a couple of reasons:

  • The question is getting pretty complicated, which I think makes answers a bit more random.
  • Many students are too deferential throughout their PhD, and might correctly say that they should have explored their own ideas more -- without this implying that the advice in this post is wrong.
  • Lots of people do in fact take an approach that is roughly "do stuff your advisor says, and over time become more independent and opinionated"; idk what they would say.

I do predict though that they mostly won't say things like "my ideas during my first year were good, I would have had more impact had I just followed my instincts and ignored my advisor". (I guess one exception is that if they hated the project their advisor suggested, but slogged through it anyway, then they might say that -- but I feel like that's more about motivation rather than impact.)

Comment by rohinmshah on How to succeed as an early-stage researcher: the “lean startup” approach · 2021-09-10T09:38:13.011Z · EA · GW

My impression from talking to friends working in ML is that usually faculty have ideas that they'd be excited to see their senior grad students to work on, senior grad students have research ideas that they'd love for junior grad students to implement, and so forth. 

I think this is true if the senior person can supervise the junior person doing the implementation (which is time-expensive). I have lots of project ideas that I expect I could supervise. I have ~no project ideas where I expect I could spend an hour talking to someone, have them go off for a few months and implement it, and then I'd be interested in their results. Something will come up along the way that requires replanning, and if I'm not around to tell them how to replan, they're going to do it in a way that makes me much less excited about the results.

Comment by rohinmshah on How to succeed as an early-stage researcher: the “lean startup” approach · 2021-09-10T09:32:36.665Z · EA · GW

I'm considering three types of advice:

  1. "Always defer to experts"
  2. "Defer to experts for ~3 years, then trust your intuitions"
  3. "Always trust your intuitions"

When you said

But to steelman(steel-alien?) his view a little, I worry that EA is overinvested in outside-view/forecasting types (like myself?), rather than people with strong and true convictions/extremely high-quality initial research taste, which (quality-weighted) may be making up  the majority of revolutionary progress. 

And if we tell the future Geoffrey Hintons (and Eliezer Yudkowskys) of the world to be more deferential and trust their intuitions less relative to elite consensus or the literature, we're doing the world/our movement a disservice, even if the advice is likely to be individually useful/good for most researchers in terms of expected correctness of beliefs or career advancement. 

I thought you were claiming "maybe 3 > 1", so my response was "don't do 1 or 3, do 2".

If you're instead claiming "maybe 3 > 2", I don't really get the argument. It doesn't seem like advice #2 is obviously worse than advice #3 even for junior Eliezers and Geoffreys. (It's hard to say for those two people: in Eliezer's case, since there were no experts to defer to at the time, and I don't know enough details about Geoffrey to evaluate which advice would be good for him.)

I think Geoffrey Hinton's advice was targeted at very junior people.

Oh, I agree that's probably true. I think he's wrong to give that advice. I'm generally pretty okay with ignoring expert advice to amateurs if you have reason to believe it's bad; experts usually don't remember what it was like to be an amateur and so it's not that surprising that their advice on what amateurs should do is not great. (EDIT: Here's a new post that goes into more detail on this.)

Comment by rohinmshah on How to succeed as an early-stage researcher: the “lean startup” approach · 2021-09-09T17:56:57.156Z · EA · GW

What % do you think this is true for, quality-weighted? 

Weighted by quality after graduating? Still > 50%, probably > 80%, but it's really just a lot harder to tell (I don't have enough data). I'd guess that the best people still had "bad ideas" when they were starting out.

(I think a lot of what makes an junior researcher's idea "bad" is that the researcher doesn't know about existing work, or has misinterpreted the goal of the field, or lacks intuitions gained from hands-on experience, etc. It is really hard to compensate for a lack of knowledge with good intuition or strong armchair reasoning, and I think junior researchers should make it a priority to learn this sort of stuff.)

Re: the rest of your comment, I think you're reading more into my comment than I said or meant. I do not think researchers should generally be deferential; I think they should have strong beliefs, that may in fact go against expert consensus. I just don't think this is the right attitude while you are junior. Some quotes from my FAQ:

When selecting research projects, when you’re junior you should generally defer to your advisor. As time passes you should have more conviction. I very rarely see a first year’s research intuitions beat a professor’s; I have seen this happen more often for fourth years and above.


There’s a longstanding debate about whether one should defer to some aggregation of experts (an “outside view”), or try to understand the arguments and come to your own conclusion (an “inside view”). This debate mostly focuses on which method tends to arrive at correct conclusions. I am not taking a stance on this debate; I think it’s mostly irrelevant to the problem of doing good research. Research is typically meant to advance the frontiers of human knowledge; this is not the same goal as arriving at correct conclusions. If you want to advance human knowledge, you’re going to need a detailed inside view.

[followed by a longer example in which the correct thing to do is to ignore the expert]

Comment by rohinmshah on How to succeed as an early-stage researcher: the “lean startup” approach · 2021-09-09T15:47:10.958Z · EA · GW

I'm not going to go into much detail here, but I disagree with all of these caveats. I think this would be a worse post if it included the first and third caveats (less sure about the second).

First caveat: I think > 95% of incoming PhD students in AI at Berkeley have bad ideas (in the way this post uses the phrase). I predict that if you did a survey of people who have finished their PhD in AI at Berkeley, over 80% of them would think their initial ideas were significantly worse than their later ideas. (Note also that AI @ Berkeley is a very selective program.)

Second caveat: I'd say that the post applies to technical AI safety, at the very least, though it's plausible it doesn't generalize further. (That would surprise me though.)

Third caveat: This doesn't seem true to me in AI safety according to my definition of "best", though idk exactly which outputs you're thinking of and why you think they're "best".

Comment by rohinmshah on rohinmshah's Shortform · 2021-08-29T16:37:59.573Z · EA · GW

I'm not objecting to providing the information (I think that is good), I'm objecting to calling it a "conflict of interest".

I'd be much more keen on something like this (source):

For transparency, note that the reports for the latter three rows are all Open Philanthropy analyses, and I am co-CEO of Open Philanthropy.

Comment by rohinmshah on rohinmshah's Shortform · 2021-08-25T15:43:47.112Z · EA · GW

I sometimes see people arguing for people to work in area A, and declaring a conflict of interest that they are personally working on area A.

If they already were working in area A for unrelated reasons, and then they produced these arguments, it seems reasonable to be worried about motivated reasoning.

On the other hand, if because of these arguments they switched to working in area A, this is in some sense a signal of sincerity ("I'm putting my career where my mouth is").

I don't like the norm of declaring your career as a "conflict of interest", because it implies that you are in the former rather than latter category, regardless of which one is actually true. (And the latter is especially common in EA.) However, I don't really have a candidate alternative norm.

Comment by rohinmshah on Phil Torres' article: "The Dangerous Ideas of 'Longtermism' and 'Existential Risk'" · 2021-08-06T10:44:15.125Z · EA · GW

He asserts that "numerous people have come forward, both publicly and privately, over the past few years with stories of being intimidated, silenced, or 'canceled.'"  This doesn't match my experience.

I also have not had this experience, though that doesn't mean it didn't happen, and I'd want to take this seriously if it did happen.

However, Phil Torres has demonstrated that he isn't above bending the truth in service of his goals, so I'm inclined not to believe him. See previous discussion here. Example from the new article:

It’s not difficult to see how this way of thinking could have genocidally catastrophic consequences if political actors were to “[take] Bostrom’s argument to heart,” in Häggström’s words.

My understanding (sorry that the link is probably private) is that Torres is very aware that Häggström generally agrees with longtermism  and provides the example as a way not to do longtermism, but that doesn't stop Torres from using it to argue that this is what longtermism implies and therefore all longtermists are horrible.

I should note that even if this were written by someone else, I probably wouldn't have investigated the supposed intimidation, silencing, or canceling even in the absence of this example, because: 

  1. It seems super unlikely for people I know to intimidate / silence / cancel
  2. Claims of "lots of X has happened" without evidence tend to be exaggerated
  3. Haters gonna hate, the hate should not be expected to correlate with truth

But in this case I feel especially justified for not investigating.

Comment by rohinmshah on What is the role of public discussion for hits-based Open Philanthropy causes? · 2021-08-05T07:04:20.089Z · EA · GW

This is my best attempt at summarizing a reasonable outsider's view of the current state of affairs. Before publication, I had this sanity checked (though not necessarily endorsed) by an EA researcher with more context. Apologies in advance if it misrepresents the actual state of affairs, but that's precisely the thing I'm trying to clarify for myself and others.

I just want to note that I think this question is great and does not misrepresent the actual state of affairs.

I do think there's hope for some quantitative estimates even in the speculative cases; for example Open Phil has mentioned that they are investigating the "value of the last dollar" to serve as a benchmark against which to compare current spending (though to my knowledge they haven't said how they're investigating it).

Comment by rohinmshah on Aligning Recommender Systems as Cause Area · 2021-08-03T21:42:48.234Z · EA · GW

Unfortunately I don't really have the time to do this well, and I think it would be a pretty bad post if I wrote the version that would be ~2 hours of effort or less.

The next Alignment Newsletter will include two articles on recommender systems that mostly disagree with the "recommender systems are driving polarization" position; you might be interested in those. (In fact, I did this shallow dive because I wanted to make sure I wasn't neglecting arguments pointing in the opposite direction.)

EDIT: To be clear, I'd be excited for someone else to develop this into a post. The majority of my relevant thoughts are in the comments I already wrote, which anyone should feel free to use :)

Comment by rohinmshah on Aligning Recommender Systems as Cause Area · 2021-08-02T14:34:07.490Z · EA · GW

The result is software that is extremely addictive, with a host of hard-to-measure side effects on users and society including harm to relationships, reduced cognitive capacity, and political radicalization.

As far as I can tell, this is all the evidence given in this post that there is in fact a problem. Two of the four links are news articles, which I ignore on the principle that news articles are roughly uncorrelated with the truth. (On radicalization I've seen specific arguments arguing against the claim.) One seems to be a paper studying what users believe about the Facebook algorithm (I don't see any connection to "harm to relationships", if anything, the paper talks about how people use Facebook to maintain relationships). The last one is a paper whose abstract does in fact talk about phones reducing cognitive capacity, but (a) most papers are garbage, (b) beware the man of one study, and (c) why blame recommender systems for that, when it could just as easily be (say) email that's the problem?

Overall I feel pretty unconvinced that there even is a major problem with recommender systems. (I'm not convinced that there isn't a problem either.)

You could argue that since recommender systems have huge scale, any changes you make will be impactful, regardless of whether there is a problem or not. However, if there isn't a clear problem that you are trying to fix, I think you are going to have huge sign uncertainty on the impact of any given change, so the EV seems pretty low.


The main argument of this post seems to be that this cause area would have spillover effects into AGI alignment, so maybe I'm being unfair by focusing on whether or not there's a problem. But if that's your primary motivation, I think you should just do whatever seems best to address AGI alignment, which I expect won't be to work on recommender systems. (Note that the skills needed for recommender alignment are also needed for some flavors of AGI alignment research, so personal fit won't usually change the calculus much.)


Before you point me to Tristan Harris, I've engaged with (some of) those arguments too, see my thoughts here.

Comment by rohinmshah on Case studies of self-governance to reduce technology risk · 2021-07-14T21:00:03.039Z · EA · GW

Planned summary for the Alignment Newsletter:

Should we expect AI companies to reduce risk through self-governance? This post investigates six historical cases, of which the two most successful were the Asilomar conference on recombinant DNA, and the actions of Leo Szilard and other physicists in 1939 (around the development of the atomic bomb). It is hard to make any confident conclusions, but the author identifies the following five factors that make self-governance more likely:

1. The risks are salient.
2. If self-governance doesn’t happen, then the government will step in with regulation (which is expected to be poorly designed).
3. The field is small, so that coordination is easier.
4. There is support from gatekeepers (e.g. academic journals).
5. There is support from credentialed scientists.

Comment by rohinmshah on Can money buy happiness? A review of new data · 2021-07-08T21:15:34.234Z · EA · GW

Nice find, thanks!

(For others: note that the linked blog post also considers things like "maybe they just uploaded the wrong data" to be a plausible explanation.)

Comment by rohinmshah on Getting started independently in AI Safety · 2021-07-07T16:57:59.607Z · EA · GW

(See response to rory_greig above)

Comment by rohinmshah on Getting started independently in AI Safety · 2021-07-07T16:56:01.502Z · EA · GW

 you can attempt a deep RL project, realise you are hopelessly out of your depth, then you know you'd better go through Spinning Up in Deep RL before you can continue. 

Tbc, I do generally like the idea of just in time learning. But:

  • You may not realize when you are hopelessly out of your depth ("doesn't everyone say that ML is an art where you just turn knobs until things work?" or "how was I supposed to know that the algorithm was going to silently clip my rewards, making all of my reward shaping useless?")
  • You may not know what you don't know. In the example I gave you probably very well know that you don't know RL, but you may not realize that you don't know the right tooling to use ("what, there's a Tensorboard dashboard I can use to visualize my training curves?")

Both of these are often avoided by taking courses that (try to) present the details to you in the right order.

Comment by rohinmshah on Getting started independently in AI Safety · 2021-07-07T07:28:02.460Z · EA · GW

I think too many people feel held back from doing a project like thing on their own.

Absolutely. Also, too many people don't feel held back enough (e.g. maybe it really would have been beneficial to, say, go through Spinning Up in Deep RL before attempting a deep RL project). How do you tell which group you're in? 

(This comment inspired by Reversing Advice)

Comment by rohinmshah on Can money buy happiness? A review of new data · 2021-06-28T06:57:47.950Z · EA · GW

If we change the y-axis to display a linear relationship, this tells a different story. In fact, we see a plateauing of the relationship between income and experience wellbeing, just as found in Kahneman and Deaton (2010), but just at a later point — about $200,000 per year.

Uhh... that shouldn't happen from just re-plotting the same data. In fact, how is it that in the original graph, there is an increase from $400,000 to $620,000, but in the new linear axis graph, there is a decrease?

A doubling of income is associated with about a 1-point increase on a 0–100 scale, which seems (to our eyes) surprisingly small.

In the context of the previous paragraph ("this doesn't mean go work at Goldman Sachs"), this seems to imply that rich people shouldn't get more money because it barely makes a difference, but this also applies to poor people as well, casting doubt on whether we should bother giving money away. I don't know if it was meant to imply the first point, but it gave me vibes of selectively interpreting the data to support a desired conclusion.

Comment by rohinmshah on Ben Garfinkel's Shortform · 2021-06-27T10:08:09.391Z · EA · GW

I agree with this general point. I'm not sure if you think this is an interesting point to notice that's useful for building a world-model, and/or a reason to be skeptical of technical alignment work. I'd agree with the former but disagree with the latter.

Comment by rohinmshah on High Impact Careers in Formal Verification: Artificial Intelligence · 2021-06-15T17:02:08.763Z · EA · GW

Planned summary for the Alignment Newsletter:

This post considers the applicability of formal verification techniques to AI alignment. Now in order to “verify” a property, you need a specification of that property against which to verify. The author considers three possibilities:

1. **Formally specifiable safety:** we can write down a specification for safe AI, _and_ we’ll be able to find a computational description or implementation

2. **Informally specifiable safety:** we can write down a specification for safe AI mathematically or philosophically, but we will not be able to produce a computational version

3. **Nonspecifiable safety:** we will never write down a specification for safe AI.

Formal verification techniques are applicable only to the first case. Unfortunately, it seems that no one expects the first case to hold in practice: even CHAI, with its mission of building provably beneficial AI systems, is talking about proofs in the informal specification case (which still includes math), on the basis of comments like [these](

) in Human Compatible. In addition, it currently seems particularly hard for experts in formal verification to impact actual practice, and there doesn’t seem to be much reason to expect that to change. As a result, the author is relatively pessimistic about formal verification as a route to reduce existential risk from failures of AI alignment.

Comment by rohinmshah on Progress studies vs. longtermist EA: some differences · 2021-06-03T17:34:35.484Z · EA · GW

I just think there's a much greater chance that we look back on it and realize, too late, that we were focused on entirely the wrong things.

If you mean like 10x greater chance, I think that's plausible (though larger than I would say). If you mean 1000x greater chance, that doesn't seem defensible.

In both fields you basically ~can't experiment with the actual thing you care about (you can't just build a superintelligent AI and check whether it is aligned; you mostly can't run an intervention on the entire world  and check whether world GDP went up). You instead have to rely on proxies.

In some ways it is a lot easier to run proxy experiments for AI alignment -- you can train AI systems right now, and run actual proposals in code on those systems, and see what they do; this usually takes somewhere between hours and weeks. It seems a lot harder to do this for "improving GDP growth" (though perhaps there are techniques I don't know about).

I agree that PS has an advantage with historical data (though I don't see why economic theory is particularly better than AI theory), and this is a pretty major difference. Still, I don't think it goes from "good chance of making a difference" to "basically zero chance of making a difference".

The latter is something almost completely in the future, where we don't get any chances to get it wrong and course-correct.

Fwiw, I think AI alignment is relevant to current AI systems with which we have experience even if the catastrophic versions are in the future, and we do get chances to get it wrong and course-correct, but we can set that aside for now, since I'd probably still disagree even if I changed my mind on that. (Like, it is hard to do armchair theory without experimental data, but it's not so hard that you should conclude that you're completely doomed and there's no point in trying.)

Comment by rohinmshah on Help me find the crux between EA/XR and Progress Studies · 2021-06-02T20:11:25.322Z · EA · GW

I've been perceiving a lot of EA/XR folks to be in (3) but maybe you're saying they're more in (2)?


Maybe it turns out that most folks in each community are between (1) and (2) toward the other. That is, we're just disagreeing on relative priority and neglectedness.

That's what I would say.

I can't see it as literally the only thing worth spending any marginal resources on (which is where some XR folks have landed).

If you have opportunity A where you get a benefit of 200 per $ invested, and opportunity B where you get a benefit of 50 per $ invested, you want to invest in A as much as possible, until the opportunity dries up. At a civilizational scale, opportunities dry up quickly (i.e. with millions, maybe billions of dollars), so you see lots of diversity. At EA scales, this is less true.

So I do agree that some XR folks (myself included) would, if given a pot of millions of dollars to distribute, allocate it all to XR; I don't think the same people would do it for e.g. trillions of dollars. (I don't know where in the middle it changes.)

I think Open Phil, at the billions of dollars range, does in fact invest in lots of opportunities, some of which are arguably about improving progress. (Though note that they are not "fully" XR-focused, see e.g. Worldview Diversification.)

Comment by rohinmshah on Help me find the crux between EA/XR and Progress Studies · 2021-06-02T19:57:10.455Z · EA · GW

I kinda sorta answered Q2 above (I don't really have anything to add to it).

Q3: I'm not too clear on this myself. I'm just an object-level AI alignment researcher :P

Q4: I broadly agree this is a problem, though I think this:

Before PS and EA/XR even resolve our debate, the car might be run off the road—either as an accident caused by fighting groups, or on purpose.

seems pretty unlikely to me, where I'm interpreting it as "civilization stops making any progress and regresses to the lower quality of life from the past, and this is a permanent effect". 

I haven't thought about it much, but my immediate reaction is that it seems a lot harder to influence the world in a good way through the public, and so other actions seem better. That being said, you could search for "raising the sanity waterline" (probably more so on LessWrong than here) for some discussion of approaches to this sort of social progress (though it isn't about educating people about the value of progress in particular).

Comment by rohinmshah on Help me find the crux between EA/XR and Progress Studies · 2021-06-02T19:51:33.504Z · EA · GW

If XR weighs so strongly (1e15 future lives!) that you are, in practice, willing to accept any cost (no matter how large) in order to reduce it by any expected amount (no matter how small), then you are at risk of a Pascal's Mugging.

Sure. I think most longtermists wouldn't endorse this (though a small minority probably would).

But when the proposal becomes: “we should not actually study progress or try to accelerate it”, I get lost.

I don't think this is negative, I think there are better opportunities to affect the future (along the lines of Ben's comment).

I think this is mostly true of other EA / XR folks as well (or at least, if they think it is negative, they aren't confident enough in it to actually say "please stop progress in general"). As I mentioned above, people (including me) might say it is negative in specific areas, such as AGI development, but not more broadly.

And it's unclear to me whether this would even increase or decrease XR, let alone the amount—in any case I think there are very wide error bars on that estimate.

I agree with that (and I think most others would too).

Comment by rohinmshah on Progress studies vs. longtermist EA: some differences · 2021-06-02T19:29:43.942Z · EA · GW

But EA/XR folks don't seem to be primarily advocating for specific safety measures. Instead, what I hear (or think I'm hearing) is a kind of generalized fear of progress. Again, that's where I get lost. I think that (1) progress is too obviously valuable and (2) our ability to actually predict and control future risks is too low.

I think there's a fear of progress in specific areas (e.g. AGI and certain kinds of bio) but not a general one? At least I'm in favor of progress generally and against progress in some specific areas where we have good object-level arguments for why progress in those areas in particular could be very risky.

(I also think EA/XR folks are primarily advocating for the development of specific safety measures, and not for us to stop progress, but I agree there is at least some amount of "stop progress" in the mix.)

Re: (2), I'm somewhat sympathetic to this, but all the ways I'm sympathetic to it seem to also apply to progress studies (i.e. I'd be sympathetic to "our ability to influence the pace of progress is too low"), so I'm not sure how this becomes a crux.

Comment by rohinmshah on Help me find the crux between EA/XR and Progress Studies · 2021-06-02T19:13:38.623Z · EA · GW

If you're willing to accept GCR in order to slightly reduce XR, then OK—but it feels to me that you've fallen for a Pascal's Mugging.

Eliezer has specifically said that he doesn't accept Pascal's Mugging arguments in the x-risk context

I wouldn't agree that this is a Pascal's Mugging. In fact, in a comment on the post you quote, Eliezer says:

If an asteroid were genuinely en route, large enough to wipe out humanity, possibly stoppable, and nobody was doing anything about this 10% probability, I would still be working on FAI but I would be screaming pretty loudly about the asteroid on the side. If the asteroid is just going to wipe out a country, I'll make sure I'm not in that country and then keep working on x-risk.

I usually think of Pascal's Mugging as centrally about cases where you have a tiny probability of affecting the world in a huge way. In contrast, your example seems to be about trading off between uncertain large-sized effects and certain medium-sized effects. ("Medium" is only meant to be relative to "large", obviously both effects are huge on some absolute scale.)

Perhaps your point is that XR can only make a tiny, tiny dent in the probability of extinction; I think most XR folks would have one of two responses:

  1. No, we can make a reasonably large dent. (This would be my response.) Off the top of my head I might say that the AI safety community as a whole could knock off ~3 percentage points from x-risk.
  2. X-risk is so over-determined (i.e. > 90%, maybe > 99%) that even though we can't affect it much, there's no other intervention that's any better (and in particular, progress studies doesn't matter because we die before it has any impact).

The other three questions you mention don't feel cruxy.

The second one (default-good vs. default-bad) doesn't really make sense to me -- I'd say something like "progress tends to increase our scope of action, which can lead to major improvements in quality of life, and also increases the size of possible risks (especially from misuse)".

Comment by rohinmshah on Draft report on existential risk from power-seeking AI · 2021-06-01T22:09:36.248Z · EA · GW

Results are in this post.

Comment by rohinmshah on Final Report of the National Security Commission on Artificial Intelligence (NSCAI, 2021) · 2021-06-01T14:13:29.378Z · EA · GW

A lot of longtermists do pay attention to this sort of stuff, they just tend not to post on the EA Forum / LessWrong. I personally heard about the report from many different people after it was published, and also from a couple of people even before it was published (when there was a chance to provide input on it).

In general I expect that for any sufficiently large object-level thing, the discourse on the EA Forum will lag pretty far behind the discourse of people actively working on that thing (whether that discourse is public or not).  I read the EA Forum because (1) I'm interested in EA and (2) I'd like to correct misconceptions about AI alignment in EA. I would not read it as a source of articles relevant to AI alignment (though every once in a while they do come up).

Comment by rohinmshah on Draft report on existential risk from power-seeking AI · 2021-05-08T16:04:38.265Z · EA · GW

If AGI doom were likely, what additional evidence would we expect to see?

  1. Humans are pursuing convergent instrumental subgoals much more. (Related question: will AGIs want to take over the world?)
    1. A lot more anti-aging research is going on.
    2. Children's inheritances are ~always conditional on the child following some sort of rule imposed by the parent, intended to further the parent's goals after their death.
    3. Holidays and vacations are rare; when they are taken it is explicitly a form of rejuvenation before getting back to earning tons of money.
    4. Humans look like they are automatically strategic.
  2. Humans are way worse at coordination. (Related question: can humans coordinate to prevent AI risk?)
    1. Nuclear war happened some time after WW2.
    2. Airplanes crash a lot more.
    3. Unions never worked.
  3. Economic incentives point strongly towards generality rather than specialization. (Related question: how general will AI systems be? Will they be capable of taking over the world?)
    1. Universities don't have "majors", instead they just teach you how to be more generally intelligent.
    2. (Really the entire world would look hugely different if this were the case; I struggle to imagine it.)

There's probably more, I haven't thought very long about it.

(Before responses of the form "what about e.g. the botched COVID response?", let me note that this is about additional evidence; I'm not denying that there is existing evidence.)

Comment by rohinmshah on Draft report on existential risk from power-seeking AI · 2021-04-30T21:32:26.128Z · EA · GW

I think that at least 80% of the AI safety researchers at MIRI, FHI, CHAI, OpenAI, and DeepMind would currently assign a >10% probability to this claim: "The research community will fail to solve one or more technical AI safety problems, and as a consequence there will be a permanent and drastic reduction in the amount of value in our future."

If you're still making this claim now, want to bet on it? (We'd first have to operationalize who counts as an "AI safety researcher".)

I also think it wasn't true in Sep 2017, but I'm less confident about that, and it's not as easy to bet on.

Comment by rohinmshah on A conversation with Rohin Shah · 2021-04-24T17:29:29.672Z · EA · GW

In that sentence I meant "a treacherous turn that leads to an existential catastrophe", so I don't think the example you link updates me strongly on that.

While Luke talks about that scenario as an example of a treacherous turn, you could equally well talk about it as an example of "deception", since the evolved creatures are "artificially" reducing their rates of reproduction to give the supervisor / algorithm a "false belief" that they are bad at reproducing. Another example along these lines is when a robot hand "deceives" its human overseer into thinking that it has grasped a ball, when it is in fact in front of the ball.

I think really though these examples aren't that informative because it doesn't seem reasonable to say that the AI system is "trying" to do something in these examples, or that it does some things "deliberately". These behaviors were learned through trial and error. An existential catastrophe style treacherous turn would presumably not happen through trial and error. (Even if it did, it seems like there must have been at least some cases where it tried and failed to take over the world, which seems like a clear and obvious warning shot, that we for some reason completely ignored.)

(If it isn't clear, the thing that I care about is something like "will there be some 'warning shot' that greatly increases the level of concern people have about AI systems, before it is too late".)

Comment by rohinmshah on Coherence arguments imply a force for goal-directed behavior · 2021-04-06T22:04:05.123Z · EA · GW

I respond here; TL;DR is that I meant a different thing than the thing Katja is responding to.

Comment by rohinmshah on Layman’s Summary of Resolving Pascallian Decision Problems with Stochastic Dominance · 2021-02-28T03:15:12.846Z · EA · GW

In a working paper, Christian Tarsney comes up with a clever resolution to this conflict

Fwiw, I was expecting that the "resolution" would be an argument for why you shouldn't take the wager.

If you do consider it a resolution: if Alice said she would torture a googol people if you didn't give her $5, would you give her the $5? (And if so, would you keep doing it if she kept upping the price, after you had already paid it?)

Comment by rohinmshah on Is this a good way to bet on short timelines? · 2020-11-30T19:39:13.443Z · EA · GW

Counterfactuals are hard. I wouldn't be committing to donate it. (Also, if I were going to donate it, but it would have been donated anyway, then $4,000 no longer seems worth it if we ignore the other benefits.)

I expect at least one of us to update at least slightly.

I agree with "at least slightly".

I'd be interested to know why you disagree

Idk, empirically when I discuss things with people whose beliefs are sufficiently different from mine, it doesn't seem like their behavior changes much afterwards, even if they say they updated towards X. Similarly, when people talk to me, I often don't see myself making any particular changes to how I think or behave. There's definitely change over the course of a year, but it feels extremely difficult to ascribe that to particular things, and I think it more often comes from reading things that people wrote, rather than talking to them.

Comment by rohinmshah on Is this a good way to bet on short timelines? · 2020-11-29T17:10:00.302Z · EA · GW

I'm happy to sell an hour of my time towards something with no impact at $1,000, so that puts an upper bound of $4,000. (Though currently I've overcommitted myself, so for the next month or two it might be  ~2x higher.)

That being said, I do think it's valuable for people working on AI safety to at least understand each other's positions; if you don't think you can do that re: my position, I'd probably be willing to have that conversation without being paid at all (after the next month or two). And I do expect to understand your position better, though I don't expect to update towards it, so that's another benefit.

Comment by rohinmshah on Is this a good way to bet on short timelines? · 2020-11-28T17:15:25.650Z · EA · GW

I'm pretty sure I have longer timelines than you. On each of the bets:

  1. I would take this, but also I like to think if I did update towards your position I would say that anyway (and I would say that you got it right earlier if you asked me to do so, to the extent that I thought you got it right for the right reasons or something).
  2. I probably wouldn't take this (unless X was quite high), because I don't really expect either of us to update to the other's position.
  3. I wouldn't take this; I am very pessimistic about my ability to do research that I'm not inside-view excited about (like, my 50% confidence interval is that I'd have 10-100x less impact even in the case where someone with the same timelines as me is choosing the project, if they don't agree with me on research priorities). It isn't necessary that someone with shorter timelines than me would choose projects I'm not excited about, but from what I know about what you care about working on, I think it would be the case here. Similarly, I am pessimistic about your ability to do research on broad topics that I choose on my inside view. (This isn't specific to you; it applies to anyone who doesn't share most of my views.)
Comment by rohinmshah on Avoiding Munich's Mistakes: Advice for CEA and Local Groups · 2020-10-16T00:20:32.863Z · EA · GW

Yeah, I think I agree with everything you're saying. I think we were probably thinking of different aspects of the situation -- I'm imagining the sorts of crusades that were given as examples in the OP (for which a good faith assumption seems straightforwardly wrong, and a bad faith assumption seems straightforwardly correct), whereas you're imagining other situations like a university withdrawing affiliation (where it seems far more murky and hard to label as good or bad faith).

Also, I realize this wasn't clear before, but I emphatically don't think that making threats is necessarily immoral or even bad; it depends on the context (as you've been elucidating).

Comment by rohinmshah on Avoiding Munich's Mistakes: Advice for CEA and Local Groups · 2020-10-15T16:06:35.491Z · EA · GW

I agree with parts of this and disagree with other parts.

First off:

First, if she is acting in good faith, pre-committing to refuse any compromise for 'do not give in to bullying' reasons means one always ends up at ones respective BATNAs even if there was mutually beneficial compromises to be struck.

Definitely agree that pre-committing seems like a bad idea (as you could probably guess from my previous comment).

Second, wrongly presuming bad faith for Alice seems apt to induce her to make a symmetrical mistake presuming bad faith for you. To Alice, malice explains well why you were unwilling to even contemplate compromise, why you considered yourself obliged out of principle  to persist with actions that harm her interests, and why you call her desire to combat misogyny bullying and blackmail.

I agree with this in the abstract, but for the specifics of this particular case, do you in fact think that online mobs / cancel culture / groups who show up to protest your event without warning should be engaged with on a good faith assumption? I struggle to imagine any of these groups accepting anything other than full concession to their demands, such that you're stuck with the BATNA regardless.

(I definitely agree that if someone emails you saying "I think this speaker is bad and you shouldn't invite him", and after some discussion they say "I'm sorry but I can't agree with you and if you go through with this event I will protest / criticize you / have the university withdraw affiliation", you should not treat this as a bad faith attack. Afaik this was not the case with EA Munich, though I don't know the details.)


Re: the first five paragraphs: I feel like this is disagreeing on how to use the word "bully" or "threat", rather than anything super important. I'll just make one note:

Alice is still not a bully even if her motivating beliefs re. Bob are both completely mistaken and unreasonable. She's also still not a bully even if Alice's implied second-order norms are wrong (e.g. maybe the public square would be better off if people didn't stridently object to hosting speakers based on their supposed views on topics they are not speaking upon, etc.)

I'd agree with this if you could reasonably expect to convince Alice that she's wrong on these counts, such that she then stops doing things like

(e.g.) protest this event, stridently criticise the group in the student paper for hosting him, petition the university to withdraw affiliation

But otherwise, given that she's taking actions that destroy value for Bob without generating value for Alice (except via their impact on Bob's actions), I think it is fine to think of this as a threat. (I am less attached to the bully metaphor -- I meant that as an example of a threat.)

Comment by rohinmshah on Avoiding Munich's Mistakes: Advice for CEA and Local Groups · 2020-10-15T05:51:53.547Z · EA · GW

Yeah, I'm aware that is the emotional response (I feel it too), and I agree the game theoretic reason for not giving in to threats is important. However, it's certainly not a theorem of game theory that you always do better if you don't give in to threats, and sometimes giving in will be the right decision.

we will find you and we will make sure it was not worth it for you, at the cost of our own resources

This is often not an option. (It seems pretty hard to retaliate against an online mob, though I suppose you could randomly select particular members to retaliate against.)

Another good example is bullying. A child has ~no resources to speak of, and bullies will threaten to hurt them unless they do X. Would you really advise this child not to give in to the bully?

(Assume for the sake of the hypothetical the child has already tried to get adults involved and it has done ~nothing, as I am told is in fact often the case. No, the child can't coordinate with other children to fight the bully, because children are not that good at coordinating.)

Comment by rohinmshah on Avoiding Munich's Mistakes: Advice for CEA and Local Groups · 2020-10-14T20:24:03.871Z · EA · GW

It seems like you believe that one's decision of whether or not to disinvite a speaker should depend only on one's beliefs about the speaker's character, intellectual merits, etc. and in particular not on how other people would react.

Suppose that you receive a credible threat that if you let already-invited person X speak at your event, then multiple bombs would be set off, killing hundreds of people. Can we agree that in that situation it is correct to cancel the event?

If so, then it seems like at least in extreme cases, you agree that the decision of whether or not to hold an event can depend on how other people react. I don't see why you seem to assume that in the EA Munich case, the consequences are not bad enough that EA Munich's decision is reasonable.

Some plausible (though not probable) consequences of hosting the talk:

  • Protests disrupting the event (this has previously happened to a local EA group)
  • Organizers themselves get cancelled
  • Most members of the club leave due to risk of the above or disagreements with the club's priorities

At least the first two seem quite bad, there's room for debate on the third.

In addition, while I agree that the extremes of cancel culture are in fact very harmful for EA, it's hard to argue that disinviting a speaker is anywhere near the level of any of the examples you give. Notably, they are not calling for a mob to e.g. remove Robin Hanson from his post, they are simply cancelling one particular talk that he was going to give at their venue. This definitely does have a negative impact on norms, but it doesn't seem obvious to me that the impact is very large.

Separately, I think it is also reasonable for a random person to come to believe that Robin Hanson is not arguing in good faith.

(Note: I'm still undecided on whether or not the decision itself was good or not.)

Comment by rohinmshah on Getting money out of politics and into charity · 2020-10-06T18:22:45.544Z · EA · GW

I'm super excited that you're doing this! It's something I've wanted to exist for a long time, and I considered doing it myself a few years ago. It definitely seems like the legal issues are the biggest hurdle. Perhaps I'm being naively optimistic, but I was at least somewhat hopeful that you could get the political parties to not hate you, by phrasing it as "we're taking away money from the other party".

I'm happy to chat about implementation details, unfortunately I'm pretty busy and can't actually commit enough time to help with, you know, actual implementation. Also unfortunately, it seems I have a similar background to you, and so wouldn't really complement your knowledge very well.

If I were to donate to politics (which I could see happening), I would very likely use this service if it existed.

Comment by rohinmshah on Getting money out of politics and into charity · 2020-10-06T18:14:50.145Z · EA · GW
The obvious criticism, I think, is: "couldn't they benefit more from keeping the money?"

You want people to not have the money any more, otherwise e.g. a single Democrat with a $1K budget could donate repeatedly to match ten Republicans donating $1K each.

I'm not sure what the equilibrium would be, but it seems likely it would evolve towards all money being exactly matched, being returned to the users, and then being donated to the parties the normal way. Or perhaps people would stop using it altogether.

Another important detail here is which charities the money goes to -- the Republican donor may not feel great if after matching the Democrat's donation goes to e.g. Planned Parenthood. In the long run, I'd probably try to do surveys of users to find out which charities they'd object to the other side giving to, and not include those. But initially it could just be GiveWell charities for simplicity.

Re choice of charities

It seems pretty important for this sort of venture to build trust with users and have a lot of legitimacy. So, I think it is probably better to let people choose their own charities (excluding political ones for the reasons mentioned above).

You can still sway donations quite a lot based on the default behavior of the platform. In the long run, I'd probably have GiveWell charities as defaults (where you can point to GiveWell's analysis for legitimacy, and you mostly don't have to worry about room for more funding), and (if you wanted to be longtermist) maybe also a section of "our recommended charities" that is more longtermist with explanations of why those charities were selected.