Some thoughts on deference and inside-view models 2020-05-28T05:37:14.979Z · score: 109 (45 votes)
My personal cruxes for working on AI safety 2020-02-13T07:11:46.803Z · score: 118 (50 votes)
Thoughts on doing good through non-standard EA career pathways 2019-12-30T02:06:03.032Z · score: 140 (65 votes)
"EA residencies" as an outreach activity 2019-11-17T05:08:42.119Z · score: 86 (41 votes)
I'm Buck Shlegeris, I do research and outreach at MIRI, AMA 2019-11-15T22:44:17.606Z · score: 120 (62 votes)
A way of thinking about saving vs improving lives 2015-08-08T19:57:30.985Z · score: 2 (4 votes)


Comment by buck on The academic contribution to AI safety seems large · 2020-07-31T16:20:15.536Z · score: 27 (10 votes) · EA · GW

Thanks for writing this post--it was useful to see the argument written out so I could see exactly where I agreed and disagreed. I think lots of people agree with this but I've never seen it written up clearly before.

I think I place substantial weight (30% or something) on you being roughly right about the relative contributions of EA safety and non-EA safety. But I think it's more likely that the penalty on non-EA safety work is larger than you think. 

I think the crux here is that I think AI alignment probably requires really focused attention, and research done by people who are trying to do something else will probably end up not being very helpful for some of the core problems.

It's a little hard to evaluate the counterfactuals here, but I'd much rather have the contributions from EA safety than from non EA safety over the last ten years.

I think that it might be easier to assign a value to the discount factor by assessing the total contributions of EA safety and non-EA safety. I think that EA safety does something like 70% of the value-weighted work, which suggests a much bigger discount factor than 80%.


Assorted minor comments:

But this is only half of the ledger. One of the big advantages of academic work is the much better distribution of senior researchers: EA Safety seems bottlenecked on people able to guide and train juniors

Yes, but those senior researchers won't necessarily have useful things to say about how to do safety research. (In fact, my impression is that most people doing safety research in academia have advisors who don't have very smart thoughts on long term AI alignment.)

None of those parameters is obvious, but I make an attempt in the model (bottom-left corner).

I think the link is to the wrong model?

A cursory check of the model

In this section you count nine safety-relevant things done by academia over two decades, and then note that there were two things from within EA safety last year that seem more important. This doesn't seem to mesh with your claim about their relative productivity.

Comment by buck on The academic contribution to AI safety seems large · 2020-07-31T15:57:43.901Z · score: 8 (5 votes) · EA · GW

MIRI is not optimistic about prosaic AGI alignment and doesn't put much time into it.

Comment by buck on How strong is the evidence of unaligned AI systems causing harm? · 2020-07-23T03:15:29.565Z · score: 11 (6 votes) · EA · GW

I don’t think the evidence is very good; I haven’t found it more than slightly convincing. I don’t think that the harms of current systems are a very good line of argument for potential dangers of much more powerful systems.

Comment by buck on Intellectual Diversity in AI Safety · 2020-07-22T22:22:42.374Z · score: 10 (7 votes) · EA · GW

I'm curious what your experience was like when you started talking to AI safety people after already coming to come of your own conclusions. Eg I'm curious if you think that you missed major points that the AI safety people had spotted which felt obvious in hindsight, or if you had topics on which you disagreed with the AI safety people and think you turned out right.

Comment by buck on Are there lists of causes (that seemed promising but are) known to be ineffective? · 2020-07-09T05:25:08.184Z · score: 8 (5 votes) · EA · GW

In an old post, Michael Dickens writes:

The closest thing we can make to a hedonium shockwave with current technology is a farm of many small animals that are made as happy as possible. Presumably the animals are cared for by people who know a lot about their psychology and welfare and can make sure they’re happy. One plausible species choice is rats, because rats are small (and therefore easy to take care of and don’t consume a lot of resources), definitively sentient, and we have a reasonable idea of how to make them happy.
Thus creating 1 rat QALY costs $120 per year, which is $240 per human QALY per year.
This is just a rough back-of-the-envelope calculation so it should not be taken literally, but I’m still surprised by how cost-inefficient this looks. I expected rat farms to be highly cost-effective based on the fact that most people don’t care about rats, and generally the less people care about some group, the easier it is to help that group. (It’s easier to help developing-world humans than developed-world humans, and easier still to help factory-farmed animals.) Again, I could be completely wrong about these calculations, but rat farms look less promising than I had expected.

I think this is a good example of something seeming like a plausible idea for making the world better, but which turned out to seem pretty ineffective.

Comment by buck on Concern, and hope · 2020-07-07T16:48:26.251Z · score: 11 (4 votes) · EA · GW

What current controversy are you saying might make moderate pro-SJ EAs more wary of SSC?

Comment by buck on Concern, and hope · 2020-07-07T16:14:34.863Z · score: 6 (7 votes) · EA · GW

I have two complaints: linking to a post which I think was made in bad faith in an attempt to harm EA, and seeming to endorse it by using it as an example of a perspective that some EAs have.

I think you shouldn't update much on what EAs think based on that post, because I think it was probably written in an attempt to harm EA by starting flamewars.

EDIT: Also, I kind of think of that post as trying to start nasty rumors about someone; I think we should generally avoid signal boosting that type of thing.

Comment by buck on KR's Shortform · 2020-07-07T16:05:28.629Z · score: 2 (1 votes) · EA · GW

I'd be interested to see a list of what kinds of systematic mistakes previous attempts at long-term forecasting made.

Also, I think that many longtermists (eg me) think it's much more plausible to successfully influence the long run future now than in the 1920s, because of the hinge of history argument.

Comment by buck on Concern, and hope · 2020-07-06T02:41:46.375Z · score: 34 (16 votes) · EA · GW

Many other people who are personally connected to the Chinese Cultural Revolution are the people making the comparisons, though. Eg the EA who I see posting the most about this (who I don't think would want to be named here) is Chinese.

Comment by buck on Concern, and hope · 2020-07-06T02:40:04.579Z · score: 26 (10 votes) · EA · GW

I think that both the Cultural Revolution comparisons and the complaints about Cultural Revolution comparisons are way less bad than that post.

Comment by buck on Concern, and hope · 2020-07-05T18:50:00.477Z · score: 21 (11 votes) · EA · GW
culminating in the Slate Star Codex controversy of the past two weeks

I don't think that the SSC kerfuffle is that related to the events that have caused people to worry about cultural revolutions. In particular, most of the complaints about the NYT plan haven't been related to the particular opinions Scott has written about.

Comment by buck on Concern, and hope · 2020-07-05T18:11:41.303Z · score: 23 (14 votes) · EA · GW

Edit: the OP has removed the link I’m complaining about.

I think it's quite bad to link to that piece. The piece makes extremely aggressive accusations and presents very little evidence to back them up; it was extensively criticised in the comments. I think that that piece isn't an example of people being legitimately concerned, it was an example of someone behaving extremely badly.

Another edit: I am 80% confident that the author of that piece is not actually a current member of the EA community, and I am more than 50% confident that the piece was written mostly with an intention of harming EA. This is a lot of why I think it's bad to link to it. I didn't say this in my initial comment, sorry.

Comment by buck on - A Petition · 2020-06-29T16:15:18.253Z · score: 22 (14 votes) · EA · GW

In addition to what Aaron said, I’d guess Scott is responsible for probably 10% of EA recruiting over the last few years.

Comment by buck on KR's Shortform · 2020-06-19T04:56:15.357Z · score: 14 (6 votes) · EA · GW

I think there are many examples of EAs thinking about the possibility that AI might be sentient by default. Some examples I can think of off the top of my head:

I don't think people are disputing that it would be theoretically possible for AIs to be conscious, I think that they're making the claim that AI systems we find won't be.

Comment by buck on Some thoughts on deference and inside-view models · 2020-06-03T14:50:41.952Z · score: 6 (4 votes) · EA · GW
My view is roughly that EAs were equally disposed to be deferential then as they are now (if there were a clear EA consensus then, most of these EAs would have deferred to it, as they do now), but that "because the 'official EA consensus' (i.e. longtermism) is more readily apparent" now, people's disposition to defer is more apparent.

This is an interesting possibility. I still think there's a difference. For example, there's a lot of disagreement within AI safety about what kind of problems are important and how to work on them, and most EAs (and AI safety people) seem much less inclined to try to argue with each other about this than I think we were at Stanford EA.

Agreed, but I can't remember the last time I saw someone try to argue that you should donate to AMF rather than longtermism.

I think this is probably a mixture of longtermism winning over most people who'd write this kind of post, and also that people are less enthusiastic about arguing about cause prio these days for whatever reason. I think the post would be recieved well inasmuch as it was good. Maybe we're agreeing here?

Whenever I do see near-termism come up, people don't seem afraid to communicate that they think that it is obviously indefensible, or that they think even a third-rate longtermist intervention is probably incomparably better than AMF because at least it's longtermist.

I don't see people say that very often. Eg I almost never see people say this in response to posts about neartermism on the EA Facebook group, or on posts here.

Comment by buck on Some thoughts on deference and inside-view models · 2020-06-03T05:14:15.783Z · score: 9 (5 votes) · EA · GW

I just looked up the proof of Fermat's Last Theorem, and it came about from Andrew Wiles spotting that someone else had recently proven something which could plausibly be turned into a proof, and then working on it for seven years. This seems like a data point in favor of the end-to-end models approach.

Comment by buck on Some thoughts on deference and inside-view models · 2020-06-03T05:12:33.646Z · score: 8 (4 votes) · EA · GW
I think the history of maths also provides some suggestive examples of the dangers of requiring end-to-end stories. E.g., consider some famous open questions in Ancient mathematics that were phrased in the language of geometric constructions with ruler and compass, such as whether it's possible to 'square the circle'. It was solved 2,000 years after it was posed using modern number theory. But if you had insisted that everyone working on it has an end-to-end story for how what they're doing contributes to solving that problem, I think there would have been a real risk that people continue thinking purely in ruler-and-compass terms and we never develop modern number theory in the first place.

I think you're interpreting me to say that people ought to have an externally validated end-to-end story; I'm actually just saying that they should have an approach which they think might be useful, which is weaker.

Comment by buck on Some thoughts on deference and inside-view models · 2020-06-03T05:00:41.387Z · score: 13 (5 votes) · EA · GW
I've heard this impression from several people, but it's unclear to me whether EAs have become more deferential, although it is my impression that many EAs are currently highly deferential

Here's what leads me to think EA seems more deferential now.

I spent a lot of time with the Stanford EA club in 2015 and 2016, and was close friends with many of the people there. We related to EA very differently to how I relate to EA now, and how most newer/younger EAs I talk to seem to relate to it.

The common attitude was something like "we're utilitarians, and we want to do as much good as we can. EA has some interesting people and interesting ideas in it. However, it's not clear who we can trust; there's lots of fiery debate about cause prioritization, and we just don't at all know whether we should donate to AMF or the Humane League or MIRI. There are EA orgs like CEA, 80K, MIRI, GiveWell, but it's not clear which of those people we should trust, given that the things they say don't always make sense to us, and they have different enough bottom line beliefs that some of them must be wrong."

It's much rarer nowadays for me to hear people have an attitude where they're wholeheartedly excited about utilitarianism but openly skeptical to the EA "establishment".

Part of this is that I think the arguments around cause prioritization are much better understood and less contentious now.

I think it's never been clearer or more acceptable to communicate implicitly or explicitly, that you think that people who support AMF (or other near-termist) probably just 'don't get' longtermism and aren't worth engaging with.

I feel like there are many fewer EA forum posts and facebook posts where people argue back and forth about whether to donate to AMF or more speculative things than there used to be.

Comment by buck on Some thoughts on deference and inside-view models · 2020-06-03T04:47:50.358Z · score: 23 (7 votes) · EA · GW

This comment is a general reply to this whole thread.

Some clarifications:

  • I don't think that we should require that people working in AI safety have arguments for their research which are persuasive to anyone else. I'm saying I think they should have arguments which are persuasive to them.
  • I think that good plans involve doing things like playing around with ideas that excite you, and learning subjects which are only plausibly related if you have a hunch it could be helpful; I do these things a lot myself.
  • I think there's a distinction between having an end-to-end story for your solution strategy vs the problem you're trying to solve--I think it's much more tractable to choose unusually important problems than to choose unusually effective research strategies.
    • In most fields, the reason you can pick more socially important problems is that people aren't trying very hard to do useful work. It's a little more surprising that you can beat the average in AI safety by trying intentionally to do useful work, but my anecdotal impression is that people who choose what problems to work on based on a model of what problems would be important to solve are still noticeably more effective.

Here's my summary of my position here:

  • I think that being goal directed is very helpful to making progress on problems on a week-by-week or month-by-month scale.
  • I think that within most fields, some directions are much more promising than others, and backchaining is required in order to work on the promising directions. AI safety is a field like this. Math is another--if I decided to try to do good by going into math, I'd end up doing research which was really different from normal mathematicians. I agree with Paul Christiano's old post about this.
  • If I wanted to maximize my probability of solving the Riemann hypothesis, I'd probably try to pursue some crazy plan involving weird strengths of mine and my impression of blind spots of the field. However, I don't think this is actually that relevant, because I think that the important work in AI safety (and most other fields of relevance to EA) is less competitive than solving the Riemann hypothesis, and also a less challenging mathematical problem.
  • I think that in my experience, people who do the best work on AI safety generally have a clear end-to-end picture of the story for what work they need to do, and people who don't have such a clear picture rarely do work I'm very excited about. Eg I think Nate Soares and Paul Christiano are both really good AI safety researchers, and both choose their research directions very carefully based on their sense of what problems are important to solve.

Sometimes I talk to people who are skeptical of EA because they have a stronger version of the position you're presenting here--they think that nothing useful ever comes of people intentionally pursuing research that they think is important, and the right strategy is to pursue what you're most interested in.

One way of thinking about this is to imagine that there are different problems in a field, and different researchers have different comparative advantages at the problems. In one extreme case, the problems vary wildly in importance, and so the comparative advantage basically doesn't matter and you should work on what's most important. In the other extreme, it's really hard to get a sense of which things are likely to be more useful than other things, and your choices should be dominated by comparative advantage.

(Incidentally, you could also apply this to the more general problem of deciding what to work on as an EA. My personal sense is that the differences in values between different cause areas are big enough to basically dwarf comparative advantage arguments, but within a cause area comparative advantage is the dominant consideration.)

I would love to see a high quality investigation of historical examples here.

Comment by buck on Some thoughts on deference and inside-view models · 2020-05-29T02:15:17.892Z · score: 10 (3 votes) · EA · GW

I mean on average; obviously you're right that our opinions are correlated. Do you think there's anything important about this correlation?

Comment by buck on The best places to donate for COVID-19 · 2020-03-28T02:11:06.977Z · score: 18 (7 votes) · EA · GW

You say that the impact/scale of COVID is "huge". I think this might mislead people who are used to thinking about the problems EAs think about. Here's why.

I think COVID is probably going to cause on the order of 100 million DALYs this year, based on predictions like this; I think that 50-95% the damage ever done by COVID will be done this year. On the scale that 80000 Hours uses to assess the scale of problems, this would be ranked as importance level 11 or so.

I think this is lower than most things EAs consider working on or funding. For example:

This is a logarithmic scale, so for example, according to this scale, health in poor countries is 100 times more important than COVID.

So given that COVID seems likely to be between 100x and 10000x less important than the main other cause areas EAs think about, I think it's misleading to describe its scale as "huge".

Comment by buck on What are the key ongoing debates in EA? · 2020-03-10T04:13:21.733Z · score: 14 (6 votes) · EA · GW

I'm interested in betting about whether 20% of EAs think psychedelics are a plausible top EA cause area. Eg we could sample 20 EAs from some group and ask them. Perhaps we could ask random attendees from last year's EAG. Or we could do a poll in EA Hangout.

Comment by buck on On Becoming World-Class · 2020-02-26T05:05:38.619Z · score: 9 (8 votes) · EA · GW

I think that it's important for EA to have a space where we can communicate efficiently, rather than phrase everything for the benefit of newcomers who might be reading, so I think that this is bad advice.

Comment by buck on My personal cruxes for working on AI safety · 2020-02-25T18:08:56.430Z · score: 6 (4 votes) · EA · GW

I'd prefer something like the weaker and less clear statement "we **can** think ahead, and it's potentially valuable to do so even given the fact that people might try to figure this all out later".

Comment by buck on My personal cruxes for working on AI safety · 2020-02-25T16:25:34.040Z · score: 2 (1 votes) · EA · GW

I think your summary of crux three is slightly wrong: I didn’t say that we need to think about it ahead of time, I just said that we can.

Comment by buck on My personal cruxes for working on AI safety · 2020-02-25T16:24:18.068Z · score: 7 (5 votes) · EA · GW

Yeah, for the record I also think those are pretty plausible and important sources of impact for AI safety research.

I think that either way, it’s useful for people to think about which of these paths to impact they’re going for with their research.

Comment by buck on Max_Daniel's Shortform · 2020-02-23T01:00:51.794Z · score: 14 (7 votes) · EA · GW
My guess is I consider the activities you mentioned less valuable than you do. Probably the difference is largest for programming at MIRI and smallest for Hubinger-style AI safety research. (This would probably be a bigger discussion.)

I don't think that peculiarities of what kinds of EA work we're most enthusiastic about lead to much of the disagreement. When I imagine myself taking on various different people's views about what work would be most helpful, most of the time I end up thinking that valuable contributions could be made to that work by sufficiently talented undergrads.

Independent of this, my guess would be that EA does have a decent number of unidentified people who would be about as good as people you've identified. E.g., I can think of ~5 people off the top of my head of whom I think they might be great at one of the things you listed, and if I had your view on their value I'd probably think they should stop doing what they're doing now and switch to trying one of these things. And I suspect if I thought hard about it, I could come up with 5-10 more people - and then there is the large number of people neither of us has any information about.

I am pretty skeptical of this. Eg I suspect that people like Evan (sorry Evan if you're reading this for using you as a running example) are extremely unlikely to remain unidentified, because one of the things that they do is think about things in their own time and put the results online. Could you name a profile of such a person, and which of the types of work I named you think they'd maybe be as good at as the people I named?

It might be quite relevant if "great people" refers only to talent or also to beliefs and values/preferences

I am not intending to include beliefs and preferences in my definition of "great person", except for preferences/beliefs like being not very altruistic, which I do count.

E.g. my guess is that there are several people who could be great at functional programming who either don't want to work for MIRI, or don't believe that this would be valuable. (This includes e.g. myself.)

I think my definition of great might be a higher bar than yours, based on the proportion of people who I think meet it? (To be clear I have no idea how good you'd be at programming for MIRI because I barely know you, and so I'm just talking about priors rather than specific guesses about you.)


For what it's worth, I think that you're not credulous enough of the possibility that the person you talked to actually disagreed with you--I think you might doing that thing whose name I forget where you steelman someone into saying the thing you think instead of the thing they think.

Comment by buck on My personal cruxes for working on AI safety · 2020-02-21T05:55:30.655Z · score: 7 (5 votes) · EA · GW
For the problems-that-solve-themselves arguments, I feel like your examples have very "good" qualities for solving themselves: both personal and economic incentives are against them, they are obvious when one is confronted with the situation, and at the point where the problems becomes obvious, you can still solve them. I would argue that not all these properties holds for AGI. What are your thoughts about that?

I agree that it's an important question whether AGI has the right qualities to "solve itself". To go through the ones you named:

  • "Personal and economic incentives are aligned against them"--I think AI safety has somewhat good properties here. Basically no-one wants to kill everyone, and AI systems that aren't aligned with their users are much less useful. On the other hand, it might be the case that people are strongly incentivised to be reckless and deploy things quickly.
  • "they are obvious when one is confronted with the situation"--I think that alignment problems might be fairly obvious, especially if there's a long process of continuous AI progress where unaligned non-superintelligent AI systems do non-catastrophic damage. So this comes down to questions about how rapid AI progress will be.
  • "at the point where the problems become obvious, you can still solve them"--If the problems become obvious because non-superintelligent AI systems are behaving badly, then we can still maybe put more effort into aligning increasingly powerful AI systems after that and hopefully we won't lose that much of the value of the future.
Comment by buck on Max_Daniel's Shortform · 2020-02-21T05:36:33.990Z · score: 21 (7 votes) · EA · GW

I'm not quite sure how high your bar is for "experience", but many of the tasks that I'm most enthusiastic about in EA are ones which could plausibly be done by someone in their early 20s who eg just graduated university. Various tasks of this type:

  • Work at MIRI on various programming tasks which require being really smart and good at math and programming and able to work with type theory and Haskell. Eg we recently hired Seraphina Nix to do this right out of college. There are other people who are recent college graduates who we offered this job to who didn't accept. These people are unusually good programmers for their age, but they're not unique. I'm more enthusiastic about hiring older and more experienced people, but that's not a hard requirement. We could probably hire several more of these people before we became bottlenecked on management capacity.
  • Generalist AI safety research that Evan Hubinger does--he led the writing of "Risks from Learned Optimization" during a summer internship at MIRI; before that internship he hadn't had much contact with the AI safety community in person (though he'd read stuff online).
    • Richard Ngo is another young AI safety researcher doing lots of great self-directed stuff; I don't think he consumed an enormous amount of outside resources while becoming good at thinking about this stuff.
  • I think that there are inexperienced people who could do really helpful work with me on EA movement building; to be good at this you need to have read a lot about EA and be friendly and know how to talk to lots of people.

My guess is that EA does not have a lot of unidentified people who are as good at these things as the people I've identified.

I think that the "EA doesn't have enough great people" problem feels more important to me than the "EA has trouble using the people we have" problem.

Comment by buck on My personal cruxes for working on AI safety · 2020-02-20T17:25:14.538Z · score: 5 (3 votes) · EA · GW
One underlying hypothesis that was not explicitly pointed out, I think, was that you are looking for priority arguments. That is, part of your argument is about whether AI safety research is the most important thing you could do (It might be so obvious in an EA meeting or the EA forum that it's not worth exploring, but I like expliciting the obvious hypotheses).

This is a good point.

Whereas you could argue that without pure mathematics, almost all the positive technological progress we have now (from quantum mechanics to computer science) would not exist.

I feel pretty unsure on this point; for a contradictory perspective you might enjoy this article.

Comment by buck on Do impact certificates help if you're not sure your work is effective? · 2020-02-14T06:47:46.405Z · score: 6 (3 votes) · EA · GW

[for context, I've talked to Eli about this in person]

I'm interpreting you as having two concerns here.

Firstly, you're asking why this is different than you deferring to people about the impact of the two orgs.

From my perspective, the nice thing about the impact certificate setup is that if you get paid in org B impact certificates, you're making the person at orgs A and B put their money where their mouth is. Analogously, suppose Google is trying to hire me, but I'm actually unsure about Google's long term profitability, and I'd rather be paid in Facebook stock than Google stock. If Google pays me in Facebook stock, I'm not deferring to them about the relative values of these stocks, I'm just getting paid in Facebook stock, such that if Google is overvalued it's no longer my problem, it's the problem of whoever traded their Facebook stock for Google stock.

The reason why I think that the policy of maximizing impact certificates is better for the world in this case is that I think that people are more likely to give careful answers to the question "how relatively valuable is the work orgs A and B are doing" if they're thinking about it in terms of trying to make trades than if some random EA is asking for their quick advice.


Secondly, you're worrying that people might end up seeming like they're endorsing an org that they don't endorse, and that this might harm community epistemics. This is an interesting objection that I haven't thought much about. A few possible responses:

  • It's already currently an issue that people have different amounts of optimism about their workplaces, and people don't very often publicly state how much they agree and disagree with their employer (though I personally try to be clear about this). It's unlikely that impact equity trades will exacerbate this problem much.
  • Also, people often work at places for reasons that aren't "I think this is literally the best org", eg:
    • comparative advantage
    • thinking that the job is fun
    • the job paying them a high salary (this is exactly analogous to them paying in impact equity of a different org)
    • thinking that the job will give you useful experience
    • random fluke of who happened to offer you a job at a particular point
    • thinking the org is particularly flawed and so you can do unusual amounts of good by pushing it in a good direction
  • Also, if there were liquid markets in the impact equity of different orgs, then we'd have access to much higher-quality information about the community's guess about the relative promisingness of different orgs. So pushing in this direction would probably be overall helpful.
Comment by buck on My personal cruxes for working on AI safety · 2020-02-13T19:27:26.015Z · score: 15 (9 votes) · EA · GW
This was nice to read, because I'm not sure I've ever seen anyone actually admit this before.

Not everyone agrees with me on this point. Many safety researchers think that their path to impact is by establishing a strong research community around safety, which seems more plausible as a mechanism to affect the world 50 years out than the "my work is actually relevant" plan. (And partially for this reason, these people tend to do different research to me.)

You say you think there's a 70% chance of AGI in the next 50 years. How low would that probability have to be before you'd say, "Okay, we've got a reasonable number of people to work on this risk, we don't really need to recruit new people into AI safety"?

I don't know what the size of the AI safety field is such that marginal effort is better spent elsewhere. Presumably this is a continuous thing rather than a discrete thing. Eg it seems to me that now compared to five years ago, there are way more people in AI safety and so if your comparative advantage is in some other way of positively influencing the future, you should more strongly consider that other thing.

Comment by buck on Thoughts on doing good through non-standard EA career pathways · 2020-01-09T08:05:54.014Z · score: 6 (4 votes) · EA · GW
What do you think about participating in a forecasting platform, e.g. Good Judgement Open or Metaculus? It seems to cover all ingredients, and even be a good signal for others to evaluate your judgement quality.

Seems pretty good for predicting things about the world that get resolved on short timescales. Sadly it seems less helpful for practicing judgement about things like the following:

  • judging arguments about things like the moral importance of wild animal suffering, plausibility of AI existential risk, and existence of mental illness
  • long-term predictions
  • predictions about small-scale things like how a project should be organized (though you can train calibration on this kind of question)

Re my own judgement: I appreciate your confidence in me. I spend a lot of time talking to people who have IMO better judgement than me; most of the things I say in this post (and a reasonable chunk of things I say other places) are my rephrasings of their ideas. I think that people whose judgement I trust would agree with my assessment of my judgement quality as "good in some ways" (this was the assessment of one person I asked about this in response to your comment).

Comment by buck on I'm Buck Shlegeris, I do research and outreach at MIRI, AMA · 2019-11-22T05:24:25.466Z · score: 10 (4 votes) · EA · GW
It seems that your current strategy is to focus on training, hiring and outreaching to the most promising talented individuals.

This seems like a pretty good summary of the strategy I work on, and it's the strategy that I'm most optimistic about.

Other alternatives might include more engagement with amatures, and providing more assistance for groups and individuals that want to learn and conduct independent research.

I think that it would be quite costly and difficult for more experienced AI safety researchers to try to cause more good research to happen by engaging more with amateurs or providing more assistance to independent research. So I think that experienced AI safety researchers are probably going to do more good by spending more time on their own research than by trying to help other people with theirs. This is because I think that experienced and skilled AI safety researchers are much more productive than other people, and because I think that a reasonably large number of very talented math/CS people become interested in AI safety every year, so we can set a pretty high bar for which people to spend a lot of time with.

Also, what would change if you had 10 times the amount of management and mentorship capacity?

If I had ten times as many copies of various top AI safety researchers and I could only use them for management and mentorship capacity, I'd try to get them to talk to many more AI safety researchers, through things like weekly hour-long calls with PhD students, or running more workshops like MSFP.

Comment by buck on I'm Buck Shlegeris, I do research and outreach at MIRI, AMA · 2019-11-22T05:16:19.716Z · score: 18 (7 votes) · EA · GW
I’m a fairly good ML student who wants to decide on a research direction for AI Safety.

I'm not actually sure whether I think it's a good idea for ML students to try to work on AI safety. I am pretty skeptical of most of the research done by pretty good ML students who try to make their research relevant to AI safety--it usually feels to me like their work ends up not contributing to one of the core difficulties, and I think that they might have been better off if they'd instead spent their effort trying to become really good at ML in the hope of being better skilled up with the goal of working on AI safety later.

I don't have very much better advice for how to get started on AI safety; I think the "recommend to apply to AIRCS and point at 80K and maybe the Alignment Newsletter" path is pretty reasonable.

Comment by buck on I'm Buck Shlegeris, I do research and outreach at MIRI, AMA · 2019-11-22T05:12:48.818Z · score: 11 (5 votes) · EA · GW

It was a good time; I appreciate all the thoughtful questions.

Comment by buck on I'm Buck Shlegeris, I do research and outreach at MIRI, AMA · 2019-11-22T05:08:39.326Z · score: 12 (6 votes) · EA · GW

Most of them are related to AI alignment problems, but it's possible that I should work specifically on them rather than other parts of AI alignment.

Comment by buck on I'm Buck Shlegeris, I do research and outreach at MIRI, AMA · 2019-11-22T05:07:53.499Z · score: 4 (2 votes) · EA · GW
I suppose that the latter goes a long way towards explaining the former.

Yeah, I suspect you're right.

Personally, there are few technologies that I think are likely to radically change the world within the next 100 years (assuming that your definition of radical is similar to mine). Maybe the only ones that would really qualify are bioengineering and nanotech. Even in those fields, though, I expect the pace of change to be fairly slow if AI isn't heavily involved.

I think there are a couple more radically transformative technologies which I think are reasonably likely over the next hundred years, eg whole brain emulation. And I suspect we disagree about the expected pace of change with bioengineering and maybe nanotech.

Comment by buck on I'm Buck Shlegeris, I do research and outreach at MIRI, AMA · 2019-11-21T21:12:37.844Z · score: 4 (2 votes) · EA · GW

Yeah, makes sense; I didn’t mean “unintentional” by “incidental”.

Comment by buck on I'm Buck Shlegeris, I do research and outreach at MIRI, AMA · 2019-11-21T02:16:18.724Z · score: 31 (11 votes) · EA · GW

I think of myself as making a lot of gambles with my career choices. And I suspect that regardless of which way the propositions turn out, I'll have an inclination to think that I was an idiot for not realizing them sooner. For example, I often have both the following thoughts:

  • "I have a bunch of comparative advantage at helping MIRI with their stuff, and I'm not going to be able to quickly reduce my confidence in their research directions. So I should stop worrying about it and just do as much as I can."
  • "I am not sure whether the MIRI research directions are good. Maybe I should spend more time evaluating whether I should do a different thing instead."

But even if it feels obvious in hindsight, it sure doesn't feel obvious now.

So I have big gambles that I'm making, which might turn out to be wrong, but which feel now like they will have been reasonable-in-hindsight gambles either way. The main two such gambles are thinking AI alignment might be really important in the next couple decades and working on MIRI's approaches to AI alignment instead of some other approach.

When I ask myself "what things have I not really considered as much as I should have", I get answers that change over time (because I ask myself that question pretty often and then try to consider the things that are important). At the moment, my answers are:

  • Maybe I should think about/work on s-risks much more
  • Maybe I spend too much time inventing my own ways of solving design problems in Haskell and I should study other people's more.
  • Maybe I am much more productive working on outreach stuff and I should do that full time.
  • (This one is only on my mind this week and will probably go away pretty soon) Maybe I'm not seriously enough engaging with questions about whether the world will look really different in a hundred years from how it looks today; perhaps I'm subject to some bias towards sensationalism and actually the world will look similar in 100 years.
Comment by buck on I'm Buck Shlegeris, I do research and outreach at MIRI, AMA · 2019-11-21T01:34:26.972Z · score: 10 (3 votes) · EA · GW

I hadn't actually noticed that.

One factor here is that a lot of AI safety research seems to need ML expertise, which is one of my least favorite types of CS/engineering.

Another is that compared to many EAs I think I have a comparative advantage at roles which require technical knowledge but not doing technical research day-to-day.

Comment by buck on I'm Buck Shlegeris, I do research and outreach at MIRI, AMA · 2019-11-21T01:32:41.002Z · score: 11 (4 votes) · EA · GW

I'm emphasizing strategy 1 because I think that there are EA jobs for software engineers where the skill ceiling is extremely high, so if you're really good it's still worth it for you to try to become much better. For example, AI safety research needs really great engineers at AI safety research orgs.

Comment by buck on I'm Buck Shlegeris, I do research and outreach at MIRI, AMA · 2019-11-21T01:24:26.602Z · score: 24 (9 votes) · EA · GW

I worry very little about losing the opportunity to get external criticism from people who wouldn't engage very deeply with our work if they did have access to it. I worry more about us doing worse research because it's harder for extremely engaged outsiders to contribute to our work.

A few years ago, Holden had a great post where he wrote:

For nearly a decade now, we've been putting a huge amount of work into putting the details of our reasoning out in public, and yet I am hard-pressed to think of cases (especially in more recent years) where a public comment from an unexpected source raised novel important considerations, leading to a change in views. This isn't because nobody has raised novel important considerations, and it certainly isn't because we haven't changed our views. Rather, it seems to be the case that we get a large amount of valuable and important criticism from a relatively small number of highly engaged, highly informed people. Such people tend to spend a lot of time reading, thinking and writing about relevant topics, to follow our work closely, and to have a great deal of context. They also tend to be people who form relationships of some sort with us beyond public discourse.
The feedback and questions we get from outside of this set of people are often reasonable but familiar, seemingly unreasonable, or difficult for us to make sense of. In many cases, it may be that we're wrong and our external critics are right; our lack of learning from these external critics may reflect our own flaws, or difficulties inherent to a situation where people who have thought about a topic at length, forming their own intellectual frameworks and presuppositions, try to learn from people who bring very different communication styles and presuppositions.
The dynamic seems quite similar to that of academia: academics tend to get very deep into their topics and intellectual frameworks, and it is quite unusual for them to be moved by the arguments of those unfamiliar with their field. I think it is sometimes justified and sometimes unjustified to be so unmoved by arguments from outsiders.
Regardless of the underlying reasons, we have put a lot of effort over a long period of time into public discourse, and have reaped very little of this particular kind of benefit (though we have reaped other benefits - more below). I'm aware that this claim may strike some as unlikely and/or disappointing, but it is my lived experience, and I think at this point it would be hard to argue that it is simply explained by a lack of effort or interest in public discourse.

My sense is pretty similar to Holden's, though we've put much less effort into explaining ourselves publicly. When we're thinking about topics like decision theory which have a whole academic field, we seem to get very little out of interacting with the field. This might be because we're actually interested in different questions and academic decision theory doesn't have much to offer us (eg see this Paul Christiano quote and this comment).

I think that MIRI also empirically doesn't change its strategy much as a result of talking to highly engaged people who have very different world views (eg Paul Christiano), though individual researchers (eg me) often change their minds from talking to these people. (Personally, I also change my mind from talking to non-very-engaged people.)

Maybe talking to outsiders doesn't shift MIRI strategy because we're totally confused about how to think about all of this. But I'd be surprised if we figured this out soon given that we haven't figured it so far. So I'm pretty willing to say "look, either MIRI's onto something or not; if we're onto something, we should go for it wholeheartedly, and I don't seriously think that we're going to update our beliefs much from more public discourse, so it doesn't that seem costly to have our public discourse become costlier".

I guess I generally don't feel that convinced that external criticism is very helpful for situations like ours where there isn't an established research community with taste that is relevant to our work. Physicists have had a lot of time to develop a reasonably healthy research culture where they notice what kinds of arguments are wrong; I don't think AI alignment has that resource to draw on. And in cases where you don't have an established base of knowledge about what kinds of arguments are helpful (sometimes people call this "being in a preparadigmatic field"; I don't know if that's correct usage), I think it's plausible that people with different intuitions should do divergent work for a while and hope that eventually some of them make progress that's persuasive to the others.

By not engaging with critics as much as we could, I think MIRI is probably increasing the probability that we're barking completely up the wrong tree. I just think that this gamble is worth taking.

I'm more concerned about costs incurred because we're more careful about sharing research with highly engaged outsiders who could help us with it. Eg Paul has made some significant contributions to MIRI's research, and it's a shame to have less access to his ideas about our problems.

Comment by buck on I'm Buck Shlegeris, I do research and outreach at MIRI, AMA · 2019-11-21T01:00:31.176Z · score: 17 (7 votes) · EA · GW

I think it's plausible that "solving the alignment problem" isn't a very clear way of phrasing the goal of technical AI safety research. Consider the question "will we solve the rocket alignment problem before we launch the first rocket to the moon"--to me the interesting question is whether the first rocket to the moon will indeed get there. The problem isn't really "solved" or "not solved", the rocket just gets to the moon or not. And it's not even obvious whether the goal is to align the first AGI; maybe the question is "what proportion of resources controlled by AI systems end up being used for human purposes", where we care about a weighted proportion of AI systems which are aligned.

I am not sure whether I'd bet for or against the proposition that humans will go extinct for AGI-misalignment-related-reasons within the next 100 years.

Comment by buck on I'm Buck Shlegeris, I do research and outreach at MIRI, AMA · 2019-11-20T07:19:30.685Z · score: 8 (4 votes) · EA · GW

It's getting late and it feels hard to answer this question, so I'm only going to say briefly:

  • for something MIRI wrote re this, see the "strategic background" section here
  • I think there are cases where alignment is non-trivial but prosaic AI alignment is possible, and some people who are cautious about AGI alignment are influential in the groups that are working on AGI development and cause them to put lots of effort into alignment (eg maybe the only way to align the thing involves spending an extra billion dollars on human feedback). Because of these cases, I am excited for the leading AI orgs having many people in important positions who are concerned about and knowledgeable about these issues.
Comment by buck on I'm Buck Shlegeris, I do research and outreach at MIRI, AMA · 2019-11-20T07:02:55.651Z · score: 12 (5 votes) · EA · GW

I don't think you can prep that effectively for x-risk-level AI outcomes, obviously.

I think you can prep for various transformative technologies; you could for example buy shares of computer hardware manufacturers if you think that they'll be worth more due to increased value of computation as AI productivity increases. I haven't thought much about this, and I'm sure this is dumb for some reason, but maybe you could try to buy land in cheap places in the hope that in a transhuman utopia the land will be extremely valuable (the property rights might not carry through, but it might be worth the gamble for sufficiently cheap land).

I think it's probably at least slightly worthwhile to do good and hope that you can sell some of your impact certificates after good AI outcomes.

You should ask Carl Shulman, I'm sure he'd have a good answer.

Comment by buck on I'm Buck Shlegeris, I do research and outreach at MIRI, AMA · 2019-11-20T06:30:45.650Z · score: 10 (5 votes) · EA · GW

"Do you have any advice for people who want to be involved in EA, but do not think that they are smart or committed enough to be engaging at your level?"--I just want to say that I wouldn't have phrased it quite like that.

One role that I've been excited about recently is making local groups be good. I think that having better local EA communities might be really helpful for outreach, and lots of different people can do great work with this.

Comment by buck on I'm Buck Shlegeris, I do research and outreach at MIRI, AMA · 2019-11-20T06:02:54.573Z · score: 9 (4 votes) · EA · GW

(I've spent a few hours talking to people about the LTFF but I'm not sure about things like "what order of magnitude of funding did they allocate last year" (my guess without looking it up is $1M, (which turns out to be correct!)), so take all this with a grain of salt.)

Re Q1: I don't know, I don't think that we coordinate very carefully.

Re Q2: I don't really know. When I look at the list of things the LTFF funded in August or April (excluding regrants to orgs like MIRI, CFAR, and Ought), about 40% look meh (~0.5x MIRI), about 40% look like things which I'm reasonably glad someone funded (~1x MIRI), about 7% are things that I'm really glad someone funded (~3x MIRI), and 3% are things that I wish that they hadn't funded (-1x MIRI). Note that my mean outcome of the meh, good, and great categories are much higher than the median outcomes--a lot of them are "I think this is probably useless but seems worth trying for value of information". Apparently this adds up to thinking that they're 78% as good as MIRI.

Q3: I don't really know. My median outcome is that they turn out to do less well than my estimation above, but I think there's a reasonable probability that they turn out to be much better than my estimate above, and I'm excited to see them try to do good. This isn't really tied up with AI capability or safety progressing though.

Comment by buck on I'm Buck Shlegeris, I do research and outreach at MIRI, AMA · 2019-11-20T05:41:19.773Z · score: 3 (2 votes) · EA · GW

Idk. A couple percent? I'm very unsure about this.

Comment by buck on I'm Buck Shlegeris, I do research and outreach at MIRI, AMA · 2019-11-20T05:37:07.529Z · score: 18 (9 votes) · EA · GW

I think your sense is correct. I think that plenty of people have short docs on why their approach is good; I think basically no-one has long docs engaging thoroughly with the criticisms of their paths (I don't think Paul's published arguments defending his perspective count as complete; Paul has arguments that I hear him make in person that I haven't seen written up.)

My guess is that it's developed because various groups decided that it was pretty unlikely that they were going to be able to convince other groups of their work, and so they decided to just go their own ways. This is exacerbated by the fact that several AI safety groups have beliefs which are based on arguments which they're reluctant to share with each other.

(I was having a conversation with an AI safety researcher at a different org recently, and they couldn't tell me about some things that they knew from their job, and I couldn't tell them about things from my job. We were reflecting on the situation, and then one of us proposed the metaphor that we're like two people who were sliding on ice next to each other and then pushed away and have now chosen our paths and can't interact anymore to course correct.)

Should we be concerned? Idk, seems kind of concerning. I kind of agree with MIRI that it's not clearly worth it for MIRI leadership to spend time talking to people like Paul who disagree with them a lot.

Also, sometimes fields should fracture a bit while they work on their own stuff; maybe we'll develop our own separate ideas for the next five years, and then come talk to each other more when we have clearer ideas.

I suspect that things like the Alignment Newsletter are causing AI safety researchers to understand and engage with each other's work more; this seems good.