Posts

My personal cruxes for working on AI safety 2020-02-13T07:11:46.803Z · score: 100 (42 votes)
Thoughts on doing good through non-standard EA career pathways 2019-12-30T02:06:03.032Z · score: 130 (60 votes)
"EA residencies" as an outreach activity 2019-11-17T05:08:42.119Z · score: 85 (40 votes)
I'm Buck Shlegeris, I do research and outreach at MIRI, AMA 2019-11-15T22:44:17.606Z · score: 118 (60 votes)
A way of thinking about saving vs improving lives 2015-08-08T19:57:30.985Z · score: 2 (4 votes)

Comments

Comment by buck on Max_Daniel's Shortform · 2020-02-23T01:00:51.794Z · score: 9 (4 votes) · EA · GW
My guess is I consider the activities you mentioned less valuable than you do. Probably the difference is largest for programming at MIRI and smallest for Hubinger-style AI safety research. (This would probably be a bigger discussion.)

I don't think that peculiarities of what kinds of EA work we're most enthusiastic about lead to much of the disagreement. When I imagine myself taking on various different people's views about what work would be most helpful, most of the time I end up thinking that valuable contributions could be made to that work by sufficiently talented undergrads.

Independent of this, my guess would be that EA does have a decent number of unidentified people who would be about as good as people you've identified. E.g., I can think of ~5 people off the top of my head of whom I think they might be great at one of the things you listed, and if I had your view on their value I'd probably think they should stop doing what they're doing now and switch to trying one of these things. And I suspect if I thought hard about it, I could come up with 5-10 more people - and then there is the large number of people neither of us has any information about.

I am pretty skeptical of this. Eg I suspect that people like Evan (sorry Evan if you're reading this for using you as a running example) are extremely unlikely to remain unidentified, because one of the things that they do is think about things in their own time and put the results online. Could you name a profile of such a person, and which of the types of work I named you think they'd maybe be as good at as the people I named?

It might be quite relevant if "great people" refers only to talent or also to beliefs and values/preferences

I am not intending to include beliefs and preferences in my definition of "great person", except for preferences/beliefs like being not very altruistic, which I do count.

E.g. my guess is that there are several people who could be great at functional programming who either don't want to work for MIRI, or don't believe that this would be valuable. (This includes e.g. myself.)

I think my definition of great might be a higher bar than yours, based on the proportion of people who I think meet it? (To be clear I have no idea how good you'd be at programming for MIRI because I barely know you, and so I'm just talking about priors rather than specific guesses about you.)

---

For what it's worth, I think that you're not credulous enough of the possibility that the person you talked to actually disagreed with you--I think you might doing that thing whose name I forget where you steelman someone into saying the thing you think instead of the thing they think.

Comment by buck on My personal cruxes for working on AI safety · 2020-02-21T05:55:30.655Z · score: 5 (3 votes) · EA · GW
For the problems-that-solve-themselves arguments, I feel like your examples have very "good" qualities for solving themselves: both personal and economic incentives are against them, they are obvious when one is confronted with the situation, and at the point where the problems becomes obvious, you can still solve them. I would argue that not all these properties holds for AGI. What are your thoughts about that?

I agree that it's an important question whether AGI has the right qualities to "solve itself". To go through the ones you named:

  • "Personal and economic incentives are aligned against them"--I think AI safety has somewhat good properties here. Basically no-one wants to kill everyone, and AI systems that aren't aligned with their users are much less useful. On the other hand, it might be the case that people are strongly incentivised to be reckless and deploy things quickly.
  • "they are obvious when one is confronted with the situation"--I think that alignment problems might be fairly obvious, especially if there's a long process of continuous AI progress where unaligned non-superintelligent AI systems do non-catastrophic damage. So this comes down to questions about how rapid AI progress will be.
  • "at the point where the problems become obvious, you can still solve them"--If the problems become obvious because non-superintelligent AI systems are behaving badly, then we can still maybe put more effort into aligning increasingly powerful AI systems after that and hopefully we won't lose that much of the value of the future.
Comment by buck on Max_Daniel's Shortform · 2020-02-21T05:36:33.990Z · score: 15 (6 votes) · EA · GW

I'm not quite sure how high your bar is for "experience", but many of the tasks that I'm most enthusiastic about in EA are ones which could plausibly be done by someone in their early 20s who eg just graduated university. Various tasks of this type:

  • Work at MIRI on various programming tasks which require being really smart and good at math and programming and able to work with type theory and Haskell. Eg we recently hired Seraphina Nix to do this right out of college. There are other people who are recent college graduates who we offered this job to who didn't accept. These people are unusually good programmers for their age, but they're not unique. I'm more enthusiastic about hiring older and more experienced people, but that's not a hard requirement. We could probably hire several more of these people before we became bottlenecked on management capacity.
  • Generalist AI safety research that Evan Hubinger does--he led the writing of "Risks from Learned Optimization" during a summer internship at MIRI; before that internship he hadn't had much contact with the AI safety community in person (though he'd read stuff online).
    • Richard Ngo is another young AI safety researcher doing lots of great self-directed stuff; I don't think he consumed an enormous amount of outside resources while becoming good at thinking about this stuff.
  • I think that there are inexperienced people who could do really helpful work with me on EA movement building; to be good at this you need to have read a lot about EA and be friendly and know how to talk to lots of people.

My guess is that EA does not have a lot of unidentified people who are as good at these things as the people I've identified.

I think that the "EA doesn't have enough great people" problem feels more important to me than the "EA has trouble using the people we have" problem.

Comment by buck on My personal cruxes for working on AI safety · 2020-02-20T17:25:14.538Z · score: 5 (3 votes) · EA · GW
One underlying hypothesis that was not explicitly pointed out, I think, was that you are looking for priority arguments. That is, part of your argument is about whether AI safety research is the most important thing you could do (It might be so obvious in an EA meeting or the EA forum that it's not worth exploring, but I like expliciting the obvious hypotheses).

This is a good point.

Whereas you could argue that without pure mathematics, almost all the positive technological progress we have now (from quantum mechanics to computer science) would not exist.

I feel pretty unsure on this point; for a contradictory perspective you might enjoy this article.

Comment by buck on Do impact certificates help if you're not sure your work is effective? · 2020-02-14T06:47:46.405Z · score: 6 (3 votes) · EA · GW

[for context, I've talked to Eli about this in person]

I'm interpreting you as having two concerns here.

Firstly, you're asking why this is different than you deferring to people about the impact of the two orgs.

From my perspective, the nice thing about the impact certificate setup is that if you get paid in org B impact certificates, you're making the person at orgs A and B put their money where their mouth is. Analogously, suppose Google is trying to hire me, but I'm actually unsure about Google's long term profitability, and I'd rather be paid in Facebook stock than Google stock. If Google pays me in Facebook stock, I'm not deferring to them about the relative values of these stocks, I'm just getting paid in Facebook stock, such that if Google is overvalued it's no longer my problem, it's the problem of whoever traded their Facebook stock for Google stock.

The reason why I think that the policy of maximizing impact certificates is better for the world in this case is that I think that people are more likely to give careful answers to the question "how relatively valuable is the work orgs A and B are doing" if they're thinking about it in terms of trying to make trades than if some random EA is asking for their quick advice.

---

Secondly, you're worrying that people might end up seeming like they're endorsing an org that they don't endorse, and that this might harm community epistemics. This is an interesting objection that I haven't thought much about. A few possible responses:

  • It's already currently an issue that people have different amounts of optimism about their workplaces, and people don't very often publicly state how much they agree and disagree with their employer (though I personally try to be clear about this). It's unlikely that impact equity trades will exacerbate this problem much.
  • Also, people often work at places for reasons that aren't "I think this is literally the best org", eg:
    • comparative advantage
    • thinking that the job is fun
    • the job paying them a high salary (this is exactly analogous to them paying in impact equity of a different org)
    • thinking that the job will give you useful experience
    • random fluke of who happened to offer you a job at a particular point
    • thinking the org is particularly flawed and so you can do unusual amounts of good by pushing it in a good direction
  • Also, if there were liquid markets in the impact equity of different orgs, then we'd have access to much higher-quality information about the community's guess about the relative promisingness of different orgs. So pushing in this direction would probably be overall helpful.
Comment by buck on My personal cruxes for working on AI safety · 2020-02-13T19:27:26.015Z · score: 14 (8 votes) · EA · GW
This was nice to read, because I'm not sure I've ever seen anyone actually admit this before.

Not everyone agrees with me on this point. Many safety researchers think that their path to impact is by establishing a strong research community around safety, which seems more plausible as a mechanism to affect the world 50 years out than the "my work is actually relevant" plan. (And partially for this reason, these people tend to do different research to me.)

You say you think there's a 70% chance of AGI in the next 50 years. How low would that probability have to be before you'd say, "Okay, we've got a reasonable number of people to work on this risk, we don't really need to recruit new people into AI safety"?

I don't know what the size of the AI safety field is such that marginal effort is better spent elsewhere. Presumably this is a continuous thing rather than a discrete thing. Eg it seems to me that now compared to five years ago, there are way more people in AI safety and so if your comparative advantage is in some other way of positively influencing the future, you should more strongly consider that other thing.

Comment by buck on Thoughts on doing good through non-standard EA career pathways · 2020-01-09T08:05:54.014Z · score: 6 (4 votes) · EA · GW
What do you think about participating in a forecasting platform, e.g. Good Judgement Open or Metaculus? It seems to cover all ingredients, and even be a good signal for others to evaluate your judgement quality.

Seems pretty good for predicting things about the world that get resolved on short timescales. Sadly it seems less helpful for practicing judgement about things like the following:

  • judging arguments about things like the moral importance of wild animal suffering, plausibility of AI existential risk, and existence of mental illness
  • long-term predictions
  • predictions about small-scale things like how a project should be organized (though you can train calibration on this kind of question)

Re my own judgement: I appreciate your confidence in me. I spend a lot of time talking to people who have IMO better judgement than me; most of the things I say in this post (and a reasonable chunk of things I say other places) are my rephrasings of their ideas. I think that people whose judgement I trust would agree with my assessment of my judgement quality as "good in some ways" (this was the assessment of one person I asked about this in response to your comment).

Comment by buck on I'm Buck Shlegeris, I do research and outreach at MIRI, AMA · 2019-11-22T05:24:25.466Z · score: 10 (4 votes) · EA · GW
It seems that your current strategy is to focus on training, hiring and outreaching to the most promising talented individuals.

This seems like a pretty good summary of the strategy I work on, and it's the strategy that I'm most optimistic about.

Other alternatives might include more engagement with amatures, and providing more assistance for groups and individuals that want to learn and conduct independent research.

I think that it would be quite costly and difficult for more experienced AI safety researchers to try to cause more good research to happen by engaging more with amateurs or providing more assistance to independent research. So I think that experienced AI safety researchers are probably going to do more good by spending more time on their own research than by trying to help other people with theirs. This is because I think that experienced and skilled AI safety researchers are much more productive than other people, and because I think that a reasonably large number of very talented math/CS people become interested in AI safety every year, so we can set a pretty high bar for which people to spend a lot of time with.

Also, what would change if you had 10 times the amount of management and mentorship capacity?

If I had ten times as many copies of various top AI safety researchers and I could only use them for management and mentorship capacity, I'd try to get them to talk to many more AI safety researchers, through things like weekly hour-long calls with PhD students, or running more workshops like MSFP.

Comment by buck on I'm Buck Shlegeris, I do research and outreach at MIRI, AMA · 2019-11-22T05:16:19.716Z · score: 18 (7 votes) · EA · GW
I’m a fairly good ML student who wants to decide on a research direction for AI Safety.

I'm not actually sure whether I think it's a good idea for ML students to try to work on AI safety. I am pretty skeptical of most of the research done by pretty good ML students who try to make their research relevant to AI safety--it usually feels to me like their work ends up not contributing to one of the core difficulties, and I think that they might have been better off if they'd instead spent their effort trying to become really good at ML in the hope of being better skilled up with the goal of working on AI safety later.

I don't have very much better advice for how to get started on AI safety; I think the "recommend to apply to AIRCS and point at 80K and maybe the Alignment Newsletter" path is pretty reasonable.

Comment by buck on I'm Buck Shlegeris, I do research and outreach at MIRI, AMA · 2019-11-22T05:12:48.818Z · score: 11 (5 votes) · EA · GW

It was a good time; I appreciate all the thoughtful questions.

Comment by buck on I'm Buck Shlegeris, I do research and outreach at MIRI, AMA · 2019-11-22T05:08:39.326Z · score: 12 (6 votes) · EA · GW

Most of them are related to AI alignment problems, but it's possible that I should work specifically on them rather than other parts of AI alignment.

Comment by buck on I'm Buck Shlegeris, I do research and outreach at MIRI, AMA · 2019-11-22T05:07:53.499Z · score: 4 (2 votes) · EA · GW
I suppose that the latter goes a long way towards explaining the former.

Yeah, I suspect you're right.

Personally, there are few technologies that I think are likely to radically change the world within the next 100 years (assuming that your definition of radical is similar to mine). Maybe the only ones that would really qualify are bioengineering and nanotech. Even in those fields, though, I expect the pace of change to be fairly slow if AI isn't heavily involved.

I think there are a couple more radically transformative technologies which I think are reasonably likely over the next hundred years, eg whole brain emulation. And I suspect we disagree about the expected pace of change with bioengineering and maybe nanotech.

Comment by buck on I'm Buck Shlegeris, I do research and outreach at MIRI, AMA · 2019-11-21T21:12:37.844Z · score: 4 (2 votes) · EA · GW

Yeah, makes sense; I didn’t mean “unintentional” by “incidental”.

Comment by buck on I'm Buck Shlegeris, I do research and outreach at MIRI, AMA · 2019-11-21T02:16:18.724Z · score: 31 (11 votes) · EA · GW

I think of myself as making a lot of gambles with my career choices. And I suspect that regardless of which way the propositions turn out, I'll have an inclination to think that I was an idiot for not realizing them sooner. For example, I often have both the following thoughts:

  • "I have a bunch of comparative advantage at helping MIRI with their stuff, and I'm not going to be able to quickly reduce my confidence in their research directions. So I should stop worrying about it and just do as much as I can."
  • "I am not sure whether the MIRI research directions are good. Maybe I should spend more time evaluating whether I should do a different thing instead."

But even if it feels obvious in hindsight, it sure doesn't feel obvious now.

So I have big gambles that I'm making, which might turn out to be wrong, but which feel now like they will have been reasonable-in-hindsight gambles either way. The main two such gambles are thinking AI alignment might be really important in the next couple decades and working on MIRI's approaches to AI alignment instead of some other approach.

When I ask myself "what things have I not really considered as much as I should have", I get answers that change over time (because I ask myself that question pretty often and then try to consider the things that are important). At the moment, my answers are:

  • Maybe I should think about/work on s-risks much more
  • Maybe I spend too much time inventing my own ways of solving design problems in Haskell and I should study other people's more.
  • Maybe I am much more productive working on outreach stuff and I should do that full time.
  • (This one is only on my mind this week and will probably go away pretty soon) Maybe I'm not seriously enough engaging with questions about whether the world will look really different in a hundred years from how it looks today; perhaps I'm subject to some bias towards sensationalism and actually the world will look similar in 100 years.
Comment by buck on I'm Buck Shlegeris, I do research and outreach at MIRI, AMA · 2019-11-21T01:34:26.972Z · score: 10 (3 votes) · EA · GW

I hadn't actually noticed that.

One factor here is that a lot of AI safety research seems to need ML expertise, which is one of my least favorite types of CS/engineering.

Another is that compared to many EAs I think I have a comparative advantage at roles which require technical knowledge but not doing technical research day-to-day.

Comment by buck on I'm Buck Shlegeris, I do research and outreach at MIRI, AMA · 2019-11-21T01:32:41.002Z · score: 11 (4 votes) · EA · GW

I'm emphasizing strategy 1 because I think that there are EA jobs for software engineers where the skill ceiling is extremely high, so if you're really good it's still worth it for you to try to become much better. For example, AI safety research needs really great engineers at AI safety research orgs.

Comment by buck on I'm Buck Shlegeris, I do research and outreach at MIRI, AMA · 2019-11-21T01:24:26.602Z · score: 24 (9 votes) · EA · GW

I worry very little about losing the opportunity to get external criticism from people who wouldn't engage very deeply with our work if they did have access to it. I worry more about us doing worse research because it's harder for extremely engaged outsiders to contribute to our work.

A few years ago, Holden had a great post where he wrote:


For nearly a decade now, we've been putting a huge amount of work into putting the details of our reasoning out in public, and yet I am hard-pressed to think of cases (especially in more recent years) where a public comment from an unexpected source raised novel important considerations, leading to a change in views. This isn't because nobody has raised novel important considerations, and it certainly isn't because we haven't changed our views. Rather, it seems to be the case that we get a large amount of valuable and important criticism from a relatively small number of highly engaged, highly informed people. Such people tend to spend a lot of time reading, thinking and writing about relevant topics, to follow our work closely, and to have a great deal of context. They also tend to be people who form relationships of some sort with us beyond public discourse.
The feedback and questions we get from outside of this set of people are often reasonable but familiar, seemingly unreasonable, or difficult for us to make sense of. In many cases, it may be that we're wrong and our external critics are right; our lack of learning from these external critics may reflect our own flaws, or difficulties inherent to a situation where people who have thought about a topic at length, forming their own intellectual frameworks and presuppositions, try to learn from people who bring very different communication styles and presuppositions.
The dynamic seems quite similar to that of academia: academics tend to get very deep into their topics and intellectual frameworks, and it is quite unusual for them to be moved by the arguments of those unfamiliar with their field. I think it is sometimes justified and sometimes unjustified to be so unmoved by arguments from outsiders.
Regardless of the underlying reasons, we have put a lot of effort over a long period of time into public discourse, and have reaped very little of this particular kind of benefit (though we have reaped other benefits - more below). I'm aware that this claim may strike some as unlikely and/or disappointing, but it is my lived experience, and I think at this point it would be hard to argue that it is simply explained by a lack of effort or interest in public discourse.

My sense is pretty similar to Holden's, though we've put much less effort into explaining ourselves publicly. When we're thinking about topics like decision theory which have a whole academic field, we seem to get very little out of interacting with the field. This might be because we're actually interested in different questions and academic decision theory doesn't have much to offer us (eg see this Paul Christiano quote and this comment).

I think that MIRI also empirically doesn't change its strategy much as a result of talking to highly engaged people who have very different world views (eg Paul Christiano), though individual researchers (eg me) often change their minds from talking to these people. (Personally, I also change my mind from talking to non-very-engaged people.)

Maybe talking to outsiders doesn't shift MIRI strategy because we're totally confused about how to think about all of this. But I'd be surprised if we figured this out soon given that we haven't figured it so far. So I'm pretty willing to say "look, either MIRI's onto something or not; if we're onto something, we should go for it wholeheartedly, and I don't seriously think that we're going to update our beliefs much from more public discourse, so it doesn't that seem costly to have our public discourse become costlier".

I guess I generally don't feel that convinced that external criticism is very helpful for situations like ours where there isn't an established research community with taste that is relevant to our work. Physicists have had a lot of time to develop a reasonably healthy research culture where they notice what kinds of arguments are wrong; I don't think AI alignment has that resource to draw on. And in cases where you don't have an established base of knowledge about what kinds of arguments are helpful (sometimes people call this "being in a preparadigmatic field"; I don't know if that's correct usage), I think it's plausible that people with different intuitions should do divergent work for a while and hope that eventually some of them make progress that's persuasive to the others.

By not engaging with critics as much as we could, I think MIRI is probably increasing the probability that we're barking completely up the wrong tree. I just think that this gamble is worth taking.

I'm more concerned about costs incurred because we're more careful about sharing research with highly engaged outsiders who could help us with it. Eg Paul has made some significant contributions to MIRI's research, and it's a shame to have less access to his ideas about our problems.

Comment by buck on I'm Buck Shlegeris, I do research and outreach at MIRI, AMA · 2019-11-21T01:00:31.176Z · score: 17 (7 votes) · EA · GW

I think it's plausible that "solving the alignment problem" isn't a very clear way of phrasing the goal of technical AI safety research. Consider the question "will we solve the rocket alignment problem before we launch the first rocket to the moon"--to me the interesting question is whether the first rocket to the moon will indeed get there. The problem isn't really "solved" or "not solved", the rocket just gets to the moon or not. And it's not even obvious whether the goal is to align the first AGI; maybe the question is "what proportion of resources controlled by AI systems end up being used for human purposes", where we care about a weighted proportion of AI systems which are aligned.

I am not sure whether I'd bet for or against the proposition that humans will go extinct for AGI-misalignment-related-reasons within the next 100 years.

Comment by buck on I'm Buck Shlegeris, I do research and outreach at MIRI, AMA · 2019-11-20T07:19:30.685Z · score: 8 (4 votes) · EA · GW

It's getting late and it feels hard to answer this question, so I'm only going to say briefly:

  • for something MIRI wrote re this, see the "strategic background" section here
  • I think there are cases where alignment is non-trivial but prosaic AI alignment is possible, and some people who are cautious about AGI alignment are influential in the groups that are working on AGI development and cause them to put lots of effort into alignment (eg maybe the only way to align the thing involves spending an extra billion dollars on human feedback). Because of these cases, I am excited for the leading AI orgs having many people in important positions who are concerned about and knowledgeable about these issues.
Comment by buck on I'm Buck Shlegeris, I do research and outreach at MIRI, AMA · 2019-11-20T07:02:55.651Z · score: 12 (5 votes) · EA · GW

I don't think you can prep that effectively for x-risk-level AI outcomes, obviously.

I think you can prep for various transformative technologies; you could for example buy shares of computer hardware manufacturers if you think that they'll be worth more due to increased value of computation as AI productivity increases. I haven't thought much about this, and I'm sure this is dumb for some reason, but maybe you could try to buy land in cheap places in the hope that in a transhuman utopia the land will be extremely valuable (the property rights might not carry through, but it might be worth the gamble for sufficiently cheap land).

I think it's probably at least slightly worthwhile to do good and hope that you can sell some of your impact certificates after good AI outcomes.

You should ask Carl Shulman, I'm sure he'd have a good answer.

Comment by buck on I'm Buck Shlegeris, I do research and outreach at MIRI, AMA · 2019-11-20T06:30:45.650Z · score: 10 (5 votes) · EA · GW

"Do you have any advice for people who want to be involved in EA, but do not think that they are smart or committed enough to be engaging at your level?"--I just want to say that I wouldn't have phrased it quite like that.

One role that I've been excited about recently is making local groups be good. I think that having better local EA communities might be really helpful for outreach, and lots of different people can do great work with this.

Comment by buck on I'm Buck Shlegeris, I do research and outreach at MIRI, AMA · 2019-11-20T06:02:54.573Z · score: 9 (4 votes) · EA · GW

(I've spent a few hours talking to people about the LTFF but I'm not sure about things like "what order of magnitude of funding did they allocate last year" (my guess without looking it up is $1M, (which turns out to be correct!)), so take all this with a grain of salt.)

Re Q1: I don't know, I don't think that we coordinate very carefully.

Re Q2: I don't really know. When I look at the list of things the LTFF funded in August or April (excluding regrants to orgs like MIRI, CFAR, and Ought), about 40% look meh (~0.5x MIRI), about 40% look like things which I'm reasonably glad someone funded (~1x MIRI), about 7% are things that I'm really glad someone funded (~3x MIRI), and 3% are things that I wish that they hadn't funded (-1x MIRI). Note that my mean outcome of the meh, good, and great categories are much higher than the median outcomes--a lot of them are "I think this is probably useless but seems worth trying for value of information". Apparently this adds up to thinking that they're 78% as good as MIRI.

Q3: I don't really know. My median outcome is that they turn out to do less well than my estimation above, but I think there's a reasonable probability that they turn out to be much better than my estimate above, and I'm excited to see them try to do good. This isn't really tied up with AI capability or safety progressing though.

Comment by buck on I'm Buck Shlegeris, I do research and outreach at MIRI, AMA · 2019-11-20T05:41:19.773Z · score: 3 (2 votes) · EA · GW

Idk. A couple percent? I'm very unsure about this.

Comment by buck on I'm Buck Shlegeris, I do research and outreach at MIRI, AMA · 2019-11-20T05:37:07.529Z · score: 18 (9 votes) · EA · GW

I think your sense is correct. I think that plenty of people have short docs on why their approach is good; I think basically no-one has long docs engaging thoroughly with the criticisms of their paths (I don't think Paul's published arguments defending his perspective count as complete; Paul has arguments that I hear him make in person that I haven't seen written up.)

My guess is that it's developed because various groups decided that it was pretty unlikely that they were going to be able to convince other groups of their work, and so they decided to just go their own ways. This is exacerbated by the fact that several AI safety groups have beliefs which are based on arguments which they're reluctant to share with each other.

(I was having a conversation with an AI safety researcher at a different org recently, and they couldn't tell me about some things that they knew from their job, and I couldn't tell them about things from my job. We were reflecting on the situation, and then one of us proposed the metaphor that we're like two people who were sliding on ice next to each other and then pushed away and have now chosen our paths and can't interact anymore to course correct.)

Should we be concerned? Idk, seems kind of concerning. I kind of agree with MIRI that it's not clearly worth it for MIRI leadership to spend time talking to people like Paul who disagree with them a lot.

Also, sometimes fields should fracture a bit while they work on their own stuff; maybe we'll develop our own separate ideas for the next five years, and then come talk to each other more when we have clearer ideas.

I suspect that things like the Alignment Newsletter are causing AI safety researchers to understand and engage with each other's work more; this seems good.

Comment by buck on I'm Buck Shlegeris, I do research and outreach at MIRI, AMA · 2019-11-20T05:13:25.387Z · score: 7 (2 votes) · EA · GW

I'm not sure; my guess is that it's somewhat harder, because we're enthusiastic about our new research directions and have moved some management capacity towards those, and those directions have relatively more room for engineering skillsets vs pure math skillsets.

Comment by buck on I'm Buck Shlegeris, I do research and outreach at MIRI, AMA · 2019-11-20T05:07:38.865Z · score: 13 (7 votes) · EA · GW

I think they're OK. I think some CFAR staff are really great. I think that their incidental effect of causing people to have more social ties to the rationalist/EA Bay Area community is probably pretty good.

I've done CFAR-esque exercises at AIRCS workshops which were very helpful to me. I think my general sense is that a bunch of CFAR material has a "true form" which is pretty great, but I didn't get the true form from my CFAR workshop, I got it from talking to Anna Salamon (and somewhat from working with other CFAR staff).

I think that for (possibly dumb) personal reasons I get more annoyed by them than some people, which prevents me from getting as much value out of them.

I generally am glad to hear that an EA has done a CFAR workshop, and normally recommend that EAs do them, especially if they don’t have as much social connection to the EA/rationalist scene, or if they don’t have high opportunity cost to their time.

Comment by buck on I'm Buck Shlegeris, I do research and outreach at MIRI, AMA · 2019-11-20T05:02:38.856Z · score: 9 (4 votes) · EA · GW

I still feel this way, and I've been trying to think of ways to reduce this problem. I think the AIRCS workshops help a bit, I think that my SSC trip was helpful and EA residencies might be helpful.

A few helpful conversations that I've had recently with people who are strongly connected to the professional EA community, which I think would be harder to have without information gained from these strong connections:

  • I enjoyed a document about AI timelines that someone from another org shared me on.
  • Discussions about how EA outreach should go--how rapidly should we try to grow, what proportion of people who are good fits for EA are we already reaching, what types of people are we going to be able to reach with what outreach mechanisms.
Comment by buck on I'm Buck Shlegeris, I do research and outreach at MIRI, AMA · 2019-11-20T02:40:12.279Z · score: 5 (4 votes) · EA · GW

Probably less than two.

Comment by buck on I'm Buck Shlegeris, I do research and outreach at MIRI, AMA · 2019-11-19T22:24:15.719Z · score: 11 (1 votes) · EA · GW

I think Paul is probably right about the causes of the disagreement between him and many researchers, and the summary of his beliefs in the AI Impacts interview you linked matches my impression of his beliefs about this.

Comment by buck on I'm Buck Shlegeris, I do research and outreach at MIRI, AMA · 2019-11-19T22:19:54.800Z · score: 16 (9 votes) · EA · GW

The most common reason that someone who I would be excited to work with at MIRI chooses not to work on AI alignment is that they decide to work on some other important thing instead, eg other x-risk or other EA stuff.

But here are some anonymized recent stories of talented people who decided to do non-EA work instead of taking opportunities to do important technical work related to x-risk (for context, I think all of these people are more technically competent than me):

  • One was very comfortable in a cushy, highly paid job which they already had, and thought it would be too inconvenient to move to an EA job (which would have also been highly paid).
  • One felt that AGI timelines are probably relatively long (eg they thought that the probability of AGI in the next 30 years felt pretty small to them), which made AI safety feel not very urgent. So they decided to take an opportunity which they thought would be really fun and exciting, rather than working at MIRI, which they thought would be less of a good fit for a particular skill set which they'd been developing for years; this person thinks that they might come back and work on x-risk after they've had another job for a few years.
  • One was in the middle of a PhD and didn't want to leave.
  • One felt unsure about whether it's reasonable to believe all the unusual things that the EA community believes, and didn't believe the arguments enough that they felt morally compelled to leave their current lucrative job.

I feel sympathetic to the last three but not to the first.

Comment by buck on I'm Buck Shlegeris, I do research and outreach at MIRI, AMA · 2019-11-19T22:15:29.706Z · score: 11 (6 votes) · EA · GW

Here are several, together with the percentage of my productivity which I think they cost me over the last year:

  • I’ve lost a couple months of productivity over the last year due to some weird health issues--I was really fatigued and couldn’t think properly for several months. This was terrible, but the problem seems to have gone away for now. This had a 25% cost.
  • I am a worse researcher because I spend half my time doing other things than research. It’s unclear to me how much efficiency this costs me. Potential considerations:
    • When I don’t feel like doing technical work, I can do other work. This should increase my productivity. But maybe it lets me procrastinate on important work.
    • I remember less of the context of what I’m working on, because my memory is spaced out.
    • My nontechnical work often feels like social drama and is really attention-grabbing and distracting.
    • Overall this costs me maybe 15%; I’m really unsure about this though.
  • I’d be a better researcher if I were smarter and more knowledgeable. I’m working on the knowledgeableness problem with the help of tutors and by spending some of my time studying. It’s unclear how to figure out how costly this is. If I’d spent a year working as a Haskell programmer in industry, I’d probably be like 15% more effective now.
Comment by buck on I'm Buck Shlegeris, I do research and outreach at MIRI, AMA · 2019-11-19T22:14:05.045Z · score: 3 (2 votes) · EA · GW

Yeah, I am not targeting that kind of person. Someone who is excited about ML and skeptical of AI safety but interested in engaging a lot with AI safety arguments for a few months might be a good fit.

Comment by buck on I'm Buck Shlegeris, I do research and outreach at MIRI, AMA · 2019-11-19T21:18:12.014Z · score: 23 (9 votes) · EA · GW

As an appendix to the above, some of my best learning experiences as a programmer were the following (starting from when I started programming properly as a freshman in 2012). (Many of these aren’t that objectively hard (and would fit in well as projects in a CS undergrad course); they were much harder for me because I didn’t have the structure of a university course to tell me what design decisions were reasonable and when I was going down blind alleys. I think that this difficulty created some great learning experiences for me.)

  • I translated the proof of equivalence between regular expressions and finite state machines from “Introduction to Automata Theory, Languages, and Computation” into Haskell.
  • I wrote a program which would take a graph describing a circuit built from resistors and batteries and then solve for the currents and potential drops.
  • I wrote a GUI for a certain subset of physics problems; this involved a lot of deconfusion-style thinking as well as learning how to write GUIs.
  • I went to App Academy and learned to write full stack web applications.
  • I wrote a compiler from C to assembly in Scala. It took a long time for me to figure out that I should eg separate out the compiling to an intermediate output that didn’t have registers allocated.
  • I wrote the first version of my data structure searcher. (Not linking because I’m embarrassed by how much worse it was than my second attempt.)
  • I wrote the second version of my data structure searcher, which involved putting a lot of effort into deconfusing myself about what data structures are and how they connect to each other.
    • One example of something I mean by deconfusion here: when you have a composite data structure (eg a binary search tree and a heap representing the same data, with pointers into each other), when you’re figuring out how quickly your reads happen, you take the union of all the things you can do with all your structures--eg you can read in any of the structures. But when you want to do a write, you need to do it in all of the structures, so you take the maximum write time. This feels obvious when I write it now, but wasn’t obvious until I’d figured out exactly what question was interesting to me. And it’s somewhat more complicated--for example, we actually want to take the least upper bound rather than the maximum.
  • At Triplebyte, I wrote a variety of web app features. The one I learned the most from was building a UI for composing structured emails quickly--the UI concept was original to me, and it was great practice at designing front end web widgets and building things to solve business problems. My original design kind of sucked so I had to rewrite it; this was also very helpful. I learned the most from trying to design complicated frontend UI components for business logic, because that involves more design work than backend Rails programming does.
  • I wrote another version of my physics problem GUI, which taught me about designing UIs in front of complicated functional backends.
  • I then shifted my programming efforts to MIRI work; my learning here has mostly been a result of learning more Haskell and trying to design good abstractions for some of the things we’re doing; I’ve also recently had to think about the performance of my code, which has been interesting.

I learned basically nothing useful from my year at PayPal.

I have opinions about how to choose jobs in order to maximize how much programming you learn and I might write them up at some point.

Comment by buck on I'm Buck Shlegeris, I do research and outreach at MIRI, AMA · 2019-11-19T20:46:42.734Z · score: 10 (4 votes) · EA · GW

I don't know about what you need to know in order to do agent foundations research and trust Scott's answer.

If you're seriously considering autodidacting to prepare for a non-agent-foundations job at MIRI, you should email me (buck@intelligence.org) about your particular situation and I'll try to give you personal advice. If too many people email me asking about this, I'll end up writing something publicly.

In general, I'd rather that people talk to me before they study a lot for a MIRI job rather than after, so that I can point them in the right direction and they don't waste effort learning things that aren't going to make the difference to whether we want to hire them.

And if you want to autodidact to work on agent foundations at MIRI, consider emailing someone on the agent foundations team. Or you could try emailing me and I can try to help.

Comment by buck on I'm Buck Shlegeris, I do research and outreach at MIRI, AMA · 2019-11-19T20:43:19.685Z · score: 4 (3 votes) · EA · GW

I feel reluctant to answer this question because it feels like it would involve casting judgement on lots of people publicly. I think that there are a bunch of different orgs and people doing good work on AI safety.

Comment by buck on I'm Buck Shlegeris, I do research and outreach at MIRI, AMA · 2019-11-19T20:41:29.388Z · score: 25 (10 votes) · EA · GW

For the record, I think that I had mediocre judgement in the past and did not reliably believe true things, and I sometimes had made really foolish decisions. I think my experience is mostly that I felt extremely alienated from society, which meant that I looked more critically on many common beliefs than most people do. This meant I was weird in lots of ways, many of which were bad and some of which were good. And in some cases this meant that I believed some weird things that feel like easy wins, eg by thinking that people were absurdly callous about causing animal suffering.

My judgement improved a lot from spending a lot of time in places with people with good judgement who I could learn from, eg Stanford EA, Triplebyte, the more general EA and rationalist community, and now MIRI.

I feel pretty unqualified to give advice on critical thinking, but here are some possible ideas, which probably aren't actually good:

  • Try to learn simple models of the world and practice applying them to claims you hear, and then being confused when they don't match. Eg learn introductory microeconomics and then whenever you hear a claim about the world that intro micro has an opinion on, try to figure out what the simple intro micro model would claim, and then inasmuch as the world doesn't seem to look like intro micro would predict, think "hmm this is confusing" and then try to figure out what about the world might have caused this. When I developed this habit, I started noticing that lots of claims people make about the world are extremely implausible, and when I looked into the facts more I found that intro micro seemed to back me up. To learn intro economics, I enjoyed the Cowen and Tabarrok textbook.
    • I think Katja Grace is a master of the "make simple models and then get confused when the world doesn't match them" technique. See her novel opinions page for many examples.
    • Another subject where I've been doing this recently is evolutionary biology--I've learned to feel confused whenever anyone makes any claims about group selection, and I plan to learn how group selection works, so that when people make claims about it I can assess them accurately.
  • Try to find the simplest questions whose answers you don't know, in order to practice noticing when you believe things for bad reasons.
    • For example, some of my favorite physics questions:
      • Why isn't the Sun blurry?
      • What is the fundamental physical difference between blue and green objects? Like, what equations do I solve to find out that an object is blue?
      • If energy is conserved, why we so often make predictions about the world by assuming that energy is minimized?
    • I think reading Thinking Physics might be helpful at practicing noticing your own ignorance, but I'm not sure.
  • Try to learn a lot about specific subjects sometimes, so that you learn what it's like to have detailed domain knowledge.
Comment by buck on I'm Buck Shlegeris, I do research and outreach at MIRI, AMA · 2019-11-19T20:28:44.051Z · score: 12 (7 votes) · EA · GW

I no longer feel annoyed about this. I'm not quite sure why. Part of it is probably that I'm a lot more sympathetic when EAs don't know things about AI safety than global poverty, because learning about AI safety seems much harder, and I think I hear relatively more discussion of AI safety now compared to three years ago.

One hypothesis is that 80000 Hours has made various EA ideas more accessible and well-known within the community, via their podcast and maybe their articles.

Comment by buck on I'm Buck Shlegeris, I do research and outreach at MIRI, AMA · 2019-11-19T20:23:53.161Z · score: 25 (15 votes) · EA · GW

I feel really sad about it. I think EA should probably have a communication strategy where we say relatively simple messages like "we think talented college graduates should do X and Y", but this causes collateral damage where people who don't succeed at doing X and Y feel bad about themselves. I don't know what to do about this, except to say that I have the utmost respect in my heart for people who really want to do the right thing and are trying their best.

I don't think I have very coherent or reasoned thoughts on how we should handle this, and I try to defer to people who I trust whose judgement on these topics I think is better.

Comment by buck on I'm Buck Shlegeris, I do research and outreach at MIRI, AMA · 2019-11-19T20:17:32.491Z · score: 4 (3 votes) · EA · GW

I don't consume information online very intentionally.

Blogs I often read:

  • Slate Star Codex
  • The Unit of Caring
  • Bryan Caplan (despite disagreeing with him a lot, obviously)
  • Meteuphoric (Katja Grace)
  • Paul Christiano's various blogs

I often read the Alignment Newsletter. I mostly learn things from hearing about them from friends.

Comment by buck on I'm Buck Shlegeris, I do research and outreach at MIRI, AMA · 2019-11-19T19:27:18.356Z · score: 23 (10 votes) · EA · GW

(This is true of all my answers but feels particularly relevant for this one: I’m speaking just for myself, not for MIRI as a whole)

We’ve actually made around five engineer hires since then; we’ll announce some of them in a few weeks. So I was off by a factor of two.

Before you read my more detailed thoughts: please don’t read the below and then get put off from applying to MIRI. I think that many people who are in fact good MIRI fits might not realize they’re good fits. If you’re unsure whether it’s worth your time to apply to MIRI, you can email me at buck@intelligence.org and I’ll (eventually) reply telling you whether I think you might plausibly be a fit. Even if it doesn't go further than that, there is great honor in applying to jobs from which you get rejected, and I feel warmly towards almost everyone I reject.

With that said, here are some of my thoughts on the discrepancy between my prediction and how much we’ve hired:

  • Since I started doing recruiting work for MIRI in late 2017, I’ve updated towards thinking that we need to be pickier with the technical caliber of engineering hires than I originally thought. I’ve updated towards thinking that we’re working in a domain where relatively small increases in competence translate into relatively large differences in output.
    • A few reasons for this:
      • Our work involves dealing a lot with pretty abstract computer science and software engineering considerations; this increases variance in productivity.
      • We use a lot of crazy abstractions (eg various Haskell stuff) that are good for correctness and productivity for programmers who understand them and bad for people who have more trouble learning and understanding all that stuff.
      • We have a pretty flat management structure where engineers need to make a lot of judgement calls for themselves about what to work on and how to do it. As a result, it’s more important for people doing programming work to have a good understanding of everything we’re doing and how their work fits into this.
    • I think it’s plausible that if we increased our management capacity, we’d be able to hire engineers who are great in many ways but who don’t happen to be good in some of the specific ways we require at the moment.
  • Our recruiting process involves a reasonable amount of lag between meeting people and hiring them, because we often want to get to know people fairly well before offering them a job. So I think it’s plausible that the number of full time offers we make is somewhat of a trailing indicator. Over time I’ve updated towards thinking that it’s worth it to take more time before giving people full time offers, by offering internships or trials, so the number of engineering hires lags more than I expected given the number of candidates I’d met who I was reasonably excited by.
  • I’ve also updated towards the importance of being selective on culture fit, eg wanting people who I’m more confident will do well in the relatively self-directed MIRI research environment.

A few notes based on this that are relevant to people who might want to work at MIRI:

  • As I said, our engineering requirements might change in the future, and when that happens I’d like to have a list of people who might be good fits. So please feel free to apply even if you think you’re a bad fit right now.
  • We think about who to hire on an extremely flexible, case-by-case basis. Eg we hire people who know no Haskell at all.
  • If you want to be more likely to be a good fit for MIRI, learning Haskell and some dependent type theory is pretty helpful. I think that it might even be worth it to try to get a job programming in Haskell just in case you get really good at it and then MIRI wants to hire you, though I feel slightly awkward to give this advice because I feel like it’s pushing a pretty specific strategy which is very targeted at only a single opportunity. As I said earlier, if you’re considering taking a Haskell job based on this, please feel free to email me to talk about it.
Comment by buck on I'm Buck Shlegeris, I do research and outreach at MIRI, AMA · 2019-11-19T18:29:32.478Z · score: 21 (11 votes) · EA · GW

Yeah, I think that a lot of EAs working on AI safety feel similarly to me about this.

I expect the world to change pretty radically over the next 100 years, and I probably want to work on the radical change that's going to matter first. So compared to the average educated American I have shorter AI timelines but also shorter timelines to the world becoming radically different for other reasons.

Comment by buck on I'm Buck Shlegeris, I do research and outreach at MIRI, AMA · 2019-11-19T18:01:09.602Z · score: 6 (5 votes) · EA · GW

I see this and appreciate it; the problem is that I want to bet on something like "your overall theory is wrong", but I don't know enough neuroscience to know whether the claims you're making are things that are probably true for reasons unrelated to your overall theory. If you could find someone who I trusted who knew neuroscience and who thought your predictions seemed unlikely, then I'd bet with them against you.

Comment by buck on I'm Buck Shlegeris, I do research and outreach at MIRI, AMA · 2019-11-19T17:54:14.981Z · score: 17 (19 votes) · EA · GW

Most things that look crankish are crankish.

I think that MIRI looks kind of crankish from the outside, and this should indeed make people initially more skeptical of us. I think that we have a few other external markers of legitimacy now, such as the fact that MIRI people were thinking and writing about AI safety from the early 2000s and many smart people have now been persuaded that this is indeed an issue to be concerned with. (It's not totally obvious to me that these markers of legitimacy mean that anyone should take us seriously on the question "what AI safety research is promising".) When I first ran across MIRI, I was kind of skeptical because of the signs of crankery; I updated towards them substantially because I found their arguments and ideas compelling, and people whose judgement I respected also found them compelling.

I think that the signs of crankery in QRI are somewhat worse than 2008 MIRI's signs of crankery.

I also think that I'm somewhat qualified to assess QRI's work (as someone who's spent ~100 paid hours thinking about philosophy of mind in the last few years), and when I look at it, I think it looks pretty crankish and wrong.

Comment by buck on I'm Buck Shlegeris, I do research and outreach at MIRI, AMA · 2019-11-19T07:40:49.158Z · score: 12 (7 votes) · EA · GW

It’s not clear what effect this has had, if any. I am personally somewhat surprised by this--I would have expected more people to stop donating to us.

I asked Rob Bensinger about this; he summarized it as “We announced nondisclosed-by-default in April 2017, and we suspected that this would make fundraising harder. In fact, though, we received significantly more funding in 2017 (https://intelligence.org/2019/05/31/2018-in-review/#2018-finances), and have continued to receive strong support since then. I don't know that there's any causal relationship between those two facts; e.g., the obvious thing to look at in understanding the 2017 spike was the cryptocurrency price spike that year. And there are other factors that changed around the same time too, e.g., Colm [who works at MIRI on fundraising among other things] joining MIRI in late 2016.“

Comment by buck on I'm Buck Shlegeris, I do research and outreach at MIRI, AMA · 2019-11-19T07:37:35.937Z · score: 35 (16 votes) · EA · GW

The obvious answer is “by working on important things at orgs which need software engineers”. To name specific examples that are somewhat biased towards the orgs I know well:

  • MIRI needs software engineers who can learn functional programming and some math
    • I think that if you’re an engineer who likes functional programming, it might be worth your time to take a Haskell job and gamble on MIRI wanting to hire you one day when you’re really good at it. One person who currently works at MIRI is an EA who worked in Haskell for a few years; his professional experience is really helpful for him. If you’re interested in doing this, feel free to email me asking about whether I think it’s a good idea for you.
  • OpenAI’s safety team needs software engineers who can work as research engineers (which might only be a few months of training if you’re coming from a software engineering background; MIRI has a retraining program for software engineers who want to try this; if you’re interested in that, you should email me.)
  • Ought needs an engineering lead
  • The 80000 Hours job board lists positions

I have two main thoughts on how talented software engineers should try to do good.

Strategy 1: become a great software engineer

I think that it’s worth considering a path where you try to become an extremely good software engineer/computer scientist. (I’m going to lump those two disciplines together in the rest of this answer.)

Here are some properties which really good engineers tend to have. I’m going to give examples which are true of a friend of mine who I think is an exceptional engineer.

  • Extremely broad knowledge of computer science (and other quantitative fields). Eg knowledge of ML, raytracing-based rendering, SMT solvers, formal verification, cryptography, physics and biology and math. This means that when he’s in one discipline (like game programming) he can notice that the problem he’s solving could be done more effectively using something from a totally different discipline. Edward Kmett is another extremely knowledgeable programmer who uses his breadth of knowledge to spot connections and write great programs.
  • Experience with a wide variety of programming settings -- web programming, data science, distributed systems, GPU programming.
  • Experience solving problems in a wide variety of computer science disciplines--designing data structures, doing automata theory reductions, designing distributed systems
  • Experience getting an ill-defined computer science problem, and then searching around for a crisp understanding of what’s happening, and then turning that into code and figuring out what fundamental foolish mistakes you’d made in your attempt at a crisp understanding.

I am not as good as this friend of mine, but I’m a lot better at my job because I am able to solve problems like my data structure search problem, and I got much better at solving problems like that from trying to solve many problems like that.

How do I think you should try to be a great programmer? I don’t really know, but here are some ideas:

  • Try to write many different types of programs from scratch. The other day I spent a couple hours trying to write the core of an engine for a real-time strategy game from scratch; I think I learned something useful from this experience. One problem with working as a professional programmer is that you relatively rarely have to build things from scratch; I think it’s worth doing that in your spare time. (Of course, there’s very useful experience that you get from building large projects, too; your work can be a good place to get that experience.)
  • Learn many different disciplines. I think I got a lot out of learning to program web frontends.
  • Have coworkers who are good programmers.
  • Try to learn about subjects and then investigate questions about them by programming. For example, I’ve been learning about population genetics recently, and I’ve been thinking about trying to write a library which does calculations about coalescent theory; I think that this will involve an interesting set of design questions as well as involving designing and using some interesting algorithms, and it’s good practice for finding the key facts in a subject that I’m learning.

It’s hard to know what the useful advice to provide here is. I guess I want to say that (especially early in your career, eg when you’re an undergrad) it might be worth following your passions and interests within computer science, and I think you should plausibly do the kinds of programming you’re most excited by, instead of doing the kinds of programming that feel most directly relevant to EA.

Strategy 2: becoming really useful

Here’s part of my more general theory of how to do good as a software engineer, a lot of which generalizes to other skillsets:

I think it’s helpful to think about the question “why can’t EA orgs just hire non-EAs to do software engineering work for them”. Some sample answers:

  • Non-EAs are often unwilling to join weird small orgs, because they're very risk-averse and don't want to join a project that might fold after four months.
  • Non-EAs aren't as willing to do random generalist tasks, or scrappy product-focused work like building the 80K career quiz, analytics dashboards, or web apps to increase efficiency of various internal tasks.
  • It's easier to trust EAs than non-EAs to do a task when it's hard to supervise them, because the EAs might be more intrinsically motivated by the task, and they might also have more context on what the org is trying to do.
  • Non-EAs aren’t as willing to do risky things like spending a few months learning some topic (eg ML, or a particular type of computer science, or how to build frontend apps on top of Airtable) which might not translate into a job.
  • Non-EAs can be disruptive to various aspects of the EA culture that the org wants to preserve. For example, in my experience EA orgs often have a culture that involves people being pretty transparent with their managers about the weaknesses in the work they've done, and hiring people who have less of that attitude can screw up the whole culture.

I think EA software engineers should try to translate those into ways that they can be better at doing EA work. For example, I think EAs should do the following (these pieces of advice are ranked roughly most to least important.):

  • Try to maintain flexibility in your work situation, so that you can quickly take opportunities which arise for which you’d be a good fit. In order to do this, it’s good to have some runway and general life flexibility.
  • Be willing to take jobs that aren’t entirely software engineering, or which involve scrappy product-focused work. Consider taking non-EA jobs which are going to help you learn these generalist or product-focused-engineering skills. (For example, working for Triplebyte was great for me because it involved a lot of non-engineering tasks and a lot of scrappy, product-focused engineering.)
  • Practice doing work in a setting where you have independent control over your work and where you need to get context on a particular industry. For example, it might be good to take a job at a startup where you’re going to have relatively large amounts of freedom to work on projects that you think will help the company become more efficient, and where the startup involves dealing with a specific domain like spoon manufacturing and so you have to learn about this specific domain in order to be maximally productive.
  • Be willing to take time off to learn skills that might be useful. (In particular, you should be relatively enthusiastic to do this in cases where some EA or EA org is willing to fund you to do it.) Also, compared to most software engineers you should be more inclined to take jobs that will teach you more varied things but which are worse in other ways.
  • Practice working in an environment which rewards transparency and collaborative truth-seeking. I am very unconfident about the following point, but: perhaps you should be wary of working in companies where there’s a lot of office politics or where you have to make up a lot of bullshit, because perhaps that trains you in unhealthy epistemic practices.

I think the point about flexibility is extremely important. I think that if you set your life up so that most of the time you can leave your current non-EA job and move to an EA job within two months, you’re much more likely to get jobs which are very high impact.

A point that’s related to flexibility but distinct: Sometimes I talk to EAs about their careers and they seem to have concrete plans that we can talk about directly, and they’re able to talk about the advantages and disadvantages of various paths they could take, and it overall feels like we’re working together to help them figure out what the best thing for them to do is. When conversations go like this, it’s much easier to do things like figure out what they’d have to change their minds about in order to think they should drop out of their PhD. I think that when people have a mindset like this, it’s much easier for them to be persuaded of opportunities which are actually worth them inconveniencing themselves to access. In contrast, some people seem to treat direct work as something you're 'supposed' to consider, so they put a token effort into it, but their heart isn't in it and they aren't putting real cognitive effort into thinking about different possibilities, ways to overcome initial obstacles, etc.

I think these two points are really important; I think that when I meet someone who is flexible in those ways, my forecast of their impact is about twice as high as it would have been if they weren’t.

Comment by buck on I'm Buck Shlegeris, I do research and outreach at MIRI, AMA · 2019-11-19T07:02:04.498Z · score: 25 (9 votes) · EA · GW

I think of hard takeoff as meaning that AI systems suddenly control much more resources. (Paul suggests the definition of "there is a one year doubling of the world economy before there's been a four year doubling".)

Unless I'm very mistaken, the point Paul is making here is that if you have a world where AI systems in aggregate gradually become more powerful, there might come a turning point where the systems suddenly stop being controlled by humans. By analogy, imagine a country where the military wants to stage a coup against the president, and their power increases gradually day by day, until one day they decide they have enough power to stage the coup. The power wielded by the military increased continuously and gradually, but the amount of control of the situation wielded by the president at some point suddenly falls.

Comment by buck on I'm Buck Shlegeris, I do research and outreach at MIRI, AMA · 2019-11-19T06:55:55.050Z · score: 8 (3 votes) · EA · GW

I don't think there's any other public information.

To apply, people should email me asking about it (buck@intelligence.org). The three people who've received one of these grants were all people who I ran across in my MIRI recruiting efforts.

Two grants have been completed and a third is ongoing. Of the two people who completed grants, both successfully replicated several deep RL papers. and one of them ended up getting a job working on AI safety stuff (the other took a data science job and hopes to work on AI safety at some point in the future).

I'm happy to answer more questions about this.

Comment by buck on I'm Buck Shlegeris, I do research and outreach at MIRI, AMA · 2019-11-19T03:25:58.342Z · score: 20 (10 votes) · EA · GW

I’m speaking very much for myself and not for MIRI here. But, here goes (this is pretty similar to the view described here):

If we build AI systems out of business-as-usual ML, we’re going to end up with systems probably trained with some kind of meta learning (as described in Risks from Learned Optimization) and they’re going to be completely uninterpretable and we’re not going to be able to fix the inner alignment. And by default our ML systems won’t be able to handle the strain of doing radical self-improvement, and they’ll accidentally allow their goals to shift as they self-improve (in the same way that if you tried to make a physicist by giving a ten year old access to a whole bunch of crazy mind altering/enhancing drugs and the ability to do brain surgery on themselves, you might have unstable results). We can’t fix this with things like ML transparency or adversarial training or ML robustness. The only hope of building aligned really-powerful-AI-systems is having a much clearer picture of what we’re doing when we try to build these systems.

Comment by buck on I'm Buck Shlegeris, I do research and outreach at MIRI, AMA · 2019-11-19T03:25:27.332Z · score: 38 (18 votes) · EA · GW

I don’t really remember what was discussed at the Q&A, but I can try to name important things about AI safety which I think aren’t as well known as they should be. Here are some:

----

I think the ideas described in the paper Risks from Learned Optimization are extremely important; they’re less underrated now that the paper has been released, but I still wish that more people who are interested in AI safety understood those ideas better. In particular, the distinction between inner and outer alignment makes my concerns about aligning powerful ML systems much crisper.

----

On a meta note: Different people who work on AI alignment have radically different pictures of what the development of AI will look like, what the alignment problem is, and what solutions might look like.

----

Compared to people who are relatively new to the field, skilled and experienced AI safety researchers seem to have a much more holistic and much more concrete mindset when they’re talking about plans to align AGI.

For example, here are some of my beliefs about AI alignment (none of which are original ideas of mine):

--

I think it’s pretty plausible that meta-learning systems are going to be a bunch more powerful than non-meta-learning systems at tasks like solving math problems. I’m concerned that by default meta-learning systems are going to exhibit alignment problems, for example deceptive misalignment. You could solve this with some combination of adversarial training and transparency techniques. In particular, I think that to avoid deceptive misalignment you could use a combination of the following components:

  • Some restriction of what ML techniques you use
  • Some kind of regularization of your model to push it towards increased transparency
  • Neural net interpretability techniques
  • Some adversarial setup, where you’re using your system to answer questions about whether there exist questions that would cause it to behave unacceptably.

Each of these components can be stronger or weaker, where by stronger I mean “more restrictive but having more nice properties”.

The stronger you can build one of those components, the weaker the others can be. For example, if you have some kind of regularization that you can do to increase transparency, you don’t have to have neural net interpretability techniques that are as powerful. And if you have a more powerful and reliable adversarial setup, you don’t need to have as much restriction on what ML techniques you can use.

And I think you can get the adversarial setup to be powerful enough to catch non-deceptive mesa optimizer misalignment, but I don’t think you can prevent deceptive misalignment without having powerful enough interpretability techniques that you can get around things like the RSA 2048 problem.

--

In the above arguments, I’m looking at the space of possible solutions to a problem and trying to narrow the possibility space, by spotting better solutions to subproblems or reducing subproblems to one another, and by arguing that it’s impossible to come up with a solution of a particular type.

The key thing that I didn’t use to do is thinking of the alignment problem as having components which can be attacked separately, and thinking of solutions to subproblems as being comprised of some combination of technologies which can be thought about independently. I used to think of AI alignment as being more about looking for a single overall story for everything, as opposed to looking for a combination of technologies which together allow you to build an aligned AGI.

You can see examples of this style of reasoning in Eliezer’s objections to capability amplification, or Paul on worst-case guarantees, or many other places.

Comment by buck on I'm Buck Shlegeris, I do research and outreach at MIRI, AMA · 2019-11-19T03:23:41.801Z · score: 34 (14 votes) · EA · GW

I’m going to instead answer the question “What evidence would persuade you that further work on AI safety is low value compared to other things?”

Note that a lot of my beliefs here disagree substantially with my coworkers.

I’m going to split the answer into two steps: what situations could we be in such that I thought we should deprioritize AI safety work, and for each of those, what could I learn that would persuade me we were in them.

Situations in which AI safety work looks much less valuable:

  • We’ve already built superintelligence, in which case the problem is moot
    • Seems like this would be pretty obvious if it happened
  • We have clear plans for how to align AI that work even when it’s superintelligent, and we don’t think that we need to do more work in order to make these plans more competitive or easier for leading AGI projects to adopt.
    • What would persuade me of this:
      • I’m not sure what evidence would be required for me to be inside-view persuaded of this. I find it kind of hard to be inside view persuaded, for the same reason that I find it hard to imagine being persuaded that an operating system is secure.
      • But I can imagine what it might feel like to hear some “solutions to the alignment problem” which I feel pretty persuaded by.
        • I can imagine someone explaining a theory of AGI/intelligence/optimization that felt really persuasive and elegant and easy-to-understand, and then building alignment within this theory.
        • Thinking about alignment of ML systems, it’s much easier for me to imagine being persuaded that we’d solved outer alignment than inner alignment.
      • More generally, I feel like it’s hard to know what kinds of knowledge could exist in a field, so it’s hard to know what kind of result could persuade me here, but I think it’s plausible that the result might exist.
      • If a sufficient set of people whose opinions I respected all thought that alignment was solved, that would convince me to stop working on it. Eg Eliezer, Paul Christiano, Nate Soares, and Dario Amodei would be sufficient (that list is biased towards people I know, this isn’t my list of best AI safety people).
  • Humans no longer have a comparative advantage at doing AI safety work (compared to AI or whole brain emulations or something else)
    • Seems like this would be pretty obvious if it happened.
  • For some reason, the world is going to do enough AI alignment research on its own.
    • Possible reasons:
      • It turns out that AI alignment is really easy
      • It turns out that you naturally end up needing to solve alignment problems as you try to improve AI capabilities, and so all the companies working on AI are going to do all the safety work that they’d need to
      • The world is generally more reasonable than I think it is
      • AI development is such that before we could build an AGI that would kill everyone, we would have had lots of warning shots where misaligned AI systems did things that were pretty bad but not GCR level.
    • What would persuade me of this:
      • Some combination of developments in the field of AGI and developments in the field of alignment
  • It looks like the world is going to be radically transformed somehow before AGI has a chance to radically transform it. Possible contenders here: whole brain emulation, other x-risks, maybe major GCRs which seem like they’ll mess up the structure of the world a lot.
    • What would persuade me of this:
      • Arguments that AGI timelines are much longer than I think. A big slowdown in ML would be a strong argument for longer timelines. If I thought there was a <30% chance of AGI within 50 years, I'd probably not be working on AI safety.
      • Arguments that one of these other things is much more imminent than I think.

I can also imagine being persuaded that AI alignment research is as important as I think but something else is even more important, like maybe s-risks or some kind of AI coordination thing.