Posts

EA Infrastructure Fund: September–December 2021 grant recommendations 2022-07-12T15:24:31.256Z
The case for becoming a black-box investigator of language models 2022-05-06T14:37:13.853Z
Apply to the second ML for Alignment Bootcamp (MLAB 2) in Berkeley [Aug 15 - Fri Sept 2] 2022-05-06T00:19:02.345Z
EA Infrastructure Fund: May–August 2021 grant recommendations 2021-12-24T10:42:08.969Z
Apply to the ML for Alignment Bootcamp (MLAB) in Berkeley [Jan 3 - Jan 22] 2021-11-03T18:20:38.019Z
We're Redwood Research, we do applied alignment research, AMA 2021-10-05T05:04:38.983Z
Funds are available to fund non-EA-branded groups 2021-07-21T01:08:10.308Z
EA Infrastructure Fund: Ask us anything! 2021-06-03T01:06:19.360Z
EA Infrastructure Fund: May 2021 grant recommendations 2021-06-03T01:01:01.202Z
Thoughts on whether we're living at the most influential time in history 2020-11-03T04:07:52.186Z
Some thoughts on EA outreach to high schoolers 2020-09-13T22:51:24.200Z
Buck's Shortform 2020-09-13T17:29:42.117Z
Some thoughts on deference and inside-view models 2020-05-28T05:37:14.979Z
My personal cruxes for working on AI safety 2020-02-13T07:11:46.803Z
Thoughts on doing good through non-standard EA career pathways 2019-12-30T02:06:03.032Z
"EA residencies" as an outreach activity 2019-11-17T05:08:42.119Z
I'm Buck Shlegeris, I do research and outreach at MIRI, AMA 2019-11-15T22:44:17.606Z
A way of thinking about saving vs improving lives 2015-08-08T19:57:30.985Z

Comments

Comment by Buck on How would a language model become goal-directed? · 2022-07-17T14:44:25.177Z · EA · GW

as long as you imitate someone aligned then it doesn't pose much safety risk.

Also, this kind of imitation doesn't result in the model taking superhumanly clever actions, even if you imitate someone unaligned.

Comment by Buck on Senior EA 'ops' roles: if you want to undo the bottleneck, hire differently · 2022-07-17T14:40:18.079Z · EA · GW

I don't normally think you should select for speaking fluent LessWrong jargon, and I have advocated for hiring senior ops staff who have read relatively little LessWrong.

Comment by Buck on Senior EA 'ops' roles: if you want to undo the bottleneck, hire differently · 2022-07-13T14:51:27.482Z · EA · GW

I think we might have fundamental disagreements about 'the value of outside perspectives' Vs. 'the need for context to add value'; or put another way 'the risk of an echo chamber from too-like-minded people' Vs. 'the risk of fracture and bad decision-making from not-like-minded-enough people'. 

I agree that this is probably the crux.

Comment by Buck on EA for dumb people? · 2022-07-11T19:16:32.313Z · EA · GW

(I'm flattered by the inclusion in the list but would fwiw describe myself as "hoping to accomplish great things eventually after much more hard work", rather than "accomplished".)

FWIW I went to the Australian National University, which is about as good as universities in Australia get. In Australia there's way less stratification of students into different qualities of universities--university admissions are determined almost entirely by high school grades, and if you graduate in the top 10% of high school graduates (which I barely did) you can attend basically any university you want to. So it's pretty different from eg America, where you have to do pretty well in high school to get into top universities. I believe that Europe is more like Australia in this regard.

Comment by Buck on EA for dumb people? · 2022-07-11T19:10:30.002Z · EA · GW

This is correct, she graduated but had a hard time doing so, due to health problems. (I hear that Stanford makes it really hard to fail to graduate, because university rankings care about completion rates.)

Note that Kelsey is absurdly smart though, and struggled with school for reasons other than inherently having trouble learning or thinking about things.

Comment by Buck on Senior EA 'ops' roles: if you want to undo the bottleneck, hire differently · 2022-07-11T18:46:20.247Z · EA · GW

(Writing quickly, sorry if I'm unclear)

Since you asked, here are my agreements and disagreements, mostly presented without argument:

  • As someone who is roughly in the target audience (I am involved in hiring for senior ops roles, though it's someone else's core responsibility), I think I disagree with much of this post (eg I think this isn't as big a problem as you think, and the arguments around hiring from outside EA are weak), but in my experience it's somewhat costly and quite low value to publicly disagree with posts like this, so I didn't write anything.
    • It's costly because people get annoyed at me.
    • It's low value because inasmuch as think your advice is bad, I don't really need to persuade you you're wrong, I just need to persuade the people who this article is aimed at that you're wrong. It's generally much easier to persuade third parties than people who already have a strong opinion. And I don't think that it's that useful for the counterarguments to be provided publicly.
      • And if someone was running an org and strongly agreed with you, I'd probably shrug and say "to each their own" rather than trying that hard to talk them out of it: if a leader really feels passionate about shaping org culture a particular way, that's a reasonable argument for them making the culture be that way.
  • For some of the things you talk about in this post (e.g. "The priority tasks are often mundane, not challenging", 'The role is mostly positioned as "enabling the existing leadership team" to the extent that it seems like "do all the tasks that we don't like"') I agree that it is bad inasmuch as EA orgs do this as egregiously as you're describing. I've never seen this happen in an EA org as blatantly as you're describing, but find it easy to believe that it happens.
    • However, if we talked through the details I think there's a reasonable chance that I'd end up thinking that you were being unfair in your description.
    • I think one factor here is that some candidates are actually IMO pretty unreasonably opposed to ever doing grunt work. Sometimes jobs involve doing repetitive things for a while when they're important. For example, I spoke to 60 people or so when I was picking applicants for the first MLAB, which was kind of repetitive but also seemed crucial. It's extremely costly to accidentally hire someone who isn't willing to do this kind of work, and it's tricky to correctly communicate both "we'd like you to not mostly do repetitive work" and "we need you to sometimes do repetitive work, as we all do, because the most important tasks are sometimes repetitive".
  • I think our main disagreement is that you're more optimistic about getting people who "aren't signed up to all EA/long-termist ideas" to help out with high level strategy decisions than I am. In my experience, people who don't have a lot of the LTist context often have strong opinions about what orgs should do that don't really make sense given more context.
    • For example, some strategic decisions I currently face are:
      • Should I try to hire more junior vs more senior researchers?
      • Who is the audience of our research?
      • Should I implicitly encourage or discourage working on weekends?
    • I think that people who don't have experience in a highly analogous setting will often not have the context required to assess this, because these decisions are based on idiosyncrasies of our context and our goals. Senior people without relevant experience will have various potentially analogous experience, and I really appreciate the advice that I get from senior people who don't have the context, but I definitely have to assess all of their advice for myself rather than just following their best practices (except on really obvious things).
    • If I was considering hiring a senior person who didn't have analogous experience and also wanted to have a lot of input into org strategy, I'd be pretty scared if they didn't seem really on board with the org leadership sometimes going against their advice, and I would want to communicate this extremely clearly to the candidate, to prevent mismatched expectations.
    • I think that the decisions that LTist orgs make are often predicated on LTist beliefs (obviously), and people who don't agree with LTist beliefs are going to systematically disagree about what to do, and so if the org hires such a person, they need that person to be okay with getting overruled a bunch on high level strategy. I don't really see how you could avoid this.
  • In general, I think that a lot of your concerns might be a result of orgs trying to underpromise and overdeliver: the orgs are afraid that you will come in expecting to have a bunch more strategic input than they feel comfortable promising you, and much less mundane work than you might occasionally have. (But probably some also comes from orgs making bad decisions.)
Comment by Buck on (Even) More Early-Career EAs Should Try AI Safety Technical Research · 2022-07-01T15:16:34.299Z · EA · GW

I agree with others that these numbers were way high two years ago and are still way high

Comment by Buck on Power dynamics between people in EA · 2022-06-02T03:23:11.033Z · EA · GW

Unfortunately, reciprocity.io is currently down (as of a few hours ago). I think it will hopefully be back in <24 hours.

 

EDIT now back up 

Comment by Buck on Some unfun lessons I learned as a junior grantmaker · 2022-05-31T04:58:00.143Z · EA · GW

If you come across as insulting, someone might say you're an asshole to everyone they talk to for the next five years, which might make it harder for you to do other things you'd hoped to do.

Comment by Buck on Some unfun lessons I learned as a junior grantmaker · 2022-05-28T00:30:00.097Z · EA · GW

The problem with saying things like this isn't that they're time consuming to say, but that they open you up to some risk of the applicant getting really mad at you, and have various other other risks like this. These costs can be mitigated by being careful (eg picking phrasings very intentionally, running your proposed feedback by other people) but being careful is time-consuming.

Comment by Buck on EA and the current funding situation · 2022-05-11T19:16:46.000Z · EA · GW

I massively disagree re the business class point. In particular, many people (e.g. me) can sleep in business class seats that let you lie flat, when they would have not slept and been quite sad and unproductive.

not worth the 2x or 3x ticket price

As a general point, the ratio between prices is irrelevant to the purchasing choice if you're only buying something once--you only care about the difference in price and the difference in value.

Comment by Buck on The case for becoming a black-box investigator of language models · 2022-05-06T21:44:40.794Z · EA · GW

I think that knowing a bit about ML is probably somewhat helpful for this but not very important.

Comment by Buck on A tale of 2.75 orthogonality theses · 2022-05-02T01:11:26.286Z · EA · GW

What do you mean by “uniform prior” here?

Comment by Buck on Longtermist EA needs more Phase 2 work · 2022-04-25T15:44:50.436Z · EA · GW

FWIW I think that compared to Chris Olah's old interpretability work, Redwood's adversarial training work feels more like phase 2 work, and our current interpretability work is similarly phase 2.

Comment by Buck on Are AGI labs building up important intangibles? · 2022-04-10T20:45:44.984Z · EA · GW

One problem with this estimate is that you don’t end up learning how long the authors spent on the project, or how important their contributions were. My sense is that contributors to industry publications often spent relatively little time on the project compared to academic contributors.

Comment by Buck on Are AGI labs building up important intangibles? · 2022-04-10T20:43:26.604Z · EA · GW

Anthropic took less than a year to set up large model training infrastructure from scratch but with the benefit of experience. This indicates that infrastructure isn’t currently extremely hard to replicate.

EleutherAI has succeeded at training some fairly large models (the biggest has like 20B params, compared to 580B in PaLM) while basically just being talented amateurs (and also not really having money). These models introduced a simple but novel tweak to the transformer architecture that PaLM used (parallel attention and MLP layers). This suggests that experience also isn’t totally crucial.

I think that the importance of ML experience for success is kind of low compared to other domains of software engineering.

My guess is that entrenched labs will have bigger advantages as time goes on and as ML gets more complicated.

Comment by Buck on Are there any AI Safety labs that will hire self-taught ML engineers? · 2022-04-08T05:19:00.615Z · EA · GW

As I understand it, DeepMind doesn’t hire people without PhDs as research scientists, and places more restrictions on what research engineers can do than other places.

Comment by Buck on "Long-Termism" vs. "Existential Risk" · 2022-04-06T23:43:18.949Z · EA · GW

I think that the longtermist EA community mostly acts as if we're close to the hinge of history, because most influential longtermists disagree with Will on this. If Will's take was more influential, I think we'd do quite different things than we're currently doing.

Comment by Buck on Are there any AI Safety labs that will hire self-taught ML engineers? · 2022-04-06T23:39:27.606Z · EA · GW

I'm not sure what you mean by "AI safety labs", but Redwood Research, Anthropic, and the OpenAI safety team have all hired self-taught ML engineers. DeepMind has a reputation for being more focused on credentials. Other AI labs don't do as much research that's clearly focused on AI takeover risk.

Comment by Buck on How might a herd of interns help with AI or biosecurity research tasks/questions? · 2022-03-21T17:14:53.808Z · EA · GW

I'm running Redwood Research's interpretability research.

I've considered running an "interpretability mine"--we get 50 interns, put them through a three week training course on transformers and our interpretability tools, and then put them to work on building mechanistic explanations of parts of some model like GPT-2 for the rest of their internship.

My usual joke is "GPT-2 has 12 attention heads per layer and 48 layers. If we had 50 interns and gave them each a different attention head every day, we'd have an intern-day of analysis of each attention head in 11 days."

This is bottlenecked on various things:

  • having a good operationalization of what it means to interpret an attention head, and having some way to do quality analysis of explanations produced by the interns. This could also be phrased as "having more of a paradigm for interpretability work".
  • having organizational structures that would make this work
  • building various interpretability tools to make it so that it's relatively easy to do this work if you're a smart CS/math undergrad who has done our three week course

I think there's a 30% chance that in July, we'll wish that we had 50 interns to do something like this. Unfortunately this is too low a probability for it to make sense for us to organize the internship.

Comment by Buck on Native languages in the EA community (and issues with assessing promisingness) · 2021-12-28T16:37:40.609Z · EA · GW

I have some sympathy to this perspective, and suspect you’re totally right about some parts of this.

They misuse jargon like “updating” and “outside view” in an attempt to get their point across, and their interlocutors decide that talking with them is not worth their time.

However, I totally don’t buy this. IMO the concepts of “updating” and “outside view” are important enough and non-quantitative enough that if someone can’t use that jargon correctly after learning it, I’m very skeptical of their ability to contribute intellectually to EA. (Of course, we should explain what those terms mean the first time they run across them.)

Comment by Buck on Linch's Shortform · 2021-12-03T18:28:30.625Z · EA · GW

How do you know whether you're happy with the results?

Comment by Buck on Linch's Shortform · 2021-12-02T17:14:32.506Z · EA · GW

This argument for the proposition "AI doesn't have an advantage over us at solving the alignment problem" doesn't work for outer alignment—some goals are easier to measure than others, and agents that are lucky enough to have easy-to-measure goals can train AGIs more easily.

Comment by Buck on What are the bad EA memes? How could we reframe them? · 2021-11-16T18:35:23.763Z · EA · GW

Unfortunately this isn’t a very good description of the concern about AI, and so even if it “polls better” I’d be reluctant to use it.

Comment by Buck on Apply to the ML for Alignment Bootcamp (MLAB) in Berkeley [Jan 3 - Jan 22] · 2021-11-04T15:35:22.332Z · EA · GW

No, the previous application will work fine. Thanks for applying :)

Comment by Buck on Buck's Shortform · 2021-11-01T18:53:06.157Z · EA · GW

I think it's bad when people who've been around EA for less than a year sign the GWWC pledge. I care a lot about this.

I would prefer groups to strongly discourage new people from signing it.

I can imagine boycotting groups that encouraged signing the GWWC pledge (though I'd probably first want to post about why I feel so strongly about this, and warn them that I was going to do so).

I regret taking the pledge, and the fact that the EA community didn't discourage me from taking it is by far my biggest complaint about how the EA movement has treated me. (EDIT: TBC, I don't think anyone senior in the movement actively encouraged we to do it, but I am annoyed at them for not actively discouraging it.)

(writing this short post now because I don't have time to write the full post right now)

Comment by Buck on We're Redwood Research, we do applied alignment research, AMA · 2021-10-08T23:39:44.486Z · EA · GW

Yeah basically.

Comment by Buck on We're Redwood Research, we do applied alignment research, AMA · 2021-10-08T23:36:19.158Z · EA · GW

Additionally, what are/how strong are the track records of Redwood's researchers/advisors?


The people we seek advice from on our research most often are Paul Christiano and Ajeya Cotra. Paul is a somewhat experienced ML researcher, who among other things led some of the applied alignment research projects that I am most excited about.

On our team, the people with the most relevant ML experience are probably Daniel Ziegler, who was involved with GPT-3 and also several OpenAI alignment research projects, and Peter Schmidt-Nielsen. Many of our other staff have research backgrounds (including publishing ML papers) that make me feel pretty optimistic about our ability to have good ML ideas and execute on the research.

How important do you think it is to have ML research projects be led by researchers who have had a lot of previous success in ML?
 

I think it kind of depends on what kind of ML research you’re trying to do. I think our projects require pretty similar types of expertise to eg Learning to Summarize with Human Feedback, and I think we have pretty analogous expertise to the team that did that research (and we’re advised by Paul, who led it).

I think that there are particular types of research that would be hard for us to do, due to not having certain types of expertise.

Maybe it's the case that the most useful ML research is done by the top ML researchers

I think that a lot of the research we are most interested in doing is not super bottlenecked on having the top ML researchers, in the same way that Learning to Summarize with Human Feedback doesn’t seem super bottlenecked on having the top ML researchers. I feel like the expertise we end up needing is some mixture of ML stuff like “how do we go about getting this transformer to do better on this classification task”, reasoning about the analogy to the AGI alignment problem, and lots of random stuff like making decisions about how to give feedback to our labellers.

or that the ML community won't take Redwood very seriously (e.g. won't consider using your algorithms) if the research projects aren't lead by people with strong track records in ML.

I don’t feel very concerned about this; in my experience, ML researchers are usually pretty willing to consider research on its merits, and we have had good interactions with people from various AI labs about our research.
 

Comment by Buck on We're Redwood Research, we do applied alignment research, AMA · 2021-10-08T01:45:45.049Z · EA · GW

So one thing to note is that I think that there are varying degrees of solving the technical alignment problem. In particular, you’ve solved the alignment problem more if you’ve made it really convenient for labs to use the alignment techniques you know about. If next week some theory people told me “hey we think we’ve solved the alignment problem, you just need to use IDA, imitative generalization, and this new crazy thing we just invented”, then I’d think that the main focus of the applied alignment community should be trying to apply these alignment techniques to the most capable currently available ML systems, in the hope of working out all the kinks in these techniques, and then repeat this every year, so that whenever it comes time to actually build the AGI with these techniques, the relevant lab can just hire all the applied alignment people who are experts on these techniques and get them to apply them. (You might call this fire drills for AI safety, or having an “anytime alignment plan” (someone else invented this latter term, I don’t remember who).)

 

Assuming that it's taking too long to solve the technical alignment problem, what might be some of our other best interventions to reduce x-risk from AI? E.g., regulation, institutions for fostering cooperation and coordination between AI labs, public pressure on AI labs/other actors to slow deployment, …

I normally focus my effort on the question “how do we solve the technical alignment problem and make it as convenient as possible to build aligned systems, and then ensure that the relevant capabilities labs put effort into using these alignment techniques”, rather than this question, because it seems relatively tractable, compared to causing things to go well in worlds like those you describe.

One way of thinking about your question is to ask how many years the deployment of existentially risky AI could be delayed (which might buy time to solve the alignment problem). I don’t have super strong takes on this question. I think that there are many reasonable-seeming interventions, such as all of those that you describe. I guess I’m more optimistic about regulation and voluntary coordination between AI labs (eg, I’m happy about “Therefore, if a value-aligned, safety-conscious project comes close to building AGI before we do, we commit to stop competing with and start assisting this project.” from the OpenAI Charter) than about public pressure, but I’m not confident.

If we solve the technical alignment problem in time, what do you think are the other major sources of AI-related x-risk that remain? How likely do you think these are, compared to x-risk from not solving the technical alignment problem in time?

Again, I think that maybe 30% of AI accident risk comes from situations where we sort of solved the alignment problem in time but the relevant labs don’t use the known solutions. Excluding that, I think that misuse risk is serious and worth worrying about. I don’t know how much value I think is destroyed in expectation by AI misuse compared to AI accident. I can also imagine various x-risk related to narrow AI in various ways.

Comment by Buck on We're Redwood Research, we do applied alignment research, AMA · 2021-10-08T01:12:00.778Z · EA · GW

We could operationalize this as “How does P(doom) vary as a function of the total amount of quality-adjusted x-risk-motivated AI alignment output?” (A related question is “Of the quality-adjusted AI alignment research, how much will be motivated by x-risk concerns?” This second question feels less well defined.)

I’m pretty unsure here. Today, my guess is like 25% chance of x-risk from AI this century, and maybe I imagine that being 15% if we doubled the quantity of quality-adjusted x-risk-motivated AI alignment output, and 35% if we halved that quantity. But I don’t have explicit models here and just made these second two numbers up right now; I wouldn’t be surprised to hear that they moved noticeably after two hours of thought. I guess that one thing you might learn from these numbers is that I think that x-risk-motivated AI alignment output is really important.

What are the main factors you expect will influence this? (e.g. the occurrence of medium-scale alignment failures as warning shots)

I definitely think that AI x-risk seems lower in worlds where we expect medium-scale alignment failure warning shots. I don’t know whether I think that x-risk-motivated alignment research seems less important in those worlds or not--even if everyone thinks that AI is potentially dangerous, we have to have scalable solutions to alignment problems, and I don’t see a reliable route that takes us directly from “people are concerned” to “people solve the problem”.

I think the main factor that affects the importance of x-risk-motivated alignment research is whether it turns out that most of the alignment problem occurs in miniature in sub-AGI systems. If so, much more of the work required for aligning AGI will be done by people who aren’t thinking about how to reduce x-risk.

Comment by Buck on We're Redwood Research, we do applied alignment research, AMA · 2021-10-07T21:23:53.932Z · EA · GW

Here are some things I think are fairly likely:

  • I think that there might be a bunch of progress on theoretical alignment, with various consequences:
    • More projects that look like “do applied research on various strategies to make imitative generalization work in practice” -- that is, projects where the theory researchers have specific proposals for ML training schemes that have attractive alignment properties, but which have practical implementation questions that might require a bunch of effort to work out. I think that a lot of the impact from applied alignment research comes from making it easier for capabilities labs to adopt alignment schemes, and so I’m particularly excited for this kind of work.
    • More well-scoped narrow theoretical problems, so that there’s more gains from parallelism among theory researchers.
    • A better sense of what kinds of practical research is useful.
    • I think I will probably be noticeably more optimistic or pessimistic -- either there will be some plan for solving the problem that seems pretty legit to me, or else I’ll have updated substantially against such a plan existing.
  • We might have a clearer picture of AGI timelines. We might have better guesses about how early AGI will be trained. We might know more about empirical ML phenomena like scaling laws (which I think are somewhat relevant for alignment).
  • There will probably be a lot more industry interest in problems like “our pretrained model obviously knows a lot about topic X, but we don’t know how to elicit this knowledge from it.” I expect more interest in this because this becomes an increasingly important problem as your pretrained models become more knowledgeable. I think that this problem is pretty closely related to the alignment problem, so e.g. I expect that most research along the lines of Learning to Summarize with Human Feedback will be done by people who need this research for practical purposes, rather than alignment researchers interested in the analogy to AGI alignment problems.
  • Hopefully we’ll have more large applied alignment projects, as various x-risk-motivated orgs like Redwood scale up.
  • Plausibly large funders like Open Philanthropy will start spending large amounts of money on funding alignment-relevant research through RFPs or other mechanisms.
  • Probably we’ll have way better resources for onboarding new people into cutting edge thinking on alignment. I think that resources are way better than they were two years ago, and I expect this trend to continue.
  • Similarly, I think that there are a bunch of arguments about futurism and technical alignment that have been written up much more clearly and carefully now than they had been a few years ago. Eg Joe Carlsmith’s report on x-risk from power-seeking AGI and Ajeya Cotra on AGI timelines. I expect this trend to continue.
Comment by Buck on We're Redwood Research, we do applied alignment research, AMA · 2021-10-07T21:20:17.847Z · EA · GW

I think this is a great question.

We are researching techniques that are simpler precursors to adversarial training techniques that seem most likely to work if you assume that it’s possible to build systems that are performance-competitive and training-competitive, and do well on average on their training distribution.

There are a variety of reasons to worry that this assumption won’t hold. In particular, it seems plausible that humanity will only have the ability to produce AGIs that will collude with each other if it’s possible for them to do so. This seems especially likely if it’s only affordable to train your AGI from scratch a few times, because then all the systems you’re using are similar to each other and will find collusion easier. (It’s not training-competitive to assume you’re able to train the AGI from scratch multiple times, if you believe that there’s a way of building an unaligned powerful system that only involves training it from scratch once.) But even if we train all our systems from scratch separately, it’s pretty plausible to me that models will collude, either via acausal trade or because the systems need to be able to communicate with each other for some competitiveness reason.

So our research is most useful if we’re able to assume a lack of such collusion.

I think that some people think you might be able to apply these techniques even in cases where you don’t have an a priori reason to be confident that the models won’t collude; I don’t have a strong opinion on this.

Comment by Buck on We're Redwood Research, we do applied alignment research, AMA · 2021-10-07T21:05:15.362Z · EA · GW

I think our work is aimed at reducing the theory-practice gap of any alignment schemes that attempt to improve worst-case performance by training the model on data that was selected in the hope of eliciting bad behavior from the model. For example, one of the main ingredients of our project is paying people to try to find inputs that trick the model, then training the model on these adversarial examples.


Many different alignment schemes involve some type of adversarial training. The kind of adversarial training we’re doing, where we just rely on human ingenuity, isn’t going to work for ensuring good behavior from superhuman models. But getting good at the simple, manual version of adversarial training seems like plausibly a prerequisite for being able to do research on the more complicated techniques that might actually scale.

Comment by Buck on We're Redwood Research, we do applied alignment research, AMA · 2021-10-07T20:56:01.180Z · EA · GW

So there’s this core question: "how are the results of this project going to help with the superintelligence alignment problem?" My claim can be broken down as follows:

  • "The problem is relevant": There's a part of the superintelligence alignment problem that is analogous to this problem. I think the problem is relevant for reasons I already tried to spell out here.
  • "The solution is relevant": There's something helpful about getting better at solving this problem. This is what I think you’re asking about, and I haven’t talked as much about why I think the solution is relevant, so I’ll do that here.

I don’t think that the process we develop will generalize, in the sense that I don’t think that we’ll be able to actually apply it to solving the problems we actually care about, but I think it’s still likely to be a useful step.

There are more advanced techniques that have been proposed for ensuring models don’t do bad things. For example, relaxed adversarial training, or adversarial training where the humans have access to powerful tools that help them find examples where the model does bad things (eg as in proposal 2 here). But it seems easier to research those things once we’ve done this research, for a few reasons:

  • It’s nice to have baselines. In general, when you’re doing ML, if you’re trying to develop some new technique that you think will get around fundamental weaknesses of a previous technique, it’s important to start out by getting a clear understanding of how good existing techniques are. ML research often has a problem where people publish papers that claim that some technique is better than the existing technique, and then it turns out that the existing technique is actually just as good if you use it properly (which of course the researchers are incentivized not to do). This kind of problem makes it harder to understand where your improvements are coming from. And so it seems good to try pretty hard to apply the naive adversarial training scheme before moving on to more complicated things.
  • There are some shared subproblems between the techniques we’re using and the more advanced techniques. For example, there are more advanced techniques where you try to build powerful ML-based tools to help humans generate adversarial examples. There’s kind of a smooth continuum between the techniques we’re trying out and techniques where the humans have access to tools to help them. And so many of the practical details we’re sorting out with our current work will make it easier to test out these more advanced techniques later, if we want to.

I often think of our project as being kind of analogous to Learning to summarize with human feedback. That paper isn’t claiming that if we know how to train models by getting humans to choose which of two options they prefer, we’ll have solved the whole alignment problem. But it’s still probably the case that it’s helpful for us to have sorted out some of the basic questions about how to do training from human feedback, before trying to move on to more advanced techniques (like training using human feedback where the humans have access to ML tools to help them provide better feedback).

Comment by Buck on We're Redwood Research, we do applied alignment research, AMA · 2021-10-07T16:36:54.856Z · EA · GW

So to start with, I want to note that I imagine something a lot more like “the alignment community as a whole develops promising techniques, probably with substantial collaboration between research organizations” than “Redwood does all the work themselves”. Among other things, we don’t have active plans to do much theoretical alignment work, and I’d be fairly surprised if it was possible to find techniques I was confident in without more theoretical progress--our current plan is to collaborate with theory researchers elsewhere.

In this comment, I mentioned the simple model of “labs align their AGI if the amount of pressure on them to use sufficiently reliable alignment techniques is greater than the inconvenience associated with using those techniques.” The kind of applied alignment work we’re doing is targeted at reducing the cost of using these techniques, rather than increasing the pressure--we’re hoping to make it cheaper and easier for capabilities labs to apply alignment techniques that they’re already fairly motivated to use, eg by ensuring that these techniques have been tried out in miniature, and so the labs feel pretty optimistic that their practical kinks have been worked out, and there are people who have implemented the techniques before who can help them.

Organizations grow and change over time, and I wouldn’t be shocked to hear that Redwood eventually ended up engaging in various kinds of efforts to get capabilities labs to put more work into alignment. We don’t currently have plans to do so.

Do you hope for your techniques to be useful enough to AGI research that labs adopt them anyway? 

That would be great, and seems plausible.

Do you want to heavily evangelize your techniques in publications/the press/etc.?

I don’t imagine wanting to heavily evangelize techniques in the press. I think that getting prominent publications about alignment research is probably useful.

Comment by Buck on We're Redwood Research, we do applied alignment research, AMA · 2021-10-07T00:34:11.609Z · EA · GW

It seems definitely good on the margin if we had ways of harnessing academia to do useful work on alignment. Two reasons for this are that 1. perhaps non-x-risk-motivated researchers would produce valuable contributions, and 2. it would mean that x-risk-motivated researchers inside academia would be less constrained and so more able to do useful work.

Three versions of this:

  • Somehow cause academia to intrinsically care about reducing x-risk, and also ensure that the power structures in academia have a good understanding of the problem, so that its own quality control mechanisms cause academics to do useful work. I feel pretty pessimistic about the viability of convincing large swathes of academia to care about the right thing for the right reasons. Historically, basically the only way that people have ended up thinking about alignment research in a way that I’m excited about is that they spent a really long time thinking about AI x-risk and talking about it with other interested people. And so I’m not very optimistic about the first of these.
  • Just get academics to do useful work on specific problems that seem relevant to x-risk. For example, I’m fairly excited about some work on interpretability and some techniques for adversarial robustness. On the other hand, my sense is that EA funders have on many occasions tried to get academics to do useful work on topics of EA interest, and have generally found it quite difficult; this makes me pessimistic about this. Perhaps an analogy here is: Suppose you’re Google, and there’s some problem you need solved, and there’s an academic field that has some relevant expertise. How hard should you try to get academics in that field excited about working on the problem? Seems plausible to me that you shouldn’t try that hard--you’d be better off trying to have a higher-touch relationship where you employ researchers or make specific grants, rather than trying to convince the field to care about the subproblem intrinsically (even if they in some sense should care about the subproblem).
  • Get academics to feel generally positively towards x-risk-motivated alignment research, even if they don’t try to work on it themselves. This seems useful and more tractable.
Comment by Buck on We're Redwood Research, we do applied alignment research, AMA · 2021-10-06T23:43:55.780Z · EA · GW

This is a great question and I don't have a good answer.

Comment by Buck on We're Redwood Research, we do applied alignment research, AMA · 2021-10-06T23:42:30.699Z · EA · GW

One simple model for this is: labs build aligned models if the amount of pressure on them to use sufficiently reliable alignment techniques is greater than the inconvenience associated with using those techniques.

Here are various sources of pressure:

  • Lab leadership
  • Employees of the lab
  • Investors
  • Regulators
  • Customers

In practice, all of these sources of pressure are involved in companies spending resources on, eg, improving animal welfare standards, reducing environmental costs, or DEI (diversity, equity, and inclusion).

And here are various sources of inconvenience that could be associated with using particular techniques, even assuming they’re in principle competitive (in both the performance-competitive and training-competitive senses).

  • Perhaps they require using substantially different algorithms or technologies, even if these aren’t fundamentally worse. As a dumb example, imagine that building an aligned AGI requires building your training code in some language that is much less bug-prone than Python, eg Haskell. It’s not really fundamentally harder to do ML in Haskell than Python, but all the ML libraries are in Python and in practice it would require a whole lot of annoying work that an org would be extremely reluctant to do.
  • Perhaps they require more complicated processes with more moving parts.
  • Perhaps they require the org to do things that are different from the things it’s good at doing. For example, I get the sense that ML researchers are averse to interacting with human labellers (because it is pretty annoying) and so underutilize techniques that involve eg having humans in the loop. Organizations that will be at the cutting edge of AI research will probably have organizational structures that are optimized for the core competencies related to their work. I expect these core competencies to include ML research, distributed systems engineering (for training gargantuan models), fundraising (because these projects will likely be extremely capital intensive), perhaps interfacing with regulators, and various work related to commercializing these large models. I think it’s plausible that alignment will require organizational capacities quite different from these. 
  • Perhaps they require you to have capable and independent red teams whose concerns are taken seriously.

And so when I’m thinking about labs not using excellent alignment strategies that had already been developed, I imagine the failures differently depending on how much inconvenience there was:

  • “They just didn’t care”: The amount of pressure on them to use these techniques was extremely low. I’d be kind of surprised by this failure: I feel like if it really came down to it, and especially if EA was willing to spend a substantial fraction of its total resources on affecting some small number of decisions, basically all existing labs could be persuaded to do fairly easy things for the sake of reducing AI x-risk.
  • “They cared somewhat, but it was too inconvenient to use them”. I think that a lot of the point of applied alignment research is reducing the probability of failures like this.
  • “The techniques were not competitive”. In this case, even large amounts of pressure might not suffice (though presumably, sufficiently large amounts of pressure could cause the whole world to use these techniques even if they weren’t that competitive.)
Comment by Buck on We're Redwood Research, we do applied alignment research, AMA · 2021-10-06T22:07:37.778Z · EA · GW

I think that most questions we care about are either technical or related to alignment. Maybe my coworkers will think of some questions that fit your description. Were you thinking of anything in particular?

Comment by Buck on We're Redwood Research, we do applied alignment research, AMA · 2021-10-06T21:40:28.965Z · EA · GW

GPT-3 suggests: "We will post the AMA with a disclaimer that the answers are coming from Redwood staff. We will also be sure to include a link to our website in the body of the AMA, with contact information if someone wants to verify with us that an individual is staff."

Comment by Buck on We're Redwood Research, we do applied alignment research, AMA · 2021-10-06T21:36:45.345Z · EA · GW

I think the main skillsets required to set up organizations like this are: 

  • Generic competence related to setting up any organization--you need to talk to funders, find office space, fill out lots of IRS forms, decide on a compensation policy, make a website, and so on.
  • Ability to lead relevant research. This requires knowledge of running ML research, knowledge of alignment, and management aptitude.
  • Some way of getting a team, unless you want to start the org out pretty small (which is potentially the right strategy).
  • It’s really helpful to have a bunch of contacts in EA. For example, I think it’s been really helpful for EA that I spent a few years doing lots of outreach stuff for MIRI, because it means I know a bunch of people who can potentially be recruited or give us advice.

Of course, if you had some of these properties but not the others, many people in EA (eg me) would be very motivated to help you out, by perhaps introducing you to cofounders or helping you with parts you were less experienced with.

People who wanted to start a Redwood competitor should plausibly consider working on an alignment research team somewhere (preferably leading it) and then leaving to start their own team. We’d certainly be happy to host people who had that aspiration (though we’d think that such people should consider the possibility of continuing to host their research inside Redwood instead of leaving).

Comment by Buck on We're Redwood Research, we do applied alignment research, AMA · 2021-10-06T19:31:10.896Z · EA · GW

Thanks for the kind words!

Our biggest bottlenecks are probably going to be some combination of:

  • Difficulty hiring people who are good at some combination of leading ML research projects, executing on ML research, and reasoning through questions about how to best attack prosaic alignment problems with applied research.
  • A lack of sufficiently compelling applied research available, as a result of theory not being well developed enough.
  • Difficulty with making the organization remain functional and coordinated as it scales.
Comment by Buck on We're Redwood Research, we do applied alignment research, AMA · 2021-10-06T18:48:03.352Z · EA · GW

In most worlds where we fail to produce value, I think we fail before we spend a hundred researcher-years. So I’m also going to include possibilities for wasting 30 researcher-years in this answer.

Here’s some reasons we might have failed to produce useful research: 

  • We failed to execute well on research. For example, maybe we were incompetent at organizing research projects, or maybe our infrastructure was forever bad, or maybe we couldn’t hire a certain type of person who was required to make the work go well.
  • We executed well on research, but failed on our projects anyway. For example, perhaps we tried to implement imitative generalization, but then it turned out to be really hard and we failed to do it. I’m unsure whether to count this as a failure or not, since null results can be helpful. This seems most like a failure if the reason that the project failed was knowable ahead of time.
  • We succeeded on our projects, but they turned out not to be useful. Perhaps we were confused about how to think about the alignment problem. This feels like a big risk to me. 

Some of the value of Redwood comes from building capacity to do more good research in the future (including building up this capacity for other orgs, eg by them being able to poach our employees). So you also have to imagine that this also didn’t work out.

Comment by Buck on We're Redwood Research, we do applied alignment research, AMA · 2021-10-06T18:47:46.950Z · EA · GW

Re 1:

It’s probably going to be easier to get good at the infrastructure engineering side of things than the ML side of things, so I’ll assume that that’s what you’re going for.

For our infra engineering role, we want to hire people who are really productive and competent at engineering various web systems quickly. (See the bulleted list of engineering responsibilities on the job page.) There are some people who are qualified for this role without having much professional experience, because they’ve done a lot of Python programming and web programming as hobbyists. Most people who want to become more qualified for this work should seek out a job that’s going to involve practicing these skills. For example, being a generalist backend engineer at a startup, especially if you’re going to be working with ML, is likely to teach you a bunch of the skills that are valuable to us. You’re more likely to learn these skills quickly if you take your job really seriously and try hard to be very good at it--you should try to take on more responsibilities when you get the opportunity to do so, and generally practice the skill of understanding the current technical situation and business needs and coming up with plans to quickly and effectively produce value.

Re 2:

Currently our compensation packages are usually entirely salary. We don’t have equity because we’re a nonprofit. We’re currently unsure how to think about compensation policy--we’d like to be able to offer competitive salaries so that we can hire non-EA talent for appropriate roles (because almost all the talent is non-EA), but there are a bunch of complexities associated with this.

Comment by Buck on We're Redwood Research, we do applied alignment research, AMA · 2021-10-06T16:23:38.013Z · EA · GW

I think the best examples would be if we tried to practically implement various schemes that seem theoretically doable and potentially helpful, but quite complicated to do in practice. For example, imitative generalization or the two-head proposal here. I can imagine that it might be quite hard to get industry labs to put in the work of getting imitative generalization to work in practice, and so doing that work (which labs could perhaps then adopt) might have a lot of impact.

Comment by Buck on Buck's Shortform · 2021-09-24T01:41:34.434Z · EA · GW

Redwood Research is looking for people to help us find flaws in our injury-detecting model. We'll pay $30/hour for this, for up to 2 hours; after that, if you’ve found interesting stuff, we’ll pay you for more of this work at the same rate. I expect our demand for this to last for maybe a month (though we'll probably need more in future).

If you’re interested, please email adam@rdwrs.com so he can add you to a Slack or Discord channel with other people who are working on this. This might be a fun task for people who like being creative, being tricky, and figuring out how language models understand language.

You can try out the interface here. The task is to find things that the model classifies as non-injurious that are actually injurious according to our definition. Full instructions here

This is in service of this research project

EDIT: update wage from $20/hour to $30/hour.

Comment by Buck on Why AI alignment could be hard with modern deep learning · 2021-09-23T17:57:21.581Z · EA · GW

In other words, if the disagreement was "bottom-up", then you'd expect that at least some people who are optimistic about misalignment risk would be pessimistic about other kinds of AI risk, such as what I call "human safety problems" (see examples here and here) but in fact I don't seem to see anyone whose position is something like, "AI alignment will be easy or likely solved by default, therefore we should focus our efforts on these other kinds of AI-related x-risks that are much more worrying."

 

 

FWIW I know some people who explicitly think this. And I think there are also a bunch of people who think something like "the alignment problem will probably be pretty technically easy, so we should be focusing on the problems arising from humanity sometimes being really bad at technically easy problems".

Comment by Buck on Linch's Shortform · 2021-09-06T04:09:13.354Z · EA · GW

What kinds of things do you think it would be helpful to do cost effectiveness analyses of? Are you looking for cost effectiveness analyses of problem areas or specific interventions?

Comment by Buck on Buck's Shortform · 2021-09-01T01:11:36.833Z · EA · GW

When I was 19, I moved to San Francisco to do a coding bootcamp. I got a bunch better at Ruby programming and also learned a bunch of web technologies (SQL, Rails, JavaScript, etc).

It was a great experience for me, for a bunch of reasons.

  • I got a bunch better at programming and web development.
    • It was a great learning environment for me. We spent basically all day pair programming, which makes it really easy to stay motivated and engaged. And we had homework and readings in the evenings and weekends. I was living in the office at the time, with a bunch of the other students, and it was super easy for me to spend most of my waking hours programming and learning about web development. I think that it was very healthy for me to practice working really long hours in a supportive environment.
    • The basic way the course worked is that every day you’d be given a project with step-by-step instructions, and you’d try to implement the instructions with your partner. I think it was really healthy for me to repeatedly practice the skill of reading the description of a project, then reading the step-by-step breakdown, and then figuring out how to code everything.
    • Because we pair programmed every day, tips and tricks quickly percolated through the cohort. We were programming in Ruby, which has lots of neat little language features that it’s hard to pick up all of on your own; these were transmitted very naturally. I also was pushed to learn my text editor better.
    • The specific content that I learned was sometimes kind of fiddly; it was helpful to have more experienced people around to give advice when things went wrong.
    • I think that this was probably a better learning experience than most tech or research internships I could have gotten. If I’d had access to the best tech/research internships, maybe that would have been better. I think that this was probably a much better learning experience than eg most Google internships seem to be.
  • I met rationalists and EAs in the Bay.
  • I spent a bunch of time with real adults who had had real jobs before. The median age of students was like 25. Most of the people had had jobs before and felt dissatisfied with them and wanted to make a career transition. I think that spending this time with them helped me grow up faster.
  • I somehow convinced my university that this coding bootcamp was a semester abroad (thanks to my friend Andrew Donnellan for suggesting this to me; that suggestion plausibly accelerated my career by six months), which meant that I graduated on schedule even though I then spent six months working for App Academy as a TA (which incidentally was also a good experience.)

Some ways in which my experience was unusual:

  • I was a much stronger programmer on the way in to the program than most of my peers.
  • I am deeply extroverted and am fine with pair programming every day.

It seems plausible to me that more undergrad EAs should do something like this, especially if they can get college credit for it (which I imagine might be hard for most students—I think I only got away with it because my university didn’t really know what was going on). The basic argument here is that it might be good for them the same way it was good for me.

More specifically, I think that there are a bunch of EAs who want to do technical AI alignment work and who are reasonably strong but not stellar coders. I think that if they did a coding bootcamp between, say, freshman and sophomore year, they might come back to school and be a bunch stronger. The bootcamp I did was focused on web app programming with Ruby and Rails and JavaScript. I think that these skills are pretty generically useful to software engineers. I often am glad to be better than my coworkers at quickly building web apps, and I learned those skills at App Academy (though being a professional web developer for a while also helped). Eg in our current research, even aside from the web app we use for getting our contractors to label data, we have to deal with a bunch of different computers that are sending data back and forth and storing it in databases or Redis queues or whatever. A reasonable fraction of undergrad EAs would seem like much more attractive candidates to me if they’d done a bootcamp. (They’d probably seem very marginally less attractive to normal employers than if they’d done something more prestigious-seeming with that summer, but most people don’t do very prestigious-seeming things in their first summer anyway. And the skills they had learned would probably be fairly attractive to some employers.)

This is just a speculative idea, rather than a promise, but I’d be interested in considering funding people to do bootcamps over the summer—they often cost maybe $15k. I am most interested in funding people to do bootcamps if they are already successful students at prestigious schools, or have other indicators of talent and conscientious, and have evidence that they’re EA aligned.

Another thing I like about this is that a coding bootcamp seems like a fairly healthy excuse to hang out in the Bay Area for a summer. I like that they involve working hard and being really focused on a concrete skill that relates to the outside world.

I am not sure whether I’d recommend someone do a web programming bootcamp or a data science bootcamp—though data science might seem more relevant, I think the practical programming stuff in the web programming bootcamp might actually be more helpful on the margin. (Especially for people who are already doing ML courses in school.)

I don’t think there are really any bootcamps focused on ML research and engineering. I think it’s plausible that we could make one happen. Eg I know someone competent and experienced who might run a bootcamp like this over a summer if we paid them a reasonable salary.

Comment by Buck on Buck's Shortform · 2021-08-26T17:06:00.930Z · EA · GW

Doing lots of good vs getting really rich

Here in the EA community, we’re trying to do lots of good. Recently I’ve been thinking about the similarities and differences between a community focused on doing lots of good and a community focused on getting really rich.

I think this is interesting for a few reasons:

  • I found it clarifying to articulate the main differences between how we should behave and how the wealth-seeking community should behave.
  • I think that EAs make mistakes that you can notice by thinking about how the wealth-seeking community would behave, and then thinking about whether there’s a good reason for us behaving differently.

—— Here are some things that I think the wealth-seeking community would do.

  • There are some types of people who should try to get rich by following some obvious career path that’s a good fit for them. For example, if you’re a not-particularly-entrepreneurial person who won math competitions in high school, it seems pretty plausible that you should work as a quant trader. If you think you’d succeed at being a really high-powered lawyer, maybe you should do that.
  • But a lot of people should probably try to become entrepreneurs. In college, they should start small businesses, develop appropriate skills (eg building web apps), start trying to make various plans about how they might develop some expertise that they could turn into a startup, and otherwise practice skills that would help them with this. These people should be thinking about what risks to take, what jobs to maybe take to develop skills that they’ll need later, and so on.

I often think about EA careers somewhat similarly:

  • Some people are natural good fits for particular cookie-cutter roles that give them an opportunity to have a lot of impact. For example, if you are an excellent programmer and ML researcher, I (and many other people) would love to hire you to work on applied alignment research; basically all you have to do to get these roles is to obtain those skills and then apply for a job.
  • But for most people, the way they will have impact is much more bespoke and relies much more on them trying to be strategic and spot good opportunities to do good things that other people wouldn’t have otherwise done.

I feel like many EAs don’t take this distinction as seriously as they should. I fear that EAs see that there exist roles of the first type—you basically just have to learn some stuff, show up, and do what you’re told, and you have a bunch of impact—and then they don’t realize that the strategy they should be following is going to involve being much more strategic and making many more hard decisions about what risks to take. Like, I want to say something like “Imagine you suddenly decided that your goal was to make ten million dollars in the next ten years. You’d be like, damn, that seems hard, I’m going to have to do something really smart in order to do that, I’d better start scheming. I want you to have more of that attitude to EA.”

Important differences:

  • Members of the EA community are much more aligned with each other than wealth-seeking people are. (Maybe we’re supposed to be imagining a community of people who wanted to maximize total wealth of the community for some reason.)
  • Opportunities for high impact are biased to be earlier in your career than opportunities for high income. (For example, running great student groups at top universities is pretty high up there in impact-per-year according to me; there isn’t really a similarly good moneymaking opportunity for which students are unusually well suited.)
  • The space of opportunities to do very large amounts of good seems much narrower than the space of opportunities to make money. So you end up with EAs wanting to work with each other much more than the wealth-maximizing people want to work with each other.
  • It seems harder to make lots of money in a weird, bespoke, non-entrepreneurial role than it is to have lots of impact. There are many EAs who have particular roles which are great fits for them and which allow them to produce a whole bunch of value. I know of relatively fewer cases where someone gets a job which seems weirdly tailored to them and is really high paid.
    • I think this is mostly because my sense is that in the for-profit world, it’s hard to get people to be productive in weird jobs, and you’re mostly only able to hire people for roles that everyone involved understands very well already. And so even if someone would be able to produce a huge amount of value in some particular role, it’s hard for them to get paid commensurately, because the employer will be skeptical that they’ll actually produce all that value, and other potential employers will also be skeptical and so won’t bid their price up.