Important, actionable research questions for the most important century

post by Holden Karnofsky (HoldenKarnofsky) · 2022-02-24T16:34:29.061Z · EA · GW · 14 comments

Contents

  Intro
  A high-level list of important, actionable questions for the most important century
    Questions about AI alignment (more)
    Questions about AI strategy (more)
    Questions about AI “takeoff dynamics” (more)
  How to know whether you can do this work
  Comparison with seeking “crucial considerations” / “Cause X” / “unknown unknowns”
  Comparison with more incremental work
  The unusual sort of people we need
  Notes
None
13 comments

Intro

To a significant degree, I see progress on the cause I consider highest-priority (essentially, making the best of the most important century) as bottlenecked by high-quality research.

There are a number of questions such that well-argued, best-guess answers could help us get a lot more done (including productively spending a lot more money) to improve humanity’s prospects. Examples would be “How hard should we expect the AI alignment problem to be?” and “What will be the first transformative application of AI?” - more below.

Unfortunately, I consider most of these questions extremely hard to investigate productively. They’re vague, open-ended, and generally intimidating. I don’t think they are for everyone - but I think that people with the ability to make progress on them should seriously consider working on them.

In particular, I hope with this post to dispel what I see as a few misconceptions common in the EA community:

Below, I will:

In the future, I hope to:

A high-level list of important, actionable questions for the most important century

This is a high-level list of the questions that seem most important and actionable to me. There’s more detail on how these could be relevant, what examples of working on them might look like, and who works on them today in Appendix 1.

Questions about AI alignment (more)

I would characterize most AI alignment research as being something like: “Pushing forward a particular line of research and/or set of questions; following one’s intuitions about what’s worth working on.” I think this is enormously valuable work, but for purposes of this post, I’m talking about something distinct: understanding the motivations, pros and cons of a variety of approaches to AI alignment, with the aim of gaining strategic clarity and/or changing how talent and resources are allocated.

To work on any of the below questions, I think the first step is gaining that background knowledge. I give thoughts on how to do so (and how much of an investment it would be) in Appendix 2.

How difficult should we expect AI alignment to be? In this post from the Most Important Century series, I argue that this broad sort of question is of central strategic importance.

More detail on what it would look like to work on this sort of question, how it could matter, and who’s working on it today

What experimental results could give us important updates about the likely difficulty of AI alignment? If we could articulate particular experiments whose results would be informative, we could:

More detail on what it would look like to work on this sort of question, how it could matter, and who’s working on it today

What relatively well-scoped research activities are particularly likely to be useful for longtermism-oriented AI alignment?

I think there is very little AI alignment research today that has both of the following properties: (a) it’s likely to be relevant for the hardest and most important parts of the problem; (b) it’s also the sort of thing that researchers can get up to speed on and contribute to relatively straightforwardly (without having to take on an unusual worldview, match other researchers’ unarticulated intuitions, etc.)

Working on this question could mean arguing that a particular AI alignment agenda has both properties, or coming up with a new way of thinking about AI alignment that offers research with both properties. Anything we can clearly identify as having these properties unlocks the potential to pour money and talent toward a relatively straightforward (but valuable) research goal - via prizes, grant programs, fellowships, conditional investments in AI companies (though I think today’s leading AI labs would be excited to do more of this work without needing any special incentive), etc.

More detail on what it would look like to work on this sort of question, how it could matter, and who’s working on it today

What’s an AI alignment result or product that would make sense to offer a $1 billion prize for?

I think longtermist funders would be excited to fund and launch such a prize if it were well-designed. I’d expect scoping the prize (to prevent spurious wins but also give maximum clarity as to the goals), promoting the prize, giving guidance to entrants, and judging entries to be a lot of work, so I’d be most excited to do it with a backdrop of having done the hard intellectual work to figure out what’s worth rewarding.

More detail on what it would look like to work on this sort of question, how it could matter, and who’s working on it today

Questions about AI strategy (more)

These could require an interesting mix of philosophical reasoning and more “worldly” reasoning about institutions, geopolitics, etc. I think the ideal researcher would also be highly informed on, and comfortable with, the general state of AI research and AI alignment research, though they need not be as informed on these as for the previous section.

How should we value various possible long-run outcomes relative to each other? E.g., how should we value “utopia” (a nearly optimal outcome) vs. “dystopia” (an outcome nearly as bad as possible) vs. “paperclipping” (a world run by misaligned AI) vs. more middling outcomes?

Most of the thinking I’ve seen on this topic to date has a general flavor: “I personally am all-in on some ethical system that says the odds of utopia [or, in some cases, dystopia] are approximately all that matters; others may disagree, which simply means we have different goals.” I think we can do better than that, as elaborated in the relevant section of Appendix 1.

This is a sprawling topic with a lot of potential applications. Some examples:

I think that reasoning about moral uncertainty and acausal trade could be important here.

More detail on what it would look like to work on this sort of question, how it could matter, and who’s working on it today

How should we value various possible medium-run outcomes relative to each other? E.g., how much should one value “transformative AI is first developed in country A” vs. “transformative AI is first developed in country B”, or “transformative AI is first developed by company A vs. company B”, or “transformative AI is developed 5 years sooner/later than it would have been otherwise?”

If we were ready to make a bet on any particular intermediate outcome in this category being significantly net positive for the expected value of the long-run future, this could unlock a major push toward making that outcome more likely. I’d guess that many of these sorts of “intermediate outcomes” are such that one could spend billions of dollars productively toward increasing the odds of achieving them, but first one would want to feel that doing so was at least a somewhat robustly good bet.3

More detail on what it would look like to work on this sort of question, how it could matter, and who’s working on it today

What does a “realistic best case transition to transformative AI” look like?

I think that major AI labs with aspirations toward transformative AI want to “do the right thing” if they develop it and are able to align it, but currently have very little to say about what this would mean. They also seem to make pessimistic assumptions about what others would do if they developed transformative AI (even assuming it was aligned).

I think there’s a big vacuum when it comes to well-thought-through visions of what a good outcome could look like, and such a vision could quickly receive wide endorsement from AI labs (and, potentially, from key people in government). I think such an outcome would be easily worth billions of dollars of longtermist capital.

More detail on what it would look like to work on this sort of question, how it could matter, and who’s working on it today

How do we hope an AI lab - or government - would handle various hypothetical situations in which they are nearing the development of transformative AI, and what does that mean for what they should be doing today?

Luke Muehlhauser and I sometimes refer to this general sort of question as the “AI deployment problem”: the question of how and when to build and deploy powerful AI systems, under conditions of uncertainty about how safe they are and how close others are to deploying powerful AI of their own.

My guess is that thinking through questions in this category can shed light on important, non-obvious actions that both AI labs and governments should be taking to make these sorts of future scenarios less daunting. This could, in turn, unlock interventions to encourage these actions.

More detail on what it would look like to work on this sort of question, how it could matter, and who’s working on it today

Questions about AI “takeoff dynamics” (more)

I think these are especially well-suited to people with economics-ish interests.

What are the most likely early super-significant applications of AI? It seems possible that AI applied in some narrow domain - such as chemical modeling, persuasion or law enforcement and the military - will be super-significant, and will massively change the world and the strategic picture before highly general AI is developed. I don’t think longtermists have done much to imagine how such developments could change key strategic considerations around transformative AI, and what we could be doing today to get ahead of such possibilities.

If we could identify a few areas that seem particularly likely to see huge impact from AI advances, this could significantly affect a number of other strategic considerations, as well as highlighting some additional ways for longtermists to have impact (e.g., by working in key industries, understanding them well and getting ahead of key potential AI-driven challenges).

More detail on what it would look like to work on this sort of question, how it could matter, and who’s working on it today

To what extent should we expect a “fast” vs. “slow” takeoff? There are a few ways to think about what this means and why it matters; one abbreviation might be “We want to know whether we should expect the massive importance and key challenges of AI to be clear common knowledge while there is still significant time to for people to work toward solutions, or whether we should expect developments that largely ‘take the world by surprise’ and are conducive to extreme power imbalances.”

I think this question feeds importantly into a number of questions about strategy, particularly about (from the previous section) what medium-run outcomes we should value and what sorts of things labs and governments should be prepared to do. Meaningful updates on likely takeoff dynamics could end up steering a lot of money and talent away from some interventions and towards others.

More detail on what it would look like to work on this sort of question, how it could matter, and who’s working on it today

How should longtermist funders change their investment portfolios? There are a number of ways in which a longtermist funder should arguably diverge from standard best practices in investing. These could include having a different attitude towards risk (which depends crucially on the question of how fast money becomes less valuable as there is more of it), doing “mission hedging” (e.g., investing in companies that are likely to do particularly well if transformative AI is developed sooner than expected), and betting on key views about the future.

A well-argued case for making particular kinds of investments would likely be influential for major longtermist funders collectively accounting for tens of billions of dollars in capital. On a 10-year time frame, an investment change that causes an extra percentage point of returns per year could easily be worth over $1 billion.

To the extent that other questions covered in this piece feed into investment decisions, this increases those questions’ action-relevance and potential impact.

More detail on what it would look like to work on this sort of question, how it could matter, and who’s working on it today

How to know whether you can do this work

I think the vast majority of people aren’t a fit for the sort of work outlined in this post. But I don’t think the main blocker is experience, and I fear that some of the few people who could be a fit are “ruling themselves out” based on takes like “I don’t have outlier mathematical ability.” or “I’ve never thought about AI or AI alignment before, and it would take me too long to catch up.”

While I think it’s extraordinarily hard to make progress on the sort of questions listed above, I think it’s pretty straightforward (and not inordinately time-consuming) to explore whether one might be able to make progress. Here’s roughly what I have in mind, for assessing your fit for working on a particular question along the lines above (this assumes you are able to get enough time freed up; as I noted before, Open Philanthropy may offer support for this in the future):

  1. Read a question (I suggest the more detailed versions in Appendix 1) and try to picture yourself working on it. If you had read enough to have basic background on existing approaches to the question, and you had a full day off tomorrow, can you imagine yourself getting up in the morning, sitting down to work on the question, and working on the question for most of the day? Or do you find yourself thinking “I just have no idea what steps I would even take, the whole thing feels bizarre and impossible to picture?” If the latter, it seems reasonable to stop there.
  2. Next, “get up to speed” on relevant topics (e.g., AI alignment) such that you feel you have enough basic background that you can get yourself to start thinking/writing about the question directly. I don’t think this means getting “fully up to speed” or anything like it, and it definitely doesn’t mean reading everything (or even most things) relevant to the key topics. The goal here is to “get started” with a highly imperfect approach, not to be fully informed. If you find “getting up to speed” aversive, feel you’ve made no progress in several tries on understanding key readings, or find yourself pouring more and more time into readings without seeming to get closer to feeling ready to start on #3, I think it’s reasonable to stop there.
  3. Free up a day to work on the question. Do you put in at least two hours of solid, focused work during that day, such that you feel like you ended the day a little further along than you started? If not, and another couple of tries yield the same results, it seems reasonable to stop there.
  4. Find a way to free up four weeks to mostly devote to the question (this part is more challenging and risky from a logistical and funding standpoint, and we’re interested in hearing from people who need help going from #3 to #4). During that time, do you put in at least, say, 604 hours of solid, focused, forward-moving work on the question? Are you starting to think, “This is work it feels like I can do, and I think I’m making progress?” If not, it seems reasonable to stop there.
  5. At this point I think it’s worth trying to find a way to spend several months on the question. I think it’s reasonable to aim for “a rough draft of something that could go on the EA Forum” - it doesn’t need to be a full answer, but should be some nontrivial point you feel you can argue for that seems useful - within a few months.
  6. From there, I think feedback from others in the community can give you some evidence about your potential fit.

I expect most people to tap out somewhere in the #1-#3 zone, and I don’t think any of those steps require a massive investment of time or leaving one’s job. (#2 is the most time-consuming of the steps, but doesn’t require dedicated blocks of time and is probably useful whether or not things end up working out.) I think it makes more sense to assess one’s fit via steps like these than via making assumptions about needed credentials and experience.

(One might want to explore their fit for several questions before tapping out on the whole endeavor.)

I say more about the general process of working on questions like these in Learning by Writing, and I plan to write more on this topic soon.

Comparison with seeking “crucial considerations” / “Cause X” / “unknown unknowns”

I think working on the questions above loads heavily on things like creativity, independent thinking, mental flexibility, ability to rescope and reimagine a question as one goes, etc. I think one needs a great deal of these qualities to make any progress, and it seems like the sky’s the limit in terms of how much these qualities could help one come up with crucial, game-changing angles on the questions above.

I want to address an impression I think many in the EA community have, which is that the people who stand out most on these properties (creativity, independent thinking, etc.) should not be working on questions like the above, but instead should be trying to identify questions and topics that currently aren’t on anyone’s radar at all. For example:

It’s of course possible that looking for such things is the highest-value activity (if done well); there’s basically no way to rule this out. However, I want to note that:

The effective altruism community’s top causes and “crucial considerations” seem to have (exclusively?) been identified very early.

There are some reasons to think that future “revolutionary crucial considerations” will be much harder to find, if they exist.

Working on the above questions might be a promising route to identifying new crucial considerations, anyway. For example, this question presents an opportunity to reason about anthropics and acausal trade with a tangible purpose in mind; questions about AI “takeoff dynamics” could change our picture of the next few decades in a way somewhat analogous to (though less dramatic than) thinking of transformative AI in the first place.

I certainly don’t intend to say that the hunt for “crucial considerations” or “Cause X” isn’t valuable. But my overall guess is that at least some of the above questions have higher value (in expectation), while being comparably challenging and neglected.

Comparison with more incremental work

I think there are a number of people today who aspire to clarify the strategic situation for the most important century (in a similar spirit to this post), but prefer a strategy of working on “bite-sized chunks” of research, rather than trying to directly tackle crucial but overwhelming questions like the ones above.

They might write a report entirely on some particular historical case study, or clarification of terms, or relatively narrow subquestion; the report isn’t intended or designed to cause a significant update of the kind that would unlock a lot of money and talent toward some goal, but rather to serve as one potential building block for others toward such an update.

I think this approach is completely reasonable, especially for purposes of getting practice with doing investigations and reports [EA · GW]. But I think there is also something to be said for directly tackling the question you most want the all-things-considered answer to (or at least a significant update on). I think the latter is more conducive to skipping rabbit holes that aren’t necessary for the big-picture goal, and the skill of skipping such rabbit holes and focusing on what might update us is (IMO) one of the most crucial ones to get practice with.

Furthermore, a big-picture report that skips a lot of steps and has a lot of imperfections can help create more opportunities to do narrower work that fits in in a clear way. For example, the biological anchors report [LW · GW] goes right at the question of transformative AI timelines rather than simply addressing a piece of the picture (e.g., trends in compute costs or thoughts on animal cognitive capabilities) that might be relevant. A lot of the things it handles imperfectly, or badly, have since become the subject of other reports and debates. There’s plenty of room for debate on whether that report - broad, sweeping, directly hypothesizing an answer to the question we care about most, while giving only short treatment to many important subquestions - is a better use of time than something narrower and more focused; I personally feel it is, at least with the present strategic landscape as it stands.

The unusual sort of people we need

A long time ago, I was struck by this post by Nate Soares [LW · GW]. At that time, I had a pretty low opinion of the rationality community and of MIRI ... and yet, it struck me as at least really *unusual* that someone would turn on a dime from a ten-year quest to reform the fundamentals of how governments are structured to, at least as far as I could tell at the time, focusing on donating to (and then being a researcher for) MIRI. In my experience, the kind of person who hatches and pursues their own vision like this is not the kind of person who then jumps on board with someone else's. In this respect at least, Nate seemed unusual.

I think that kind of unusualness is what we need more of right now. I think the kind of person who can make serious headway on the above questions is the kind of person who is naturally motivated by creating their own frameworks and narratives, and will naturally be drawn to hatching and promoting some worldview that is recognizably "their own" (such that it might e.g. be named after them) - rather than working on problems that neatly fit within a pretty significant and well-funded community's existing conventional wisdom.

And maybe there's more value in the former than in the latter - but that's not my best guess. My guess is that we need people who break that mold: people who very much have the potential to build a plausible worldview all their own, but choose not to because they instead want to do as much good as possible.

Notes

  1. It’s harder to know what the situation looks like for people doing this work on their own. 

  2. To be a bit more precise, I think that a year’s study could result in someone having a level of knowledge that would put them in the 25 most knowledgeable people today. Hopefully, by the time that year is up, the bar for being in the top 25 will be quite a bit higher, though. 

  3. By this I mean not “has no downside” but rather “looks like a good bet based on the best reasoning, analysis and discussion that can reasonably be done.” I wouldn’t want to start down a path of spending billions of dollars on X while I still felt there were good extant arguments against doing so that hadn’t been well considered. 

  4. For someone new to this work, I think of “four hours of focused work” as a pretty good day, and “two hours of focused work” as a lower-end day (though there are certainly days that go outside of this range in both directions). So here I’m assuming 4 weeks, 5 days per week, 3 hours per day on average. Some people might do something more like “4 hours a week for the first 3 weeks, then 7 hours a day for 7 straight days in the fourth week” and I’d consider that a success as well. 

14 comments

Comments sorted by top scores.

comment by Mauricio · 2022-02-24T21:12:50.386Z · EA(p) · GW(p)

Thanks for this! For the more governance-oriented questions (specifically, the 2nd-4th questions under AI strategy, and the 1st question about takeoff dynamics), how useful do you (or others) think deep experience with relevant governance organizations is? I wonder what explains the apparent difference between the approach suggested by this post (which I read as not emphasizing gaining relevant experience, and instead suggesting "just start trying to figure this stuff out") and the approach suggested by this other post [EA · GW]:

If you want to try this kind of work [contributing to research that may provide greater strategic clarity in the future, in the context of AI governance], in most cases I recommend that you [among other things] gain experience working in relevant parts of key governments and/or a top AI lab (ideally both) so that you acquire a detailed picture of the opportunities and constraints those actors operate with.

(Maybe it's that people can test their fit without much experience, but would get lots of value out of that experience for actually doing this work?)

Replies from: HoldenKarnofsky
comment by Holden Karnofsky (HoldenKarnofsky) · 2022-03-31T23:08:51.228Z · EA(p) · GW(p)

I think "people can test their fit without much experience, but would get lots of value out of that experience for actually doing this work" is pretty valid, though I'll also comment that I think there are diminishing returns to direct experience - I think getting some experience (or at least exposure, e.g. via conversation with insiders) is important, but I don't think one necessarily needs several years inside key institutions in order to be helpful on problems like these.

comment by Mauricio · 2022-03-03T08:40:21.554Z · EA(p) · GW(p)

I'd tentatively suggest an additional question for the post's list of research questions (in the context of the idea that we may only get narrow/minimalist [? · GW] versions of alignment):

Assuming that transformative AI will be aligned, how good will the future be?

My (not very confident) sense is that opinions on this question are highly varied, and that it's another important strategic question. After all,

  • Some people seem to think that, if transformative AI will be aligned, then the future will be amazing.
    • A common justification for this view seems to be: AI will be aligned to people/groups who on reflection would have good values (because most people/institutions have such values, or because people/groups with good values are on track to influence), and AI-assisted deliberation & coordination will be enough to bootstrap them from that starting point to an amazing future.
    • If we had good arguments for this, the community could focus on alignment.
  • Some people seem to think that, even if transformative AI will be aligned, the future won't be all that amazing.
    • Common justifications for this view seem to be: AI will be aligned to individuals or (coordinated) groups with lame or bad values, either because they are already on track to influence or because inadequate cooperation will erode value during or after the development of transformative AI.
    • If we had good arguments for this, the community could dedicate a large fraction of its resources to addressing whatever may cause a future with aligned AI to not be great (e.g., by boosting certain organizational or individual actors, improving institutions, forming "cooperation-compatible" plans for using aligned AI, or otherwise improving cooperation).
Replies from: Mauricio, Mauricio
comment by Mauricio · 2022-03-03T09:13:48.229Z · EA(p) · GW(p)

Some existing work on these topics, as potential starting points for people interested in looking into this (updated March 11, 2022):

comment by Mauricio · 2022-03-03T09:34:20.853Z · EA(p) · GW(p)

(This could be accurately seen as an implicit sub-question of "How should we value various possible medium-run outcomes relative to each other?", since it's asking "How should we value a medium-term outcome of transformative AI being aligned?". But I think this question is important enough that it's useful to give it more emphasis than that--if not for making progress on it, at least for clarifying where people's disagreements come from.)

comment by Benjamin_Todd · 2022-03-05T18:07:30.242Z · EA(p) · GW(p)

This is really useful, thank you! We'll be linking to it in a couple of places on 80k.

comment by Evan R. Murphy · 2022-03-15T21:46:55.857Z · EA(p) · GW(p)

What relatively well-scoped research activities are particularly likely to be useful for longtermism-oriented AI alignment?
[...]
(3)    Activity that is likely to be relevant for the hardest and most important parts of the problem, while also being the sort of thing that researchers can get up to speed on and contribute to relatively straightforwardly (without having to take on an unusual worldview, match other researchers’ unarticulated intuitions to too great a degree, etc.)

 

I'm planning to spend some time working on this question, or rather part of it. In particular I'm going to explore the argument that interpretability research falls into this category, with some attention to which specific aspects or angles of interpretability research seem most useful.

Since I don't plan to spend much time thoroughly examining other research directions besides interpretability, I don't expect to have a complete comparative answer to the question. But by answering the question for interpretability, I hope to at least put together a fairly comprehensive argument for (or perhaps against, we'll see after I look at the evidence!) interpretability research that could be used by those considering it as a target for their funding or their time. I also hope that then someone trying to answer the larger question could use my work on interpretability as part of a comparative analysis across different research activities.

If someone is already working on this particular question and I'm duplicating effort, please let me know and perhaps we can sync up. Otherwise, I hope to have something to show on this question in a few/several weeks!

Replies from: Evan R. Murphy
comment by matthewp · 2022-02-26T22:50:55.418Z · EA(p) · GW(p)

> How difficult should we expect AI alignment to be?

With many of the AI questions, one needs to reason backwards rather than pose the general question.

Suppose we all die because unaligned AI. What form did the unaligned AI take? How did it work? Which things that exist now were progenitors of it, and what changed to make it dangerous? How could those problems have been avoided, technically? Organisationally?

I don't see how useful alignment research can be done quite separately to capabilities research. Otherwise we'll get will be people coming in at the wrong time with a bunch of ideas that lack technical purchase. 

Similarly, the questions about what applications we'll see first are already hinted at in capabilities research.

That being the case, it will take way more energy than 1 year for someone to upskill because they actually need to understand something about capabilities work.

comment by Mauricio · 2022-03-03T07:50:01.180Z · EA(p) · GW(p)

If we had good arguments that alignment will be very hard and require “heroic coordination,” the EA funders and the EA community could focus on spreading these arguments and pushing for coordination/cooperation measures. [...] If we had good arguments that it won’t be, we could focus more on speeding/boosting the countries, labs and/or people that seem likely to make wise decisions about deploying transformative AI.

My intuition is to also flag a potential intermediate outcome: maybe we'll have good enough arguments that alignment will be moderately difficult--difficult enough that it's not very likely to happen by default, yet not so default that it requires "heroic coordination." In that case, the community could spend a large fraction of its resources on increasing investment into high-quality alignment research (e.g., by funding/doing such research, convincing others to fund/do it, increasing others' incentives to fund it, and preventing a race to the bottom (since that may undermine all of the previous things in this list)).

comment by jacobpfau · 2022-02-27T20:45:43.428Z · EA(p) · GW(p)

Re: feasibility of AI alignment research, Metaculus already has Control Problem solved before AGI invented . Do you have a sense of what further questions would be valuable?

Replies from: HoldenKarnofsky
comment by Holden Karnofsky (HoldenKarnofsky) · 2022-03-31T23:08:28.215Z · EA(p) · GW(p)

I don't have anything available for this offhand - I'd have to put serious thought into what questions are at the most productive intersection of "resolvable", "a good fit for Metaculus" and "capturing something important." Something about warning signs ("will an AI system steal at least $10 million?") could be good.

comment by brb243 · 2022-05-07T22:50:31.871Z · EA(p) · GW(p)

Would you consider breaking down your questions into sub-(sub-sub-...) questions that readers can answer and coordinating the discourse? I made a (fun) synthesis sheet for 216 participants (1 central EA-related question broken down into 3 layers of 6 questions). For each end-question, I included a resource which I vetted for EA-related thinking stimulation and (some) idea non-repetition. Feel free to also review a rough draft of breaking down the questions that you introduce in this piece.

I would argue that any intro to EA fellowship participant who was not discouraged from involvement can answer these questions. First, the broader ones should be skimmed and then the end-one selected.

This would result in efficient thought development, defining exclusivity by participation.