New series of posts answering one of Holden's "Important, actionable research questions"

post by Evan R. Murphy · 2022-05-12T21:22:33.705Z · EA · GW · None comments

In February, Holden Karnofsky published the Important, actionable research questions for the most important century [EA · GW].

I've been working for a couple months on trying to answer the following of Holden's questions:

"What relatively well-scoped research activities are particularly likely to be useful for longtermism-oriented AI alignment?"

To answer this question, I've started a series of posts exploring the argument that interpretability - that is, research into better understanding what is happening inside machine learning systems -  is a high-leverage research activity for solving the AI alignment problem.

I just published the first two posts on Alignment Forum/LessWrong:

1. Introduction to the sequence: Interpretability Research for the Most Important Century [? · GW]
2. (main post) Interpretability’s Alignment-Solving Potential: Analysis of 7 Scenarios [? · GW]

There will be at least one more post in the series, but in particular post #2 contains a substantial amount of my research on this topic.

None comments

Comments sorted by top scores.