New series of posts answering one of Holden's "Important, actionable research questions"post by Evan R. Murphy · 2022-05-12T21:22:33.705Z · EA · GW · None comments
In February, Holden Karnofsky published the Important, actionable research questions for the most important century [EA · GW].
I've been working for a couple months on trying to answer the following of Holden's questions:
"What relatively well-scoped research activities are particularly likely to be useful for longtermism-oriented AI alignment?"
To answer this question, I've started a series of posts exploring the argument that interpretability - that is, research into better understanding what is happening inside machine learning systems - is a high-leverage research activity for solving the AI alignment problem.
I just published the first two posts on Alignment Forum/LessWrong:
There will be at least one more post in the series, but in particular post #2 contains a substantial amount of my research on this topic.
Comments sorted by top scores.