Posts

When reporting AI timelines, be clear who you're (not) deferring to 2022-10-10T14:24:14.428Z
Classifying sources of AI x-risk 2022-08-08T18:18:50.806Z
A Survey of the Potential Long-term Impacts of AI 2022-07-18T09:48:30.175Z
CSER is hiring for a senior research associate on longterm AI risk and governance 2022-01-24T13:24:22.817Z
The longtermist AI governance landscape: a basic overview 2022-01-18T12:58:34.426Z
Clarifications about structural risk from AI 2022-01-18T12:57:42.829Z
What is most confusing to you about AI stuff? 2021-11-23T16:00:00.556Z
CGP Grey: The Fable of the Dragon-Tyrant 2021-11-11T09:01:49.213Z
General vs specific arguments for the longtermist importance of shaping AI development 2021-10-15T14:43:37.694Z
Lessons learned running the Survey on AI existential risk scenarios 2021-10-13T11:33:51.807Z
SamClarke's Shortform 2021-10-04T08:10:44.178Z
[Link post] How plausible are AI Takeover scenarios? 2021-09-27T13:03:54.369Z
Survey on AI existential risk scenarios 2021-06-08T17:12:29.810Z

Comments

Comment by Sam Clarke (SamClarke) on When reporting AI timelines, be clear who you're (not) deferring to · 2022-10-10T17:36:46.398Z · EA · GW

Unfortunately, when someone tells you "AI is N years away because XYZ technical reasons," you may think you're updating on the technical reasons, but your brain was actually just using XYZ as excuses to defer to them.

I really like this point. I'm guilty of having done something like this loads myself.

When someone gives you gears-level evidence, and you update on their opinion because of that, that still constitutes deferring. What you think of as gears-level evidence is nearly always disguised testimonial evidence. At least to some, usually damning, degree. And unless you're unusually socioepistemologically astute, you're just lost to the process.

If it's easy, could you try to put this another way? I'm having trouble making sense of what exactly you mean, and it seems like an important point if true.

Comment by Sam Clarke (SamClarke) on When reporting AI timelines, be clear who you're (not) deferring to · 2022-10-10T17:31:16.168Z · EA · GW

Thanks for your comment! I agree that the concept of deference used in this community is somewhat unclear, and a separate comment exchange on this post further convinced me of this. It's interesting to know how the word is used in formal epistemology.

Here is the EA Forum topic entry on epistemic deference. I think it most closely resembles your (c). I agree there's the complicated question of what your priors should be, before you do any deference, which leads to the (b) / (c) distinction.

Comment by Sam Clarke (SamClarke) on When reporting AI timelines, be clear who you're (not) deferring to · 2022-10-10T17:19:58.030Z · EA · GW

Thanks for your comment!

Asking "who do you defer to?" feels like a simplification

Agreed! I'm not going to make any changes to the survey at this stage, but I like the suggestion and if I had more time I'd try to clarify things along these lines.

I like the distinction between deference to people/groups and deference to processes.

deference to good ideas

[This is a bit of a semantic point, but seems important enough to mention] I think "deference to good ideas" wouldn't count as "deference", in the way that this community has ended up using it. As per the forum topic entry on epistemic deference:

Epistemic deference is the process of updating one's beliefs in response to what others appear to believe, even if one ignores the reasons for those beliefs or do not find those reasons persuasive. (emphasis mine)

If you find an argument persuasive and incorporate it into your views, I think that doesn't qualify as "deference". Your independent impressions don't (and in most cases won't) be the views you formed in isolation. When forming your independent impressions, you can and should take other people's arguments into account, to the extent that you find them convincing. Deference occurs when you take into account knowledge about what other people believe, and how trustworthy you find them, without engaging with their object level arguments.

non-defensible original ideas

A similar point applies to this one, I think.

(All of the above makes me think that the concept of deference is even less clear in the community than I thought it was -- thanks for making me aware of this!)

Comment by Sam Clarke (SamClarke) on Assessing SERI/CHERI/CERI summer program impact by surveying fellows · 2022-10-10T17:00:01.428Z · EA · GW

Cool, makes sense.

The main way to answer this seems to be getting a non-self-rated measure of research skill change.

Agreed. Asking mentors seems like the easiest thing to do here, in the first instance.

Comment by Sam Clarke (SamClarke) on Assessing SERI/CHERI/CERI summer program impact by surveying fellows · 2022-10-05T11:09:02.707Z · EA · GW

Somewhat related comment: next time, I think it could be better to ask "What percentage of the value of the fellowship came from these different components?"* instead of "What do you think were the most valuable parts of the programme?". This would give a bit more fine-grained data, which could be really important.

E.g. if it's true that most of the value of ERIs comes from networking, this would suggest that people who want to scale ERIs should do pretty different things (e.g. lots of retreats optimised for networking).

*and give them several buckets to select from, e.g. <3%, 3-10%, 10-25%, etc.

Comment by Sam Clarke (SamClarke) on Assessing SERI/CHERI/CERI summer program impact by surveying fellows · 2022-10-05T11:01:07.375Z · EA · GW

Thanks for putting this together!

I'm surprised by the combination of the following two survey results:

Fellows' estimate of how comfortable they would be pursuing a research project remains effectively constant. Many start out very comfortable with research. A few decline.

and

Networking, learning to do research, and becoming a stronger candidate for academic (but not industry) jobs top the list of what participants found most valuable about the programs. (emphasis mine)

That is: on average, fellows claim they learned to do better research, but became no more comfortable pursuing a research project.

Do you think this is mostly explained by most fellows already being pretty comfortable with research?

A scatter plot of comfort against improvement in research skill could be helpful to examine different hypotheses (though won't be possible with the current data, given how the "greatest value adds" question was phrased.

Comment by Sam Clarke (SamClarke) on Survey on AI existential risk scenarios · 2022-09-21T15:12:26.157Z · EA · GW

Re (1) See When Will AI Exceed Human Performance? Evidence from AI Experts (2016) and the 2022 updated version. These surveys don't ask about x-risk scenarios in detail, but do ask about the overall probability of very bad outcomes and other relevant factors.

Re (1) and (3), you might be interested in various bits of research that GovAI has done on the American public and AI researchers.

You also might want to get in touch with Noemi Dreksler, who is working on surveys at GovAI.

Comment by Sam Clarke (SamClarke) on Strategic Perspectives on Transformative AI Governance: Introduction · 2022-09-05T17:32:53.628Z · EA · GW

A potentially useful subsection for each perspective could be: evidence that should change your mind about how plausible this perspective is (including things you might observe over the coming years/decades). This would be kinda like the future-looking version of the "historical analogies" subsection.

Comment by Sam Clarke (SamClarke) on Feedback I've been giving to junior x-risk researchers · 2022-08-17T00:53:07.341Z · EA · GW

Another random thought: a bunch of these lessons seem like the kind of things that general writing and research coaching can teach. Maybe summer fellows and similar should be provided with that? (Freeing up time for you/other people in your reference class to play to your comparative advantage.)

(Though some of these lessons are specific to EA research and so seem harder to outsource.)

Comment by Sam Clarke (SamClarke) on Feedback I've been giving to junior x-risk researchers · 2022-08-17T00:43:44.362Z · EA · GW

Love it, thanks for the post!

"Reading 'too much' is possibly the optimal strategy if you're mainly trying to skill up (e.g., through increased domain knowledge), rather than have direct impact now. But also bear in mind that becoming more efficient at direct impact is itself a form of skilling up, and this pushes back toward 'writing early' as the better extreme."

Two thoughts on this section:

  1. Additional (obvious) arguments for writing early: producing stuff builds career capital, and is often a better way to learn than just reading.

  2. I want to disentangle 'aiming for direct impact' and 'writing early'. You can write without optimising hard for direct impact, and I claim that more junior people should do so (on the current margin). There's some failure mode (which I fell into myself) where junior researchers try to solve hugely important problems, because they really want to make direct impact. But this leads them to work on problems that are wicked, poorly scoped, or methodologically fraught. Which ends with them getting stuck, demoralised, and not producing anything.

Often, I think it's better for junior researchers to still aim to write/produce stuff (bc of the arguments above/in your piece), but not be optimising hard for direct impact with that writing. Picking more tractable problems and less important ones.

Comment by Sam Clarke (SamClarke) on General vs specific arguments for the longtermist importance of shaping AI development · 2022-08-17T00:13:03.913Z · EA · GW

Thanks, I agree with all of this

Comment by Sam Clarke (SamClarke) on Slowing down AI progress? · 2022-08-10T16:18:22.700Z · EA · GW

Relevant discussion from a couple of days ago: https://astralcodexten.substack.com/p/why-not-slow-ai-progress

Comment by Sam Clarke (SamClarke) on Why does no one care about AI? · 2022-08-09T10:49:21.295Z · EA · GW

The question was:

Assume for the purpose of this question that HLMI* will at some point exist. How positive or negative do you expect the overall impact of this to be on humanity, in the long run?

So it doesn't presuppose some agentic form of AGI—but rather asks about the same type of technology that the median respondant gave a 50% chance of arriving within 45 years.

*HLMI was defined in the survey as:

“High-level machine intelligence” (HLMI) is achieved when unaided machines can accomplish every task better and more cheaply than human workers.

Comment by Sam Clarke (SamClarke) on Why does no one care about AI? · 2022-08-09T10:30:32.916Z · EA · GW

Right, I just wanted to point out that the average AI researcher who dismisses AI x-risk doesn't do so because they think AGI is very unlikely. But I admit to often being confused about why they do dismiss AI x-risk.

The same survey asked AI researchers about the outcome they expect from AGI:

The median probability was 25% for a “good” outcome and 20% for an “extremely good” outcome. By contrast, the probability was 10% for a bad outcome and 5% for an outcome described as “Extremely Bad (e.g., human extinction).”

If I learned that there was some scientific field where the median researcher assigned a 5% probability that we all die due to advances in their field, I'd be incredibly worried. Going off this data alone, it seems hard to make a case that x-risk from AI is some niche thing that almost no AI researchers think is real.

The median researcher does think it's somewhat unlikely, but 5% extinction risk is more than enough to take it very seriously and motivate a huge research and policy effort.

Comment by Sam Clarke (SamClarke) on Classifying sources of AI x-risk · 2022-08-09T10:15:03.352Z · EA · GW

Thanks, I agree with most of these suggestions.

"Other (AI-enabled) dangerous tech" feels to me like it clearly falls under "exacerbating other x-risk factors"

I was trying to stipulate that the dangerous tech was a source of x-risk in itself, not just a risk factor (admittedly the boundary is fuzzy). The wording was "AI leads to deployment of technology that causes extinction or unrecoverable collapse" and the examples (which could have been clearer) were intended to be "a pathogen kills everyone" or "full scale nuclear war leads to unrecoverable collapse"

Comment by Sam Clarke (SamClarke) on Why does no one care about AI? · 2022-08-09T10:00:53.834Z · EA · GW

they basically see AGI as very unlikely

Certainly some people you talk to in the fairness/bias crowd think AGI is very unlikely, but that's definitely not a consensus view among AI researchers. E.g. see this survey of AI researchers (at top conferences in 2015, not selecting for AI safety folk), which finds that:

Researchers believe there is a 50% chance of AI outperforming humans in all tasks in 45 years and of automating all human jobs in 120 years

Comment by Sam Clarke (SamClarke) on A Survey of the Potential Long-term Impacts of AI · 2022-07-21T15:29:15.425Z · EA · GW

Thanks, I'm glad this was helpful to you!

Comment by Sam Clarke (SamClarke) on How would a language model become goal-directed? · 2022-07-18T10:21:20.880Z · EA · GW

I'm also still a bit confused about what exactly this concept refers to. Is a 'consequentialist' basically just an 'optimiser' in the sense that Yudkowsky uses in the sequences (e.g. here), that has later been refined by posts like this one (where it's called 'selection') and this one?

In other words, roughly speaking, is a system a consequentialist to the extent that it's trying to take actions that push its environment towards a certain goal state?

Comment by Sam Clarke (SamClarke) on Slowing down AI progress is an underexplored alignment strategy · 2022-07-13T16:04:43.504Z · EA · GW

Another (unoriginal) way that heavy AI reg could be counterproductive for safety: AGI alignment research probably increases in productivity as you get close to AGI. So, regulation in jurisdictions with the actors who are closest to AGI (currently, US/UK) would give those actors less time to do high productivity AGI alignment research, before the 2nd place actor catches up

And within a jurisdiction, you might think that responsible actors are most likely to comply to regulation, differentially slowing them down

Comment by Sam Clarke (SamClarke) on SamClarke's Shortform · 2022-04-26T16:03:54.812Z · EA · GW

Agreed, thanks for the pushback!

Comment by Sam Clarke (SamClarke) on SamClarke's Shortform · 2022-04-25T15:08:48.337Z · EA · GW

Ways of framing EA that (extremely anecdotally*) make it seem less ick to newcomers. These are all obvious/boring; I'm mostly recording them here for my own consolidation

  • EA as a bet on a general way of approaching how to do good, that is almost certainly wrong in at least some ways—rather than a claim that we've "figured out" how to do the most good (like, probably no one claims the latter, but sometimes newcomers tend to get this vibe). Different people in the community have different degrees of belief in the bet, and (like all bets) it can make sense to take it even if you still have a lot of uncertainty.
  • EA as about doing good on the current margin. That is, we're not trying to work out the optimal allocation of altruistic resources in general, but rather: given how the rest of the world is spending its money and time to do good, which approaches could do with more attention? Corollary: you should expect to see EA behaviour changing over time (for this and other reasons). This is a feature not a bug.
  • EA as diverse in its ways of approaching how to do good. Some people work on global health and wellbeing. Others on animal welfare. Others on risks from climate change and advanced technology.

These frames can also apply to any specific cause area.

*like, I remember talking to a few people who became more sympathetic when I used these frames.

Comment by Sam Clarke (SamClarke) on How I Formed My Own Views About AI Safety · 2022-03-14T16:19:49.850Z · EA · GW

I'm still confused about the distinction you have in mind between inside view and independent impression (which also have the property that they feel true to me)?

Or do you have no distinction in mind, but just think that the phrase "inside view" captures the sentiment better?

Comment by Sam Clarke (SamClarke) on How I Formed My Own Views About AI Safety · 2022-03-14T16:12:49.128Z · EA · GW

Thanks - good points, I'm not very confident either way now

Comment by Sam Clarke (SamClarke) on On presenting the case for AI risk · 2022-03-14T16:05:50.407Z · EA · GW

Thanks, I appreciate this post a lot!

Playing the devil's advocate for a minute, I think one main challenge to this way of presenting the case is something like "yeah, and this is exactly what you'd expect to see for a field in its early stages. Can you tell a story for how these kinds of failures end up killing literally everyone, rather than getting fixed along the way, well before they're deployed widely enough to do so?"

And there, it seems you do need to start talking about agents with misaligned goals, and the reasons to expect misalignment that we don't manage to fix?

Comment by Sam Clarke (SamClarke) on AI Risk is like Terminator; Stop Saying it's Not · 2022-03-14T15:48:43.159Z · EA · GW

Thanks for writing this!

There are yet other views about about what exactly AI catastrophe will look like, but I think it is fair to say that the combined views of Yudkowsky and Christiano provide a fairly good representation of the field as a whole.

I disagree with this.

We ran a survey of prominent AI safety and governance researchers, where we asked them to estimate the probability of five different AI x-risk scenarios.

Arguably, the "terminator-like" scenarios are the "Superintelligence" scenario, and part 2 of "What failure looks like" (as you suggest in your post).[1]

Conditional on an x-catastrophe due to AI occurring, the median respondent gave those scenarios 10% and 12% probability (mean 16% each). The other three scenarios[2] got median 12.5%, 10% and 10% (means 18%, 17% and 15%).

So I don't think that the "field as a whole" thinks terminator-like x-risk scenarios are the most likely. Accordingly, I'd prefer if the central claim of this post was "AI risk could actually be like terminator; stop saying it's not".


  1. Part 1 of "What failure looks like" probably doesn't look that much like Terminator (disaster unfolds more slowly and is caused by AI systems just doing their jobs really well) ↩︎

  2. That is, the following three secanrios: Part 1 of "What failure looks like", existentially catastrophic AI misuse, and existentially catastrophic war between humans exacerbated by AI. See the post for full scenario descriptions. ↩︎

Comment by Sam Clarke (SamClarke) on You are probably underestimating how good self-love can be · 2022-03-03T11:56:57.373Z · EA · GW

After practising some self-love I am now noticeably less stressed about work in general. I sleep better, have more consistent energy, enjoy having conversations about work-related stuff more (so I just talk about EA and AI risk more than I used to, which was a big win on my previous margin). I think I maybe work fewer hours than I used to because before it felt like there was a bear chasing me and if I wasn't always working then it was going to eat me, whereas now that isn't the case. But my working patterns feel healthy and sustainable now; before, I was going through cycles of half-burning out every 3 months or so (which was bad enough for my near-term productivity, not to mention long-term producitivity and health). I also spend relatively less time just turning the handle on my mainline tasks (vs zooming out, having random conversations that feel useful but won't pay off immediately, reading more widely), which again I think was a win on my previous margin (maybe reduced it from ~90% to ~80% of my research hours).

I'm confused about how this happened. My model is that before there were two parts of me that strongly disagreed about whether work is good, and that these parts have now basically resolved (they agree that doing sensible amounts of work is good), because both feel understood and loved. Basically the part that didn't think work was good just needed its needs to be understood and taken into account.

I think this model is quite different from Charlie's main model of what happens (which is to do with memory consolidation), so I'm especially confused.

I haven't attained persistent self-love of the sort described here.

Comment by Sam Clarke (SamClarke) on You are probably underestimating how good self-love can be · 2022-03-03T11:36:36.225Z · EA · GW

I found this helpful and am excited to try it - thanks for sharing!

Comment by Sam Clarke (SamClarke) on How I Formed My Own Views About AI Safety · 2022-03-02T14:29:08.481Z · EA · GW

Also, nitpick, but I find the "inside view" a more confusing and jargony way of just saying "independent impressions" (okay, also jargon to some extent, but closer to plain English), which also avoids the problem you point out: inside view is not the opposite of the Tetlockian sense of outside view (and the other ambiguities with outside view that another commenter pointed out).

Comment by Sam Clarke (SamClarke) on How I Formed My Own Views About AI Safety · 2022-03-02T10:35:50.734Z · EA · GW

Nice post! I agree with ~everything here. Parts that felt particularly helpful:

  • There are even more reasons why paraphrasing is great than I thought - good reminder to be doing this more often
  • The way you put this point was v crisp and helpful: "Empirically, there’s a lot of smart people who believe different and contradictory things! It’s impossible for all of them to be right, so you must disagree with some of them. Internalising that you can do this is really important for being able to think clearly"
  • The importance of "how much feedback do they get from the world" in deferring intelligently

One thing I disagree with: the importance of forming inside views for community epistemic health. I think it's pretty important. E.g. I think that ~2 years ago, the arguments for the longterm importance of AGI safety were pretty underdeveloped; that since then lots more people have come out with their insidee views about it; and that now the arguments are in much better shape.

Comment by Sam Clarke (SamClarke) on CSER is hiring for a senior research associate on longterm AI risk and governance · 2022-02-03T16:30:22.949Z · EA · GW

Note: the deadline has been extended to 27 February 2022

Comment by Sam Clarke (SamClarke) on CSER is hiring for a senior research associate on longterm AI risk and governance · 2022-01-24T16:05:50.682Z · EA · GW

Yes that would be helpful, thanks!

Comment by Sam Clarke (SamClarke) on The longtermist AI governance landscape: a basic overview · 2022-01-18T20:03:47.927Z · EA · GW

Maybe this process generalises and so longtermist AI governance can learn from other communities?

In some sense, this post explains how the longtermist AI governance community is trying to go from “no one understands this issue well”, to actually improving concrete decisions that affect the issue.

It seems plausible that the process described here is pretty general (i.e. not specific to AI governance). If that’s true, then there could be opportunities for AI governance to learn from how this process has been implemented in other communities/fields and vice-versa.

Comment by Sam Clarke (SamClarke) on The longtermist AI governance landscape: a basic overview · 2022-01-18T20:02:13.506Z · EA · GW

Something that would improve this post but I didn’t have time for:

For each kind of work, give a sense of:

  • The amount of effort currently going into it
  • What the biggest gaps/bottlenecks/open questions are
  • What kinds of people might be well-suited to it
Comment by Sam Clarke (SamClarke) on The longtermist AI governance landscape: a basic overview · 2022-01-18T20:00:08.328Z · EA · GW

Thanks!

I agree with your quibble. Other than the examples you list here, I'm curious for any other favourite reports/topics in the broader space of AI governance - esp. ones that you think are at least as relevant to longtermist AI governance as the average example I give in this post?

Comment by Sam Clarke (SamClarke) on How to use the Forum · 2022-01-14T18:18:13.808Z · EA · GW

Note: "If you want to add one or more co-authors to your post, you’ll need to contact the Forum team..." is no longer the easiest way to add co-authors, so might want to be updated accordingly.

And by the way, thanks for adding this new feature!

Comment by Sam Clarke (SamClarke) on EU AI Act now has a section on general purpose AI systems · 2021-12-13T19:13:30.805Z · EA · GW

I think equally important for longtermists is the new requirement for the Commission to consider updating the definition of AI, and the list of high-risk systems, every 1 year. If you buy that adaptive/flexible/future-proof governance will be important for regulating AGI, then this looks good.

(The basic argument for this instance of adaptive governance is something like: AI progress is fast and will only get faster, so having relevant sections of regulation come up for mandatory review every so often is a good idea, especially since policymakers are busy so this doesn't tend to happen by default.)

Relevant part of the doc:

  1. As regards the modalities for updates of Annexes I and III, the changes in Article 84 introduce a new reporting obligation for the Commission whereby it will be obliged to assess the need for amendment of the lists in these two annexes every 24 months following the entry into force of the AIA.
Comment by Sam Clarke (SamClarke) on What is most confusing to you about AI stuff? · 2021-11-24T11:20:00.503Z · EA · GW

On the margin, should donors prioritize AI safety above other existential risks and broad longtermist interventions?

To the extent that this question overlaps with Mauricio's question 1.2 (i.e. A bunch of people seem to argue for "AI stuff is important" but believe / act as if "AI stuff is overwhelmingly important"--what are arguments for the latter view?), then you might find his answer helpful.

other x-risks and longtermist areas seem rather unexplored and neglected, like s-risks

Only a partial answer, but worth noting that I think the most plausible source of s-risk is messing up on AI stuff

Comment by Sam Clarke (SamClarke) on What is most confusing to you about AI stuff? · 2021-11-24T11:15:13.142Z · EA · GW

Is "intelligence" ... really enough to make an AI system more powerful than humans (individuals, groups, or all of humanity combined)?

Some discussion of this question here: https://www.alignmentforum.org/posts/eGihD5jnD6LFzgDZA/agi-safety-from-first-principles-control

Comment by Sam Clarke (SamClarke) on What is most confusing to you about AI stuff? · 2021-11-24T11:12:59.473Z · EA · GW

Do we need to decide on a moral principle(s) first? How would it be possible to develop beneficial AI without first 'solving' ethics/morality?

Good question! The answer is no: 'solving' ethics/morality first is one thing that we probably eventually need to do, but we could first solve a narrower, simpler form of AI alignment, and use those aligned systems to help us solve ethics/morality and the other trickier problems (like the control problem for more general, capable systems). This is more or less what is discussed in ambitious vs narrow value learning. Narrow value learning is one narrower, simpler form of AI alignment. There are others, discussed here under the heading "Alternative solutions".

Comment by Sam Clarke (SamClarke) on What is most confusing to you about AI stuff? · 2021-11-24T10:52:08.376Z · EA · GW

A timely post: https://forum.effectivealtruism.org/posts/DDDyTvuZxoKStm92M/ai-safety-needs-great-engineers

(The focus is software engineering not development, but should still be informative.)

Comment by Sam Clarke (SamClarke) on Why AI alignment could be hard with modern deep learning · 2021-11-05T14:44:10.652Z · EA · GW

I love this post and also expect it to be something that I point people towards in the future!

I was wondering about what kind of alignment failure - i.e. outer or inner alignment - you had in mind when describing sycophant models (for schemer models, it's obviously an inner alignment failure).

It seems you could get sycophant models via inner alignment failure, because you could train them a sensible, well-specified objective functions, and yet the model learns to pursue human approval anyway (because "pursuing human approval" turned out to be more easily discovered by SGD).

It also seems you could also sycophant models them via outer alignment failure, because e.g. a model trained using naive reward modelling (which would be an obviously terrible objective) seems very likely to yield a model that is pursuing approval from the humans whose feedback is used in training the reward model.

Does this seem right to you, and if so, which kind of alignment failure did you have in mind?

(Paul has written most explicitly about what a world full of advanced sycophants looks like/how it leads to existential catastrophe, and his stories are about outer alignment, so I'd be especially curious if you disagreed with that.)

Comment by Sam Clarke (SamClarke) on General vs specific arguments for the longtermist importance of shaping AI development · 2021-11-03T15:27:44.449Z · EA · GW

(Apologies for my very slow reply.)

I feel like something has gone wrong in this conversation; you have tricked Bob into working on learning from human feedback, rather than convincing him to do so.

I agree with this. If people become convinced to work on AI stuff by specific argument X, then they should definitely go and try to fix X, not something else (e.g. what other people tell them needs doing in AI safety/governance).

I think when I said I wanted a more general argument to be the "default", I was meaning something very general, that doesn't clearly imply any particular intervention - like the one in the most important century series, or the "AI is a big deal" argument (I especially like Max Daniel's version of this).

Then, it's very important to think clearly about what will actually go wrong, and how to actually fix that. But I think it's fine to do this once you're already convinced that you should work on AI, by some general argument.

I'd be really curious if you still disagree with this?

Comment by Sam Clarke (SamClarke) on General vs specific arguments for the longtermist importance of shaping AI development · 2021-10-18T13:07:17.495Z · EA · GW

The problem with general arguments is that they tell you very little about how to solve the problem

Agreed!

If I were producing key EA content/fellowships/etc, I would be primarily interested in getting people to solve the problem

I think this is true for some kinds of content/fellowships/etc, but not all. For those targeted at people who aren't already convinced that AI safety/governance should be prioritised (which is probably the majority), it seems more important to present them with the strongest arguments for caring about AI safety/governance in the first place. This suggests presenting more general arguments.

Then, I agree that you want to get people to help solve the problem, which requires talking about specific failure modes. But I think that doing this prematurely can lead people to dismiss the case for shaping AI development for bad reasons.

Another way of saying this: for AI-related EA content/fellowships/etc, it seems worth separating motivation ("why should I care?") and action ("if I do care, what should I do?"). This would get you the best of both worlds: people are presented with the strongest arguments, allowing them to make an informed decision about how much AI stuff should be prioritised, and then also the chance to start to explore specific ways to solve the problem.

I think this maybe applies to longtermism in general. We don't yet have that many great ideas of what to do if longtermism is true, and I think that people sometimes (incorrectly) dismiss longtermism for this reason.

Comment by Sam Clarke (SamClarke) on Lessons learned running the Survey on AI existential risk scenarios · 2021-10-15T15:34:06.045Z · EA · GW

Thanks for the detailed reply, all of this makes sense!

I added a caveat to the final section mentioning your disagreements with some of the points in the "Other small lessons about survey design" section

Comment by Sam Clarke (SamClarke) on We're Redwood Research, we do applied alignment research, AMA · 2021-10-06T15:02:04.702Z · EA · GW

What might be an example of a "much better weird, theory-motivated alignment research" project, as mentioned in your intro doc? (It might be hard to say at this point, but perhaps you could point to something in that direction?)

Comment by Sam Clarke (SamClarke) on We're Redwood Research, we do applied alignment research, AMA · 2021-10-06T15:00:44.616Z · EA · GW

How crucial a role do you expect x-risk-motivated AI alignment will play in making things go well? What are the main factors you expect will influence this? (e.g. the occurrence of medium-scale alignment failures as warning shots)

Comment by Sam Clarke (SamClarke) on We're Redwood Research, we do applied alignment research, AMA · 2021-10-06T13:37:05.799Z · EA · GW

Some questions that aren't super related to Redwood/applied ML AI safety, so feel free to ignore if not your priority:

  1. Assuming that it's taking too long to solve the technical alignment problem, what might be some of our other best interventions to reduce x-risk from AI? E.g., regulation, institutions for fostering cooperation and coordination between AI labs, public pressure on AI labs/other actors to slow deployment, ...

  2. If we solve the technical alignment problem in time, what do you think are the other major sources of AI-related x-risk that remain? How likely do you think these are, compared to x-risk from not solving the technical alignment problem in time?

Comment by Sam Clarke (SamClarke) on How to succeed as an early-stage researcher: the “lean startup” approach · 2021-10-06T13:28:38.982Z · EA · GW

The 'lean startup' approach reminds me of Jacob Steinhardt's post about his approach to research, of which the key takeaways are:

  • When working on a research project, you should basically either be in "de-risking mode" (determining if the project is promising as quickly as possible) OR "execution mode" (assuming the project is promising and trying to do it quickly). This probably looks like trying to do an MVP version of the project quickly, and then iterating on that if it's promising.
  • If a project doesn't work out, ask why. That way you:
    • avoid trying similar things that will fail for the same reasons.
    • will find out whether it didn't work because your implementation was broken, or the high-level approach you were taking isn't promising.
  • Try hard, early, to try to show that your project won't solve the problem.
Comment by Sam Clarke (SamClarke) on How to succeed as an early-stage researcher: the “lean startup” approach · 2021-10-06T13:27:58.237Z · EA · GW

Write out at least 10 project ideasand ask somebody more senior to rank the best few

For bonus points, try to understand how they did the ranking. That way, you can start building up a model of how senior researchers think about evaluating project ideas, and refining your own research taste explicitly.

Comment by Sam Clarke (SamClarke) on How to succeed as an early-stage researcher: the “lean startup” approach · 2021-10-06T13:26:47.626Z · EA · GW

Thanks for writing this, I found it helpful and really clearly written!

One reaction: if you're testing research as a career (rather than having committed and now aiming to maximise your chances of success), your goal isn't exactly to succeed as an early stage researcher. It might be that trying your best to succeed is approximately the best way to test your fit - but it seems like there are a few differences:

  • "Going where there's supervision" might be especially important, since a supervisor who comes to know you very well is a big and reliable source of information about your fit for research - which seems esp. important given that feedback in the form of "how much other people like your ideas" is often biased (e.g. because most of your early ideas are bad) or noisy (e.g. because some factors that influence the success of your research aren't under your control).
  • It might be important to test your fit for different fields or flavours (e.g. quantitative vs qualitative, empircal vs theoretical) of research. This can come apart from the goal of trying to succeed as an early-stage researcher - since moving into unfamiliar territory might mean your outputs are less good in the short term.
  • Relatedly, it might be important to select at least some of your projects based on the skills or knowledge gaps they help you fill. Again, this goal might come apart from short term success (e.g. you pick a forecasting project to improve those skills, despite not expecting it to generate interesting findings)
  • Probably you want to spend less energy marketing your work, except to the extent that it's helpful in getting more people to give you feedback on your fit for a research career.
  • [most uncertain] "Someone senior tells you what to work on" might actually not be the ideal solution to your problem 1. If the skills of research execution and research planning are importantly different, then you might fail to get enough info about your competence/enjoyment/fit for research planning skills (but I'm pretty uncertain if they are importantly different).

I'd be curious how much you agree with any of these points :)