Concrete Advice for Forming Inside Views on AI Safety 2022-08-17T23:26:57.156Z
Things That Make Me Enjoy Giving Career Advice 2022-06-17T20:49:46.274Z
How I Formed My Own Views About AI Safety 2022-02-27T18:52:20.222Z
Simplify EA Pitches to "Holy Shit, X-Risk" 2022-02-11T01:57:52.086Z
My Overview of the AI Alignment Landscape: A Bird’s Eye View 2021-12-15T23:46:59.200Z
Optimisation-focused introduction to EA podcast episode 2021-01-15T09:59:29.416Z
Retrospective on Teaching Rationality Workshops 2021-01-03T17:15:06.154Z
Local Group Event Idea: EA Community Talks 2020-12-20T17:12:29.251Z
Make a Public Commitment to Writing EA Forum Posts 2020-11-18T18:23:11.468Z
Helping each other become more effective 2020-10-30T21:33:47.382Z
What altruism means to me 2020-08-15T08:25:28.386Z
The world is full of wasted motion 2020-08-05T20:41:23.710Z


Comment by Neel Nanda on Punching Utilitarians in the Face · 2022-07-15T01:49:59.539Z · EA · GW

OK, that seems like a pretty reasonable position. Thoough if we're restricting ourselves to everyday situations it feels a bit messy - naive utilitarianism implies things like lying a bunch or killing people in contrived situations, and I think the utility maximising decision is actually to be somewhat deontologist.

More importantly though, people do use utilitarianism in contexts with very large amounts of utility and small probabilities - see strong longtermism and the astronomical waste arguments. I think this is an important and action relevant thing, influencing a bunch of people in EA, and that criticising this is a meaningful critique of utilitarianism, not a weird contrived thought experiment

Comment by Neel Nanda on Punching Utilitarians in the Face · 2022-07-13T21:14:27.025Z · EA · GW

I'm pretty confused about the argument made by this post. Pascal's Mugging seems like a legitimately important objection to expected value based decision theory, and all of these thought experiments are basically flavours of that. This post feels like it's just imposing scorn on that idea without making an actual argument? 

I think "utilitarianism says seemingly weird shit when given large utilities and tiny probabilities" is one of the most important objections. 

Is your complaint that this is an isolated demand for rigor? 

Comment by Neel Nanda on Did OpenPhil ever publish their in-depth review of their three-year OpenAI grant? · 2022-07-04T15:25:15.635Z · EA · GW

Note that OpenAI became a limited profit company in 2019 (2 years into this grant), which I presume made them a much less cost-effective thing to invest in, since they had much better alternative funding sources

Comment by Neel Nanda on Community Builders Spend Too Much Time Community Building · 2022-07-01T06:02:34.399Z · EA · GW

If you're ever running an event that you are not excited to be part of, something has gone wrong

This seems way too strong to me. Eg, reasonable and effective intro talks feel like they wouldn't be much fun for me to do, yet seem likely high value

Comment by Neel Nanda on How to pursue a career in technical AI alignment · 2022-06-04T22:10:49.884Z · EA · GW

Really excited to see this post come out! I think this is a really helpful guide to people who want to work on AI Alignment, and would have been pretty useful to me in the past. 

Comment by Neel Nanda on Stuff I buy and use: a listicle to boost your consumer surplus and productivity · 2022-06-01T09:34:01.605Z · EA · GW

This felt like an unusually high quality post in the genre of 'stuff I buy and use', thanks for writing it! I particularly appreciate the nutrition advice, plus actual discussion of your reasoning and epistemic confidences

Comment by Neel Nanda on What are the coolest topics in AI safety, to a hopelessly pure mathematician? · 2022-05-07T20:50:14.520Z · EA · GW

I'm did a pure maths undergrad and recently switched to doing mechanistic interpretability work - my day job isn't exactly doing maths, but I find it has a strong aesthetic appeal in a similar way. My job is not to train an ML model (with all the mess and frustration that involves), it's to take a model someone else has trained, and try to rigorously understand what is going on with it. I want to take some behaviour I know it's capable of and understand how it does that, and ideally try to decompile the operations it's running into something human understandable. And, fundamentally, a neural network is just a stack of matrix multiplications. So I'm trying to build tools and lenses for analysing this stack of matrices, and converting it into something understandable. Day-to-day, this looks like having ideas for experiments, writing code and running them, getting feedback and iterating, but I've found a handful of times where having good intuitions around linear algebra, or how gradients work, and spending some time working through algebra has been really useful and clarifying. 

If you're interested in learning more, Zoom In is a good overview of a particular agenda for mechanistic interpretability in vision models (which I personally find super inspiring!), and my team wrote a pretty mathsy paper giving a framework to breakdown and understand small, attention-only transformers (I expect the paper to only make sense after reading an overview of autoregressive transformers like this one). If you're interested in working on this, there are currently teams at Anthropic, Redwood Research, DeepMind and Conjecture doing work along these lines!

Comment by Neel Nanda on Can we agree on a better name than 'near-termist'? "Not-longermist"? "Not-full-longtermist"? · 2022-04-19T20:32:05.541Z · EA · GW

the reason the "longtermists working on AI risk" care about the total doom in 15 years is because it could cause extinction preclude the possibility of a trillion-happy-sentient-beings in the long term. Not because it will be bad for people alive today.

As a personal example, I work on AI risk and care a lot about harm to people alive today! I can't speak for the rest of the field, but I think the argument for working on AI risk goes through if you just care about people alive today and hold beliefs which are common in the field

 - see this post I wrote on the topic, and a post by Scott Alexander on the same theme.

Comment by Neel Nanda on [Book rec] The War with the Newts as “EA fiction” · 2022-04-09T13:47:39.055Z · EA · GW

Thanks for the recommendation! I've just finished reading it and really enjoyed it. Note for future readers that the titular "war" only really happens towards the end of the book, and most of it is about set up and exploring the idea of introducing newts to society

Comment by Neel Nanda on "Long-Termism" vs. "Existential Risk" · 2022-04-07T11:42:46.309Z · EA · GW

No worries, I'm excited to see more people saying this! (Though I did have some eerie deja vu when reading your post initially...)

I'd be curious if you have any easy-to-articulate feedback re why my post didn't feel like it was saying the same thing, or how to edit it to be better? 

(EDIT: I guess the easiest object-level fix is to edit in a link at the top to your's, and say that I consider you to be making substantially the same point...)

Comment by Neel Nanda on How I Formed My Own Views About AI Safety · 2022-03-16T05:12:20.303Z · EA · GW

Inside view feels deeply emotional and tied to how I feel the world to be, independent impression feels cold and abstract

Comment by Neel Nanda on What is the new EA question? · 2022-03-03T14:09:28.115Z · EA · GW

How can we best allocate our limited resources to improve the world? Sub-question: Which resources are worth the effort to optimise the allocation of, and which are not, given that we all have limited time, effort and willpower?

I find this framing most helpful. In particular, for young people, the most valuable resource they have is their future labor. Initially, converting this to money and the money to donations was very effective, but now this is often outcompeted by working directly on high priority paths. But the underlying question remains. And I'd argue we often reach the point where optimising our use of money, as it manifests as frugality and thrift, is not worth the willpower and opportunity costs, given that there's a lot more money than vetting capacity or labor. (Implicit assumption: thrift has cost and is the non default option. This feels true for me but may not generalise)

Comment by Neel Nanda on How I Formed My Own Views About AI Safety · 2022-03-02T16:52:50.236Z · EA · GW

The complaint that it's confusing jargon is fair. Though I do think the Tetlock sense + phrase inside view captures something important - my inside view is what feels true to me, according to my personal best guess and internal impressions. Deferring doesn't feel true in the same way, it feels like I'm overriding my beliefs, not like how they world is

This mostly comes under the motivation point - maybe, for motivation, inside views matter but independent impressions don't? And people differ on how they feel about the two?

Comment by Neel Nanda on How I Formed My Own Views About AI Safety · 2022-03-02T16:51:08.110Z · EA · GW

One thing I disagree with: the importance of forming inside views for community epistemic health. I think it's pretty important. E.g. I think that ~2 years ago, the arguments for the longterm importance of AGI safety were pretty underdeveloped; that since then lots more people have come out with their insidee views about it; and that now the arguments are in much better shape.

I want to push back against this. The aggregate benefit may have been high, but when you divide it by all the people trying, I'm not convinced it's all that high.

Further, that's an overestimate - the actual question is more like 'if the people who are least enthusiastic about it stop trying to form inside views, how bad is that?'. And I'd both guess that impact is fairly heavy tailed, and that the people most willing to give up are the least likely to have a major positive impact.

I'm not confident in the above, but it's definitely not obvious

Comment by Neel Nanda on Simplify EA Pitches to "Holy Shit, X-Risk" · 2022-02-27T22:36:50.431Z · EA · GW

Fair point re tractability

What argument do you think works on people who already think they're working on important and neglected problems? I can't think of any argument that doesn't just boil down to one of those

Comment by Neel Nanda on The value of small donations from a longtermist perspective · 2022-02-27T00:32:30.330Z · EA · GW

Thanks for the post! I broadly agree with the arguments you give, though I think you understate the tensions between promoting earning to give vs direct work.

Personal example: I'm currently doing AI Safety work, and I expect it to be fairly impactful. But I came fairly close to going into finance as it was a safe, stable path I was confident I'd enjoy. And part of this motivation was a fuzzy feeling that donations was still somewhat good. And this made it harder to internalise just how much higher the value from direct work was. Anecdotally, a lot of smart mathematicians I know are tempted by finance and have a similar problem. And in cases like this, I think that promoting longtermist donations is actively in tension with high impact career advice

Comment by Neel Nanda on Simplify EA Pitches to "Holy Shit, X-Risk" · 2022-02-26T11:12:23.137Z · EA · GW

But we want to make sure that the "truth-seeking" norms of this movement stay really really high.

I think there's two similar but different things here - truth-seeking and cause neutrality. Truth-seeking is the general point of 'it's really important to find truth, look past biases, care about evidence, etc' and cause neutrality is the specific form of truth seeking that impact between different causes can differ enormously and that it's worth looking past cached thoughts and the sunk cost fallacy to be open to moving to other causes.

I think truth-seeking can be conveyed well without cause neutrality - if you don't truth-seek, you will be a much less effective person working on global development. I think this is pretty obvious, and can be made with any of the classic examples (PlayPumps, Scared Straight, etc).

People may absorb the idea of truth-seeking without cause neutrality. And I think I feel kinda fine about this? Like, I want the EA movement to still retain cause neutrality. And I'd be pro talking about it. But I'd be happy with intro materials getting people who want to work on AI and bio without grokking cause neutrality. 

In particular, I want to distinguish between 'cause switching because another cause is even more important' and 'cause switching because my cause is way less important than I thought'. I don't really expect to see another cause way more important than AI or bio? Something comparably important, or maybe 2-5x more important, maybe? But my fair value on AI extinction within my lifetime is 10-20%. This is really high!!! I don't really see there existing future causes that are way more important than that. And, IMO, the idea of truth-seeking conveyed well should be sufficient to get people to notice if their cause is way less important than they thought in absolute terms (eg, work on AI is not at all tractable).

Comment by Neel Nanda on Simplify EA Pitches to "Holy Shit, X-Risk" · 2022-02-26T11:06:00.277Z · EA · GW
  • I think there's a lot that goes into deciding which people are correct on this, and only saying "AI x-risk and bio x-risk are really important" is missing a bunch of stuff that feels pretty essential to my beliefs that x-risk is the best thing to work on

Can you say more about what you mean by this? To me, 'there's a 1% chance of extinction in my lifetime from a problem that fewer than 500 people worldwide are working on' feels totally sufficient

Comment by Neel Nanda on Simplify EA Pitches to "Holy Shit, X-Risk" · 2022-02-26T11:05:09.765Z · EA · GW

This is a fair criticism! My short answer is that, as I perceive it, most people writing new EA pitches, designing fellowship curricula, giving EA career advice, etc, are longtermists and give pitches optimised for producing more people working on important longtermist stuff. And this post was a reaction to what I perceive as a failure in such pitches by focusing on moral philosophy. And I'm not really trying to engage with the broader question of whether this is a problem in the EA movement. Now OpenPhil is planning on doing neartermist EA movement building funding, maybe this'll change?


Personally, I'm not really a longtermist, but think it's way more important to get people working on AI/bio stuff from a neartermist lens, so I'm pretty OK with optimising my outreach for producing more AI and bio people. Though I'd be fine with low cost ways to also mention 'and by the way, global health and animal welfare are also things some EAs care about, here's how to find the relevant people and communities'.

Comment by Neel Nanda on Announcing Alvea—An EA COVID Vaccine Project · 2022-02-23T12:17:23.395Z · EA · GW

This seems like a really exciting project, I look forwards to seeing where it goes! 

As I understand it, a lot of the difficulty with new medical technology is running big and expensive clinical trials, and going through the process of getting approved by regulators. What's Alvea's plan for getting the capital and expertise necessary to do this?

Comment by Neel Nanda on Simplify EA Pitches to "Holy Shit, X-Risk" · 2022-02-12T12:48:54.308Z · EA · GW

Ah gotcha. So you're specifically objecting to people who say 'even if there's a 1% chance' based on vague intuition, and not to people who think carefully about AI risk, conclude that there's a 1% chance, and then act upon it?

Comment by Neel Nanda on Simplify EA Pitches to "Holy Shit, X-Risk" · 2022-02-12T09:20:36.755Z · EA · GW

Ah sorry, the original thing was badly phrased. I meant, a valid objection to x-risk work might be "I think that factory farming is really really bad right now, and prioritise this over dealing with x-risk". And if you don't care about the distant future, that argument seems pretty legit from some moral perspectives? While if you do care about the distant future, you need to answer the question of what the future distribution of animal welfare looks like, and it's not obviously positive. So to convince these people you'd need to convince them that the distribution is positive.

Comment by Neel Nanda on Simplify EA Pitches to "Holy Shit, X-Risk" · 2022-02-12T09:18:21.523Z · EA · GW

I haven't met anyone who's working on this stuff and says they're deferring on the philosophy (while I feel like I've often heard that people feel iffy/confused about the empirical claims).

Fair - maybe I feel that people mostly buy 'future people have non-zero worth and extinction sure is bad', but may be more uncertain on a totalising view like 'almost all value is in the far future, stuff today doesn't really matter, moral worth is the total number of future people and could easily get to >=10^20'.

I'm sympathetic to something along these lines. But I think that's a great case (from longtermists' lights) for keeping longtermism in the curriculum. If one week of readings has a decent chance of boosting already-impactful people's impact by, say, 10x (by convincing them to switch to 10x more impactful interventions), that seems like an extremely strong reason for keeping that week in the curriculum.

Agreed! (Well, by the lights of longtermism at least - I'm at least convinced that extinction is 10x worse than civilisational collapse temporarily, but maybe not 10^10x worse). At this point I feel like we mostly agree - keeping a fraction of the content on longtermism, after x-risks, and making it clear that it's totally legit to work on x-risk without buying longtermism would make me happy

Comment by Neel Nanda on Simplify EA Pitches to "Holy Shit, X-Risk" · 2022-02-12T09:14:08.962Z · EA · GW

That's fair pushback. My personal guess is that it's actually pretty tractable to decrease it to eg 0.9x of the original risk, with the collective effort and resources of the movement? To me it feels quite different to think about reducing something where the total risk is (prob=10^-10)  x (magnitude = 10^big), vs having (prob of risk=10^-3 ) x (prob of each marginal person making a decrease = 10^-6) x (total number of people working on it = 10^4) x (magnitude = 10^10)

(Where obviously all of those numbers are pulled out of my ass)

Comment by Neel Nanda on Simplify EA Pitches to "Holy Shit, X-Risk" · 2022-02-12T09:10:54.085Z · EA · GW

These arguments appeal to phenomenal stakes implying that, using expected value reasoning, even a very small probability of the bad thing happening means we should try to reduce the risk, provided there is some degree of tractability in doing so.

To be clear, the argument in my post is that we only need the argument to work for very small=1% or 0.1%, not eg 10^-10. I am much more skeptical about arguments involving 10^-10 like probabilities

Comment by Neel Nanda on Simplify EA Pitches to "Holy Shit, X-Risk" · 2022-02-12T06:58:29.278Z · EA · GW

I'm curious, do you actually agree with the two empirical claims I make in this post? (1% risk of AI x-risk, 0.1% of bio within my lifetime)

Comment by Neel Nanda on Simplify EA Pitches to "Holy Shit, X-Risk" · 2022-02-12T06:31:53.893Z · EA · GW

Re your final point, I mostly just think they miss the mark by not really addressing the question of what the long-term distribution of animal welfare looks like (I'm personally pretty surprised by the comparative lack of discussion about how likely our Lightcone is to be net bad by the lights of people who put significant weight on animal welfare)

Comment by Neel Nanda on Simplify EA Pitches to "Holy Shit, X-Risk" · 2022-02-12T06:29:08.614Z · EA · GW

Thanks, this is some great pushback. Strongly upvoted.

Re long-termists will think hard about x-risk, that's a good point. Implicitly I think I'm following the intuition that people don't really evaluate a moral claim in isolation. And that when someone considers how convinced to be by long-termism, they're asking questions like "does this moral system imply important things about my actions?" And that it's much easier to convince them of the moral claim once you can point to tractable action relevant conclusions.

Re target audiences, I think we are imagining different settings. My read from running intro fellowships is that lots of people find long-termism weird, and I implicitly think that many people who ultimately end up identifying as long-termist still have a fair amount of doubt but are deferring to their perception of the EA consensus. Plus, even if your claim IS true, to me that would imply that we're selecting intro fellows wrong! 

Implicit model: People have two hidden variables - 'capacity to be convinced of long-termism' and 'capacity to be convinced of x-risk'. These are not fully correlated, and I'd rather only condition on the second one, to maximise the set of reachable people (I say as someone identifying with the second category much more than the first!)

This also addresses your third point - I expect the current framing is losing a bunch of people who buy x risk but not long-termism, or who are eg suspicious of highly totalising arguments like Astronomical Waste that imply 'it is practically worthless to do things that just help people alive today'.

Though it's fair to say that there are people who CAN be reached by long-termism much more easily than x-risk. I'd be pro giving them the argument for long-termism and some intuition pumps and seeing if it grabs people, so long as we also ensure that the message doesn't implicitly feel like "and if you don't agree with long-termism you also shouldn't prioritise x-risk". The latter is the main thing I'm protecting here 

Re your fourth point, yeah that's totally fair, point mostly conceded. By the lights of long-termism I guess I'd argue that the distinction between work to prevent major disasters and work to ruthlessly focus on x-risk isn't that strong? It seems highly likely that work to prevent natural pandemics is somewhat helpful to prevent engineered pandemics, or work to prevent mild engineered pandemics is useful to help prevent major ones. I think that work to reduce near-term problems in AI systems is on average somewhat helpful for long-term safety. It is likely less efficient, but maybe only 3-30x? And I think we should often be confused and uncertain about our stories for how to just prevent the very worst disasters, and this kind of portfolio is more robust to mistakes re the magnitude of different disasters. Plus, I expect a GCBR to heavily destabilise the world and to be an x-risk increaser by making x risks that can be averted with good coordination more likely

Comment by Neel Nanda on Simplify EA Pitches to "Holy Shit, X-Risk" · 2022-02-12T06:11:11.664Z · EA · GW

Estimates can be massively off in both directions. Why do you jump to the conclusion of inaction rather than action?

(My guess is that it's sufficiently easy to generate plausible but wrong ideas at the 1% level that you should have SOME amount of inaction bias, but not to take it too far)

Comment by Neel Nanda on Simplify EA Pitches to "Holy Shit, X-Risk" · 2022-02-11T18:29:13.530Z · EA · GW

To articulate my worries, I suppose it's that this implies a very reductionist and potentially exclusionary idea of doing good; it's sort of "Holy shit, X-risks matters (and nothing else does)". On any plausible conception of EA, we want people doing a whole bunch of stuff to make things better.

I'd actually hoped that this framing is less reductionist and exclusionary. Under total utilitarianism + strong longtermism, averting extinction is the only thing that matters, everything else is irrelevant. Under this framing, averting extinction from AI is, say, maybe 100x better than totally solving climate change. And AI is comparatively much more neglected and so likely much more tractable. And so it's clearly the better thing to work on. But it's only a few orders of magnitude, coming from empirical details of the problem, rather than a crazy, overwhelming argument that requires estimating the number of future people, the moral value of digital minds, etc.

The other bit that irks me is that it does not follow, from the mere fact that's there's a small chance of something bad happening, that preventing that bad thing is the most good you can do. I basically stop listening to the rest of any sentence that starts with "but if there's even a 1% chance that ..."

I agree with the first sentence, but your second sentence seems way too strong - it seems bad to devote all your efforts to averting some tiny tail risk, but I feel pretty convinced that averting a 1% chance of a really bad thing is more important than averting a certainty of a kinda bad thing (operationalising this as 1000x less bad, though it's fuzzy). But I agree that the preference ordering of (1% chance of really bad thing) vs (certainty of maybe bad thing) is unclear, and that it's reasonable to reject eg naive attempts to calculate expected utility.

Comment by Neel Nanda on Simplify EA Pitches to "Holy Shit, X-Risk" · 2022-02-11T18:22:49.995Z · EA · GW

It's not at all clear under this view that it would be worthwhile to pivot your career to AI safety or biorisk, instead of taking the more straightforward route of earning to give to standard near-term interventions.

I'd disagree with this. I think the conversion of money to labour is super inefficient on longtermism, and so this analogy breaks down. Sure, maybe I should donate to the Maximum Impact Fund rather than LTFF. But it's really hard to usefully convert billions of dollars into useful labour on longtermist stuff. So, as someone who can work on AI Safety, there's a major inefficiency factor if I pivot to ETG. I think the consensus basically already is that ETG for longtermism is rarely worth it, unless you're incredibly good at ETG.

Comment by Neel Nanda on Simplify EA Pitches to "Holy Shit, X-Risk" · 2022-02-11T18:20:17.347Z · EA · GW

Yep! I think this phenomena of 'things that are technically all-or-nothing, but it's most useful to think of them as a continuous thing' is really common. Eg, if you want to reduce the amount of chickens killed for meat, it helps to stop buying chicken. This lowers demand, which will on average lower chickens killed. But the underlying thing is meat companies noticing and reducing production, which is pretty discrete and chunky and hard to predict well (though not literally all-or-nothing).

Basically any kind of campaign to change minds or achieve social change with some political goal also comes under this. I think AI Safety is about as much a Pascal's Mugging as any of these other things

Comment by Neel Nanda on Simplify EA Pitches to "Holy Shit, X-Risk" · 2022-02-11T12:24:19.463Z · EA · GW

Hmm, what would this perspective say to people working on climate change? 

Comment by Neel Nanda on Simplify EA Pitches to "Holy Shit, X-Risk" · 2022-02-11T11:56:31.286Z · EA · GW

If this were to actually be delivered as a pitch I would suggest putting more focus on cognitive biases that lead to inaction

Thanks for the thoughts! Definitely agreed that this could be compelling for some people. IMO this works best on people whose crux is "if this was actually such a big deal, why isn't it common knowledge? Given that it's not common knowledge, this is too weird for me and I am probably missing something". 

I mostly make this argument in practice by talking about COVID - IMO COVID clearly demonstrates basically all of these biases with different ways that we under-prepared and bungled the response.

Comment by Neel Nanda on Simplify EA Pitches to "Holy Shit, X-Risk" · 2022-02-11T11:54:10.671Z · EA · GW

Thanks for the feedback! Yep, it's pretty hard to judge this kind of thing given survivorship bias. I expect this kind of pitch would have worked best on me, though I got into EA long enough ago that I was most grabbed by global health pitches. Which maybe got past my weirdness filter in a way that this one didn't. 

I'd love to see what happens if someone tries an intro fellowship based around reading the Most Important Century series!

Comment by Neel Nanda on Simplify EA Pitches to "Holy Shit, X-Risk" · 2022-02-11T11:51:11.299Z · EA · GW

TL;DR I think that in practice most of these disagreements boil down to empirical cruxes not moral ones. I'm not saying that moral cruxes are literally irrelevant, but that they're second order, only relevant to some people, and only matter if people buy the empirical cruxes, and so should not be near the start of the outreach funnel but should be brought up eventually

Hmm, I see your point, but want to push back against this. My core argument is essentially stemming from an intuition that you have a limited budget to convince people of weird ideas, and that if you can only convince them of one weird ideas it should be the empirical claims about the probability of x-risk, not the moral claims about future people. My guess is that most people who genuinely believe these empirical claims about x-risk will be on board with most of the action relevant EA recommendations. While people who buy the moral claims but NOT the empirical claims will massively disagree with most EA recommendations.

And, IMO, the empirical claims are much more objective than the moral claims, and are an easier case to make. I just don't think you can make moral philosophy arguments that are objectively convincing.

I'm not arguing that it's literally useless to make the moral arguments - once you've convinced someone of the first weird idea, they're probably willing to listen to the second weird idea! But if you fail to convince them that the first weird idea is worth taking seriously they probably aren't. And I agree that once you get into actually working on a field there may be subtle differences re trading off short term disasters against long-term disasters, which can really matter for the work you do. But IMO most intro material is just trying to convey an insight like "try to work on bio/AI", and that subtle disagreements about which research agendas and subfields most matter are things that can be hashed out later. In the same way that I wouldn't want intro fellowships to involve a detailed discussion of the worth of MIRI vs DeepMind Safety's research agenda.

Also, if the failure mode of this advice is a bunch of people trying to prevent biorisks that kill billions of people but doesn't actually permanently derail civilisation, I'm pretty fine with that? That feels like a great outcome to me.

Further, I think that prioritising AI or bio over these other problems is kinda obviously the right thing to do from just the perspective of ensuring the next 200 years go well, and probably from the perspective of ensuring the next 50 go well. To the degree that people disagree, IMO it tends to come from empirical disagreements, not moral ones. Eg people who think that climate change is definitely an x-risk - I think this is an incorrect belief, but that you resolve it by empirically discussing how bad climate change is, not by discussing future generations. This may just be my biased experience, but I often meet people who have different cause prio and think that eg AI Safety is delusional, but very rarely meet people with different cause prio who agree with me about the absolute importance of AI and bio.

One exception might be people who significantly prioritise animal welfare, and think that the current world is majorly net bad due to factory farming? But that the future world will likely contain far less factory farming and many more happy humans. But if your goal is to address that objection, IMO current intro materials still majorly miss the mark.

Comment by Neel Nanda on Introduction to Effective Altruism (Ajeya Cotra) · 2022-01-21T06:41:36.403Z · EA · GW

+1, I think this is my current favourite intro to EA

Comment by Neel Nanda on Meme Review, the first - My mistakes on the path to impact · 2022-01-18T06:59:21.753Z · EA · GW

Strongly downvoted. I think a meme review feels in fairly poor taste for this post. I took the tone of Denise's post as an honest, serious and somewhat sad account of how path to having an impact. Tonally, memes feel non-serious and about humour and making light of things. This clashes a lot with the tone of Denise's post, in a way that feels inappropriate to me.

I found the meme review of Aaron Gertler's retirement fun though!

Comment by Neel Nanda on A huge opportunity for impact: movement building at top universities · 2022-01-02T16:09:43.066Z · EA · GW

I think this proposal fixes a lot of the problems I'd seen in the earlier CBG program, and I'm incredibly excited to see where it goes. Nice work! EA Stanford and EA Cambridge seem like some of the current groups we have that are closest to Campus Centres, and I've been REALLY impressed with both of their work and all the exciting projects that are coming out of them. I'm very keen to see this scaled to more places!

Comment by Neel Nanda on My Overview of the AI Alignment Landscape: A Bird’s Eye View · 2021-12-26T17:56:37.431Z · EA · GW
  • Why is it useful to think of AI-influenced coordination failures as a major threat model in the alignment landscape? My intuition would be to think of it as falling under capabilities (since the worry, if I understand it, is that--even if AI systems are aligned with their users--bad things will still happen because coordination is hard?).

This may be a disagreement about semantics. As I see it, my goal as an alignment researcher is to do whatever I can to reduce x-risk from powerful AI. And given my skillset, I mostly focus on how I can do this with technical research. And, if there are ways to shape technical development of AI that leads to better cooperation, and this reduces x-risk, I count that as part of the alignment landscape. 

Another take is Critch's description of extending alignment to groups of systems and agents, giving the multi-multi alignment problem of ensuring alignment between groups of humans and groups of AIs who all need to coordinate. I discuss this a bit more in the next post.

Comment by Neel Nanda on My Overview of the AI Alignment Landscape: A Bird’s Eye View · 2021-12-26T17:51:56.937Z · EA · GW

Thanks for the feedback! Really glad to hear it was helpful de-confusion for people who've already engaged somewhat with AI Alignment, but aren't actively researching in the field, that's part of what I was aiming for.


I didn't get much feedback on my categorisation, I was mostly trying to absorb other people's inside views on their specific strand of alignment. And most of the feedback on the doc was more object-level discussion of each section. I didn't get feedback suggesting this was wrong in some substantial way, but I'd also expect it to be considered 'reasonable but controversial' rather than widely accepted.

If it helps, I'm most uncertain about the following parts of this conceptualisation:

  • Separating power-seeking AI and inner misalignment, rather than merging them - inner misalignment seems like the most likely way this happens
  • Having assistance games as an agenda, rather than as a way to address the power-seeking AI or you get what you measure threat models
  • Not having recursive reward modelling as a fully fledged agenda (this may just be because I haven't read enough about it to really have my head around it properly)
  • Putting reinforcement learning from human feedback under you get what you measure - this seems like a pretty big fraction of current alignment effort, and might be better put under a category like 'narrowly aligning superhuman models'


It's hard to be precise, but there's definitely not an even distribution. And it depends a lot on which resources you care about.

A lot of the safety work at industry labs revolves around trying to align large language models, mostly with tools like reinforcement learning from human feedback. I mostly categorise this under you get what you measure, though I'm open to pushback there. This is very  resource intensive, especially if you include the costs of training those large language models in the first place, and consumes a lot of capital, engineer time, and researcher time. Though much of the money comes from companies like Google, rather than philanthropic sources.

The other large collections of researchers are at MIRI, who mostly do deconfusion work, and CHAI, who do a lot of things, including a bunch of good field-building, but probably the modal type of work is on training AIs with assistance games? This is more speculative though.

Most of the remaining areas are fairly small, though these are definitely not clear-cut distinctions.

It's unclear which of these resources are most important to track - training large models is very capital intensive, and doing anything with them is fairly labour intensive and needs good engineers. But as eg OpenPhil's recent RFPs show, there's a lot of philanthropic dollars available for researchers who have a credible case for being able to do good alignment research, suggesting we're more bottlenecked by researcher time? And there we're much more bottlenecked by senior researcher time than junior researcher time.


Very hard to say, sorry! Personally, I'm most excited about inner alignment and interpretability and really want to see those having more resources. Generally, I'd also want to see a more even distribution of resources for exploration, diversification and value of information reasons. I expect different people would give wildly varying opinions.

Comment by Neel Nanda on EA Infrastructure Fund: May–August 2021 grant recommendations · 2021-12-24T15:08:38.617Z · EA · GW

This seems like a really exciting set of grants! It's great to see EAIF scaling up so rapidly.

Comment by Neel Nanda on What are some success stories of grantmakers beating the wider EA community? · 2021-12-12T07:17:41.468Z · EA · GW

Sure. But I think the story there was that Open Phil intentionally split off to pursue this much more aggressive approach, and GiveWell is more traditional charity focused/requires high standards of evidence. And I think having prominent orgs doing each strategy is actually pretty great? They just fit into different niches

Comment by Neel Nanda on What are some success stories of grantmakers beating the wider EA community? · 2021-12-08T07:49:24.737Z · EA · GW

I had planned to write a whole post on this and on how to do active grant-making well as a small donor – not sure if I will have time but maybe

I would love to read this post (especially any insights that might transfer to someone with AI Safety expertise, but not much in other areas of EA!). Do you think there's much value in small donors giving to areas they don't know much about? Especially in areas with potential high downside risk like policy. Eg, is the average value of the marginal "not fully funded" policy project obviously positive or negative?

Comment by Neel Nanda on What are some success stories of grantmakers beating the wider EA community? · 2021-12-08T07:44:29.419Z · EA · GW

So of course the community collectively gets credit because OpenPhil identifies as EA, but it's worth noting that their "hits based giving" approach divers substantially from more conventional EA-style (quantitative QALY/cost-effectiveness) analysis and asking what that should mean for the movement more generally.

My impression is that most major EA funding bodies, bar Givewell, are mostly following a hits based giving approach nowadays. Eg EA Funds are pretty explicit about this. I definitely agree with the underlying point about weaknesses of traditional EA methods, but I'm not sure this implies a deep question for the movement, vs a question that's already fairly internalised

Comment by Neel Nanda on Biblical advice for people with short AI timelines · 2021-12-07T08:15:10.390Z · EA · GW

Though maybe "quitting their job and not getting a pension" is meant as a metaphor for "take very big life risks,"

That's fair pushback - a lot of that really doesn't seem that risky if you're young and have a very employable skillset. I endorse this rephrasing of my view, thanks

I guess you're still exposed to SOME increased risk, eg that the tech industry in general becomes much smaller/harder to get into/less well paying, but you're still exposed to risks like "the US pension system collapses" anyway, so this seems reasonable to mostly ignore. (Unless there's a good way of buying insurance against this?)

Comment by Neel Nanda on Biblical advice for people with short AI timelines · 2021-12-06T19:46:52.301Z · EA · GW

I think if it turns out that short AI timelines are wrong, those with short timelines should acknowledge it and the EA as a whole should seek to understand why we got it so wrong. I will think it odd if those who make repeatedly wrong predictions continue to be taken seriously.

I think this only applies to people who are VERY confident in short timelines. Say you have a distribution over possible timelines that puts 50% probability on <20 years, and 20% probability on >60 years. This would be a really big deal! It's a 50% chance of the world wildly changing in 20 years. But having no AGI within 60 years is only a 5x update against this model, hardly a major sin of bad prediction.

Though if someone is eg quitting their job and not getting a pension they probably have a much more extreme distribution, so your point is pretty valid there.

Comment by Neel Nanda on Donations tripled to GiveDirectly through 12/5 (crypto gifts only) · 2021-11-29T19:00:12.197Z · EA · GW

Haseeb Qureshi and FTX are both EA aligned donors. I'm fairly skeptical that this is a counterfactual donation match.

Comment by Neel Nanda on Despite billions of extra funding, small donors can still have a significant impact · 2021-11-24T15:37:30.127Z · EA · GW

There might still be some leverage in some cases, but less than 1:1.

If they have a rule of providing 66% of a charity's budget, surely donations are even more leveraged? $1 to the charity unlocks $2. 

Of course, this assumes that additional small donations to the charity will counter-factually unlock further donations from OpenPhil, which is making some strong assumptions about their decision-making

Comment by Neel Nanda on December 2021 monthly meme post · 2021-11-24T15:34:44.746Z · EA · GW

This post brought me joy, and I would enjoy it being a monthly thing. I'm weakly pro it being on the Front Page, though I expect I'd see it even if it was a Personal Blog. I'd feel sad if memes became common in the rest of the Forum though - having a monthly post feels like a good compromise to me.