Posts

[AN #80]: Why AI risk might be solved without additional intervention from longtermists 2020-01-03T07:52:24.981Z · score: 58 (25 votes)
Summary of Stuart Russell's new book, "Human Compatible" 2019-10-19T19:56:52.174Z · score: 30 (14 votes)
Alignment Newsletter One Year Retrospective 2019-04-10T07:00:34.021Z · score: 61 (23 votes)
Thoughts on the "Meta Trap" 2016-12-20T21:36:39.498Z · score: 8 (12 votes)
EA Berkeley Spring 2016 Retrospective 2016-09-11T06:37:02.183Z · score: 6 (6 votes)
EAGxBerkeley 2016 Retrospective 2016-09-11T06:27:16.316Z · score: 17 (7 votes)

Comments

Comment by rohinmshah on Long-Term Future Fund: April 2020 grants and recommendations · 2020-09-20T01:48:40.253Z · score: 2 (1 votes) · EA · GW

I do mean CS and not just ML. (E.g. PLDI and OSDI are top conferences with acceptance rates of 27% and 18% respectively according to the first Google result, and Berkeley students do publish there.)

Comment by rohinmshah on Long-Term Future Fund: April 2020 grants and recommendations · 2020-09-19T16:41:29.713Z · score: 3 (2 votes) · EA · GW

I don't know for sure, but at least in most areas of Computer Science it is pretty typical for at least Berkeley PhD students to publish in the top conferences in their area. (And they could publish in top journals; that just happens not to be as incentivized in CS.)

I generally dislike using acceptance rates -- I don't see strong reasons that they should correlate strongly with quality or difficulty -- but top CS conferences have maybe ~25% acceptance rates, suggesting this journal would be 5x "harder". This is more than I thought, but I don't think it brings me to the point of thinking it should be a significant point in favor in an outside evaluation, given the size of the organization and the time period over which we're talking.

Comment by rohinmshah on Long-Term Future Fund: April 2020 grants and recommendations · 2020-09-18T17:11:01.963Z · score: 18 (8 votes) · EA · GW
some promising signs about their ability to produce work that well-established external reviewers consider to be very high-quality—most notably, the acceptance of one of their decision theory papers to a top philosophy journal, The Journal of Philosophy.

I get that this is not the main case for the grant, and that MIRI generally avoids dealing with academia so this is not a great criterion to evaluate them on, but getting a paper accepted does not imply "very high-quality", and having a single paper accepted in (I assume) a couple of years is an extremely low bar (e.g. many PhD students exceed this in terms of their solo output).

Comment by rohinmshah on How to think about an uncertain future: lessons from other sectors & mistakes of longtermist EAs · 2020-09-15T06:40:11.445Z · score: 8 (4 votes) · EA · GW
I think showing that longermism is plausible is also an understatement of the goal of the paper

Yeah, that's a fair point, sorry for the bad argument.

Comment by rohinmshah on How to think about an uncertain future: lessons from other sectors & mistakes of longtermist EAs · 2020-09-14T21:22:59.091Z · score: 2 (1 votes) · EA · GW

I feel like it's misleading to take a paper that explicitly says "we show that strong longtermism is plausible", does so via robust arguments, and conclude that longtermist EAs are basing their conclusions on speculative arguments.

If you want robust arguments for interventions you should look at those interventions. I believe there are robust arguments for work on e.g. AI risk, such as Human Compatible. (Personally, I prefer a different argument, but I think the one in HC is pretty robust and only depends on the assumption that we will build intelligent AI systems in the near-ish future, say by 2100.)

Yes, and I'm also not willing to commit to any specific degree of confidence, since I haven't seen any in particular justified. This is also for future impact. Why shouldn't my prior for success be < 1%? Can I rule out a negative expected impact?

Idk what's happening with GFI, so I'm going to bow out of this discussion. (Though one obvious hypothesis is that GFI's main funders have more information than you do.)

Hits-based funding shouldn't be taken for granted.

I mean, of course, but it's not like people just throw money randomly in the air. They use the sorts of arguments you're complaining about to figure out where to try for a hit. What should they do instead? Can you show examples of that working for startups, VC funding, scientific R&D, etc? You mention two things:

  • Developing reasonable probability distributions
  • Diversification

It seems to me that longtermists are very obviously trying to do both of these things. (Also, the first one seems like the use of "explicit calculations" that you seem to be against.)

Comment by rohinmshah on How to think about an uncertain future: lessons from other sectors & mistakes of longtermist EAs · 2020-09-14T18:00:11.677Z · score: 2 (1 votes) · EA · GW
Are there any particular articles/texts you would recommend?

Sorry, on what topic?

Imo, the Greaves and MacAskill paper relies primarily on explicit calculations and speculative plausibility arguments for its positive case for strong longtermism.

I see the core case of the paper as this:

... putting together the assumption that the expected size of the future is vast and the assumption that all consequences matter equally, it becomes at least plausible that the amount of ex ante good we can generate by influencing the expected course of the very long-run future exceeds the amount of ex ante good we can generate via influencing the expected course of short-run events, even after taking into account the greater uncertainty of further-future events.

They do illustrate claims like "the expected size of the future is vast" with calculations, but those are clearly illustrative; the argument is just "there's a decent chance that humanity continues for a long time with similar or higher population levels". I don't think you can claim that this relies on explicit calculations except inasmuch as any reasoning that involves claims about things being "large" or "small" depends on calculations.

I also don't see how this argument is speculative: it seems really hard to me to argue that any of the assumptions or inferences are false.

Note it is explicitly talking about the expected size of the future, and so is taking as a normative assumption that you want to maximize actual expected values. I suppose you could argue that the argument is "speculative" in that it depends on this normative assumption, but in the same way AMF is "speculative" in that it depends on the normative assumption that saving human lives is good (an assumption that may not be shared by e.g. anti-natalists or negative utilitarians).

Animal Charity Evaluators has been criticized a few times for this, see here and here.

I haven't been following animal advocacy recently, but I remember reading "The Actual Number is Almost Surely Higher" when it was released and feeling pretty meh about it. (I'm not going to read it now, it's too much of a time sink.)

GiveWell has even been criticized for relying too much on quantitative models in practice, too, despite Holden's own stated concerns with this.

Yeah I also didn't agree with this post. The optimizer's curse tells you that you should expect your estimates to be inflated, but it does not change the actual decisions you should make. I agree somewhat more with the wrong-way reductions part, but I feel like that says "don't treat your models as objective fact"; GiveWell frequently talks about how the cost-effectiveness model is only one input into their decision making.

More generally, I don't think you should look at the prevalence of critiques as an indicator for how bad a thing is. Anything sufficiently important will eventually be critiqued. The question is how correct or valid those critiques are.

I'm still personally not convinced the Good Food Institute has much impact at all, since I'm not aware of a proper evaluation that didn't depend a lot on speculation

I'm interpreting this as "I don't have >90% confidence that GFI has actually had non-trivial impact so far (i.e. an ex-post evaluation)". I don't have a strong view myself since I haven't been following GFI, but I expect even if I read a lot about GFI I'd agree with that statement.

However, if you think this should be society's bar for investing millions of dollars, you would also have to be against many startups, nearly all VCs and angel funding, the vast majority of scientific R&D, some government megaprojects, etc. This bar seems clearly too stringent to me. You need some way of doing something like hits-based funding.

Comment by rohinmshah on Does Economic History Point Toward a Singularity? · 2020-09-13T15:54:29.448Z · score: 2 (1 votes) · EA · GW

Yeah in hindsight that was confusing. I meant that growth rates have been increasing since the Industrial Revolution, and have only become constant in the last few decades.

Comment by rohinmshah on How to think about an uncertain future: lessons from other sectors & mistakes of longtermist EAs · 2020-09-08T21:23:20.658Z · score: 4 (2 votes) · EA · GW
Yes. I would have been happy to say that, in general, I expect work of this type is less likely to be useful than other research work that does not try to predict the long-run future of humanity.

Sorry, I think I wasn't clear. Let me make the case for the ex ante value of the Open Phil report in more detail:

1. Ex ante, it was plausible that the report would have concluded "we should not expect lots of growth in the near future".

2. If the report had this conclusion, then we should update that AI risk is much less important than we currently think. (I am not arguing that "lots of growth => transformative AI", I am arguing that "not much growth => no transformative AI".)

3. This would be a very significant and important update (especially for Open Phil). It would presumably lead them to put less money into AI and more money into other areas.

4. Therefore, the report was ex ante quite valuable since it had a non-trivial chance of leading to major changes in cause prioritization.

Presumably you disagree with 1, 2, 3 or 4; I'm not sure which one.

Comment by rohinmshah on Does Economic History Point Toward a Singularity? · 2020-09-08T15:31:45.101Z · score: 3 (2 votes) · EA · GW

Ah, fair point, I'll change "explosive" to "accelerating" everywhere.

Comment by rohinmshah on Does Economic History Point Toward a Singularity? · 2020-09-08T15:31:14.969Z · score: 4 (2 votes) · EA · GW

I agree with this, but it seems irrelevant to Asya's point? If it turned out to be the case that we would just resume the trend of accelerating growth, and AI was the cause of that, I would still call that transformative AI and I would still be worried about AI risks, to about the same degree as I would if that same acceleration was instead trend-breaking.

Comment by rohinmshah on Does Economic History Point Toward a Singularity? · 2020-09-08T00:50:24.615Z · score: 2 (1 votes) · EA · GW

On my read of this doc, everyone agrees that the industrial revolution led to explosive growth, and the question is primarily about whether we should interpret this as a one-off event, or as something that is likely to happen again in the future, so for all viewpoints it seems like transformative AI would still require explosive growth. Does that seem right to you?

Comment by rohinmshah on How to think about an uncertain future: lessons from other sectors & mistakes of longtermist EAs · 2020-09-08T00:30:41.740Z · score: 16 (6 votes) · EA · GW

Hmm, I should note that I am in strong support of quantitative models as a tool for aiding decision-making -- I am only against committing ahead of time to do whatever the model tells you to do. If the post is against the use of quantitative models in general, then I do in fact disagree with the post.

Some things that feel like quantitative models that are merely "aiding" rather than "doing" decision-making:

this model for the Global Priorities Project
The case for strong longtermism by Greaves and MacAskill illustrates with some back-of-the-envelope estimates and cites others' estimates (GiveWell's, Matheny's).
Patient philanthropy is being justified on account of EV estimates
Comment by rohinmshah on Does Economic History Point Toward a Singularity? · 2020-09-07T19:47:58.351Z · score: 4 (2 votes) · EA · GW

Planned summary for the Alignment Newsletter:

One important question for the long-term future is whether we can expect accelerating growth in the near future (see e.g. this <@recent report@>(@Modeling the Human Trajectory@)). For AI alignment in particular, the answer to this question could have a significant impact on AI timelines: if some arguments suggested that it would be very unlikely for us to have accelerating growth soon, we should probably be more skeptical that we will develop transformative AI soon.
So far, the case for accelerating growth relies on one main argument that the author calls the _Hyperbolic Growth Hypothesis_ (HGH). This hypothesis posits that the growth _rate_ rises in tandem with the population size (intuitively, a higher population means more ideas for technological progress which means higher growth rates). This document explores the _empirical_ support for this hypothesis.
I’ll skip the messy empirical details and jump straight to the conclusion: while the author agrees that growth rates have been increasing in the modern era (roughly, the Industrial Revolution and everything after), he does not see much support for the HGH prior to the modern era. The data seems very noisy and hard to interpret, and even when using this noisy data it seems that models with constant growth rates fit the pre-modern era better than hyperbolic models. Thus, we should be uncertain between the HGH and the hypothesis that the industrial revolution triggered a one-off transition to increasing growth rates that have now stabilized.

Planned opinion:

I’m glad to know that the empirical support for the HGH seems mostly limited to the modern era, and may be weakly disconfirmed by data from the pre-modern era. I’m not entirely sure how I should update -- it seems that both hypotheses would be consistent with future accelerating growth, though HGH predicts it more strongly. It also seems plausible to me that we should still assign more credence to HGH because of its theoretical support and relative simplicity -- it doesn’t seem like there is strong evidence suggesting that HGH is false, just that the empirical evidence for it is weaker than we might have thought. See also Paul Christiano’s response.
Comment by rohinmshah on How to think about an uncertain future: lessons from other sectors & mistakes of longtermist EAs · 2020-09-06T03:39:14.773Z · score: 29 (11 votes) · EA · GW
Does that help?

I buy that using explicit EV calculations is not a great way to reason. My main uncertainty is whether longtermists actually rely a lot on EV calculations -- e.g. Open Phil has explicitly argued against it (posts are from GiveWell before Open Phil existed; note they were written by Holden).

Examples: https://globalprioritiesinstitute.org/christian-tarsney-the-epistemic-challenge-to-longtermism/ and https://www.emerald.com/insight/content/doi/10.1108/FS-04-2018-0037/full/html (the later of which I have not read)

I haven't read these so will avoid commenting on them.

I don’t see the OpenPhil article as that useful – it is interesting but I would not think it has a big impact on how we should approach AI risk.

I mean, the report ended up agreeing with our prior beliefs, so yes it probably doesn't change much. (Though idk, maybe it does influence Open Phil.) But it seems somewhat wrong to evaluate the value of conducting research after the fact -- would you have been confident that the conclusion would have agreed with our prior beliefs before the report was done? I wouldn't have been.

Comment by rohinmshah on How to think about an uncertain future: lessons from other sectors & mistakes of longtermist EAs · 2020-09-05T20:15:54.100Z · score: 62 (20 votes) · EA · GW

I find this hard to engage with -- you point out lots of problems that a straw longtermist might have, but it's hard for me to tell whether actual longtermists fall prey to these problems. For most of them my response is "I don't see this problem, I don't know why you have this impression".

Responding to the examples you give:

(GPI and CLR and to some degree OpenPhil have done research like this)

I'm not sure which of GPI's and CLR's research you're referring to (and there's a good chance I haven't read it), but the Open Phil research you link to seems obviously relevant to cause prioritization. If it's very unlikely that there's explosive growth this century, then transformative AI is quite unlikely and we would want to place correspondingly more weight on other areas like biosecurity -- this would presumably directly change Open Phil's funding decisions.

For example, I expect that the longtermism community could benefit from looking at business planning strategies. It is notable in the world that organisations, even those with long term goals, do not make concrete plans more than 30 years ahead

... I assume from the phrasing of this sentence that you believe longtermists have concrete plans more than 30 years ahead, which I find confusing. I would be thrilled to have a concrete plan for 5 years in the future (currently I'm at ~2 years). I'd be pretty surprised if Open Phil had a >30 year concrete plan (unless you count reasoning about the "last dollar").

Comment by rohinmshah on Singapore’s Technical AI Alignment Research Career Guide · 2020-08-28T02:08:47.729Z · score: 5 (3 votes) · EA · GW
When you mentioned "I estimated a very rough 50% chance of AGI within 20 years, and 30-40% chance that it would be using 'essentially current techniques'", I took it as prosaic AGI too, but you might mean something else.

Oh yeah, that sounds correct to me. I think the issue was that I thought you meant something different from "prosaic AGI" when you were talking about "short term AI capabilities". I do think it is very impactful to work on prosaic AGI alignment; that's what I work on.

Your rephrasing sounds good to me -- I think you can make it stronger; it is true that many researchers including me endorse working on prosaic AI alignment.

Comment by rohinmshah on Singapore’s Technical AI Alignment Research Career Guide · 2020-08-26T18:29:52.220Z · score: 7 (3 votes) · EA · GW
However, such research on short term AI capabilities is potentially impactful in the long term too, according to some AI researchers like Paul Christiano, Ian Goodfellow, and Rohin Shah.

Huh, I don't see where I said anything that implied that? (I just reread the summary that you linked.)

I'm not entirely sure what you mean by "short term AI capabilities". The context suggests you mean "AI-related problems that will arise soon that aren't about x-risk". If so, under a longtermist perspective, I think that work addressing such problems is better than nothing, but I expect that focusing on x-risk in particular will lead to orders of magnitude more (expected) impact.

(I also don't think the post you linked for Paul implies the statement you made either, unless I'm misunderstanding something.)

Comment by rohinmshah on We're (surprisingly) more positive about tackling bio risks: outcomes of a survey · 2020-08-25T18:03:39.980Z · score: 11 (7 votes) · EA · GW

In case anyone else wanted this sorted by topic and then by person, here you go:

  • Do you think that the world will handle future pandemics and bio risks better as a result of having gone through the current coronavirus pandemic?
    • Joan: Would like to be able to say that as humans, we will be able to adapt. But there’s not a lot of good evidence at the moment that we will take the right steps. We’re seeing states retreat, build high walls, and become less globalised. And signs of anti globalisation trends and anti science trends, are negative indicators when global cooperation is exactly what’s needed to handle these issues better in the future. Good Leadership is key to our future preparedness.
    • Catherine: Slightly pessimistic
    • Catherine: Probably not
    • Catherine: Depends when the next one hits
    • Catherine: If it happens within 5-10 years, we would have boosted ability
    • Catherine: In the scientific side, the huge rush to innovate probably leaves a legacy
    • Catherine: There’s about a 20% probability that we might get better, probably not going to get worse
    • Megan: We should expect organisational learning and prioritisation
    • Megan: But on balance we should likely expect over-indexing/over-fitting based on what’s happened previously, and not enough planning and preparation relating to biological risks that don’t look like what’s come most recently
    • Anita: In general yes, we should expect there to be some learning and improvement. Countries have often struggled with getting sufficient attention and resources to outbreak preparedness
    • Anita: There will probably be more money going into public health and maybe the military as well
    • Anita: Asian countries certainly seem better prepared as a result of their past experience
  • Do you think future bio risks will be more likely, less likely, or unchanged in likelihood after the current pandemic? (it may help to split between deliberate man-made risks, accidental man-made, and natural)
    • Joan: It is pretty clear that bio risks will become more likely. This is because of general trends that predated coronavirus such as technological developments, climate change, population growth, urbanisation and global travel. This is because of general trends that predated coronavirus. Already this century we’ve had four major global disease outbreaks (Swine Flu, MERS, Ebola, COVID-19) -- almost double the rate of previous centuries. In terms of whether COVID-19 actually causes future bio risks to be more likely, NTI preferred not to make a strong comment either way on this point.
    • Catherine: More likely, in that that was the trend already
    • Catherine: Unchanged by covid
    • Catherine: Our economies will go back to interconnectedness
    • Catherine: We will have more contact with wildlife as we encroach further into their habitats
    • Catherine: Deliberate bio risks could go two ways. Potential users of bio weapons might see that this is really disruptive, so might make it more appealing. Conversely might see that there is really no way of making sure that bio risks are contained and won’t affect your own people
    • Catherine: Increased research utilising dangerous pathogens is a source of risk requiring greater attention to biosafety
    • Megan: We should expect some organisational learning and prioritisation as a result of COVID-19
    • Megan: We need more work on understanding and modeling origins of biological risks, without which it’s hard to give definitive answers
    • Megan: We may well see extra work happening to increase our overall understanding of SARS-CoV-2 in particular and viruses in general as a result of the current pandemic. But does understanding viruses increase or decrease our risk? The extra knowledge may well be valuable, but accidents can happen as a result of people doing scientific work which is intended to tackle a pathogen, especially when the rush of people tackling the problem means that people without experience of working with infectious pathogens are involved.
    • Megan: There are state and non-state actors who may have not been interested in or otherwise discounted biological weapons who now may become interested. It’s still not fully understood how terrorists get inspired about ways to use biology as a weapon, and socialising threats can cause information hazards
    • Megan: Also as research develops, it makes it easier for low skilled or medium skilled actors to generate pathogens
    • Anita: Hard to say at this moment
    • Anita: Need to secure biosafety practices in labs; hopefully more people will appreciate that this is really important. Tentatively optimistic about this, however I don’t think I’ve seen as much as I’d like to see about the importance of this.
    • Anita: Could inspire those who want to do harm to see the power of a released pathogen in the community. E.g. an independent group or some state actors who have an interest in the development of bio weapons might feel encouraged. Hopefully, these groups will see that it’s hard to control a pandemic once it starts, so this may also act as a deterrent. But overall, we’re not expecting this pandemic will turn people away from bioweapons.
  • How do you think the willingness of key actors such as governments (but excluding donors) to tackle bio risks will change in light of the current pandemic?
    • Joan: In the near term, we’ll have a higher degree of attention on better preparing for pandemics
    • Joan: It’s unclear whether we will see the right levels of political competence and focused engagement to facilitate the right investments for enduring improvements and attention that last into the future but we have a unique opportunity to work for lasting change.
    • Catherine: Short window of opportunity in which things might change
    • Catherine: Might be c 3-5 years window, perhaps
    • Catherine: Huge economic damage means that the appetite for thinking further ahead might not be there because governments will be focusing on immediate economic recovery needs
    • Catherine: It’s not the case that the world didn’t know that pandemics could cause huge damage and coronavirus has now educated us. It was clear that this sort of event was going to happen. The world bank has been putting out warnings; see for example, the World Bank paper “From Panic to neglect
    • Megan: People are now socialised to the risk, so will take the risks more seriously, but this will differ by risk types.
    • Megan: We have seen a long history of over-indexing on the most recent high profile incidents and environments, including before they are fully understood. For example there was over-indexing on outsider threats in the midst of the anthrax response. Based on past experience, it seems likely that there might be longer term neglect of certain types of risks.
    • Megan: We may see general build-up of capabilities around pandemic response, which will likely be helpful for naturally occurring infectious disease. But there may be less attention on deliberate and accidental bio risks that may look very different.
    • Anita: I expect there will be some additional investment in this area, although there could also be a funding fatigue once we get through this pandemic. A large and enduring investment in biosecurity may be difficult to achieve, especially at the moment when governments are spending so much on COVID
    • Anita: Standard public health budgets are different line items, and you could just up the budgets to, say, something similar or more than what it was in 2003, when it was much higher than today.
    • Anita: However it’s worrying that existential threats look likely to remain underinvested in
  • Have you seen signs that donor interest in tackling bio risks has changed or will change in light of the current pandemic?
    • Joan: There is now lots of attention on biological risks. And several donors such as Bill Gates and Jack Dorsey have been pledging substantial amounts.
    • Joan: The risk is that donors overly focus on naturally occurring biological risks like COVID, without considering that other things also constitute existential risks, like manmade pathogens or nuclear war that also deserve attention.
    • Catherine: Not seen much indication at the moment
    • Catherine: However a small number of specific funders are starting to think about existential bio risks a bit more
    • Megan: We have not seen a noticeable uptick in donations because of COVID but have tried not to be opportunistic.
    • Megan: To a certain extent this is also a function of us spending time talking to senior politicians and others in government and the commercial sector on immediate response and not having the time to broadcast this value to the outside world.
    • Megan: Many of our colleagues working in adjacent areas have seen some donor interest on secondary effects (e.g. the impact of COVID-19 on geopolitics).
    • Megan: This may also be another example of over-indexing -- everyone is focused on the immediate response efforts (contact tracing etc) but not a lot of what will happen if a worse biological risk hits us in the future. We’ve been focused on this longer term strategy.
    • Anita: Have seen some modest uptick in donors who want to give to covid response. Not sure that that will translate to a longer term interest or commitment to the health space going forward. We are so used to the panic neglect cycle. Uptick mostly (but not entirely) from people in the Effective Altruism community.
Comment by rohinmshah on What organizational practices do you use (un)successfully to improve culture? · 2020-08-16T05:22:09.763Z · score: 13 (5 votes) · EA · GW

Some maybe-related posts (not vouching for them):

Team Cohesion and Exclusionary Egalitarianism

Deliberate Performance in People Management

Burnout: What is it and how to Treat it

Comment by rohinmshah on The emerging school of patient longtermism · 2020-08-16T04:59:49.390Z · score: 17 (5 votes) · EA · GW
Arguments pushing back against the Bostrom-Yudkowsky view of AI by Ben Garfinkel.

I don't know to what extent this is dependent on the fact that researchers like me argue for alignment by default, but I want to note that at least my views do not argue for patient longtermism according to my understanding. (Though I have not read e.g. Phil Trammel's paper.)

As the post notes, it's a spectrum, I would not argue that Open Phil should spend a billion dollars on AI safety this year, but I would probably not argue for Open Phil to take fewer opportunities than they currently do, nor would I recommend that individuals not donate to x-risk orgs and save the money instead.

Comment by rohinmshah on EA reading list: longtermism and existential risks · 2020-08-03T18:31:21.142Z · score: 20 (9 votes) · EA · GW

What about The Precipice?

Comment by rohinmshah on The academic contribution to AI safety seems large · 2020-08-02T17:20:38.363Z · score: 11 (4 votes) · EA · GW

Was going to write a longer comment but I basically agree with Buck's take here.

It's a little hard to evaluate the counterfactuals here, but I'd much rather have the contributions from EA safety than from non EA safety over the last ten years.

I wanted to endorse this in particular.

On the actual argument:

1. EA safety is small, even relative to a single academic subfield.
2. There is overlap between capabilities and short-term safety work.
3. There is overlap between short-term safety work and long-term safety work.
4. So AI safety is less neglected than the opening quotes imply.
5. Also, on present trends, there’s a good chance that academia will do more safety over time, eventually dwarfing the contribution of EA.

I agree with 1, 2, and 3 (though perhaps disagree with the magnitude of 2 and 3, e.g. you list a bunch of related areas and for most of them I'd be surprised if they mattered much for AGI alignment).

I agree 4 is literally true, but I'm not sure it necessarily matters, as this sort of thing can be said for ~any field (as Ben Todd notes). It would be weird to say that animal welfare is not neglected because of the huge field of academia studying animals, even though those fields are relevant to questions of e.g. sentience or farmed animal welfare.

I strongly agree with 5 (if we replace "academia" with "academia + industry", it's plausible to me academia never gets involved while industry does), and when I argue that "work will be done by non-EAs", I'm talking about future work, not current work.

Comment by rohinmshah on Objections to Value-Alignment between Effective Altruists · 2020-08-02T07:00:58.722Z · score: 6 (4 votes) · EA · GW
It seems like an overstatement that the topics of EA are completely disjoint with topics of interest to various established academic disciplines.

I didn't mean to say this, there's certainly overlap. My claim is that (at least in AI safety, and I would guess in other EA areas as well) the reasons we do the research we do are different from those of most academics. It's certainly possible to repackage the research in a format more suited to academia -- but it must be repackaged, which leads to

rewrite your paper so that regular academics understand it whereas other EAs who actually care about it don't

I agree that the things you list have a lot of benefits, but they seem quite hard to me to do. I do still think publishing with peer review is worth it despite the difficulty.

Comment by rohinmshah on Objections to Value-Alignment between Effective Altruists · 2020-07-30T18:53:24.031Z · score: 6 (4 votes) · EA · GW
Most of this was about very large documents on AI safety and strategy issues allegedly existing within OpenAI and MIRI.

I agree people trust MIRI's conclusions a bunch based on supposed good internal reasoning / the fact that they are smart, and I think this is bad. However, I think this is pretty limited to MIRI.

I haven't seen anything similar with OpenAI though of course it is possible.

I agree with all the other things you write.

Comment by rohinmshah on Objections to Value-Alignment between Effective Altruists · 2020-07-29T16:21:51.756Z · score: 23 (11 votes) · EA · GW

This is a good post, I'm glad you wrote it :)

On the abstract level, I think I see EA as less grand / ambitious than you do (in practice, if not in theory) -- the biggest focus of the longtermist community is reducing x-risk, which is good by basically any ethical theory that people subscribe to (exceptions being negative utilitarianism and nihilism, but nihilism cares about nothing and very few people are negative utilitarian and most of those people seem to be EAs). So I see the longtermist section of EA more as the "interest group" in humanity that advocates for the future, as opposed to one that's going to determine what will and won't happen in the future. I agree that if we were going to determine the entire future of humanity, we would want to be way more diverse than we are now. But if we're more like an interest group, efficiency seems good.

On the concrete level -- you mention not being happy about these things:

EAs give high credence to non-expert investigations written by their peers

Agreed this happens and is bad

they rarely publish in peer-review journals and become increasingly dismissive of academia

Idk, academia doesn't care about the things we care about, and as a result it is hard to publish there. It seems like long-term we want to make a branch of academia that cares about what we care about, but before that it seems pretty bad to subject yourself to peer reviews that argue that your work is useless because they don't care about the future, and/or to rewrite your paper so that regular academics understand it whereas other EAs who actually care about it don't. (I think this is the situation of AI safety.)

show an increasingly certain and judgmental stance towards projects they deem ineffective

Agreed this happens and is bad (though you should get more certain as you get more evidence, so maybe I think it's less bad than you do)

defer to EA leaders as epistemic superiors without verifying the leaders epistemic superiority

Agreed this happens and is bad

trust that secret google documents which are circulated between leaders contain the information that justifies EA’s priorities and talent allocation

Agreed this would be bad if it happened, I'm not actually sure that people trust this? I do hear comments like "maybe it was in one of those secret google docs" but I wouldn't really say that those people trust that process.

let central institutions recommend where to donate and follow advice to donate to central EA organisations

Kinda bad, but I think this is more a fact about "regular" EAs not wanting to think about where to donate? (Or maybe they have more trust in central institutions than they "should".)

let individuals move from a donating institution to a recipient institution and visa versa

Seems really hard to prevent this -- my understanding is it happens in all fields, because expertise is rare and in high demand. I agree that it's a bad thing, but it seems worse to ban it.

strategically channel EAs into the US government

I don't see why this is bad. I think it might be bad if other interest groups didn't do this, but they do. (Though I might just be totally wrong about that.)

adjust probability assessments of extreme events to include extreme predictions because they were predictions by other members

That seems somewhat bad but not obviously so? Like, it seems like you want to predict an average of people's opinions weighted by expertise; since EA cares a lot more about x-risk it often is the case that EAs are the experts on extreme events.

Comment by rohinmshah on AMA or discuss my 80K podcast episode: Ben Garfinkel, FHI researcher · 2020-07-22T20:07:37.341Z · score: 5 (3 votes) · EA · GW

My experience matches Ben's more than yours.

My impression is that there hasn't so much been a shift in views within individual people than the influx of a younger generation who tends to have an ML background and roughly speaking tends to agree more with Paul Christiano than MIRI. Some of them are now somewhat prominent themselves (e.g. Rohin Shah, Adam Gleave, you), and so the distribution of views among the set of perceived "AI risk thought leaders" has changed.

All of the people you named didn't have an ML background. Adam and I have CS backgrounds (before we joined CHAI, I was a PhD student in programming languages, while Adam worked in distributed systems iirc). Ben is in international relations. If you were counting Paul, he did a CS theory PhD. I suspect all of us chose the "ML track" because we disagreed with MIRI's approach and thought that the "ML track" would be more impactful.

(I make a point out of this because I sometimes hear "well if you started out liking math then you join MIRI and if you started out liking ML you join CHAI / OpenAI / DeepMind and that explains the disagreement" and I think that's not true.)

I don't recall anyone seriously suggesting there might not be enough time to finish a PhD before AGI appears.

I've heard this (might be a Bay Area vs. Europe thing).

Comment by rohinmshah on Antitrust-Compliant AI Industry Self-Regulation · 2020-07-12T17:11:59.364Z · score: 3 (2 votes) · EA · GW

Planned summary for the Alignment Newsletter:

One way to reduce the risk of unsafe AI systems is to have agreements between corporations that promote risk reduction measures. However, such agreements may run afoul of antitrust laws. This paper suggests that this sort of self-regulation could be done under the “Rule of Reason”, in which a learned profession (such as “AI engineering”) may self-regulate in order to correct a market failure, as long as the effects of such a regulation promote rather than harm competition.

In the case of AI, if AI engineers self-regulate, this could be argued as correcting the information asymmetry between the AI engineers (who know about risks) and the users of the AI system (who don’t). In addition, since AI engineers arguably do not have a monetary incentive, the self-regulation need not be anticompetitive. Thus, this seems like a plausible method by which AI self-regulation could occur without running afoul of antitrust law, and so is worthy of more investigation.
Comment by rohinmshah on Some promising career ideas beyond 80,000 Hours' priority paths · 2020-06-28T20:10:31.938Z · score: 18 (7 votes) · EA · GW
I would have thought that it would sometimes be important for making safe and beneficial AI to be able to prove that systems actually exhibit certain properties when implemented.

We can decompose this into two parts:

1. Proving that the system that we design has certain properties

2. Proving that the system that we implement matches the design (and so has the same properties)

1 is usually done by math-style proofs, which are several orders of magnitude easier to do than direct formal verification of the system in a proof assistant without having first done the math-style proof.

2 is done by formal verification, where for complex enough systems the specification for the formal verification often comes from the output of a math proof.

I guess I think this first becuase bugs seem capable of being big deals in this context

I'm arguing that after you've done 1, even if there's a failure from not having done 2, it's very unlikely to cause x-risk via the usual mechanism of an AI system adversarially optimizing against humans. (Maybe it causes x-risk in that due to a bug the computer system says "call Russia" and that gets translated to "launch all the nukes", or something like that, but that's not specific to AI alignment, and I think it's pretty unlikely.)

Like, idk. I struggle to actually think of a bug in implementation that would lead to a powerful AI system optimizing against us, when without that bug it would have been fine. Even if you accidentally put a negative sign on a reward function, I expect that this would be caught long before the AI system was a threat.

I realize this isn't a super compelling response, but it's hard to argue against this because it's hard to prove a negative.

there could be some instances where it's more feasible to use proof assistants than math to prove that a system has a property.

Proof assistants are based on math. Any time a proof assistant proves something, it can produce a "transcript" that is a formal math proof of that thing.

Now you might hope that proof assistants can do things faster than humans, because they're automated. This isn't true -- usually the automation is things like "please just prove for me that 2*x is larger than x, I don't want to have to write the details myself", or "please fill out and prove the base case of this induction argument", where a standard math proof wouldn't even note the detail.

Sometimes a proof assistant can do better than humans, when the proof of a fact is small but deeply unintuitive, such that brute force search is actually better than finetuned human intuition. I know of one such case, that I'm failing to find a link for. But this is by far the exception, not the rule.

(There are some proofs, most famously the map-coloring theorem, where part of the proof was done by a special-purpose computer program searching over a space of possibilities. I'm not counting these, as this feels like mathematicians doing a math proof and finding a subpart that they delegated to a machine.)

EDIT: I should note that one use case that seems plausible to me is to use formal verification techniques to verify learned specifications, or specifications that change based on the weights of some neural net, but I'd be pretty surprised if this was done using proof assistants (as opposed to other techniques in formal verification).

Comment by rohinmshah on Some promising career ideas beyond 80,000 Hours' priority paths · 2020-06-26T16:51:16.983Z · score: 22 (8 votes) · EA · GW
For example, it might be possible to use proof assistants to help solve the AI ‘alignment problem’ by creating AI systems that we can prove have certain properties we think are required for the AI system to reliably do what we want it to do.

I don't think this is particularly impactful, primarily because I don't see a path by which it has an impact, and I haven't seen anyone make a good case for this particular path to impact.

(It's hard to argue a negative, but if I had to try, I'd point out that if we want proofs, we would probably do those via math, which works at a much higher level of abstraction and so takes much less work / effort; formal verification seems good for catching bugs in your implementations of ideas, which is not the core of the AI risk problem.)

However, it is plausibly still worthwhile becoming an expert on formal verification because of the potential applications to cybersecurity. (Though it seems like in that case you should just become an expert on cybersecurity.)

Comment by rohinmshah on Critical Review of 'The Precipice': A Reassessment of the Risks of AI and Pandemics · 2020-05-19T16:20:25.051Z · score: 5 (3 votes) · EA · GW
This suggests that we should schedule a time to talk in person, and/or an adversarial collaboration trying to write a version of the argument that you're thinking of.

Sounds good, I'll just clarify my position in this response, rather than arguing against your claims.

So then I guess your response is something like "But everyone forgetting to eat food is a crazy scenario, whereas the naive extrapolation of the thing we're currently doing is the default scenario".

It's more like "there isn't any intellectual work to be done / field building to do / actors to coordinate to get everyone to eat".

Whereas in the AI case, I don't know how we're going to fix the problem I outlined; and as far as I can tell nor does anyone else in the AI community, and therefore there is intellectual work to be done.

We are already at significantly-better-than-human optimisation

Sorry, by optimization there I meant something more like "intelligence". I don't really care whether it comes from better SGD, some hardcoded planning algorithm, or a mesa optimizer; the question is whether it is significantly more capable than humans at pursuing goals.

I thought our opinions were much more similar.

I think our predictions of how the world will go concretely are similar; but I'd guess that I'm happier with abstract arguments that depend on fuzzy intuitive concepts than you are, and find them more compelling than more concrete ones that depend on a lot of specific details.

Comment by rohinmshah on Critical Review of 'The Precipice': A Reassessment of the Risks of AI and Pandemics · 2020-05-18T18:36:10.192Z · score: 9 (6 votes) · EA · GW
I agree that perfect optimisers are pathological. But we are not going to train anything that is within light-years of perfect optimisation. Perfect optimisation is a totally different type of thing to what we're doing.

If you replace "perfect optimization" with "significantly-better-than-human optimization" in all of my claims, I'd continue to agree with them.

This argument feels to me like saying "We shouldn't keep building bigger and bigger bombs because in the limit of size they'll form a black hole and destroy the Earth."

If somehow I knew that this fact were true, but I didn't know at what size the bombs form a black hole and destroy us all, I would in fact see this as a valid and motivating argument for not building bigger bombs, and for trying to figure out how to build bombs that don't destroy the Earth (or coordinate to not build them at all).

Firstly because it feels reminiscent of the utility-maximisation arguments made by Yudkowsky - in both cases the arguments are based on theoretical claims which are literally true but in practice irrelevant or vacuous.

I strongly disagree with this.

The utility-maximization argument that I disagree with is something like:

"AI is superintelligent" implies "AI is EU-maximizing" implies "AI has convergent instrumental subgoals".

This claim is not true even theoretically. It's not a question of what's happening in practice.

There is a separate argument which goes

"Superintelligent AI is built by humans" implies "AI is goal-directed" implies "AI has convergent instrumental subgoals"

And I place non-trivial weight on this claim, even though it is a conceptual, fuzzy claim that we're not sure yet will be relevant in practice, and one of the implications doesn't apply in the case where the AI is pursuing some "meta" goal that refers to the human's goals.

(You might disagree with this analysis as well, but I'd guess you'd be in the minority amongst AI safety researchers.)

The argument I gave is much more like the second kind -- a conceptual claim that depends on fuzzy categories like "certain specifications".

Secondly [...]

Sorry, I don't understand your point here. It sounds like "the last time we made an argument, we were wrong, therefore we shouldn't make more arguments", but that can't be what you're saying.

Maybe your point is that ML researchers are more competent than we give them credit for, and so we should lower our probability of x-risk? If so, I mostly just want to ignore this; I'm really not making a probabilistic argument. I'm making an argument "from the perspective of humanity / the full AI community".

I think spreading the argument "if we don't do X, then we are in trouble because of problem Y" seems better than spreading something like "there is a p% of having problem Y, where I've taken into account the fact that people will try to solve Y, and that won't be sufficient because of Z; therefore we need to put more effort into X". The former is easier to understand and more likely to be true / correctly reasoned.

(I would also defend "the chance is not so low that EAs should ignore it", but that's a separate conversation, and seems not very relevant to what arguments we should spread amongst the AI community.)

Thirdly, because I am epistemically paranoid about giving arguments which aren't actually the main reason to believe in a thing. [...] I suspect that the same is not really the case for you and the argument you give.

It totally is. I have basically two main concerns with AI alignment:

  • We're aiming for the wrong thing (outer alignment)
  • Even if we aim for the right thing, we might generalize poorly (inner alignment)

If you told me that inner alignment was magically not a problem -- we always generalize in the way that the reward function would have incentivized -- I would still be worried; though it would make a significant dent in my AI risk estimate.

If you told me that outer alignment was magically not a problem (we're actually aiming for the right thing), that would make a smaller but still significant dent in my estimate of AI risk. It's only smaller because I expect the work to solve this problem to be done by default, whereas I feel less confident about that for inner alignment.

it doesn't establish that AI safety work needs to be done by someone, it just establishes that AI researchers have to avoid naively extrapolating their current work.

Why is "not naively extrapolating their current work" not an example of AI safety work? Like, presumably they need to extrapolate in some as-yet-unknown way, figuring out that way sounds like a central example of AI safety work.

It seems analogous to "biologists just have to not publish infohazards, therefore there's no need to work on the malicious use category of biorisk".

Secondly because the argument is also true for image classifiers, since under perfect optimisation they could hack their loss functions. So insofar as we're not worried about them, then the actual work is being done by some other argument.

I'm not worried about them because there are riskier systems that will be built first, and because there isn't much economic value in having strongly superintelligent image classifiers. If we really tried to build strongly superintelligent image classifiers, I would be somewhat worried (though less so, since the restricted action space provides some safety).

(You might also think that image classifiers are safe because they are myopic, but in this world I'm imagining that we make non-myopic image classifiers, because they will be better at classifying images than myopic ones.)

Thirdly because I do think that counterfactual impact is the important bit, not "AI safety work needs to be done by someone."

I do think that there is counterfactual impact in expectation. I don't know why you think there isn't counterfactual impact. So far it sounds to me like "we should give the benefit of the doubt to ML researchers and assume they'll solve outer alignment", which sounds like a claim about norms, not a claim about the world.

I think the better argument against counterfactual impact is "there will be a strong economic incentive to solve these problems" (see e.g. here), and that might reduce it by an order of magnitude, but that still leaves a lot of possible impact. But also, I think this argument applies to inner alignment as well (though less strongly).

Comment by rohinmshah on Critical Review of 'The Precipice': A Reassessment of the Risks of AI and Pandemics · 2020-05-16T16:30:15.686Z · score: 10 (6 votes) · EA · GW
What is a "certain specification"?

I agree this is a fuzzy concept, in the same way that "human" is a fuzzy concept.

Is training an AI to follow instructions, giving it strong negative rewards every time it misinterprets us, then telling it to do X, a "certain specification" of X?

No, the specification there is to follow instructions. I am optimistic about these sorts of "meta" specifications; CIRL / assistance games can also be thought of as a "meta" specification to assist the human. But like, afaict this sort of idea has only recently become common in the AI community; I would guess partly because of people pointing out problems with the regular method of writing down specifications.

Broadly speaking, think of certain specifications as things that you plug in to hardcoded optimization algorithms (not learned ones which can have "common sense" and interpret you correctly).

I just don't think this concept makes sense in modern ML, because it's the optimiser, not the AI, that is given the specification.

If you use a perfect optimizer and train in the real world with what you would intuitively call a "certain specification", an existential catastrophe almost certainly happens. Given agreement on this fact, I'm just saying that I want a better argument for safety than "it's fine because we have a less-than-perfect optimizer", which as far as I can tell is ~the argument we have right now, especially since in the future we will presumably have better optimizers (where more compute during training is a type of better optimization).

More constructively, I just put this post online. It's far from comprehensive, but it points at what I'm concerned about more specifically than anything else.

I also find that the most plausible route by which you actually get to extinction, but it's way more speculative (to me) than the arguments I'm using above.

So this observed fact doesn't help us distinguish between "everyone in AI thinks that making AIs which intend to do what we want is an integral part of their mission, but that the 'intend' bit will be easy" vs "everyone in AI is just trying to build machines that can achieve hardcoded literal objectives even if it's very difficult to hardcode what we actually want".

??? I agree that you can't literally rule the first position out, but I've talked to many people in AI, and the closest people get to this position is saying "well maybe the 'intend' bit will be easy"; I haven't seen anyone argue for it.

I feel like you're equivocating between what AI researchers want (obviously they don't want extinction) and what they actually do (things that, if extrapolated naively, would lead to extinction).

I agree that they will start (and have started) working on the 'intend' bit once it's important, but to my mind that means at that point they will have started working on the category of work that we call "AI safety". This is consistent with my statement above:

Therefore, if we want superintelligent AI systems that don't have these problems, we need to change how AI is done.

(We in that statement was meant to refer to humanity as a whole.)

And without distinguishing them, then the "stated goal of AI" has no predictive power (if it even exists).

I specifically said this was not a prediction for this reason:

This doesn't tell you the probability with which superintelligent AI has convergent instrumental subgoals, since maybe we were always going to change how AI is done

Nonetheless, it still establishes "AI safety work needs to be done by someone", which seems like the important bit.

Perhaps you think that to motivate work by EAs on AI safety, you need to robustly demonstrate that a) there is a problem AND b) the problem won't be solved by default. I think this standard eliminates basically all x-risk prevention efforts, because you can always say "but if it's so important, someone else will probably solve it" (a thing that I think is approximately true).

(I don't think this is actually your position though, because the same critique could be applied to your new post.)

Comment by rohinmshah on Critical Review of 'The Precipice': A Reassessment of the Risks of AI and Pandemics · 2020-05-14T22:45:22.677Z · score: 9 (5 votes) · EA · GW

We have discussed this, so I'll just give brief responses so that others know what my position is. (My response to you is mostly in the last section, the others are primarily explanation for other readers.)

Convergent instrumental subgoals aren't the problem. Large-scale misaligned goals (instrumental or not) are the problem.

I'm not entirely sure what you mean by "large-scale", but misaligned goals simply argues for "the agent doesn't do what you want". To get to "the agent kills everyone", you need to bring in convergent instrumental subgoals.

Once you describe in more detail what it actually means for an AI system to "have some specification", the "certain" bit also stop seeming like a problem.

The model of "there is an POMDP, it has a reward function, the specification is to maximize expected reward" is fully formal and precise (once you spell out the MDP and reward), and the optimal solution usually involves convergent instrumental subgoals.

Whether or not a predefined specification gives rise to those sorts of goals depends on the AI architecture and training process in a complicated way.

I'm assuming you agree with:

1. The stated goal of AI research would very likely lead to human extinction

I agree that it is unclear whether AI systems actually get anywhere close to optimal for the tasks we train them for. However, if you think that we will get AGI and be fine, but we'll continue to give certain specifications of what we want, it seems like you also have to believe:

2. We will build AGI without changing the stated goal of AI research

3. AI research will not achieve its stated goal

The combination of 2 + 3 seems like a strange set of beliefs to have. (Not impossible, but unlikely.)

Comment by rohinmshah on Critical Review of 'The Precipice': A Reassessment of the Risks of AI and Pandemics · 2020-05-11T15:58:59.940Z · score: 65 (35 votes) · EA · GW
Though far from perfect, I believe this process if far more transparent than the estimates provided by Ord, for which no explanation is offered as to how they were derived. This means that it is effectively impossible to subject them to critical scrutiny.

I want to note that I agree with this, and I think it's good for people to write down their explicit reasoning.

That said, I disagree pretty strongly with the section on AI.

More generally, it is unclear why we should even expect AI researchers to have any particular knowledge about the future trajectories of AI capabilities. Such researchers study and develop particular statistical and computational techniques to solve specific types of problems. I am not aware of any focus of their training on extrapolating technological trends, or in investigations historical case studies of technological change.

I don't see why people keep saying this. Given the inconsistent expert responses to surveys, I think it makes sense to say that AI researchers probably aren't great at predicting future trajectories of AI capabilities. Nonetheless, if I had no inside-view knowledge and I wanted to get a guess at AI timelines, I'd ask experts in AI. (Even now, after people have spent a significant amount of time thinking about AI timelines, I would not ask experts in trend extrapolation; I seriously doubt that they would know which trends to extrapolate without talking to AI researchers.)

I suppose you could defend a position of the form "we can't know AI timelines", but it seems ridiculous to say "we can't know AI timelines, therefore AGI risk is low".

However such current methods, in particular deep learning, are known to be subject to a wide range of limitations. [...] at present they represent deep theoretical limitations of current methods

I disagree. So do many of the researchers at OpenAI and DeepMind, who are explicitly trying to build AGI using deep learning, reinforcement learning, and similar techniques. Meanwhile, academics tend to agree. I think from an outside view this should be maybe a 2x hit to the probability of developing AGI soon (if you start from the OpenAI / DeepMind position).

Atari games are highly simplified environments with comparatively few degrees of freedom, the number of possible actions is highly limited, and where a clear measure of success (score) is available. Real-world environments are extremely complicated, with a vast number of possible actions, and often no clear measure of success. Uncertainty also plays little direct role in Atari games, since a complete picture of the current gamespace is available to the agent. In the real world, all information gained from the environment is subject to error, and must be carefully integrated to provide an approximate model of the environment.

All of these except for the "clear measure of success" have already been surmounted (see OpenAI Five or AlphaStar for example). I'd bet that we'll see AI systems based on deep imitation learning and related techniques that work well in domains without a clear measure of success within the next 5 years. There definitely are several obstacles to general AI systems, but these aren't the obstacles.

These skills may be regarded as a subset of a very broad notion of intelligence, but do not seem to correspond very closely at all to the way we normally use the word ‘intelligence’, nor do they seem likely to be the sorts of things AIs would be very good at doing.

... Why wouldn't AIs be good at doing these things? It seems like your main point is that AI will lack a physical body and so will be bad at social interactions, but I don't see why an AI couldn't have social interactions from a laptop screen (just like the rest of us in the era of COVID-19).

More broadly, if you object to the implication "superintelligence implies ability to dominate the world", then just take whatever mental property P you think does allow an agent to dominate the world; I suspect both Toby and I would agree with "there is a non-trivial chance that future AI systems will be superhuman at P and so would be able to dominate the world".

While this seems plausible in the case of a reinforcement learning agent, it seems far less clear that it would apply to another form of AI. In particular, it is not even clear if humans actually posses anything that corresponds to a ‘reward function’, nor is it clear that such a thing is immutable with experience or over the lifespan. To assume that an AI would have such a thing therefore is to make specific assumptions about the form such an AI would take.

I agree with this critique of Toby's argument; I personally prefer the argument given in Human Compatible, which roughly goes:

  • Almost every AI system we've created so far (not just deep RL systems) have some predefined, hardcoded, certain specification that the AI is trying to optimize for.
  • A superintelligent agent pursuing a known specification has convergent instrumental subgoals (the thing that Toby is worried about).
  • Therefore, if we want superintelligent AI systems that don't have these problems, we need to change how AI is done.

This doesn't tell you the probability with which superintelligent AI has convergent instrumental subgoals, since maybe we were always going to change how AI is done, but it does show why you might expect the "default assumption" to be an AI system that has convergent instrumental subgoals, instead of one that is more satisficing like humans are.

the fate of a humanity dominated by an AI would be in the hands of that AI (or collective of AIs that share control)

This seems true to me, but if an AI system was so misaligned as to subjugate humans, I don't see why you should be hopeful that future changes in its motivations lead to it not subjugating humans. It's possible, but seems very unlikely (< 1%).

I regard 1) as roughly as likely as not

Isn't this exactly the same as Toby's estimate? (I actually don't know, I have a vague sense that this is true and was stated in The Precipice.)

Probability of unaligned artificial intelligence

Here are my own estimates for your causal pathway:

1: 0.8

2 conditioned on 1: 0.05 (I expect that there will be an ecosystem of AI systems, not a single AI system that can achieve a decisive strategic advantage)

3 conditioned on 1+2: 0.3 (If there is a single AI system that has a DSA, probably it took us by surprise, seems less likely we solved the problem in that world)

4 conditioned on 1+2+3: 0.99

Which gives in total ~0.012, or about 1%.

But really, the causal pathway I would want involves a change to 2 and 3:

2+3: Some large fraction of the AI systems in the world have reason / motivation to usurp power, and by coordinating they are able to do it.

Then:

1: 0.8

2+3 conditioned on 1: 0.1 (with ~10% on "has the motivation to usurp power" and ~95% on "can usurp power")

4: 0.99

Which comes out to ~0.08, or 8%.

Comment by rohinmshah on How do you talk about AI safety? · 2020-04-19T23:21:15.687Z · score: 3 (3 votes) · EA · GW
While I haven't read the book, Slate Star Codex has a great review on Human Compatible. Scott says it speaks of AI safety, especially in the long-term future, in a very professional sounding, and not weird way. So I suggest reading that book, or that review.

Was going to recommend this as well (and I have read the book).

Comment by rohinmshah on COVID-19 response as XRisk intervention · 2020-04-18T16:21:02.390Z · score: 8 (5 votes) · EA · GW
Note: part of what impressed Scott here was being early to raise the alarm, and that boat has already sailed, so it could be that future COVID-19 work won't do much to impress people like him.

I think that's crucial -- I'm generally supportive of EAs / rationalists to be doing things like COVID-19 work when they have a comparative advantage at doing so, which is a factor in why I support forecasting / meta work even now, and I'd certainly want biosecurity people to at least be thinking about how they could help with COVID-19 (as they in fact are). But the OP isn't arguing that, and whether or not it was intended I could see readers thinking that they should be actively trying to work on COVID even if they don't have an obvious comparative advantage at it, and that seems wrong to me.

This point about comparative advantage is also why I wrote:

I'd probably change my mind if I thought that these other longtermists could actually make a large impact on the COVID-19 response, but that seems quite unlikely to me.
Comment by rohinmshah on COVID-19 response as XRisk intervention · 2020-04-11T17:28:07.210Z · score: 24 (14 votes) · EA · GW

I can't tell whether you're arguing "some small subset of EAs/rationalists are in a great position to fight COVID-19 and they should do so" vs. "if an arbitrary EA/rationalist wants to fight COVID-19, they shouldn't worry that they are doing less because they aren't reducing x-risk" vs. "COVID-19 is such an opportunity for x-risk reduction that nearly all longtermists should be focusing on it now".

I agree with the first (in particular for people who work on forecasting / "meta" stuff), but not with the latter two. To the extent you're arguing for the latter two, I don't find the arguments very convincing, because they aren't comparing against counterfactuals. Taking each point in turn:

Training Ourselves

I agree that COVID-19 is particularly good for training the general bucket of forecasting / applied epistemology / scenario-planning.

However, for coordination, persuasive argumentation, networking, and project management, I don't see why COVID-19 is particularly better than other projects you could be working on. For example, I think I practiced all of those skills by organizing a local EA group; it also seems like ~any project that involves advocacy would likely require / train all of these skills.

Forging alliances

Presumably for most goals there are more direct ways to forge alliances than by working on COVID-19. E.g. you mentioned AI safety -- if I wanted to forge alliances with people at OSTP, I'd focus on current AI issues like interpretability and fairness.

Establishing credibility

I agree that this is important for the more "meta" parts of x-risk, such as forecasting. But for those of us who are working closer to the object level (e.g. technical AI safety, nuclear war, climate change), I don't really see how this is going to help establish credibility that's used in the future.

Growing the global risk movement

You talk about field-building here, which in fact seems like an important thing to be doing, but seems basically unrelated to the COVID-19 response. I'd guess that field-building has ~zero effect on how many people die from COVID-19 this year.

Creating XRisk infrastructure

Agreed that this is good.

Overall take: It does seem like anyone working on "meta" approaches to x-risk reduction probably should be thinking very seriously about how they can contribute to the COVID-19 response, but I'd guess that for most other longtermists the argument "it is just a distraction" is basically right.

I'd probably change my mind if I thought that these other longtermists could actually make a large impact on the COVID-19 response, but that seems quite unlikely to me.

Comment by rohinmshah on The case for building more and better epistemic institutions in the effective altruism community · 2020-03-30T06:44:27.632Z · score: 51 (22 votes) · EA · GW

I really like the general class of improving community epistemics :)

That being said, I feel pretty pessimistic about having dedicated "community builders" come in to create good institutions that would then improve the epistemics of the field: in my experience, most such attempts fail, because they don't actually solve a problem in a way that works for the people in the field (and in addition, they "poison the well", in that it makes it harder for someone else to build an actually-functioning version of the solution, because everyone in the field now expects it to fail and so doesn't buy in to it).

I feel much better about people within the field figuring out ways to improve the epistemics of the community they're in, trialing them out themselves, and if they seem to work well only then attempting to formalize them into an institution.

Take me as an example. I've done a lot of work that could be characterized as "trying to improve the epistemics of a community", such as:

The first five couldn't have been done by a person without the relevant expertise (in AI alignment for the first four, and in EA group organizing for the fifth). If they were trying to build institutions that would lead to any of these six things happening, I think they might have succeeded, but it probably would have taken multiple years, as opposed to it taking ~a month each for me. (Here I'm assuming that an institution is "built" once it operates through the effort of people within the field, with no or very little ongoing effort from the person who started the institution.) It's just quite hard to build institutions for a field without significant buy-in from people in the field, and creating that buy-in is hard.

I think people who find the general approach in this post interesting should probably be becoming very knowledgeable about a particular field (both the technical contents of the field, as well as the landscape of people who work on it), and then trying to improve the field from within.

It's also of course fine to think of ideas for better institutions and pitch them to people in the field; what I want to avoid is coming up with a clever idea and then trying to cause it to exist without already having a lot of buy in from people in the field.

Comment by rohinmshah on What are the challenges and problems with programming law-breaking constraints into AGI? · 2020-02-28T16:33:45.293Z · score: 4 (3 votes) · EA · GW

Yeah, I certainly feel better about learning law relative to learning the One True Set of Human Values That Shall Then Be Optimized Forevermore.

Comment by rohinmshah on What are the challenges and problems with programming law-breaking constraints into AGI? · 2020-02-28T16:31:52.554Z · score: 2 (2 votes) · EA · GW
I want (and I suspect you also want) AI systems to have such incentivization.

Not obviously. My point is just that if the AI is aligned with an human principal, and that human principal can be held accountable for the AI's actions, then that automatically disincentivizes AI systems from breaking the law.

(I'm not particularly opposed to AI systems being disincentivized directly, e.g. by making it possible to hold AI systems accountable for their actions. It just doesn't seem necessary in the world where we've solved alignment.)

I don't see why (from a societal perspective) we shouldn't just do that on the actor's side and not the "police's" side.

I agree that doing it on the actor's side is better if you can ensure it for all actors, but you have to also prevent the human principal from getting a different actor that isn't bound by law.

E.g. if you have a chauffeur who refuses to exceed the speed limit (in a country where the speed limit that's actually enforced is 10mph higher), you fire that chauffeur and find a different one.

(Also, I'm assuming you're teaching the agent to follow the law via something like case 2 above, where you have it read the law and understand it using its existing abilities, and then train it somehow to not break the law. If you were instead thinking something like case 1, I'd make the second argument that it isn't likely to work.)

Comment by rohinmshah on What are the challenges and problems with programming law-breaking constraints into AGI? · 2020-02-26T16:48:11.342Z · score: 2 (2 votes) · EA · GW
Couldn't the AI end up misaligned with the owners by accident, even if they're aligned with the rest of humanity?

Yes, but as I said earlier, I'm assuming the alignment problem has already been solved when talking about enforcement. I am not proposing enforcement as a solution to alignment.

If you haven't solved the alignment problem, enforcement doesn't help much, because you can't rely on your AI-enabled police to help catch the AI-enabled criminals, because the police AI itself may not be aligned with the police.

The question is whether 1 or 2 is better at aligning the AI in cases where enforcement is impossible or explicitly prevented.

Case 2 is assuming that you already have an intelligent agent with motivations, and then trying to deal with that after the fact. I agree this is not going to work for alignment. If for some reason I could only do 1 or 2 for alignment, I would try 1. (But there are in fact a bunch of other things that you can do.)

Comment by rohinmshah on My personal cruxes for working on AI safety · 2020-02-26T16:33:16.230Z · score: 2 (2 votes) · EA · GW

I broadly agree with this, but I feel like this is mostly skepticism of crux 3 and not crux 2. I think to switch my position on crux 2 using only timeline arguments, you'd have to argue something like <10% chance of transformative AI in 50 years.

Comment by rohinmshah on My personal cruxes for working on AI safety · 2020-02-25T17:44:30.324Z · score: 1 (1 votes) · EA · GW

My interpretation was that the crux was

We can do good by thinking ahead

One thing this leaves implicit is the counterfactual: in particular, I thought the point of the "Problems solve themselves" section was that if problems would be solved by default, then you can't do good by thinking ahead. I wanted to make that clearer, which led to

we both **can** and **need to** think ahead in order to solve [the alignment problem].

Where "can" talks about feasibility, and "need to" talks about the counterfactual.

I can remove the "and **need to**" if you think this is wrong.

Comment by rohinmshah on What are the challenges and problems with programming law-breaking constraints into AGI? · 2020-02-25T16:22:44.114Z · score: 1 (1 votes) · EA · GW
What if they also have access to nukes or other weapons that could prevent them or their owners from being held accountable if they're used?

I'm going to interpret this as:

  • Assume that the owners are misaligned w.r.t the rest of humanity (controversial, to me at least).
  • Assume that enforcement is impossible.

Under these assumptions, I feel better about 1 than 2, in the sense that case 1 feels like a ~5% chance of success while case 2 feels like a ~0% chance of success. (Numbers made up of course.)

But this seems like a pretty low-probability way the world could be (I would bet against both assumptions), and the increase in EV from work on it seems pretty low (since you only get 5% chance of success), so it doesn't seem like a strong argument to focus on case 1.

Comment by rohinmshah on What are the challenges and problems with programming law-breaking constraints into AGI? · 2020-02-25T14:39:44.157Z · score: 2 (2 votes) · EA · GW
Part of the reason that enforcement works, though, is that human agents have an independent incentive not to break the law (or, e.g., report legal violations) since they are legally accountable for their actions.

Certainly you still need legal accountability -- why wouldn't we have that? If we solve alignment, then we can just have the AI's owner be accountable for any law-breaking actions the AI takes.

This seems to require the same type of fundamental ML research that I am proposing: mapping AI actions onto laws.

Imagine trying to make teenagers law-abiding. You could have two strategies:

1. Rewire the neurons or learning algorithm in their brain such that you can say "the computation done to produce the output of neuron X reliably tracks whether a law has been violated, and because of its connection via neuron Y to neuron Z, if an action is predicted to violate a law, the teenager won't take it".

2. Explain to them what the laws are (relying on their existing ability to understand English, albeit fuzzily), and give them incentives to follow it.

I feel much better about 2 than 1.

When you say "programming AI to follow law" I imagine case 1 above (but for AI systems instead of humans). Certainly the OP seemed to be arguing for this case. This is the thing I think is extremely difficult.

I am much happier about AI systems learning about the law via case 2 above, which would enable the AI police applications I mentioned above.

However, some ML people I have talked about this with have given positive feedback, so I think you might be overestimating the difficulty.

I suspect they are thinking about case 2 above? Or they might be thinking of self-driving car type applications where you have an in-code representation of the world? Idk, I feel confident enough of this that I'd predict that there is a miscommunication somewhere, rather than an actual strong difference of opinion between me and them.

Comment by rohinmshah on What are the challenges and problems with programming law-breaking constraints into AGI? · 2020-02-25T14:21:56.856Z · score: 1 (1 votes) · EA · GW
My intuition is that more formal systems will be easier for AI to understand earlier in the "evolution" of SOTA AI intelligence than less-formal systems.

I agree for fully formal systems (e.g. solving SAT problems), but don't agree for "more formal" systems like law.

Mostly I'm thinking that understanding law would require you to understand language, but once you've understood language you also understand "what humans want". You could imagine a world in which AI systems understand the literal meaning of language but don't grasp the figurative / pedagogic / Gricean aspects of language, and in that world I think AI systems will understand law earlier than normal English, but that doesn't seem to be the world we live in:

  • GPT-2 and other language models don't seem particularly literal.
  • We have way more training data about natural language as it is normally used (most of the Internet), relative to natural language meant to be interpreted mostly literally.
  • Humans find it easier / more "native" to interpret language in the figurative / pedagogic way than to interpret it in the literal way.
My point was that I think that making a law-following AI that can follow (A) all enumerated laws is not much harder than one that can be made to follow (B) any given law.

Makes sense, that seems true to me.

Comment by rohinmshah on My personal cruxes for working on AI safety · 2020-02-25T14:11:14.892Z · score: 4 (3 votes) · EA · GW

Planned summary for the Alignment Newsletter:

This post describes how Buck's cause prioritization within an effective altruism framework leads him to work on AI risk. The case can be broken down into a conjunction of five cruxes. Specifically, the story for impact is that 1) AGI would be a big deal if it were created, 2) has a decent chance of being created soon, before any other "big deal" technology is created, and 3) poses an alignment problem that we both **can** and **need to** think ahead in order to solve. His research 4) would be put into practice if it solved the problem and 5) makes progress on solving the problem.

Planned opinion:

I enjoyed this post, and recommend reading it in full if you are interested in AI risk because of effective altruism. (I've kept the summary relatively short because not all of my readers care about effective altruism.) My personal cruxes and story of impact are actually fairly different: in particular, while this post sees the impact of research as coming from solving the technical alignment problem, I care about other sources of impact as well. See this comment for details.
Comment by rohinmshah on My personal cruxes for working on AI safety · 2020-02-25T14:09:41.701Z · score: 11 (9 votes) · EA · GW

I enjoyed this post, it was good to see this all laid out in a single essay, rather than floating around as a bunch of separate ideas.

That said, my personal cruxes and story of impact are actually fairly different: in particular, while this post sees the impact of research as coming from solving the technical alignment problem, I care about other sources of impact as well, including:

1. Field building: Research done now can help train people who will be able to analyze problems and find solutions in the future, when we have more evidence about what powerful AI systems will look like.

2. Credibility building: It does you no good to know how to align AI systems if the people who build AI systems don't use your solutions. Research done now helps establish the AI safety field as the people to talk to in order to keep advanced AI systems safe.

3. Influencing AI strategy: This is a catch all category meant to include the ways that technical research influences the probability that we deploy unsafe AI systems in the future. For example, if technical research provides more clarity on exactly which systems are risky and which ones are fine, it becomes less likely that people build the risky systems (nobody _wants_ an unsafe AI system), even though this research doesn't solve the alignment problem.

As a result, cruxes 3-5 in this post would not actually be cruxes for me (though 1 and 2 would be).

Comment by rohinmshah on What are the best arguments that AGI is on the horizon? · 2020-02-20T16:58:13.754Z · score: 4 (3 votes) · EA · GW

Just wanted to note that while I am quoted as being optimistic, I am still working on it specifically to cover the x-risk case and not the value lock-in case. (But certainly some people are working on the value lock-in case.)

(Also I think several people would disagree that I am optimistic, and would instead think I'm too pessimistic, e.g. I get the sense that I would be on the pessimistic side at FHI.)

Comment by rohinmshah on What are the challenges and problems with programming law-breaking constraints into AGI? · 2020-02-16T18:29:45.948Z · score: 3 (3 votes) · EA · GW

Cullen's argument was "alignment may not be enough, even if you solve alignment you might still want to program your AI to follow the law because <reasons>." So in my responses I've been assuming that we have solved alignment; I'm arguing that after solving alignment, AI-powered enforcement will probably be enough to handle the problems Cullen is talking about. Some quotes from Cullen's comment (emphasis mine):

Reasons other than directly getting value alignment from law that you might want to program AI to follow the law

We will presumably want organizations with AI to be bound by law.

We don't want to rely on the incentives of human principals to ensure their agents advance their goals in purely legal ways

Some responses to your comments:

if we want to automate "detect bad behavior", wouldn't that require AI alignment, too?

Yes, I'm assuming we've solved alignment here.

Isn't most of this after a crime has already been committed?

Good enforcement is also a deterrent against crime (someone without any qualms about murder will still usually not murder because of the harsh penalties and chance of being caught).

Furthermore, AIs may be able to learn new ways of hiding things from the police, so there could be gaps where the police are trying to catch up.

Remember that the police are also AI-enabled, and can find new ways of detecting things. Even so, this is possible: but it's also possible today, without AI: criminals presumably constantly find new ways of hiding things from the police.