Posts

[AN #80]: Why AI risk might be solved without additional intervention from longtermists 2020-01-03T07:52:24.981Z · score: 58 (25 votes)
Summary of Stuart Russell's new book, "Human Compatible" 2019-10-19T19:56:52.174Z · score: 29 (13 votes)
Alignment Newsletter One Year Retrospective 2019-04-10T07:00:34.021Z · score: 61 (23 votes)
Thoughts on the "Meta Trap" 2016-12-20T21:36:39.498Z · score: 8 (12 votes)
EA Berkeley Spring 2016 Retrospective 2016-09-11T06:37:02.183Z · score: 6 (6 votes)
EAGxBerkeley 2016 Retrospective 2016-09-11T06:27:16.316Z · score: 17 (7 votes)

Comments

Comment by rohinmshah on The case for building more and better epistemic institutions in the effective altruism community · 2020-03-30T06:44:27.632Z · score: 5 (4 votes) · EA · GW

I really like the general class of improving community epistemics :)

That being said, I feel pretty pessimistic about having dedicated "community builders" come in to create good institutions that would then improve the epistemics of the field: in my experience, most such attempts fail, because they don't actually solve a problem in a way that works for the people in the field (and in addition, they "poison the well", in that it makes it harder for someone else to build an actually-functioning version of the solution, because everyone in the field now expects it to fail and so doesn't buy in to it).

I feel much better about people within the field figuring out ways to improve the epistemics of the community they're in, trialing them out themselves, and if they seem to work well only then attempting to formalize them into an institution.

Take me as an example. I've done a lot of work that could be characterized as "trying to improve the epistemics of a community", such as:

The first five couldn't have been done by a person without the relevant expertise (in AI alignment for the first four, and in EA group organizing for the fifth). If they were trying to build institutions that would lead to any of these six things happening, I think they might have succeeded, but it probably would have taken multiple years, as opposed to it taking ~a month each for me. (Here I'm assuming that an institution is "built" once it operates through the effort of people within the field, with no or very little ongoing effort from the person who started the institution.) It's just quite hard to build institutions for a field without significant buy-in from people in the field, and creating that buy-in is hard.

I think people who find the general approach in this post interesting should probably be becoming very knowledgeable about a particular field (both the technical contents of the field, as well as the landscape of people who work on it), and then trying to improve the field from within.

It's also of course fine to think of ideas for better institutions and pitch them to people in the field; what I want to avoid is coming up with a clever idea and then trying to cause it to exist without already having a lot of buy in from people in the field.

Comment by rohinmshah on What are the challenges and problems with programming law-breaking constraints into AGI? · 2020-02-28T16:33:45.293Z · score: 4 (3 votes) · EA · GW

Yeah, I certainly feel better about learning law relative to learning the One True Set of Human Values That Shall Then Be Optimized Forevermore.

Comment by rohinmshah on What are the challenges and problems with programming law-breaking constraints into AGI? · 2020-02-28T16:31:52.554Z · score: 2 (2 votes) · EA · GW
I want (and I suspect you also want) AI systems to have such incentivization.

Not obviously. My point is just that if the AI is aligned with an human principal, and that human principal can be held accountable for the AI's actions, then that automatically disincentivizes AI systems from breaking the law.

(I'm not particularly opposed to AI systems being disincentivized directly, e.g. by making it possible to hold AI systems accountable for their actions. It just doesn't seem necessary in the world where we've solved alignment.)

I don't see why (from a societal perspective) we shouldn't just do that on the actor's side and not the "police's" side.

I agree that doing it on the actor's side is better if you can ensure it for all actors, but you have to also prevent the human principal from getting a different actor that isn't bound by law.

E.g. if you have a chauffeur who refuses to exceed the speed limit (in a country where the speed limit that's actually enforced is 10mph higher), you fire that chauffeur and find a different one.

(Also, I'm assuming you're teaching the agent to follow the law via something like case 2 above, where you have it read the law and understand it using its existing abilities, and then train it somehow to not break the law. If you were instead thinking something like case 1, I'd make the second argument that it isn't likely to work.)

Comment by rohinmshah on What are the challenges and problems with programming law-breaking constraints into AGI? · 2020-02-26T16:48:11.342Z · score: 2 (2 votes) · EA · GW
Couldn't the AI end up misaligned with the owners by accident, even if they're aligned with the rest of humanity?

Yes, but as I said earlier, I'm assuming the alignment problem has already been solved when talking about enforcement. I am not proposing enforcement as a solution to alignment.

If you haven't solved the alignment problem, enforcement doesn't help much, because you can't rely on your AI-enabled police to help catch the AI-enabled criminals, because the police AI itself may not be aligned with the police.

The question is whether 1 or 2 is better at aligning the AI in cases where enforcement is impossible or explicitly prevented.

Case 2 is assuming that you already have an intelligent agent with motivations, and then trying to deal with that after the fact. I agree this is not going to work for alignment. If for some reason I could only do 1 or 2 for alignment, I would try 1. (But there are in fact a bunch of other things that you can do.)

Comment by rohinmshah on My personal cruxes for working on AI safety · 2020-02-26T16:33:16.230Z · score: 2 (2 votes) · EA · GW

I broadly agree with this, but I feel like this is mostly skepticism of crux 3 and not crux 2. I think to switch my position on crux 2 using only timeline arguments, you'd have to argue something like <10% chance of transformative AI in 50 years.

Comment by rohinmshah on My personal cruxes for working on AI safety · 2020-02-25T17:44:30.324Z · score: 1 (1 votes) · EA · GW

My interpretation was that the crux was

We can do good by thinking ahead

One thing this leaves implicit is the counterfactual: in particular, I thought the point of the "Problems solve themselves" section was that if problems would be solved by default, then you can't do good by thinking ahead. I wanted to make that clearer, which led to

we both **can** and **need to** think ahead in order to solve [the alignment problem].

Where "can" talks about feasibility, and "need to" talks about the counterfactual.

I can remove the "and **need to**" if you think this is wrong.

Comment by rohinmshah on What are the challenges and problems with programming law-breaking constraints into AGI? · 2020-02-25T16:22:44.114Z · score: 1 (1 votes) · EA · GW
What if they also have access to nukes or other weapons that could prevent them or their owners from being held accountable if they're used?

I'm going to interpret this as:

  • Assume that the owners are misaligned w.r.t the rest of humanity (controversial, to me at least).
  • Assume that enforcement is impossible.

Under these assumptions, I feel better about 1 than 2, in the sense that case 1 feels like a ~5% chance of success while case 2 feels like a ~0% chance of success. (Numbers made up of course.)

But this seems like a pretty low-probability way the world could be (I would bet against both assumptions), and the increase in EV from work on it seems pretty low (since you only get 5% chance of success), so it doesn't seem like a strong argument to focus on case 1.

Comment by rohinmshah on What are the challenges and problems with programming law-breaking constraints into AGI? · 2020-02-25T14:39:44.157Z · score: 2 (2 votes) · EA · GW
Part of the reason that enforcement works, though, is that human agents have an independent incentive not to break the law (or, e.g., report legal violations) since they are legally accountable for their actions.

Certainly you still need legal accountability -- why wouldn't we have that? If we solve alignment, then we can just have the AI's owner be accountable for any law-breaking actions the AI takes.

This seems to require the same type of fundamental ML research that I am proposing: mapping AI actions onto laws.

Imagine trying to make teenagers law-abiding. You could have two strategies:

1. Rewire the neurons or learning algorithm in their brain such that you can say "the computation done to produce the output of neuron X reliably tracks whether a law has been violated, and because of its connection via neuron Y to neuron Z, if an action is predicted to violate a law, the teenager won't take it".

2. Explain to them what the laws are (relying on their existing ability to understand English, albeit fuzzily), and give them incentives to follow it.

I feel much better about 2 than 1.

When you say "programming AI to follow law" I imagine case 1 above (but for AI systems instead of humans). Certainly the OP seemed to be arguing for this case. This is the thing I think is extremely difficult.

I am much happier about AI systems learning about the law via case 2 above, which would enable the AI police applications I mentioned above.

However, some ML people I have talked about this with have given positive feedback, so I think you might be overestimating the difficulty.

I suspect they are thinking about case 2 above? Or they might be thinking of self-driving car type applications where you have an in-code representation of the world? Idk, I feel confident enough of this that I'd predict that there is a miscommunication somewhere, rather than an actual strong difference of opinion between me and them.

Comment by rohinmshah on What are the challenges and problems with programming law-breaking constraints into AGI? · 2020-02-25T14:21:56.856Z · score: 1 (1 votes) · EA · GW
My intuition is that more formal systems will be easier for AI to understand earlier in the "evolution" of SOTA AI intelligence than less-formal systems.

I agree for fully formal systems (e.g. solving SAT problems), but don't agree for "more formal" systems like law.

Mostly I'm thinking that understanding law would require you to understand language, but once you've understood language you also understand "what humans want". You could imagine a world in which AI systems understand the literal meaning of language but don't grasp the figurative / pedagogic / Gricean aspects of language, and in that world I think AI systems will understand law earlier than normal English, but that doesn't seem to be the world we live in:

  • GPT-2 and other language models don't seem particularly literal.
  • We have way more training data about natural language as it is normally used (most of the Internet), relative to natural language meant to be interpreted mostly literally.
  • Humans find it easier / more "native" to interpret language in the figurative / pedagogic way than to interpret it in the literal way.
My point was that I think that making a law-following AI that can follow (A) all enumerated laws is not much harder than one that can be made to follow (B) any given law.

Makes sense, that seems true to me.

Comment by rohinmshah on My personal cruxes for working on AI safety · 2020-02-25T14:11:14.892Z · score: 4 (3 votes) · EA · GW

Planned summary for the Alignment Newsletter:

This post describes how Buck's cause prioritization within an effective altruism framework leads him to work on AI risk. The case can be broken down into a conjunction of five cruxes. Specifically, the story for impact is that 1) AGI would be a big deal if it were created, 2) has a decent chance of being created soon, before any other "big deal" technology is created, and 3) poses an alignment problem that we both **can** and **need to** think ahead in order to solve. His research 4) would be put into practice if it solved the problem and 5) makes progress on solving the problem.

Planned opinion:

I enjoyed this post, and recommend reading it in full if you are interested in AI risk because of effective altruism. (I've kept the summary relatively short because not all of my readers care about effective altruism.) My personal cruxes and story of impact are actually fairly different: in particular, while this post sees the impact of research as coming from solving the technical alignment problem, I care about other sources of impact as well. See this comment for details.
Comment by rohinmshah on My personal cruxes for working on AI safety · 2020-02-25T14:09:41.701Z · score: 9 (7 votes) · EA · GW

I enjoyed this post, it was good to see this all laid out in a single essay, rather than floating around as a bunch of separate ideas.

That said, my personal cruxes and story of impact are actually fairly different: in particular, while this post sees the impact of research as coming from solving the technical alignment problem, I care about other sources of impact as well, including:

1. Field building: Research done now can help train people who will be able to analyze problems and find solutions in the future, when we have more evidence about what powerful AI systems will look like.

2. Credibility building: It does you no good to know how to align AI systems if the people who build AI systems don't use your solutions. Research done now helps establish the AI safety field as the people to talk to in order to keep advanced AI systems safe.

3. Influencing AI strategy: This is a catch all category meant to include the ways that technical research influences the probability that we deploy unsafe AI systems in the future. For example, if technical research provides more clarity on exactly which systems are risky and which ones are fine, it becomes less likely that people build the risky systems (nobody _wants_ an unsafe AI system), even though this research doesn't solve the alignment problem.

As a result, cruxes 3-5 in this post would not actually be cruxes for me (though 1 and 2 would be).

Comment by rohinmshah on What are the best arguments that AGI is on the horizon? · 2020-02-20T16:58:13.754Z · score: 4 (3 votes) · EA · GW

Just wanted to note that while I am quoted as being optimistic, I am still working on it specifically to cover the x-risk case and not the value lock-in case. (But certainly some people are working on the value lock-in case.)

(Also I think several people would disagree that I am optimistic, and would instead think I'm too pessimistic, e.g. I get the sense that I would be on the pessimistic side at FHI.)

Comment by rohinmshah on What are the challenges and problems with programming law-breaking constraints into AGI? · 2020-02-16T18:29:45.948Z · score: 3 (3 votes) · EA · GW

Cullen's argument was "alignment may not be enough, even if you solve alignment you might still want to program your AI to follow the law because <reasons>." So in my responses I've been assuming that we have solved alignment; I'm arguing that after solving alignment, AI-powered enforcement will probably be enough to handle the problems Cullen is talking about. Some quotes from Cullen's comment (emphasis mine):

Reasons other than directly getting value alignment from law that you might want to program AI to follow the law

We will presumably want organizations with AI to be bound by law.

We don't want to rely on the incentives of human principals to ensure their agents advance their goals in purely legal ways

Some responses to your comments:

if we want to automate "detect bad behavior", wouldn't that require AI alignment, too?

Yes, I'm assuming we've solved alignment here.

Isn't most of this after a crime has already been committed?

Good enforcement is also a deterrent against crime (someone without any qualms about murder will still usually not murder because of the harsh penalties and chance of being caught).

Furthermore, AIs may be able to learn new ways of hiding things from the police, so there could be gaps where the police are trying to catch up.

Remember that the police are also AI-enabled, and can find new ways of detecting things. Even so, this is possible: but it's also possible today, without AI: criminals presumably constantly find new ways of hiding things from the police.

Comment by rohinmshah on What are the challenges and problems with programming law-breaking constraints into AGI? · 2020-02-15T12:59:42.636Z · score: 2 (2 votes) · EA · GW
(Most) real laws have huge bodies of interpretative text surrounding them and examples of real-world applications of them to real-world facts.

Right, I was trying to factor this part out, because it seemed to me that the hope was "the law is explicit and therefore can be programmed in". But if you want to include all of the interpretative text and examples of real-world application, it starts looking more like "here is a crap ton of data about this law, please understand what this law means and then act in accordance to it", as opposed to directly hardcoding in the law.

Under this interpretation (which may not be what you meant) this becomes a claim that laws have a lot more data that pinpoints what exactly they mean, relative to something like "what humans want", and so an AI system will more easily pinpoint it. I'm somewhat sympathetic to this claim, though I think there is a lot of data about "what humans want" in everyday life that the AI can learn from. But my real reason for not caring too much about this is that in this story we rely on the AI's "intelligence" to "understand" laws, as opposed to "programming it in"; given that we're worried about superintelligent AI it should be "intelligent" enough to "understand" what humans want as well (given that humans seem to be able to do that).

Lawyers approximate generalists: they can take arbitrary written laws and give advice on how to conform behavior to those laws. So a lawyerlike AI might be able to learn general interpretative principles and research skills and be able to simulate legal adjudications of proposed actions.

I'm not sure what you're trying to imply with this -- does this make the AIs task easier? Harder? The generality somehow implies that the AI is safer?

Like, I don't get why this point has any bearing on whether it is better to train "lawyerlike AI" or "AI that tries to do what humans want". If anything, I think it pushes in the "do what humans want" direction, since historically it has been very difficult to create generalist AIs, and easier to create specialist AIs.

(Though I'm not sure I think "AI that tries to do what humans want" is less "general" than lawyerlike AI.)

Comment by rohinmshah on What are the challenges and problems with programming law-breaking constraints into AGI? · 2020-02-15T12:51:37.258Z · score: 3 (3 votes) · EA · GW

I agree that getting a guarantee of following the law is (probably) better than trying to ensure it through enforcement, all else equal. I also agree that in principle programming the AI to follow the law could give such a guarantee. So in some normative sense, I agree that it would be better if it were programmed to follow the law.

My main argument here is that it is not worth the effort. This factors into two claims:

First, it would be hard to do. I am a programmer / ML researcher and I have no idea how to program an AI to follow the law in some guaranteed way. I also have an intuitive sense that it would be very difficult. I think the vast majority of programmers / ML researchers would agree with me on this.

Second, it doesn't provide much value, because you can get most of the benefits via enforcement, which has the virtue of being the solution we currently use.

It will also probably be able to hide its actions, obscure its motives, and/or evade detection better than humans could.

But AI-enabled police would be able to probe actions, infer motives, and detect bad behavior better than humans could. In addition, AI systems could have fewer rights than humans, and could be designed to be more transparent than humans, making the police's job easier.

Comment by rohinmshah on What are the challenges and problems with programming law-breaking constraints into AGI? · 2020-02-07T06:03:20.982Z · score: 6 (2 votes) · EA · GW
Law is less indeterminate than you might think, and probably more definite than human values

Agreed that "human values" is harder and more indeterminate, because it's a tricky philosophical problem that may not even have a solution.

I don't think "alignment" is harder or more indeterminate, where "alignment" means something like "I have in mind something I want the AI system to do, it does that thing, without trying to manipulate me / deceive me etc."

Like, idk, imagine there was a law that said "All AI systems must not deceive their users, and must do what they believe their users want". A real law would probably only be slightly more explicit than that? If so, just creating an AI system that followed only this law would lead to something that meets the criterion I'm imagining. Creating an AI system that follows all laws seems a lot harder.

Due to the formality of parts of law and the legal process, an AI can be made to have higher confidence that an action is (2) than (1).

I think this would probably have been true of expert systems but not so true of deep learning-based systems.

Also, personally I find it easier to tell when my actions are unaligned with <person X whom I know> than when my actions are illegal.

Comment by rohinmshah on What are the challenges and problems with programming law-breaking constraints into AGI? · 2020-02-07T05:53:11.761Z · score: 8 (3 votes) · EA · GW

Agree with all of these, but they don't require you to program your AI to follow the law (sounds horrendously difficult), they require that you can enforce the law on AI systems. If you've solved alignment to arbitrary tasks/preferences, then I'd expect you can solve the enforcement problem too -- if you're worried about criminals having powerful AI systems, you can give powerful AI systems to the police / judicial system / whatever else you think is important.

Comment by rohinmshah on What are the challenges and problems with programming law-breaking constraints into AGI? · 2020-02-07T05:49:02.767Z · score: 1 (1 votes) · EA · GW
These are things it would be trained to learn. It would learn to read and could read biology textbooks and papers or things online, and it would also see pictures of people, brains, etc..

It really sounds like this sort of training is going to require it to be able to interpret English the way we interpret English (e.g. to read biology textbooks); if you're going to rely on that I don't see why you don't want to rely on that ability when we are giving it instructions.

This could be an explicit output we train the AI to predict (possibly part of responses in language).

That... is ambitious, if you want to do this for every term that exists in laws. But I agree that if you did this, you could try to "translate" laws into code in a literal fashion. I'm fairly confident that this would still be pretty far from what you wanted, because laws aren't meant to be literal, but I'm not going to try to argue that here.

(Also, it probably wouldn't be computationally efficient -- that "don't kill a person" law, to be implemented literally in code, would require you to loop over all people, and make a prediction for each one: extremely expensive.)

I "named" a particular person in that sentence.

Ah, I see. In that case I take back my objection about butterfly effects.

Comment by rohinmshah on What are the challenges and problems with programming law-breaking constraints into AGI? · 2020-02-05T20:23:16.215Z · score: 2 (2 votes) · EA · GW
Once we add caveats like “what we would want / intend after sufficient rational reflection,” my sense is that “values” just captures that more intuitively.

I in fact don't want to add in those caveats here: I'm suggesting that we tell our AI system to do what we short-term want. (Of course, we can then "short-term want" to do more rational reflection, or to be informed of true and useful things that help us make moral progress, etc.)

I agree that "values" more intuitively captures the thing with all the caveats added in.

Comment by rohinmshah on What are the challenges and problems with programming law-breaking constraints into AGI? · 2020-02-05T20:17:01.689Z · score: 1 (1 votes) · EA · GW
Can we just define them as we normally do, e.g. biologically with a functioning brain?

How do you define "biological" and "brain"? Again, your input is a camera image, so you have to build this up starting from sentences of the form "the pixel in the top left corner is this shade of grey".

(Or you can choose some other input, as long as we actually have existing technology that can create that input.)

The AI would do this. Are AIs that aren't good at estimating probabilities of events smart enough to worry about?

Powerful AIs will certainly behave in ways that make it look like they are estimating probabilities.

Let's take AIs trained by deep reinforcement learning as an example. If you want to encode something like "Any particular person dies at least x earlier with probability > p than they would have by inaction" explicitly and literally in code, you will need functions like getAllPeople() and getProbability(event). AIs do not usually come equipped with such functions, so you either have to say how to use the AI system to implement those functions, or you have to implement them yourself. I am claiming that the second option is hard, and any solution you have for the first option will probably also work for something like telling the AI system to "do what the user wants".

The AI waits for the next request, turns off or some other inconsequential default action.

If you're a self-driving car, it's very unclear what an inconsequential default action is. (Though I agree in general there's often some default action that is fine.)

Maybe my wording didn't capture this well, but my intention was a presentist/necessitarian person-affecting approach (not that I agree with the ethical position).

I mean, the existence part was not the main point -- my point was that if butterfly effects are real, then the AI system must always do nothing (even if it can't predict what the butterfly effects would be). If you want to avoid debates about population ethics, you could imagine butterfly effects that affect current people: e.g. you slightly change who talks to whom, which changes whether a person gets hit by a car later in the day or not.

I'm not arguing that these sorts of butterfly effects are real -- I'm not sure -- but it seems bad for the behavior of our AI system to depend so strongly on whether butterfly effects are real.

Comment by rohinmshah on What are the challenges and problems with programming law-breaking constraints into AGI? · 2020-02-03T22:56:10.392Z · score: 4 (4 votes) · EA · GW

If you want literal interpretations, specificity, and explicitness, I think you're in for a bad time:

"Any particular person dies at least x earlier with probability > p than they would have by inaction"

How do you intend to define "person" in terms of the inputs to an AI system (let's assume a camera image)? How do you compute the "probability" of an event? What is "inaction"?

(There's also the problem that all actions probably change who does and doesn't exists, so this law would require the AI system to always take inaction, making it useless.)

Comment by rohinmshah on What are the challenges and problems with programming law-breaking constraints into AGI? · 2020-02-03T16:36:12.930Z · score: 3 (3 votes) · EA · GW
To illustrate, "Maximize paperclips without killing anyone" is not an interpretation of "Maximize paperclips"

Huh? If I ask someone to manage my paperclip factory, I certainly do expect them to interpret that request to include "and also don't kill anyone".

This seems like it could be a problem of reasoning and understanding language, instead of the problem of understanding and acting in line with human values.

I feel like the word "values" makes this sound more complex than it is, and I'd say we instead want the agent to understand and act in line with what the human wants / intends.

This is then also a problem of reasoning and understanding language: when I say "please help me write good education policy laws", if it understands language and reason, and acts based on that, that seems pretty aligned to me.

Isn't interpreting statements (e.g. laws) and checking if they apply to a given action a narrower, more structured and better-defined problem than getting AI to do what we want it to do?

I am not a law expert, but my impression is that there is a lot of common sense + human judgment in the application of laws, just as there is a lot of common sense + human judgment in interpreting requests.

Comment by rohinmshah on What are the challenges and problems with programming law-breaking constraints into AGI? · 2020-02-03T00:18:41.688Z · score: 22 (10 votes) · EA · GW

You're focusing on the issue that current laws don't capture everything we care about, which is definitely a problem.

However, the bigger problem is that there isn't a clear definition of what does and doesn't break the law that you can write down in a program.

You might say that we could train an AI system to learn what is and isn't breaking the law; but then you might as well train an AI system to learn what is and isn't the thing you want it to do. It's not clear why training to follow laws would be easier than training it to do what you want; the latter would be a much more useful AI system.

Comment by rohinmshah on [AN #80]: Why AI risk might be solved without additional intervention from longtermists · 2020-01-19T21:03:51.241Z · score: 1 (1 votes) · EA · GW

See this comment thread.

Comment by rohinmshah on [AN #80]: Why AI risk might be solved without additional intervention from longtermists · 2020-01-19T21:03:20.250Z · score: 8 (3 votes) · EA · GW
Sends me the message that longtermists should care less about AI risk.

I do believe that, and so does Robin. I don't know about Paul and Adam, but I wouldn't be surprised if they thought so too.

Though, the people in the "conversations" all support AI safety research.

Well, it's unclear if Robin supports AI safety research, but yes, the other three of us do. This is because:

10% chance of existential risk from AI sounds like a problem of catastrophic proportions to me.

(Though I'll note that I don't think the 10% figure is robust.)

I'm not arguing "AI will definitely go well by default, so no one should work on it". I'm arguing "Longtermists currently overestimate the magnitude of AI risk".

I also broadly agree with reallyeli:

However I really think we ought to be able to discuss guesses about what's true merely on the level of what's true, without thinking about secondary messages being sent by some statement or another. It seems to me that if we're unable to do so, that will make the difficult task of finding truth even more difficult.

And this really does have important implications: if you believe "non-robust 10% chance of AI accident risk", maybe you'll find that biosecurity, global governance, etc. are more important problems to work on. I haven't checked myself -- for me personally, it seems quite clear that AI safety is my comparative advantage -- but I wouldn't be surprised if on reflection I thought one of those areas was more important for EA to work on than AI safety.

Comment by rohinmshah on [AN #80]: Why AI risk might be solved without additional intervention from longtermists · 2020-01-19T20:54:04.122Z · score: 2 (2 votes) · EA · GW

See this comment thread.

Comment by rohinmshah on Learning to ask action-relevant questions · 2019-12-29T22:23:31.538Z · score: 2 (2 votes) · EA · GW
I hope it's sufficiently clear that I'm not trying to claim that action-relevance is *all* you should think about as a fledgling researcher?

I didn't think that you thought that; I think the post is fine as is. I wasn't trying to critique this post; it's an important concept and I can certainly think of some people who I think should take this advice.

Comment by rohinmshah on Learning to ask action-relevant questions · 2019-12-29T02:54:01.392Z · score: 13 (12 votes) · EA · GW

In the spirit of reversing advice, the very short case for not asking yourself whether something is action-relevant, is that curiosity is an incredibly valuable tool for motivation and directing your learning where there is something important to be learned. Justifying every question on decision-relevance replaces curiosity with (semi) explicit reasoning; it is not clear to me that this is a good trade (many of the best thinkers of the past seem to me to be extremely curious, and in my experience, explicit reasoning is not very powerful).

I don't have a strong opinion on whether the median EA interested in research should be taking this advice or its opposite.

Comment by rohinmshah on I'm Buck Shlegeris, I do research and outreach at MIRI, AMA · 2019-11-20T19:09:25.842Z · score: 19 (9 votes) · EA · GW
I suspect that things like the Alignment Newsletter are causing AI safety researchers to understand and engage with each other's work more; this seems good.

This is the goal, but it's unclear that it's having much of an effect. I feel like I relatively often have conversations with AI safety researchers where I mention something I highlighted in the newsletter, and the other person hasn't heard of it, or has a very superficial / wrong understanding of it (one that I think would be corrected by reading just the summary in the newsletter).

This is very anecdotal; even if there are times when I talk to people and they do know the paper that I'm talking about because of the newsletter, I probably wouldn't notice / learn that fact.

(In contrast, junior researchers are often more informed than I would expect, at least about the landscape, even if not the underlying reasons / arguments.)

Comment by rohinmshah on AGI safety and losing electricity/industry resilience cost-effectiveness · 2019-11-18T00:51:23.359Z · score: 1 (1 votes) · EA · GW

Thanks!

Comment by rohinmshah on AGI safety and losing electricity/industry resilience cost-effectiveness · 2019-11-18T00:49:52.891Z · score: 6 (4 votes) · EA · GW

I mostly meant phrasing it as "the model result", the "99-100%" is fine if it's clear that it's from a model and not your considered belief.

Comment by rohinmshah on AGI safety and losing electricity/industry resilience cost-effectiveness · 2019-11-17T17:37:20.602Z · score: 5 (3 votes) · EA · GW

I would have liked to see the models and graphs (presumably the most important part of the paper), but the images don't load and the links to the models don't work:

Table 1 shows the key input parameters for Model 1 (largely Denkenberger and conference poll of effective altruists)(D. Denkenberger, Cotton-Barrat, Dewey, & Li, 2019a) and Model 2 (D. Denkenberger, Cotton-Barratt, Dewey, & Li, 2019) (Sandberg inputs)(3).

Also:

However, it can be said with 99%-100% confidence that funding interventions for losing industry now is more cost effective than additional funding for AGI safety beyond the expected $3 billion.

If you don't actually mean such confidence (which I assume you don't because 1. it's crazy and 2. you mention model uncertainty elsewhere), can you please not say it?

Comment by rohinmshah on A conversation with Rohin Shah · 2019-11-14T01:11:42.987Z · score: 2 (2 votes) · EA · GW
perhaps babies develop a sense of "hierarchy" which then gets applied to language, explaining how children learn languages so fast.

Though if we are to believe this paper at face value (I haven't evaluated it), babies start learning in the womb. (The paper claims that the biases depend on which language is spoken around the pregnant mother, which suggests that it must be learned, rather than being "built-in".)

Comment by rohinmshah on On AI Weapons · 2019-11-14T01:09:33.801Z · score: 2 (2 votes) · EA · GW

Ah, somehow I missed that, thanks!

Comment by rohinmshah on On AI Weapons · 2019-11-13T20:41:00.352Z · score: 1 (1 votes) · EA · GW

While I'm broadly uncertain about the overall effects of LAWs within the categories you've identified, and it seems plausible that LAWs are more likely to be good given those particular consequences, one major consideration for me against LAWs is that it plausibly would differentially benefit small misaligned groups such as terrorists. This is the main point of the Slaughterbots video. I don't know how big this effect is, especially since I don't know how much terrorism there is or how competent terrorists are; I'm just claiming that it is plausibly big enough to make a ban on LAWs desirable.

Comment by rohinmshah on A conversation with Rohin Shah · 2019-11-13T02:28:05.613Z · score: 9 (4 votes) · EA · GW
(Not sure how much of this Shah already knows.)

Not much, sadly. I don't actually intend to learn about it in the near future, because I don't think timelines are particularly decision-relevant to me (though they are to others, especially funders). Thanks for the links!

Tooby and Cosmides are big advocates for the "massive modularity" view--a huge amount of human cognition takes place in specialized, task-tailored modules rather than on one big, domain-general "computer".

On my view, babies would learn a huge amount about the structure of the world simply by interacting with it (pushing over an object can in principle teach you a lot about objects, causality, intuitive physics, etc), and this leads to general patterns that we later call "inductive biases" for more complex tasks. For example, hierarchy is a very useful way to understand basically any environment we are ever in; perhaps babies develop a sense of "hierarchy" which then gets applied to language, explaining how children learn languages so fast.

From the Wikipedia page you linked, challenges to a "rationality" based view:

1. Evolutionary theories using the idea of numerous domain-specific adaptions have produced testable predictions that have been empirically confirmed; the theory of domain-general rational thought has produced no such predictions or confirmations.

I wish they said what these predictions were. I'm not going to chase down this reference.

2. The rapidity of responses such as jealousy due to infidelity indicates a domain-specific dedicated module rather than a general, deliberate, rational calculation of consequences.

This is a good point; in general emotions are probably not learned, for the most part. I'm not sure what's going on there.

3. Reactions may occur instinctively (consistent with innate knowledge) even if a person has not learned such knowledge.

I agree that reflexes are "built-in" and not learned; reflexes are also pretty different from e.g. language. Obviously not everything our bodies do is "learned", reflexes, breathing, digestion, etc. all fall into the "built-in" category. I don't think this says much about what leads humans to be good at chess, language, plumbing, soccer, gardening, etc, which is what I'm more interested in.

It seems likely to me that you might need the equivalent of reflexes, breathing, digestion, etc. if you want to design a fully autonomous agent that learns without any human support whatsoever, but we will probably instead design an agent that (initially) depends on us to keep the electricity flowing, to fix any wiring issues, to keep up the Internet connection, etc. (In contrast, human parents can't ensure that the child keeps breathing, so you need an automatic, built-in system for that.)

Comment by rohinmshah on Does 80,000 Hours focus too much on AI risk? · 2019-11-03T17:14:45.567Z · score: 54 (18 votes) · EA · GW
Top AI safety researchers are now saying that they expect AI to be safe by default, without further intervention from EA. See here and here.

Two points:

  • "Probably safe by default" doesn't mean "we shouldn't work on it". My estimate of 90% that you quote still leaves a 10% chance of catastrophe, which is worth reducing. (Though the 10% is very non-robust.) It also is my opinion before updating on other people's views.
  • Those posts were published because AI Impacts was looking to have conversations with people who had safe-by-default views, so there's a strong selection bias. If you looked for people with doom-by-default views, you could find them.
Comment by rohinmshah on Publication of Stuart Russell’s new book on AI safety - reviews needed · 2019-10-24T19:40:33.094Z · score: 3 (2 votes) · EA · GW

Amusingly, I use my own Amazon account so infrequently that they refuse to let me write a review. I didn't think about GoodReads, I might do that.

Comment by rohinmshah on Summary of Stuart Russell's new book, "Human Compatible" · 2019-10-21T21:52:57.636Z · score: 7 (2 votes) · EA · GW

I also added a bunch of comments with some other less polished thoughts on the book on the Alignment Forum version of this post.

Comment by rohinmshah on Why do you reject negative utilitarianism? · 2019-10-18T17:10:51.805Z · score: 2 (2 votes) · EA · GW

Yes, that's correct.

Comment by rohinmshah on Why were people skeptical about RAISE? · 2019-09-04T22:03:47.475Z · score: 3 (3 votes) · EA · GW

Mathematical knowledge would be knowing that the Pythagoras theorem states that , mathematical thinking would be the ability to prove that theorem from first principles.

The way I use the phrase, mathematical thinking doesn't only encompass proofs. It would also count as "mathematical reasoning" if you figure out that means are affected by outliers more than medians are, even if you don't write down any formulas, equations, or proofs.

Comment by rohinmshah on Why were people skeptical about RAISE? · 2019-09-04T16:22:53.326Z · score: 15 (6 votes) · EA · GW

Depends what you call the "goal".

If you mean "make it easier for new people to get up to speed", I'm all for that goal. That goal encompasses a significant chunk of the value of the Alignment Newsletter.

If you mean "create courses that allow new people to get the required mathematical maturity", I'm less excited. Such courses already exist, and while mathematical thinking is extremely useful, mathematical knowledge mostly isn't. (Mathematical knowledge is more useful for MIRI-style work, but I'd guess it's still not that useful.)

Comment by rohinmshah on Debrief: "cash prizes for the best arguments against psychedelics" · 2019-07-17T05:13:39.082Z · score: 3 (2 votes) · EA · GW

I don't know of any such stats, but I also don't know much about CFAR.

Comment by rohinmshah on How Europe might matter for AI governance · 2019-07-17T05:12:50.902Z · score: 4 (3 votes) · EA · GW

I was excluding governance papers, because it seems like the relevant question is "will AI development happen in Europe or elsewhere", and governance papers provide ~no evidence for or against that.

Comment by rohinmshah on How Europe might matter for AI governance · 2019-07-15T20:10:38.757Z · score: 23 (9 votes) · EA · GW

My lived experience is that most of the papers I care about (even excluding safety-related papers) come from the US. There are lots of reasons that both of these could be true, but for the sake of improving AGI-related governance, I think my lived experience is a much better measure of the thing we actually care about (which is something like "which region does good AGI-related thinking").

Comment by rohinmshah on Debrief: "cash prizes for the best arguments against psychedelics" · 2019-07-15T17:11:25.099Z · score: 5 (4 votes) · EA · GW
From my current read, psychedelics have a stronger evidence base than rationality training programs

I agree if for CFAR you are looking at the metric of how rational their alumni are. If you instead look at CFAR as a funnel for people working on AI risk, the "evidence base" seems clearer. (Similarly to how we can be quite confident that 80K is having an impact, despite there not being any RCTs of 80K's "intervention".)

Comment by rohinmshah on Charity Vouchers [public policy idea] · 2019-07-11T04:51:05.090Z · score: 2 (2 votes) · EA · GW

Sorry, I'm claiming government is supposed to spend money to achieve outcomes the public wants. (That felt self-evident to me, but maybe you disagree with it?) Given that, it's weird to say that it is better to give the money to the public than to let the government spend it.

I think the claim "philanthropic spending can do more good than typical government spending" usually works because we agree with the philanthropist's values more so than "government's values". But I wouldn't expect that "public's values" would be better than "government's values", and I do expect that "government's competence" would be better than "public's competence".

Comment by rohinmshah on Charity Vouchers [public policy idea] · 2019-07-10T19:19:18.493Z · score: 2 (2 votes) · EA · GW

Not necessarily disagreeing, but I wanted to point out that this relies on a perhaps-controversial claim:

Claim: Even though government is supposed to spend money to achieve outcomes the public wants, it is better to give the money to the public so that they can achieve outcomes that they want.

Comment by rohinmshah on Please May I Have Reading Suggestions on Consistency in Ethical Frameworks · 2019-07-08T17:24:22.521Z · score: 1 (1 votes) · EA · GW

To me, the most relevant of these impossibility theorems is the Arrhenius paradox (relevant to population ethics). Unfortunately, I don't know of any good public explanation of it.

Comment by rohinmshah on Not getting carried away with reducing extinction risk? · 2019-06-04T20:37:11.612Z · score: 3 (2 votes) · EA · GW

Even with the astronomical waste argument, which is the most extreme version of this argument, at some point you have astronomical numbers of people living, and the rest of the future isn't tremendously large in comparison, and so focusing on flourishing at that point makes more sense. Of course, this would be quite far in the future.

In practice, I expect the bar comes well before that point, because if everyone is focusing on x-risks, it will become harder and harder to reduce x-risks further, while staying equally as easy to focus on flourishing.

Note that in practice many more people in the world focus on flourishing than on x-risks, so maybe the few long-term focused people might end up always prioritizing x-risks because everyone else picks the low-hanging fruit in flourishing. But that's different from saying "it's never important to work on animal suffering", it's saying "someone else will fix animal suffering, and so I should do the other important thing of reducing x-risk".