Comment by tobias_baumann on Effective animal advocacy movement building: a neglected opportunity? · 2019-06-12T09:42:26.181Z · score: 4 (4 votes) · EA · GW

Great post – thanks for writing this up!

How can we influence the long-term future?

2019-03-06T15:31:43.683Z · score: 9 (11 votes)
Comment by tobias_baumann on Risk factors for s-risks · 2019-02-17T21:18:51.280Z · score: 1 (1 votes) · EA · GW

Thank you – great to hear that you've found it useful!

Risk factors for s-risks

2019-02-13T17:51:37.632Z · score: 31 (12 votes)
Comment by tobias_baumann on Why I expect successful (narrow) alignment · 2019-01-02T12:31:36.379Z · score: 1 (1 votes) · EA · GW

Thanks for the detailed comments!

(Also, BTW, I would have preferred the word "narrow" or something like it in the post title, because some people use "alignment" in a broad sense and as a result may misinterpret you as being more optimistic than you actually are.)

Good point – changed the title.

Also, distributed emergence of AI is likely not safer than centralized AI, because an "economy" of AIs would be even harder to control and harness towards human values than a single or small number of AI agents.

As long as we consider only narrow alignment, it does seem safer to me in that local misalignment or safety issues in individual systems would not immediately cause everything to break down, because such a system would (arguably) not be able to obtain a decisive strategic advantage and take over the world. So there'd be time to react.

But I agree with you that an economy-like scenario entails other safety issues, and aligning the entire "economy" with human (compromise) values might be very difficult. So I don't think this is safer overall, or at least it's not obvious. (From my suffering-focused perspective, distributed emergence of AI actually seems worse than a scenario of the form "a single system quickly takes over and forms a singleton", as the latter seems less likely to lead to conflict-related disvalue.)

This assumes that alignment work is highly parallelizable. If it's not, then doing more alignment work now can shift the whole alignment timeline forward, instead of just adding to the total amount of alignment work in a marginal way.

Yeah, I do think that alignment work is fairly parallelizable, and future work also has a (potentially very big) information advantage over current work because they will know more about what AI techniques look like. Is there any precedent of a new technology where work on safety issues was highly serial and where it was therefore crucial to start working on safety a long time in advance?

This only applies to short-term "alignment" and not to long-term / scalable alignment. That is, I have an economic incentive to build an AI that I can harness to give me short-term profits, even if that's at the expense of the long term value of the universe to humanity or human values. This could be done for example by creating an AI that is not at all aligned with my values and just giving it rewards/punishments so that it has a near-term instrumental reason to help me (similar to how other humans are useful to us even if they are not value aligned to us).

I think there are two different cases:

  • If the human actually cares only about short-term selfish gain, possibly at the expense of others, then this isn't a narrow alignment failure, it's a cooperation problem. (But I agree that it could be a serious issue).
  • If the human actually cares about the long term, then it appears that she's making a mistake by buying an AI system that is only aligned in the short term. So it comes down to human inadequacy – given sufficient information she'd buy a long-term aligned AI system instead, and AI companies would have incentive to provide long-term aligned AI systems. Though of course the "sufficient information" part is crucial, and is a fairly strong assumption as it may be hard to distinguish between "short-term alignment" and "real" alignment. I agree that this is another potentially serious problem.
I think we ourselves don't know how to reliably distinguish between "attempts to manipulate" and "attempts to help" so it would be hard to AIs to learn this. One problem is, our own manipulate/help classifier was trained on a narrow set of inputs (i.e., of other humans manipulating/helping) and will likely fail when applied to AIs due to distributional shift.

Interesting point. I think I still have an intuition that there's a fairly simple core to it, but I'm not sure how to best articulate this intuition.

Comment by tobias_baumann on Why I expect successful (narrow) alignment · 2018-12-30T12:14:02.976Z · score: 2 (2 votes) · EA · GW

Working on these problems makes a lot of sense, and I'm not saying that the philosophical issues around what "human values" means will likely be solved by default.

I think increasing philosophical sophistication (or "moral uncertainty expansion") is a very good idea from many perspectives. (A direct comparison to moral circle expansion would also need to take relative tractability and importance into account, which seems unclear to me.)

Comment by tobias_baumann on Why I expect successful (narrow) alignment · 2018-12-30T11:51:46.672Z · score: 3 (3 votes) · EA · GW

Great point – I agree that it would be value to have a common scale.

I'm a bit surprised by the 1-10% estimate. This seems very low, especially given that "serious catastrophe caused by machine intelligence" is broader than narrow alignment failure. If we include possibilities like serious value drift as new technologies emerge, or difficult AI-related cooperation and security problems, or economic dynamics riding roughshod over human values, then I'd put much more than 10% (plausibly more than 50%) on something not going well.

Regarding the "other thoughtful people" in my 80% estimate: I think it's very unclear who exactly one should update towards. What I had in mind is that many EAs who have thought about this appear to not have high confidence in successful narrow alignment (not clear if the median is >50%?), judging based on my impressions from interacting with people (which is obviously not representative). I felt that my opinion is quite contrarian relative to this, which is why I felt that I should be less confident than the inside view suggests, although as you say it's quite hard to grasp what people's opinions actually are.

On the other hand, one possible interpretation (but not the only one) of the relatively low level of concern for AI risk among the larger AI community and societal elites is that people are quite optimistic that "we'll know how to cross that bridge once we get to it".

Why I expect successful (narrow) alignment

2018-12-29T15:46:04.947Z · score: 18 (17 votes)

A typology of s-risks

2018-12-21T18:23:05.249Z · score: 15 (11 votes)

Thoughts on short timelines

2018-10-23T15:59:41.415Z · score: 22 (24 votes)
Comment by tobias_baumann on Problems with EA representativeness and how to solve it · 2018-08-06T08:53:16.535Z · score: 7 (9 votes) · EA · GW

Agreed. As someone who prioritises s-risk reduction, I find it odd that long-termism is sometimes considered equivalent to x-risk reduction. It is legitimate if people think that x-risk reduction is the best way to improve the long-term, but it should be made clear that this is based on additional beliefs about ethics (rejecting suffering-focused views and not being very concerned about value drift), about how likely x-risks in this century are, and about how tractable it is to reduce them, relative to other ways of improving the long-term. I for one think that none of these points is obvious.

So I feel that there is a representativeness problem between x-risk reduction and other ways of improving the long-term future (not necessarily only s-risk reduction), in addition to an underrepresentation of near-term causes.

Comment by tobias_baumann on Multiverse-wide cooperation in a nutshell · 2017-11-02T16:25:27.427Z · score: 3 (3 votes) · EA · GW

Thanks for writing this up!

I think the idea is intriguing, and I agree that this is possible in principle, but I'm not convinced of your take on its practical implications. Apart from heuristic reasons to be sceptical of a new idea on this level of abstractness and speculativeness, my main objection is that a high degree of similarity with respect to reasoning (which is required for the decisions to be entangled) probably goes along with at least some degree of similarity with respect to values. (And if the values of the agents that correlate with me are similar to mine, then the result of taking them into account is also closer to my own values than the compromise value system of all agents.)

You write:

Superrationality only motivates cooperation if one has good reason to believe that another party’s decision algorithm is indeed extremely similar to one’s own. Human reasoning processes differ in many ways, and sympathy towards superrationality represents only one small dimension of one’s reasoning process. It may very well be extremely rare that two people’s reasoning is sufficiently similar that, having common knowledge of this similarity, they should rationally cooperate in a prisoner’s dilemma.

Conditional on this extremely high degree of similarity to me, isn't it also more likely that their values are also similar to mine? For instance, if my reasoning is shaped by the experiences I've made, my genetic makeup, or the set of all ideas I've read about over the course of my life, then an agent with identical or highly similar reasoning would also share a lot of these characteristics. But of course, my experiences, genes, etc. also determine my values, so similarity with respect to these factors implies similarity with respect to values.

This is not the same as claiming that a given characteristic X that's relevant to decision-making is generally linked to values, in the sense that people with X have systematically different values. It's a subtle difference: I'm not saying that certain aspects of reasoning generally go along with certain values across the entire population; I'm saying that a high degree of similarity regarding reasoning goes along with similarity regarding values.

S-risk FAQ

2017-09-18T08:05:39.850Z · score: 13 (15 votes)
Comment by tobias_baumann on An Argument for Why the Future May Be Good · 2017-07-20T08:40:43.714Z · score: 12 (18 votes) · EA · GW

Thanks for writing this up! I agree that this is a relevant argument, even though many steps of the argument are (as you say yourself) not airtight. For example, consciousness or suffering may be related to learning, in which case point 3) is much less clear.

Also, the future may contain vastly larger populations (e.g. because of space colonization), which, all else being equal, may imply (vastly) more suffering. Even if your argument is valid and the fraction of suffering decreases, it's not clear whether the absolute amount will be higher or lower (as you claim in 7.).

Finally, I would argue we should focus on the bad scenarios anyway – given sufficient uncertainty – because there's not much to do if the future will "automatically" be good. If s-risks are likely, my actions matter much more.

(This is from a suffering-focused perspective. Other value systems may arrive at different conclusions.)

Comment by tobias_baumann on My current thoughts on MIRI's "highly reliable agent design" work · 2017-07-08T08:31:50.583Z · score: 0 (0 votes) · EA · GW

Do you mean more promising than other technical safety research (e.g. concrete problems, Paul's directions, MIRI's non-HRAD research)?

Yeah, and also (differentially) more promising than AI strategy or AI policy work. But I'm not sure how strong the effect is.

If so, I'd be interested in hearing why you think hard / unexpected takeoff differentially favors HRAD.

In a hard / unexpected takeoff scenario, it's more plausible that we need to get everything more or less exactly right to ensure alignment, and that we have only one shot at it. This might favor HRAD because a less principled approach makes it comparatively unlikely that we get all the fundamentals right when we build the first advanced AI system.

In contrast, if we think there's no such discontinuity and AI development will be gradual, then AI control may be at least somewhat more similar (but surely not entirely comparable) to how we "align" contemporary software systems. That is, it would be more plausible that we could test advanced AI systems extensively without risking catastrophic failure or that we could iteratively try a variety of safety approaches to see what works best.

It would also be more likely that we'd get warning signs of potential failure modes, so that it's comparatively more viable to work on concrete problems whenever they arise, or to focus on making the solutions to such problems scalable – which, to my understanding, is a key component of Paul's approach. In this picture, successful alignment without understanding the theoretical fundamentals is more likely, which makes non-HRAD approaches more promising.

My personal view is that I find a hard and unexpected takeoff unlikely, and accordingly favor other approaches than HRAD, but of course I can't justify high confidence in this given expert disagreement. Similarly, I'm not highly confident that the above distinction is actually meaningful.

I'd be interested in hearing your thoughts on this!

Comment by tobias_baumann on My current thoughts on MIRI's "highly reliable agent design" work · 2017-07-07T14:49:05.074Z · score: 1 (1 votes) · EA · GW

Great post! I agree with your overall assessment that other approaches may be more promising than HRAD.

I'd like to add that this may (in part) depend on our outlook on which AI scenarios are likely. Conditional on MIRI's view that a hard or unexpected takeoff is likely, HRAD may be more promising (though it's still unclear). If the takeoff is soft or AI will be more like the economy, then I personally think HRAD is unlikely to be the best way to shape advanced AI.

(I wrote a related piece on strategic implications of AI scenarios.)

Strategic implications of AI scenarios

2017-06-29T07:31:27.891Z · score: 6 (6 votes)
Comment by tobias_baumann on The asymmetry and the far future · 2017-03-10T10:00:15.600Z · score: 11 (11 votes) · EA · GW

Thanks for your post! I agree that work on preventing risks of future suffering is highly valuable.

It’s tempting to say that it implies that the expected value of a miniscule increase in existential risk to all sentient life is astronomical.

Even if the future is negative according to your values, there are strong reasons not to increase existential risk. This would be extremely uncooperative towards other value systems, and there are many good reasons to be nice to other value systems. It is better to pull the rope sideways by working to improve the future (i.e. reducing risks of astronomical suffering) conditional on there being a future.

In addition, I think it makes sense for utilitarians to adopt a quasi-deontological rule against using violence, regardless of whether one is a classical utilitarian or suffering-focused. This obviously prohibits something like increasing risks of extinction.

Comment by tobias_baumann on Students for High Impact Charity: Review and $10K Grant · 2016-10-17T00:35:42.334Z · score: 3 (3 votes) · EA · GW

Thanks a lot, Peter, for taking the time to evaluate SHIC! I agree that their work seems to be very promising.

In particular, it seems that students and future leaders are one of the most important target groups of effective altruism.

Comment by tobias_baumann on The map of organizations, sites and people involved in x-risks prevention · 2016-10-17T00:31:09.147Z · score: 3 (3 votes) · EA · GW

A minor detail: It's a bit inaccurate to say that the Foundational Research Institute works on general x-risks. This text explains that FRI focuses on reducing risks of astronomical suffering, which is related to, but not the same, as x-risk reduction.

Comment by tobias_baumann on The map of organizations, sites and people involved in x-risks prevention · 2016-10-17T00:27:52.363Z · score: 1 (1 votes) · EA · GW

Thanks for this great map!