Comment by wei_dai on How should large donors coordinate with small donors? · 2019-01-10T05:07:00.922Z · score: 1 (1 votes) · EA · GW

by the way, I think your link is broken

Thanks, looks like that's actually caused by a bug in EA Forum. I'll do a workaround and notify the admins. And thanks for the Holden quote, which I'll reply to below:

but the dollars spent by others would’ve done less good (and we think we have a good sense of the counterfactual for most of those dollars).

A possible solution here is to let people donate to a fund controlled by OpenPhil or GiveWell, and then they can coordinate amongst themselves to maximize good done per dollar.

We guessed that the latter outweighed the former.

Given the huge amounts of money/value involved here, I think detailed analysis, empirical investigations, and creative solutions are called for, not just a guess.

(test link for the admins, please ignore)

Comment by wei_dai on List of possible EA meta-charities and projects · 2019-01-09T20:12:50.459Z · score: 17 (8 votes) · EA · GW

In a recent post, I suggested that the current Good Ventures "splitting" policy may be making it much harder than it perhaps should to start up and get funding for new effective charities, because the top charities that OpenPhil / Good Ventures have identified but not fully funded are unnecessarily sucking up most of the donations from individual donors who might otherwise fund these riskier new opportunities, and the policy also perhaps incorrectly encourages people to contribute money instead of time to EA causes. I think it might be really high leverage for someone to investigate that question so that the policy could be changed if it in fact is suboptimal. (To be clear I don't think that person should be me because my comparative advantage probably lies elsewhere.)

How should large donors coordinate with small donors?

2019-01-08T22:47:56.661Z · score: 53 (25 votes)
Comment by wei_dai on GiveWell and the problem of partial funding · 2019-01-06T22:50:15.668Z · score: 1 (1 votes) · EA · GW

if A and B both like X (and have the same desired funding level for it), but have different second choices of Y and Z, the fully cooperative solution would not involve either A or B funding X alone.

I'm not sure this is right. What if A and B both commit to fully funding their top charities, as soon as they find such opportunities (i.e., without taking other people's reactions into consideration)? That seems like a fully cooperative solution that on expectation would work as well as A and B trying to negotiate a "fair division" of funding for X. Also, I'm not sure this analogy applies to the situation where A is a single big donor and B is a bunch of small donors, since in that case A and B can't actually negotiate so A unilaterally deciding on a split would seem to lead to some deadweight loss (e.g., missed funding opportunities).

BTW, are you aware of a fully thought-out analysis of Good Venture's "splitting" policy (whether such a policy is a good idea, and what the optimal split is)? For such an important question, I'm surprised how little apparent deliberation and empirical investigation has been done on it. Even if the value of information here is just 1% of the total funding, that would amount to about $100,000,000. (Not to mention that the analysis could be applied to other analogous situations with large and small donors.)

Comment by wei_dai on Altruistic Motivations · 2019-01-05T08:28:32.911Z · score: 2 (2 votes) · EA · GW

Now I feel bad for naming one of the sections of a recent post "AI design as opportunity and obligation to address human safety problems". I wonder what the Nate-approved way of saying that would be. :)

Comment by wei_dai on Why I expect successful (narrow) alignment · 2019-01-02T00:43:18.776Z · score: 19 (5 votes) · EA · GW

In this comment I engage with many of the object-level arguments in the post. I upvoted this post because I think it's useful to write down these arguments, but we should also consider the counterarguments.

(Also, BTW, I would have preferred the word "narrow" or something like it in the post title, because some people use "alignment" in a broad sense and as a result may misinterpret you as being more optimistic than you actually are.)

If the emergence of AI is gradual or distributed, then it is more plausible that safety issues can adequately be handled “as usual”, by reacting to issues as they arise, by extensive testing and engineering, and by incrementally designing systems to satisfy multiple constraints.

If the emergence of AI is gradual enough, it does seem that safety issues can be handled adequately, but even many people who think "soft takeoff" is likely don't seem to think that AI will come that slowly. To the extent that AI does emerge that slowly, that seems to cut across many other AI-related problem areas including ones mentioned in the Summary as alternatives to narrow alignment.

Also, distributed emergence of AI is likely not safer than centralized AI, because an "economy" of AIs would be even harder to control and harness towards human values than a single or small number of AI agents. An argument can be made that AI alignment work is valuable in part so that unified AI agents can be safely built, thereby heading off such a less controllable AI economy.

So it does not seem like "distributed" by itself buys any safety. I think our intuition that it does probably comes from a sense that "distributed" is correlated with "gradual". If you consider a fast and distributed rise of AI, does that really seem safer than a fast and centralized rise of AI?

While alignment looks neglected now, we should also take into account that huge amounts of resources will likely be invested if it becomes apparent that this is a serious problem (see also here).

This assumes that alignment work is highly parallelizable. If it's not, then doing more alignment work now can shift the whole alignment timeline forward, instead of just adding to the total amount of alignment work in a marginal way.

Strong economic incentives will push towards alignment: it’s not economically useful to have a powerful AI system that doesn’t reliably do what you want.

This only applies to short-term "alignment" and not to long-term / scalable alignment. That is, I have an economic incentive to build an AI that I can harness to give me short-term profits, even if that's at the expense of the long term value of the universe to humanity or human values. This could be done for example by creating an AI that is not at all aligned with my values and just giving it rewards/punishments so that it has a near-term instrumental reason to help me (similar to how other humans are useful to us even if they are not value aligned to us).

Existing approaches hold some promise

I have an issue with "approaches" (plural) here because as far as I can tell, everyone is converging to Paul Christiano's iterated amplification approach (except for MIRI which is doing more theoretical research). ETA: To be fair, perhaps iterated amplification should be viewed as a cluster of related approaches.

But the crux is that the notion of human values doesn’t need to be perfect to understand that humans do not approve of lock-ins, that humans would not approve of attempts to manipulate them, and so on.

I think we ourselves don't know how to reliably distinguish between "attempts to manipulate" and "attempts to help" so it would be hard to AIs to learn this. One problem is, our own manipulate/help classifier was trained on a narrow set of inputs (i.e., of other humans manipulating/helping) and will likely fail when applied to AIs due to distributional shift.

Again, it’s not that hard to understand what it means to be a helpful assistant to somebody.

Same problem here, our own understanding of what it means to be a helpful assistant to somebody likely isn't robust to distributional shifts. I think this means we actually need to gain a broad/theoretical understanding of "corrigibility" or "helping" instead of being able to have AIs just learn it from humans.

Comment by wei_dai on Why I expect successful (narrow) alignment · 2018-12-31T22:52:02.083Z · score: 3 (3 votes) · EA · GW

I’m a bit surprised by the 1-10% estimate. This seems very low, especially given that “serious catastrophe caused by machine intelligence” is broader than narrow alignment failure.

Yeah, it's also much lower than my inside view, as well as what I thought a group of such interviewees would say. Aside from Lukas's explanation, I think maybe 1) the interviewees did not want to appear too alarmist (either personally or for EA as a whole) or 2) they weren't reporting their inside views but instead giving their estimates after updating towards others who have much lower risk estimates. Hopefully Robert Wiblin will see my email at some point and chime in with details of how the 1-10% figure was arrived at.

Comment by wei_dai on Why I expect successful (narrow) alignment · 2018-12-30T04:38:41.340Z · score: 5 (7 votes) · EA · GW

Good point, I'll send a message to Robert Wiblin asking for clarification.

Comment by wei_dai on Why I expect successful (narrow) alignment · 2018-12-30T03:32:37.098Z · score: 4 (4 votes) · EA · GW

What do you think about technical interventions on these problems, and "moral uncertainty expansion" as a more cooperative alternative to "moral circle expansion"?

Comment by wei_dai on Why I expect successful (narrow) alignment · 2018-12-30T03:24:35.449Z · score: 35 (16 votes) · EA · GW

I find it unfortunate that people aren't using a common scale for estimating AI risk, which makes it hard to integrate different people's estimates, or even figure out who is relatively more optimistic or pessimistic. For example here's you (Tobias):

My inside view puts ~90% probability on successful alignment (by which I mean narrow alignment as defined below). Factoring in the views of other thoughtful people, some of which think alignment is far less likely, that number comes down to ~80%.

Robert Wiblin, based on interviews with Nick Bostrom, an anonymous leading professor of computer science, Jaan Tallinn, Jan Leike, Miles Brundage, Nate Soares, Daniel Dewey:

We estimate that the risk of a serious catastrophe caused by machine intelligence within the next 100 years is between 1 and 10%.

Paul Christiano:

I think there is a >1/3 chance that AI will be solidly superhuman within 20 subjective years, and that in those scenarios alignment destroys maybe 20% of the total value of the future

It seems to me that Robert's estimate is low relative to your inside view and Paul's, since you're both talking about failures of narrow alignment ("intent alignment" in Paul's current language), while Robert's "serious catastrophe caused by machine intelligence" seems much broader. But you update towards much higher risk based on "other thoughtful people" which makes me think that either your "other thoughtful people" or Robert's interviewees are not representative, or I'm confused about who is actually more optimistic or pessimistic. Either way it seems like there's some very valuable work to be done in coming up with a standard measure of AI risk and clarifying people's actual opinions.

Beyond Astronomical Waste

2018-12-27T09:27:26.728Z · score: 22 (7 votes)
Comment by wei_dai on Which five books would you recommend to an 18 year old? · 2017-09-07T08:10:51.575Z · score: 3 (3 votes) · EA · GW

This is a bit tangential, but do you know if anyone has done an assessment of the impact of HPMoR? Cousin_it (Vladimir Slepnev) recently wrote:

The question then becomes, how do we set up a status economy that will encourage research? Peer review is one way, because publications and citations are a status badge desired by many people. Participating in a forum like LW when it's "hot" and frequented by high status folks is another way, but unfortunately we don't have that anymore. From that perspective it's easy to see why the massively popular HPMOR didn't attract many new researchers to AI risk, but attracted people to HPMOR speculation and rational fic writing. People do follow their interests sometimes, but mostly they try to find venues to show off.

Taking this one step further, it seems to me that HPMoR may have done harm by directing people's attentions (including Eliezer's own) away from doing the hard work of making philosophical and practical progress in AI alignment and rationality, towards discussion/speculation of the book and rational fic writing, thereby contributing to the decline of LW. Of course it also helped bring new people into the rationalist/EA communities. What would be a fair assessment of its net impact?

Comment by wei_dai on Ideological engineering and social control: A neglected topic in AI safety research? · 2017-09-04T23:37:52.207Z · score: 4 (4 votes) · EA · GW

I'm also worried about the related danger of AI persuasion technology being "democratically" deployed upon open societies (i.e., by anyone with an agenda, not necessarily just governments and big corporations), with the possible effect that in the words of Paul Christiano, "we’ll live to see a world where it’s considered dicey for your browser to uncritically display sentences written by an untrusted party." This is arguably already true today for those especially vulnerable to conspiracy theories, but eventually will affect more and more people as the technology improves. How will we solve our collective problems when the safety of discussions are degraded to such an extent?

Comment by wei_dai on Why I think the Foundational Research Institute should rethink its approach · 2017-07-22T10:06:39.448Z · score: 9 (9 votes) · EA · GW

The one view that seems unusually prevalent within FRI, apart from people self-identifying with suffering-focused values, is a particular anti-realist perspective on morality and moral reasoning where valuing open-ended moral reflection is not always regarded as the by default "prudent" thing to do.

Thanks for pointing this out. I've noticed this myself in some of FRI's writings, and I'd say this, along with the high amount of certainty on various object-level philosophical questions that presumably cause the disvaluing of reflection about them, are what most "turns me off" about FRI. I worry a lot about potential failures of goal preservation (i.e., value drift) too, but because I'm highly uncertain about just about every meta-ethical and normative question, I see no choice but to try to design some sort of reflection procedure that I can trust enough to hand off control to. In other words, I have nothing I'd want to "lock in" at this point and since I'm by default constantly handing off control to my future self with few safeguards against value drift, doing something better than that default is one of my highest priorities. If other people are also uncertain and place high value on (safe/correct) reflection as a result, that helps with my goal (because we can then pool resources together to work out what safe/correct reflection is), so it's regrettable to see FRI people sometimes argue for more certainty than I think is warranted and especially to see them argue against reflection.

Comment by wei_dai on Why I think the Foundational Research Institute should rethink its approach · 2017-07-21T21:07:49.467Z · score: 4 (6 votes) · EA · GW

I'm a bit surprised to find that Brian Tomasik attributes his current views on consciousness to his conversations with Carl Shulman, since in my experience Carl is a very careful thinker and the case for accepting anti-realism as the answer to the problem of consciousness seems pretty weak, at least as explained by Brian. I'm very curious to read Carl's own explanation of his views, if he has written one down. I scanned Carl Shulman's list of writings but was unable to find anything that addressed this.

Comment by wei_dai on Why I think the Foundational Research Institute should rethink its approach · 2017-07-21T15:39:04.734Z · score: 7 (7 votes) · EA · GW

What would you say are the philosophical or other premises that FRI does accept (or tends to assume in its work), which distinguishes it from other people/organizations working in a similar space such as MIRI, OpenAI, and QRI? Is it just something like "preventing suffering is the most important thing to work on (and the disjunction of assumptions that can lead to this conclusion)"?

It seems to me that a belief in anti-realism about consciousness explains a lot of Brian's (near) certainty about his values and hence his focus on suffering. People who are not so sure about consciousness anti-realism tend to be less certain about their values as a result, and hence don't focus on suffering as much. Does this seem right, and if so, can you explain what premises led you to work for FRI?

Comment by wei_dai on An Argument for Why the Future May Be Good · 2017-07-20T19:50:49.075Z · score: 20 (17 votes) · EA · GW

What lazy solutions will look like seems unpredictable to me. Suppose someone in the future wants to realistically roleplay a historical or fantasy character. The lazy solution might be to simulate a game world with conscious NPCs. The universe contains so much potential for computing power (which presumably can be turned into conscious experiences), that even if a very small fraction of people do this (or other things whose lazy solutions happen to involve suffering), that could create an astronomical amount of suffering.

Comment by wei_dai on My current thoughts on MIRI's "highly reliable agent design" work · 2017-07-15T15:41:19.591Z · score: 2 (2 votes) · EA · GW

Daniel, while re-reading one of Paul's posts from March 2016, I just noticed the following:

[ETA: By the end of 2016 this problem no longer seems like the most serious.] ... [ETA: while robust learning remains a traditional AI challenge, it is not at all clear that it is possible. And meta-execution actually seems like the ingredient furthest from existing ML practice, as well as having non-obvious feasibility.]

My interpretation of this is that between March 2016 and the end of 2016, Paul updated the difficulty of his approach upwards. (I think given the context, he means that other problems, namely robust learning and meta-execution, are harder, not that informed oversight has become easier.) I wanted to point this out to make sure you updated on his update. Clearly Paul still thinks his approach is more promising than HRAD, but perhaps not by as much as before.

Comment by wei_dai on My current thoughts on MIRI's "highly reliable agent design" work · 2017-07-13T17:06:54.027Z · score: 6 (6 votes) · EA · GW

I can't really think of anyone more familiar with MIRI's work than Paul who isn't already at MIRI (note that Paul started out pursuing MIRI's approach and shifted in an ML direction over time).

The Agent Foundations Forum would have been a good place to look for more people familiar with MIRI's work. Aside from Paul, I see Stuart Armstrong, Abram Demski, Vadim Kosoy, Tsvi Benson-Tilsen, Sam Eisenstat, Vladimir Slepnev, Janos Kramar, Alex Mennen, and many others. (Abram, Tsvi, and Sam have since joined MIRI, but weren't employees of it at the time of the Open Phil grant.)

That being said, I agree that the public write-up on the OpenAI grant doesn't reflect that well on OpenPhil, and it seems correct for people like you to demand better moving forward

I had previously seen some complaints about the way the OpenAI grant was made, but until your comment, hadn't thought of a possible group blind spot due to a common ML perspective. If you have any further insights on this and related issues (like why you're critical of deep learning but still think the grant to OpenAI was a pretty good idea, what are your objections to Paul's AI alignment approach, how could Open Phil have done better), would you please write them down somewhere?

Comment by wei_dai on My current thoughts on MIRI's "highly reliable agent design" work · 2017-07-13T11:37:14.017Z · score: 7 (6 votes) · EA · GW

That actually didn't cross my mind before, so thanks for pointing it out. After reading your comment, I decided to look into Open Phil's recent grants to MIRI and OpenAI, and noticed that of the 4 technical advisors Open Phil used for the MIRI grant investigation (Paul Christiano, Jacob Steinhardt, Christopher Olah, and Dario Amodei), all either have a ML background or currently advocate a ML-based approach to AI alignment. For the OpenAI grant however, Open Phil didn't seem to have similarly engaged technical advisors who might be predisposed to be critical of the potential grantee (e.g., HRAD researchers), and in fact two of the Open Phil technical advisors are also employees of OpenAI (Paul Christiano and Dario Amodei). I have to say this doesn't look very good for Open Phil in terms of making an effort to avoid potential blind spots and bias.

Comment by wei_dai on My current thoughts on MIRI's "highly reliable agent design" work · 2017-07-11T17:45:15.332Z · score: 3 (3 votes) · EA · GW

I can talk in more detail about the reduction from (capability amplification --> agent foundations) if it's not clear whether it is possible and it would have an effect on your view.

Yeah, this is still not clear. Suppose we had a solution to agent foundations, I don't see how that necessarily helps me figure out what to do as H in capability amplification. For example the agent foundations solution could say, use (some approximation of) exhaustive search in the following way, with your utility function as the objective function, but that doesn't help me because I don't have a utility function.

When comparing difficulty of two approaches you should presumably compare the difficulty of achieving a fixed goal with one approach or the other.

My point was that HRAD potentially enables the strategy of pushing mainstream AI research away from opaque designs (which are hard to compete with while maintaining alignment, because you don't understand how they work and you can't just blindly copy the computation that they do without risking safety), whereas in your approach you always have to worry about "how do I compete with with an AI that doesn't have an overseer or has an overseer who doesn't care about safety and just lets the AI use whatever opaque and potentially dangerous technique it wants".

On the agent foundations side, it seems like plausible approaches involve figuring out how to peer inside the previously-opaque hypotheses, or understanding what characteristic of hypotheses can lead to catastrophic generalization failures and then excluding those from induction.

Oh I see. In my mind the problems with Solomonoff Induction means that it's probably not the right way to define how induction should be done as an ideal, so we should look for something kind of like Solomonoff Induction but better, not try to patch it by doing additional things on top of it. (Like instead of trying to figure out exactly when CDT would make wrong decisions and add more complexity on top of it to handle those cases, replace it with UDT.)

Comment by wei_dai on My current thoughts on MIRI's "highly reliable agent design" work · 2017-07-11T08:46:40.560Z · score: 8 (8 votes) · EA · GW

Shouldn't this cut both ways? Paul has also spent far fewer words justifying his approach to others, compared to MIRI.

The fact that Paul hasn't had a chance to hear from many of his (would-be) critics and answer them means we don't have a lot of information about how promising his approach is, hence my "too early to call it more promising than HRAD" conclusion.

I actually do have some objections to it, but I feel it is likely to be significantly useful even if (as I, obviously, expect) my objections end up having teeth.

Have you written down these objections somewhere? My worry is basically that different people looked at Paul's approach and each thought of a different set of objections, and they think, "that's not so bad", without knowing that there's actually a whole bunch of other objections out there, including additional ones that people would find if they thought and talked about Paul's ideas more.

Comment by wei_dai on My current thoughts on MIRI's "highly reliable agent design" work · 2017-07-11T08:42:59.253Z · score: 5 (5 votes) · EA · GW

And as far as I can tell, the most promising approaches to this problem apply both to MIRI's version and the mainstream ML version.

I'm not sure which approaches you're referring to. Can you link to some details on this?

Capability amplification requires breaking cognitive work down into smaller steps. MIRI's approach also requires such a breakdown. Capability amplification is easier in a simple formal sense (that if you solve the agent foundations you will definitely solve capability amplification, but not the other way around).

I don't understand how this is true. I can see how solving FAI implies solving capability amplification (just emulate the FAI at a low level *), but if all you had was a solution that allows a specific kind of agent (e.g., with values well-defined apart from its implementation details) keep those values as it self-modifies, how does that help a group of short-lived humans who don't know their own values break down an arbitrary cognitive task and perform it safely and as well as an arbitrary competitor?

(* Actually, even this isn't really true. In MIRI's approach, an FAI does not need to be competitive in performance with every AI design in every domain. I think the idea is to either convert mainstream AI research into using the same FAI design, or gain a decisive strategic advantage via superiority in some set of particularly important domains.)

My understanding is, MIRI's approach is to figure out how to safely increase capability by designing a base agent that can make safe use of arbitrary amounts of computing power and can safely improve itself by modifying its own design/code. The capability amplification approach is to figure out how to safely increase capability by taking a short-lived human as the given base agent, making copies of it and and organize how the copies work together. These seem like very different problems with their own difficulties.

I think CEV has avoided those criticisms not because it solves the problem, but because it is sufficiently vague that it's hard to criticize along these lines (and there are sufficiently many other problems that this one isn't even at the top of the list).

I agree that in this area MIRI's approach and yours face similar difficulties. People (including me) have criticized CEV for being vague and likely very difficult to define/implement though, so MIRI is not exactly getting a free pass by being vague. (I.e., I assume Daniel already took this into account.)

But I'm not sure there are fewer such problems than for the MIRI agenda, since I think that being closer to concreteness may more than outweigh the smaller amount of discussion.

This seems like a fair point, and I'm not sure how to weight these factors either. Given that discussion isn't particularly costly relative to the potential benefits, an obvious solution is just to encourage more of it. Someone ought to hold a workshop to talk about your ideas, for example.

I think it would also be a good reason to focus on the difficulties that are common to both approaches

This makes sense.

Comment by wei_dai on My current thoughts on MIRI's "highly reliable agent design" work · 2017-07-11T08:42:52.523Z · score: 1 (3 votes) · EA · GW

the gap seems much smaller to me when it comes to the justification for thinking HRAD is promising vs justification for Paul's approach being promising

This seems wrong to me. For example, in the "learning to reason from human" approaches, the goal isn't just to learn to reason from humans, but to do it in a way that maintains competitiveness with unaligned AIs. Suppose a human overseer disapproves of their AI using some set of potentially dangerous techniques, how can we then ensure that the resulting AI is still competitive? Once someone points this out, proponents of the approach, to continue thinking their approach is promising, would need to give some details about how they intend to solve this problem. Subsequently, justification for thinking the approach is promising is more subtle and harder to understand. I think conversations like this have occurred for MIRI's approach far more than Paul's, which may be a large part of why you find Paul's justifications easier to understand.

Comment by wei_dai on My current thoughts on MIRI's "highly reliable agent design" work · 2017-07-09T08:53:55.155Z · score: 18 (20 votes) · EA · GW

3c. Other research, especially "learning to reason from humans," looks more promising than HRAD (75%?)

From the perspective of an observer who can only judge from what's published online, I'm worried that Paul's approach only looks more promising than MIRI's because it's less "mature", having received less scrutiny and criticism from others. I'm not sure what's happening internally in various research groups, but the amount of online discussion about Paul's approach has to be at least an order of magnitude less than what MIRI's approach has received.

(Looking at the thread cited by Rob Bensinger, various people including MIRI people have apparently looked into Paul's approach but have not written down their criticisms. I've been trying to better understand Paul's ideas myself and point out some difficulties that others may have overlooked, but this is hampered by the fact that Paul seems to be the only person who is working on the approach and can participate on the other side of the discussion.)

I think Paul's approach is certainly one of the most promising approaches we currently have, and I wish people paid more attention to it (and/or wrote down their thoughts about it more), but it seems much too early to cite it as an example of an approach that is more promising than HRAD and therefore makes MIRI's work less valuable.

Comment by wei_dai on I am Nate Soares, AMA! · 2015-06-12T00:51:18.769Z · score: 6 (6 votes) · EA · GW

It seems easy to imagine scenarios where MIRI's work is either irrelevant (e.g., mainstream AI research keeps going in a neuromorphic or heuristic trial-and-error direction and eventually "succeeds" that way) or actively harmful (e.g., publishes ideas that eventually help others to build UFAIs). I don't know how to tell whether MIRI's current strategy overall has positive expected impact. What's your approach to this problem?