Altruistic Motivations

2019-01-04T20:38:24.711Z · score: 30 (18 votes)
Comment by so8res on My current thoughts on MIRI's "highly reliable agent design" work · 2017-07-10T22:46:05.973Z · score: 5 (7 votes) · EA · GW

Can you give an example or two of failure modes or "categories of failure modes that are easy to foresee" that you think are addressed by some HRAD topic? I'd thought previously that thinking in terms of failure modes wasn't a good way to understand HRAD research.

I want to steer clear of language that might make it sound like we’re saying:

  • X 'We can't make broad-strokes predictions about likely ways that AGI could go wrong.'

  • X 'To the extent we can make such predictions, they aren't important for informing research directions.'

  • X 'The best way to address AGI risk is just to try to advance our understanding of AGI in a general and fairly undirected way.'

The things I do want to communicate are:

  • All of MIRI's research decisions are heavily informed by a background view in which there are many important categories of predictable failure, e.g., 'the system is steering toward edges of the solution space', 'the function the system is optimizing correlates with the intended function at lower capability levels but comes uncorrelated at high capability levels', 'the system has incentives to obfuscate and mislead programmers to the extent it models its programmers’ beliefs and expects false programmer beliefs to result in it better-optimizing its objective function.’

  • The main case for HRAD problems is that we expect them to help in a gestalt way with many different known failure modes (and, plausibly, unknown ones). E.g., 'developing a basic understanding of counterfactual reasoning improves our ability to understand the first AGI systems in a general way, and if we understand AGI better it's likelier we can build systems to address deception, edge instantiation, goal instability, and a number of other problems'.

  • There usually isn't a simple relationship between a particular open problem and a particular failure mode, but if we thought there were no way to predict in advance any of the ways AGI systems can go wrong, or if we thought a very different set of failures were likely instead, we'd have different research priorities.

Comment by so8res on My current thoughts on MIRI's "highly reliable agent design" work · 2017-07-08T21:10:42.917Z · score: 26 (21 votes) · EA · GW

Thanks for this solid summary of your views, Daniel. For others’ benefit: MIRI and Open Philanthropy Project staff are in ongoing discussion about various points in this document, among other topics. Hopefully some portion of those conversations will be made public at a later date. In the meantime, a few quick public responses to some of the points above:

2) If we fundamentally "don't know what we're doing" because we don't have a satisfying description of how an AI system should reason and make decisions, then we will probably make lots of mistakes in the design of an advanced AI system.

3) Even minor mistakes in an advanced AI system's design are likely to cause catastrophic misalignment.

I think this is a decent summary of why we prioritize HRAD research. I would rephrase 3 as "There are many intuitively small mistakes one can make early in the design process that cause resultant systems to be extremely difficult to align with operators’ intentions.” I’d compare these mistakes to the “small” decision in the early 1970s to use null-terminated instead of length-prefixed strings in the C programming language, which continues to be a major source of software vulnerabilities decades later.

I’d also clarify that I expect any large software product to exhibit plenty of actually-trivial flaws, and that I don’t expect that AGI code needs to be literally bug-free or literally proven-safe in order to be worth running. Furthermore, if an AGI design has an actually-serious flaw, the likeliest consequence that I expect is not catastrophe; it’s just that the system doesn’t work. Another likely consequence is that the system is misaligned, but in an obvious ways that makes it easy for developers to recognize that deployment is a very bad idea. The end goal is to prevent global catastrophes, but if a safety-conscious AGI team asked how we’d expect their project to fail, the two likeliest scenarios we’d point to are "your team runs into a capabilities roadblock and can't achieve AGI" or "your team runs into an alignment roadblock and can easily tell that the system is currently misaligned, but can’t figure out how to achieve alignment in any reasonable amount of time."

This case does not revolve around any specific claims about specific potential failure modes, or their relationship to specific HRAD subproblems. This case revolves around the value of fundamental understanding for avoiding "unknown unknown" problems.

We worry about "unknown unknowns", but I’d probably give them less emphasis here. We often focus on categories of failure modes that we think are easy to foresee. As a rule of thumb, when we prioritize a basic research problem, it’s because we expect it to help in a general way with understanding AGI systems and make it easier to address many different failure modes (both foreseen and unforeseen), rather than because of a one-to-one correspondence between particular basic research problems and particular failure modes.

As an example, the reason we work on logical uncertainty isn’t that we’re visualizing a concrete failure that we think is highly likely to occur if developers don't understand logical uncertainty. We work on this problem because any system reasoning in a realistic way about the physical world will need to reason under both logical and empirical uncertainty, and because we expect broadly understanding how the system is reasoning about the world to be important for ensuring that the optimization processes inside the system are aligned with the intended objectives of the operators.

A big intuition behind prioritizing HRAD is that solutions to “how do we ensure the system’s cognitive work is being directed at solving the right problems, and at solving them in the desired way?” are likely to be particularly difficult to hack together from scratch late in development. An incomplete (empirical-side-only) understanding of what it means to optimize objectives in realistic environments seems like it will force designers to rely more on guesswork and trial-and-error in a lot of key design decisions.

I haven't found any instances of complete axiomatic descriptions of AI systems being used to mitigate problems in those systems (e.g. to predict, postdict, explain, or fix them) or to design those systems in a way that avoids problems they'd otherwise face.

This seems reasonable to me in general. I’d say that AIXI has had limited influence in part because it’s combining several different theoretical insights that the field was already using (e.g., complexity penalties and backtracking tree search), and the synthesis doesn’t add all that much once you know about the parts. Sections 3 and 4 of MIRI's Approach provide some clearer examples of what I have in mind by useful basic theory: Shannon, Turing, Bayes, etc.

My perspective on this is a combination of “basic theory is often necessary for knowing what the right formal tools to apply to a problem are, and for evaluating whether you're making progress toward a solution” and “the applicability of Bayes, Pearl, etc. to AI suggests that AI is the kind of problem that admits of basic theory.” An example of how this relates to HRAD is that I think that Bayesian justifications are useful in ML, and that a good formal model of rationality in the face of logical uncertainty is likely to be useful in analogous ways. When I speak of foundational understanding making it easy to design the right systems, I’m trying to point at things like the usefulness of Bayesian justifications in modern ML. (I’m unclear on whether we miscommunicated about what sort of thing I mean by “basic insights”, or whether we have a disagreement about how useful principled justifications are in modern practice when designing high-reliability systems.)

Intro to caring about AI alignment as an EA cause

2017-04-14T00:42:16.065Z · score: 24 (24 votes)
Comment by so8res on MIRI Update and Fundraising Case · 2016-10-29T19:00:23.722Z · score: 10 (10 votes) · EA · GW

Under whatever constraints Open Phil provided, I'd have sent the 'best by academic lights' papers I had.

We originally sent Nick Beckstead what we considered our four most important 2015 results, at his request; these were (1) the incompatibility of the "Inductive Coherence" framework and the "Asymptotic Convergence in Online Learning with Unbounded Delays" framework; (2) the demonstration in "Proof-Producing Reflection for HOL" that a non-pathological form of self-referential reasoning is possible in a certain class of theorem-provers; (3) the reflective oracles result presented in "A Formal Solution to the Grain of Truth Problem," "Reflective Variants of Solomonoff Induction and AIXI," and "Reflective Oracles"; (4) and Vadim Kosoy's "Optimal Predictors" work. The papers we listed under 1, 2, and 4 then got used in an external review process they probably weren't very well-suited for.

I think this was more or less just an honest miscommunication. I told Nick in advance that I only assigned an 8% probability to external reviewers thinking the “Asymptotic Convergence…” result was "good" on its own (and only a 20% probability for "Inductive Coherence"). My impression of what happened is that Open Phil staff interpreted my pushback as saying that I thought the external reviews wouldn’t carry much Bayesian evidence (but that the internal reviews still would), where what I was trying to communicate was that I thought the papers didn’t carry very much Bayesian evidence about our technical output (and that I thought the internal reviewers would need to speak to us about technical specifics in order to understand why we thought they were important). Thus, we were surprised when their grant decision and write-up put significant weight on the internal reviews of those papers (and they were surprised that we were surprised). This is obviously really unfortunate, and another good sign that I should have committed more time and care to clearly communicating my thinking from the outset.

Regarding picking better papers for external review: We only put out 10 papers directly related to our technical agendas between Jan 2015 and Mar 2016, so the option space is pretty limited, especially given the multiple constraints Open Phil wanted to meet. Optimizing for technical impressiveness and non-obviousness as a stand-alone result, I might have instead gone with Critch's bounded Löb paper and the grain of truth problem paper over the AC/IC results. We did submit the grain of truth problem paper to Open Phil, but they decided not to review it because it didn't meet other criteria they were interested in.

If MIRI is unable to convince someone like Dewey, the prospects of it making the necessary collaborations or partnerships with the wider AI community look grim.

I’m less pessimistic about building collaborations and partnerships, in part because we’re already on pretty good terms with other folks in the community, and in part because I think we have different models of how technical ideas spread. Regardless, I expect that with more and better communication, we can (upon re-evaluation) raise the probability of Open Phil staff that the work we’re doing is important.

More generally, though, I expect this task to get easier over time as we get better at communicating about our research. There's already a body of AI alignment research (and, perhaps, methodology) that requires the equivalent of multiple university courses to understand, but there aren't curricula or textbooks for teaching it. If we can convince a small pool of researchers to care about the research problems we think are important, this will let us bootstrap to the point where we have more resources for communicating information that requires a lot of background and sustained scholarship, as well as more of the institutional signals that this stuff warrants a time investment.

I can maybe make the time expenditure thus far less mysterious if I mention a couple more ways I erred in trying to communicate my model of MIRI's research agenda:

  1. My early discussion with Daniel was framed around questions like "What specific failure mode do you expect to be exhibited by advanced AI systems iff their programmers don't understand logical uncertainty?” I made the mistake of attempting to give straight/non-evasive answers to those sorts of questions and let the discussion focus on that evaluation criterion, rather than promptly saying “MIRI's research directions mostly aren't chosen to directly address a specific failure mode in a notional software system” and “I don't think that's a good heuristic for identifying research that's likely to be relevant to long-run AI safety.”
  1. I fell prey to the transparency illusion pretty hard, and that was completely my fault. Mid-way through the process, Daniel made a write-up of what he had gathered so far; this write-up revealed a large number of miscommunications and places where I thought I had transmitted a concept of mine but Daniel had come away with a very different concept. It’s clear in retrospect that we should have spent a lot more time with me having Daniel try to explain what he thought I meant, and I had all the tools to predict this in foresight; but I foolishly assumed that wouldn’t be necessary in this case.

(I plan to blog more about the details of these later.)

I think these are important mistakes that show I hadn't sufficiently clarified several concepts in my own head, or spent enough time understanding Daniel's position. My hope is that I can do a much better job of avoiding these sorts of failures in the next round of discussion, now that I have a better model of where Open Phil’s staff and advisors are coming from and what the review process looks like.

(I am correct in that Yuan previously worked for you, right?)

Yeah, though that was before my time. He did an unpaid internship with us in the summer of 2013, and we’ve occasionally contracted him to tutor MIRI staff. Qiaochu's also a lot socially closer to MIRI; he attended three of our early research workshops.

Unless and until then, I remain sceptical about MIRI's value.

I think that's a reasonable stance to take, and that there are other possible reasonable stances here too. Some of the variables I expect EAs to vary on include “level of starting confidence in MIRI's mathematical intuitions about complicated formal questions” and “general risk tolerance.” A relatively risk-intolerant donor is right to wait until we have clearer demonstrations of success; and a relatively risk-tolerant donor who starts without a very high confidence in MIRI's intuitions about formal systems might be pushed under a donation threshold by learning that an important disagreement has opened up between us and Daniel Dewey (or between us and other people at Open Phil).

Also, thanks for laying out your thinking in so much detail -- I suspect there are other people who had more or less the same reaction to Open Phil's grant write-up but haven't spoken up about it. I'd be happy to talk more about this over email, too, including answering Qs from anyone else who wants more of my thoughts on this.

Comment by so8res on MIRI Update and Fundraising Case · 2016-10-27T16:26:22.002Z · score: 4 (4 votes) · EA · GW

Thanks for the response, Gregory. I was hoping to see more questions along these lines in the AMA, so I'm glad you followed up.

Open Phil's grant write-up is definitely quite critical, and not an endorsement. One of Open Phil's main criticisms of MIRI is that they don't think our agent foundations agenda is likely to be useful for AI alignment; but their reasoning behind this is complicated, and neither Open Phil nor MIRI has had time yet to write up our thoughts in any detail. I suggest pinging me to say more about this once MIRI and Open Phil have put up more write-ups on this topic, since the hope is that the write-ups will also help third parties better evaluate our research methods on their merits.

I think Open Phil's assessment that the papers they reviewed were ‘technically unimpressive’ is mainly based on the papers "Asymptotic Convergence in Online Learning with Unbounded Delays" and (to a lesser extent) "Inductive Coherence." These are technically unimpressive, in the sense that they're pretty easy results to get once you're looking for them. (The proof in "Asymptotic Convergence..." was finished in less than a week.) From my perspective the impressive step is Scott Garrabrant (the papers’ primary author) getting from the epistemic state (1) ‘I notice AIXI fails in reflection tasks, and that this failure is deep and can't be easily patched’ to:

  • (2) ‘I notice that one candidate for “the ability AIXI is missing that would fix these deep defects” is “learning mathematical theorems while respecting patterns in whether a given theorem can be used to (dis)prove other theorems.”’
  • (3) ‘I notice that another candidate for “the ability AIXI is missing that would fix these deep defects” is “learning mathematical theorems while respecting empirical patterns in whether a claim looks similar to a set of claims that turned out to be theorems.”’
  • (4) ‘I notice that the two most obvious and straightforward ways to formalize these two abilities don't let you get the other ability for free; in fact, the obvious and straightforward algorithm for the first ability precludes possessing the second ability, and vice versa.’

In contrast, I think the reviewers were mostly assessing how difficult it would be to get from 2/3/4 to a formal demonstration that there’s at least one real (albeit impractical) algorithm that can actually exhibit ability 2, and one that can exhibit ability 3. This is a reasonable question to look at, since it's a lot harder to retrospectively assess how difficult it is to come up with a semiformal insight than how difficult it is to formalize the insight; but those two papers weren't really chosen for being technically challenging or counter-intuitive. They were chosen because they help illustrate two distinct easy/straightforward approaches to LU that turned out to be hard to reconcile, and also because (speaking with the benefit of hindsight) conceptually disentangling these two kinds of approaches turned out to be one of the key insights leading to "Logical Induction."

I confess scepticism at this degree of inferential distance, particularly given the Open Phil staff involved in this report involved several people who previously worked with MIRI.

I wasn't surprised that there's a big inferential gap for most of Open Phil's technical advisors -- we haven't talked much with Chris/Dario/Jacob about the reasoning behind our research agenda. I was surprised by how big the gap was for Daniel Dewey, Open Phil's AI risk program officer. Daniel's worked with us before and has a lot of background in alignment research at FHI, and we spent significant time trying to understand each other’s views, so this was a genuine update for me about how non-obvious our heuristics are to high-caliber researchers in the field, and about how much background researchers at MIRI and FHI have in common. This led to a lot of wasted time: I did a poor job addressing Daniel's questions until late in the review process.

I'm not sure what prior probability you should have assigned to ‘the case for MIRI's research agenda is too complex to be reliably communicated in the relevant timeframe.’ Evaluating how promising basic research is for affecting the long-run trajectory of the field of AI is inherently a lot more complicated than evaluating whether AI risk is a serious issue, for example. I don't have as much experience communicating the former, so the arguments are still rough. There are a couple of other reasons MIRI's research focus might have more inferential distance than the typical alignment research project:

  • (a) We've been thinking about these problems for over a decade, so we've had time to arrive at epistemic states that depend on longer chains of reasoning. Similarly, we've had time to explore and rule out various obvious paths (that turn out to be dead ends).
  • (b) Our focus is on topics we don't expect to jibe well with academia and industry, often because they look relatively intractable and unimportant from standard POVs.
  • (c) ‘High-quality nonstandard formal intuitions’ are what we do. This is what put us ahead of the curve on understanding the AI alignment problem, and the basic case for MIRI (from the perspective of people like Holden who see our early analysis and promotion of the alignment problem as our clearest accomplishment) is that our nonstandard formal intuitions may continue to churn out correct and useful insights about AI alignment when we zero in on subproblems. MIRI and FHI were unusual enough to come up with the idea of AI alignment research in the first place, so they're likely to come up with relatively unusual approaches within AI alignment.

Based on the above, I think the lack of mutual understanding is moderately surprising rather than extremely surprising. Regardless, it’s clear that we need to do a better job communicating how we think about choosing open problems to work on.

I note the blogging is by people already in MIRI's sphere of influence/former staff, and MIRI's previous 'blockbuster result' in decision theory has thus far underwhelmed)

I don't think we've ever worked with Scott Aaronson, though we're obviously on good terms with him. Also, our approach to decision theory stirred up a lot of interest from professional decision theorists at last year's Cambridge conference; expect more about this in the next few months.

is not a promissory note that easily justifies an organization with a turnover of $2M/year, nor fundraising for over a million dollars more.

I think this is a reasonable criticism, and I'm hoping our upcoming write-ups will help address this. If your main concern is that Open Phil doesn't think our work on logical uncertainty, reflection, and decision-theoretic counterfactuals is likely to be safety-relevant, keep in mind that Open Phil gave us $500k expecting this to raise our 2016 revenue from $1.6-2 million (the amount of 2016 revenue we projected absent Open Phil's support) to $2.1-2.5 million, in part to observe the ROI of the added $500k. We've received around $384k in our fundraiser so far (with four days to go), which is maybe 35-60% of what we'd expect based on past fundraiser performance. (E.g., we received $597k in our 2014 fundraisers and $955k in our 2015 ones.) Combined with our other non-Open-Phil funding sources, that means we've so far received around $1.02M in 2016 revenue outside Open Phil, which is solidly outside the $1.6-2M range we've been planning around.

There are a lot of reasons donors might be retracting; I’d be concerned if the reason is that they're expecting Open Phil to handle MIRI's funding on their own, or that they're interpreting some action of Open Phil's as a signal that Open Phil wants broadly Open-Phil-aligned donors to scale back support for MIRI.

(In all of the above, I’m speaking only for myself; Open Phil staff and advisors don’t necessarily agree with the above, and might frame things differently.)

Comment by so8res on Ask MIRI Anything (AMA) · 2016-10-13T17:43:03.390Z · score: 9 (9 votes) · EA · GW

Posts or comments on personal Twitter accounts, Facebook walls, etc. should not be assumed to represent any official or consensus MIRI position, unless noted otherwise. I'll echo Rob's comment here that "a good safety approach should be robust to the fact that the designers don’t have all the answers". If an AI project hinges on the research team being completely free from epistemic shortcomings and moral failings, then the project is doomed (and should change how it's doing alignment research).

I suspect we're on the same page about it being important to err in the direction of system designs that don't encourage arms races or other zero-sum conflicts between parties with different object-level beliefs or preferences. See also the CEV discussion above.

Comment by so8res on Ask MIRI Anything (AMA) · 2016-10-13T00:03:00.008Z · score: 10 (10 votes) · EA · GW

In short: there’s a big difference between building a system that follows the letter of the law (but not the spirit), and a system that follows the intent behind a large body of law. I agree that the legal system is a large corpus of data containing information about human values and how humans currently want their civilization organized. In order to use that corpus, we need to be able to design systems that reliably act as intended, and I’m not sure how the legal corpus helps with that technical problem (aside from providing lots of training data, which I agree is useful).

In colloquial terms, MIRI is more focused on questions like “if we had a big corpus of information about human values, how could we design a system to learn from that corpus how to act as intended”, and less focused on the lack of corpus.

The reason that we have to work on corrigibility ourselves is that we need advanced learning systems to be corrigible before they’ve finished learning how to behave correctly from a large training corpus. In other words, there are lots of different training corpuses and goal systems where, if the system is fully trained and working correctly, we get corrigibility for free; the difficult part is getting the system to behave corrigibly before it’s smart enough to be doing corrigibility for the “right reasons”.

Comment by so8res on Ask MIRI Anything (AMA) · 2016-10-12T23:52:03.061Z · score: 3 (3 votes) · EA · GW

Thanks, Benito. With regards to the second half of this question, I suspect that either you’ve misunderstood some of the arguments I’ve made about why our work doesn’t tend to fit into standard academic journals and conferences, or (alternatively) someone has given arguments for why our work doesn’t tend to fit into standard academic venues that I personally disagree with. My view is that our work doesn’t tend to fit into standard journals etc. because (a) we deliberately focus on research that we think academia and industry are unlikely to work on for one reason or another, and (b) we approach problems from a very different angle than the research communities that are closest to those problems.

One example of (b) is that we often approach decision theory not by following the standard philosophical approach of thinking about what decision sounds intuitively reasonable in the first person, but instead by asking “how could a deterministic robot actually be programmed to reliably solve these problems”, which doesn’t fit super well into the surrounding literature on causal vs. evidential decision theory. For a few other examples, see my response to (8) in my comments on the Open Philanthropy Project’s internal and external reviews of some recent MIRI papers.

Comment by so8res on Ask MIRI Anything (AMA) · 2016-10-12T23:40:44.151Z · score: 0 (0 votes) · EA · GW

I'm not sure I understand the hypothetical -- most of the actions that I deem necessary are aimed at affecting the trajectory of the AI field in one way or another.

Comment by so8res on Ask MIRI Anything (AMA) · 2016-10-12T23:38:07.284Z · score: 4 (4 votes) · EA · GW

I think this has been changing in recent years, yes. A number of AI researchers (some of them quite prominent) have told me that they have largely agreed with AI safety concerns for some time, but have felt uncomfortable expressing those concerns until very recently. I do think that the tides are changing here, with the Concrete Problems in AI Safety paper (by Amodei, Olah, et al) perhaps marking the inflection point. I think that the 2015 FLI conference also helped quite a bit.

Comment by so8res on Ask MIRI Anything (AMA) · 2016-10-12T23:34:08.349Z · score: 2 (2 votes) · EA · GW

I'm not exactly sure what venue it will show up in, but it will very likely be mentioned on the MIRI blog (or perhaps just posted there directly). intelligence.org/blog.

Comment by so8res on Ask MIRI Anything (AMA) · 2016-10-12T23:20:19.216Z · score: 5 (5 votes) · EA · GW

A question from Topher Halquist, on facebook:

Has MIRI considered hiring a more senior math-Ph.D., to serve in a "postdoc supervisor"-type role?

We considered it, but decided against it because supervision doesn’t seem like a key bottleneck on our research progress. Our priority is just to find people who have the right kinds of math/CS intuitions to formalize the mostly-informal problems we’re working on, and I haven’t found that this correlates with seniority. That said, I'm happy to hire senior mathematicians if we find ones who want to work with us and have the relevant skills.

One thing that is currently a key bottleneck for us is technical writing capability -- if you are interested in MIRI’s research and you’re good at technical exposition, let us know.

Comment by so8res on Ask MIRI Anything (AMA) · 2016-10-12T22:46:07.839Z · score: 5 (5 votes) · EA · GW

Re: 1, "what are the main points of disagreement?" is itself currently one of the points of disagreement :) A lot of our disagreements (I think) come down to diverging inchoate mathematical intuitions, which makes it hard to precisely state why we think different problems are worth prioritizing (or to resolve the disagreements).

Also, I think that different Open Phil technical advisors have different disagreements with us. As an example, Paul Christiano and I seem to have an important disagreement about how difficult it will be to align AI systems if we don’t have a correct theoretically principled understanding of how the system performs its abstract reasoning. But while the disagreement seems to me and Paul to be one of the central reasons the two of us prioritize different projects, I think some other Open Phil advisors don’t see this as a core reason to accept/reject MIRI’s research directions.

Discussions are still ongoing, but Open Phil and MIRI are both pretty time-constrained organizations, so it may take a while for us to publish details on where and why we disagree. My own attempts to gesture at possible points of divergence have been very preliminary so far, and represent my perspective rather than any kind of MIRI / Open Phil consensus summary.

Re: 4, I think we probably spent too much time this year writing up results and research proposals. The ML agenda and “Logical Induction,” for example, were both important to get right, but in retrospect I think we could have gotten away with writing less, and writing it faster. Another candidate mistake is some communication errors I made when I was trying to explain the reasoning behind MIRI’s research agenda to Open Phil. I currently attribute the problem to me overestimating how many concepts we shared, and falling prey to the illusion of transparency, in a way that burned a lot of time (though I’m not entirely confident in this analysis).

Comment by so8res on Ask MIRI Anything (AMA) · 2016-10-12T20:56:02.378Z · score: 6 (6 votes) · EA · GW

As Tsvi mentioned, and as Luke has talked about before, we’re not really researching “provable AI”. (I’m not even quite sure what that term would mean.) We are trying to push towards AI systems where the way they reason is principled and understandable. We suspect that that will involve having a good understanding ourselves of how the system performs its reasoning, and when we study different types of reasoning systems we sometimes build models of systems that are trying to prove things as part of how they reason; but that’s very different from trying to make an AI that is “provably X” for some value of X. I personally doubt AGI teams be able to literally prove anything substantial about how well the system will work in practice, though I expect that they will be able to get some decent statistical guarantees.

There are some big difficulties related to the problem of choosing the right objective to optimize, but currently, that’s not where my biggest concerns are. I’m much more concerned with scenarios where AI scientists figure out how to build misaligned AGI systems well before they figure out how to build aligned AGI systems, as that would be a dangerous regime. My top priority is making it the case that the first AGI designs humanity develops are the kinds of system it’s technologically possible to align with operator intentions in practice. (I’ll write more on this subject later.)

Comment by so8res on MIRI Update and Fundraising Case · 2016-10-12T20:52:06.460Z · score: 2 (2 votes) · EA · GW

There’s nothing very public on this yet. Some of my writing over the coming months will bear on this topic, and some of the questions in Jessica’s agenda are more obviously applicable in “less optimistic” scenarios, but this is definitely a place where public output lags behind our private research.

As an aside, one of our main bottlenecks is technical writing capability: if you have technical writing skill and you’re interested in MIRI research, let us know.

Comment by so8res on Ask MIRI Anything (AMA) · 2016-10-12T20:25:16.307Z · score: 10 (10 votes) · EA · GW

I don’t think of our strategy as having changed much in the last year. For example, in the last AMA I said that the plan was to work on some big open problems (I named 5 here: asymptotically good reasoning under logical uncertainty, identifying the best available decision with respect to a predictive world-model and utility function, performing induction from inside an environment, identifying the referents of goals in realistic world-models, and reasoning about the behavior of smarter reasoners), and that I’d be thrilled if we could make serious progress on any of these problems within 5 years. Scott Garrabrant then promptly developed logical induction, which represents serious progress on two (maybe three) of the big open problems. I consider this to be a good sign of progress, and that set of research priorities remains largely unchanged.

Jessica Taylor is now leading a new research program, and we're splitting our research time between this agenda and our 2014 agenda. I see this as a natural consequence of us bringing on new researchers with their own perspectives on various alignment problems, rather than as a shift in organizational strategy. Eliezer, Benya, and I drafted the agent foundations agenda when we were MIRI’s only full-time researchers; Jessica, Patrick, and Critch co-wrote a new agenda with their take once they were added to the team. The new agenda reflects a number of small changes: some updates that we’ve all made in response to evidence over the last couple of years, some writing-up of problems that we’d been thinking about for some time but which hadn’t made the cut into the previous agenda, and some legitimate differences in intuition and perspective brought to the table by Jessica, Patrick, and Critch. The overall strategy is still “do research that we think others won’t do,” and the research methods and intuitions we rely on continue to have a MIRI-ish character.

Regarding success probability, I think MIRI has a decent chance of success compared to other potential AI risk interventions, but AI risk is a hard problem. I’d guess that humanity as a whole has a fairly low probability of success, with wide error bars.

Unless I’m missing context, I think the “medium probability of success” language comes from old discussions on LessWrong about how to respond to Pascal’s mugging. (See Rob’s note about Pascalian reasoning here.) In that context, I think the main dichotomy Eliezer had in mind was “tiny” probabilities (that can be practically ignored, like gambling in the powerball) and strategically relevant probabilities like 1% or 10%. See Eliezer’s post here. I’m fine with calling the latter probabilities “medium-sized” in the context of lottery-style errors, and calling them “small” in other contexts. With respect to ensuring that the first AGI designs developed by AI scientists are easy to align, I don’t think MIRI’s odds are stellar, though I do feel comfortable saying that they’re higher than 1%. Let me know if I’ve misunderstood the question you had in mind here.

Comment by so8res on Ask MIRI Anything (AMA) · 2016-10-12T20:17:09.561Z · score: 4 (4 votes) · EA · GW

Yep, we often have a number of non-MIRI folks checking the proofs, math, and citations. I’m still personally fairly involved in the writing process (because I write fast, and because I do what I can to free up the researchers’ time to do other work); this is something I’m working to reduce. Technical writing talent is one of our key bottlenecks; if you like technical writing and are interested in MIRI’s research, get in touch.

Comment by so8res on Ask MIRI Anything (AMA) · 2016-10-12T19:19:15.724Z · score: 8 (8 votes) · EA · GW

I largely endorse Jessica’s comment. I’ll add that I think the ideal MIRI researcher has their own set of big-picture views about what’s required to design aligned AI systems, and that their vision holds up well under scrutiny. (I have a number of heuristics for what makes me more or less excited about a given roadmap.)

That is, the ideal researcher isn’t just working on whatever problems catch their eye or look interesting; they’re working toward a solution of the whole alignment problem, and that vision regularly affects their research priorities.

Comment by so8res on Ask MIRI Anything (AMA) · 2016-10-12T18:47:48.155Z · score: 6 (8 votes) · EA · GW

I’ll interpret this question as “what are the most plausible ways for you to lose confidence in MIRI’s effectiveness and/or leave MIRI?” Here are a few ways that could happen for me:

  1. I could be convinced that I was wrong about the type and quality of AI alignment research that the external community is able to do. There’s some inferential distance here, so I'm not expecting to explain my model in full, but in brief, I currently expect that there are a few types of important research that academia and industry won’t do by default. If I was convinced that either (a) there are no such gaps or (b) they will be filled by academia and industry as a matter of course, then I would downgrade my assessment of the importance of MIRI accordingly.
  2. I could learn that our research path was doomed, for one reason or another, and simultaneously learn that repurposing our skill/experience/etc. for other purposes was not worth the opportunity cost of all our time and effort.
Comment by so8res on Ask MIRI Anything (AMA) · 2016-10-12T17:45:34.911Z · score: 10 (10 votes) · EA · GW

I endorse Tsvi's comment above. I'll add that it’s hard to say how close we are to closing basic gaps in understanding of things like “good reasoning”, because mathematical insight is notoriously difficult to predict. All I can say is that logical induction does seem like progress to me, and we're taking various different approaches on the remaining problems. Also, yeah, one of those avenues is a follow-up to PPRHOL. (One experiment we’re running now is an attempt to implement a cellular automaton in HOL that implements a reflective reasoner with access to the source code of the world, where the reasoner uses HOL to reason about the world and itself. The idea is to see whether we can get the whole stack to work simultaneously, and to smoke out all the implementation difficulties that arise in practice when you try to use a language like HOL for reasoning about HOL.)

Comment by so8res on Ask MIRI Anything (AMA) · 2016-10-12T17:26:12.369Z · score: 9 (9 votes) · EA · GW

Good question. The main effect is that I’ve increased my confidence in the vague MIRI mathematical intuitions being good, and the MIRI methodology for approaching big vague problems actually working. This doesn’t constitute a very large strategic shift, for a few reasons. One reason is that my strategy was already predicated on the idea that our mathematical intuitions and methodology are up to the task. As I said in last year’s AMA, visible progress on problems like logical uncertainty (and four other problems) were one of the key indicators of success that I was tracking; and as I said in February, failure to achieve results of this caliber in a 5-year timeframe would have caused me to lose confidence in our approach. (As of last year, that seemed like a real possibility.) The logical induction result increases my confidence in our current course, but it doesn't shift it much.

Another reason logical induction doesn’t affect my strategy too much is that it isn’t that big a result. It’s one step on a path, and it’s definitely mathematically exciting, and it gives answers to a bunch of longstanding philosophical problems, but it’s not a tool for aligning AI systems on the object level. We’re building towards a better understanding of “good reasoning”, and we expect this to be valuable for AI alignment, and logical induction is a step in that direction, but it's only one step. It’s not terribly useful in isolation, and so it doesn’t call for much change in course.

MIRI Update and Fundraising Case

2016-10-09T22:05:50.211Z · score: 18 (18 votes)
Comment by so8res on Let's conduct a survey on the quality of MIRI's implementation · 2016-02-20T03:32:29.862Z · score: 20 (22 votes) · EA · GW

Thanks for the write-up, Rob. OpenPhil actually decided to evaluate our technical agenda last summer, and Holden put Daniel Dewey on the job. The report isn't done yet, in part because it has proven very time-intensive to fully communicate the reasoning behind our research priorities, even to someone with as much understanding of the AI landscape as Daniel Dewey. Separately, we have plans to get an independent evaluation of our organizational efficacy started later in 2016, which I expect to be useful for our admin team as well as prospective donors.

FYI, when it comes to evaluating our research progress, I doubt that the methods you propose would get you much Bayesian evidence. Our published output will look like round pegs shoved into square holes regardless of whether we're doing our jobs well or poorly, because we're doing research that doesn't fit neatly into an existing academic niche. Our objective is to make direct progress on what appear to us to be the main neglected technical obstacles to developing reliable AI systems in the long term, with a goal of shifting the direction of AI research in a big way once we hit certain key research targets; and we're specifically targeting research that isn't compatible with industry's economic incentives or academia's publish-or-perish incentives. To get information about how well we're doing our jobs, I think the key questions to investigate are (1) whether we've chosen good research targets; and (2) whether we're making good progress towards them.

We've been focusing our communication efforts mainly on helping people evaluate (1): I've been working on explaining our approach and agenda, and OpenPhil is also on the job. To investigate (2), we'd need to spend a sizable chunk of time with mathematically adept evaluators — we still haven't hit any of our key research targets, which means that evaluating our progress requires understanding our smaller results and why we think they're progress towards the big results. In practice, we've found that explaining this usually requires explaining why we think the big targets are vital, as this informs (e.g.) which shortcuts are and are not acceptable. I plan to wait until after the OpenPhil report is finished before taking on another time-intensive eval.

Fortunately, (2) will become much easier to evaluate as we achieve (or persistently fail to achieve) those key targets. This also provides us with an opportunity to test our approach and methodology. People who understand our approach and find it uncompelling often predict that some of the results we're shooting for cannot be achieved. This means we'll get some evidence about (1) as we learn more about (2). For example, last year I mentioned "naturalized AIXI" as an ambitious 5-year research target. If we are not able to make concrete progress towards that goal, then over the next four years, I will lose confidence in our approach and eventually change our course dramatically. Conversely, if we make discoveries that are important pieces of that puzzle, I'll update in favor of us being onto something, especially if we find puzzle pieces that knowledgeable critics predicted we wouldn’t find. This data will hopefully start rolling in soon, now that our research team is getting up to size.

("Concrete progress" / "important puzzle pieces" in this case are satisfactory asymptotic algorithms for any of: (1) reasoning under logical uncertainty; (2) identifying the best available decision with respect to a utility function; (3) performing induction from inside an environment; (4) identifying the referents of goals in realistic world-models; and (5) reasoning about the behavior of smarter reasoners; the last of which is hopefully a subset of 1 and 2. The linked papers give rough descriptions of what counts as 'satisfactory' in each case; I'll work to make the desiderata more explicit as time goes on.)

Comment by so8res on Peter Hurford thinks that a large proportion of people should earn to give long term · 2015-08-20T22:48:15.996Z · score: 11 (13 votes) · EA · GW

I want to push back a bit against point #1 ("Let's divide problems into 'funding constrained' and 'talent constrained'.) In my experience recruiting for MIRI, these constraints are tightly intertwined. To hire talent, you need money (and to get money, you often need results, which requires talent).

I think the "are they funding constrained or talent constrained?" model is incorrect, and potentially harmful. In the case of MIRI, imagine we're trying to hire a world-class researcher for $50k/year, and can't find one. Are we talent constrained, or funding constrained? (Our actual researcher salaries are higher than this, but they weren't last year, and they still aren't anywhere near competitive with industry rates.)

Furthermore, there are all sorts of things I could be doing to loosen the talent bottleneck, but only if I knew the money was going to be there. I could be setting up a researcher stewardship program, having seminars run at Berkeley and Stanford, and hiring dedicated recruiting-focused researchers who know the technical work very well and spend a lot of time practicing getting people excited -- but I can only do this if I know we're going to have the money to sustain that program alongside our core research team, and if I know we're going to have the money to make hires. If we reliably bring in only enough funding to sustain modest growth, I'm going to have a very hard time breaking the talent constraint.

And that's ignoring the opportunity costs of being under-funded, which I think are substantial. For example, at MIRI there are numerous additional programs we could be setting up, such as a visiting professor + postdoc program, or a separate team that is dedicated to working closely with all the major industry leaders, or a dedicated team that's taking a different research approach, or any number of other projects that I'd be able to start if I knew the funding would appear. All those things would lead to new and different job openings, letting us draw from a wider pool of talented people (rather than the hyper-narrow pool we currently draw from), and so this too would loosen the talent constraint -- but again, only if the funding was there.

Right now, we have more trouble finding top-notch math talent excited about our approach to technical AI alignment problems than we have raising money, but don't let this fool you -- the talent constraint would be much, much easier to address with more money, and there are many things we aren't doing (for lack of funding) that I think would be high impact.

2015 MIRI Summer Fundraiser: How We Could Scale

2015-07-28T02:43:02.036Z · score: 7 (9 votes)
Comment by so8res on I am Nate Soares, AMA! · 2015-06-12T02:15:14.344Z · score: 9 (9 votes) · EA · GW

All right, I'll come back for one more question. Thanks, Wei. Tough question. Briefly,

(1) I can't see that many paths to victory. The only ones I can see go through either (a) aligned de-novo AGI (which needs to be at least powerful enough to safely prevent maligned systems from undergoing intelligence explosions) or (b) very large amounts of global coordination (which would be necessary to either take our time & go cautiously, or to leap all the way to WBE without someone creating a neuromorph first). Both paths look pretty hard to walk, but in short, (a) looks slightly more promising to me. (Though I strongly support any attempts to widen path (b)!)

(2) It seems to me that the default path leads almost entirely to UFAI: insofar as MIRI research makes it easier for others to create UFAI, most of that effect isn't replacing wins with losses, it's just making the losses happen sooner. By contrast, this sort of work seems necessary in order to keep path (a) open. I don't see many other options. (In other words, I think it's net positive because it creates some wins and moves some losses sooner, and that seems like a fair trade to me.)

To make that a bit more concrete, consider logical uncertainty: if we attain a good formal understanding of logically uncertain reasoning, that's quite likely to shorten AI timelines. But I think I'd rather have a 10-year time horizon and be dealing with practical systems built upon solid foundations that come from a decade's worth of formally understanding what good logically uncertain reasoning looks like, rather than a 20-year time horizon where we have to deal with systems built using 19 years of hacks and 1 year of patches bolted on at the end.

(In other words, the possibility of improving AI capabilities is the price you have to pay to keep path (a) open.)

A bunch of other factors also play into my considerations (including a heuristic which says "the best way to figure out which problems are the real problems is to start solving the things that appear to be the problems," and another heuristic which says "if you see a big fire, try to put it out, and don't spend too much time worrying about whether putting it out might actually start worse fires elsewhere", and a bunch of others), but those are the big considerations, I think.

Comment by so8res on I am Nate Soares, AMA! · 2015-06-12T00:48:11.704Z · score: 3 (3 votes) · EA · GW

Kinda. The current approach is more like "Pretend you're trying to solve a much easier version of the problem, e.g. where you have a ton of computing power and you're trying to maximize diamond instead of hard-to-describe values. What parts of the problem would you still not know how to solve? Try to figure out how to solve those first."

(1) If we manage to (a) generate a theory of advanced agents under many simplifying assumptions, and then (b) generate a theory of bounded rational agents under far fewer simplifying assumptions, and then (c) figure out how to make highly reliable practical generally intelligent systems, all before anyone else gets remotely close to AGI, then we might consider teching up towards designing AI systems ourselves. I currently find this scenario unlikely.

(2) We're currently far enough away from knowing what the actual architectures will look like that I don't think it's useful to try to build AI components intended for use in an actual AGI at this juncture.

(3) I think that making theorem provers easier to use is an important task and a worthy goal. I'm not optimistic about attempts to merge natural language with Martin-Lof type theory. If you're interested in improving theorem-proving tools in ways that might make it easier to design safe reflective systems in the future, I'd point you more towards trying to implement (e.g.) Marcello's Waterfall in a dependently typed language (which may well involve occasionally patching the language, at this stage).

Comment by so8res on I am Nate Soares, AMA! · 2015-06-12T00:44:30.476Z · score: 6 (6 votes) · EA · GW

You could call it a kind of moral relativism if you want, though it's not a term I would use. I tend to disagree with many self-proclaimed moral relativists: for example, I think it's quite possible for one to be wrong about what they value, and I am not generally willing to concede that Alice thinks murder is OK just because Alice says Alice thinks murder is OK.

Another place I depart from most moral relativists I've met is by mixing in a healthy dose of "you don't get to just make things up." Analogy: we do get to make up the rules of arithmetic, but once we do, we don't get to decide whether 7+2=9. This despite the fact that a "7" is a human concept rather than a physical object (if you grind up the universe and pass it through the finest sieve, you will find no particle of 7). Similarly, if you grind up the universe you'll find no particle of Justice, and value-laden concepts are human concoctions, but that doesn't necessarily mean they bend to our will.

My stance can roughly be summarized as "there are facts about what you value, but they aren't facts about the stars or the void, they're facts about you." (The devil's in the details, of course.)

Comment by so8res on I am Nate Soares, AMA! · 2015-06-12T00:35:27.156Z · score: 3 (3 votes) · EA · GW

First, I think that civilization had better be really dang mature before it considers handing over the reins to something like CEV. (Luke has written a bit about civilizational maturity in the past.)

Second, I think that the CEV paper (which is currently 11 years old) is fairly out of date, and I don't necessarily endorse the particulars of it. I do hope, though, that if humanity (or posthumanity) ever builds a singleton, that they build it with a goal of something like taking into account the extrapolated preferences of all sentients and fulfilling some superposition of those in a non-atrocious way. (I don't claim to know how to fill in the gaps there.)

Comment by so8res on I am Nate Soares, AMA! · 2015-06-12T00:31:39.589Z · score: 1 (1 votes) · EA · GW

(1) I suspect it's possible to create an artificial system that exhibits what many people would call "intelligent behavior," and which poses an existential threat, but which is not sentient or conscious. (In the same way that Deep Blue wasn't sentient: it seems to me like optimization power may well be separable from sentience/consciousness.) That's no guarantee, of course, and if we do create a sentient artificial mind, then it will have moral weight in its own right, and that will make our job quite a bit more difficult.

(2) The goal is not to build a sentient mind something that wants to destroy humanity but can't. (That's both morally reprehensible and doomed to failure! :-p) Rather, the goal is to successfully transmit the complicated values of humanity into a powerful optimizer.

Have you read Bostrom's The Superintelligent Will? Short version is, it looks possible to build powerful optimizers that pursue goals we might think are valueless (such as an artificial system that, via very clever long-term plans, produces extremely large amounts of diamond, or computes lots and lots of digits of pi). We'd rather not build that sort of system (especially if it's powerful enough to strip the Earth of resources and turn them into diamonds / computing power): most people would rather build something that shares some of our notion of "value," such as respect for truth and beauty and wonder and so on.

It looks like this isn't something you get for free. (In fact, it looks very hard to get: it seems likely that most minds would by default have incentives to manipulate & decieve in order to acquire resources.) We'd rather not build minds that try to turn everything they can into a giant computer for computing digits of pi, so the question is how to design the sort of mind that has things like respect for truth and beauty and wonder?

In hollywood movies, you can just build something that looks cute and fluffy and then it will magically acquire a spark of human-esque curiosity and regard for other sentient life, but in the real world, you've got to figure out how to program in those capabilities yourself (or program something that will reliably acquire them), and that's hard :-)

Comment by so8res on I am Nate Soares, AMA! · 2015-06-12T00:20:37.812Z · score: 3 (3 votes) · EA · GW

The most reliable strategy to date is "ask me" :-)

Comment by so8res on I am Nate Soares, AMA! · 2015-06-12T00:15:08.253Z · score: 3 (3 votes) · EA · GW

Luke talks about the pros and cons of various terms here. Then, long story short, we asked Stuart Russell for some thoughts and settled on "AI alignment" (his suggestion, IIRC).

Comment by so8res on I am Nate Soares, AMA! · 2015-06-12T00:09:38.655Z · score: 4 (4 votes) · EA · GW

Couldn't it be that the returns on intelligence tend to not be very high for a self-improving agent around the human area?

Seems unlikely to me, given my experience as an agent at roughly the human level of intelligence. If you gave me a human-readable version of my source code, the ability to use money to speed up my cognition, and the ability to spawn many copies of myself (both to parallelize effort and to perform experiments with) then I think I'd be "superintelligent" pretty quickly. (In order for the self-improvement landscape to be shallow around the human level, you'd need systems to be very hardware-limited, and hardware currently doesn't look like the bottleneck.)

(I'm also not convinced it's meaningful to talk about "the human level" except in a very broad sense of "having that super powerful domain generality that humans seem to possess", so I'm fairly uncomfortable with terminology such as "20x the human level.")

Comment by so8res on I am Nate Soares, AMA! · 2015-06-12T00:03:04.437Z · score: 2 (2 votes) · EA · GW

Great question! I suggest checking out either our research guide or our technical agenda. The first is geared towards students who are wondering what to study in order to eventually gain the skills to be an AI alignment researcher, the latter is geared more towards professionals who already have the skills and are wondering what the current open problems are.

In your case, I'd guess maybe (1) get some solid foundations via either set theory or type theory, (2) get solid foundations on AI, perhaps via AI: A Modern Approach, (3) brush up on probability theory, formal logic, and causal graphical models, and then (4) dive into the technical agenda and figure out which open problems pique your interest.

Comment by so8res on I am Nate Soares, AMA! · 2015-06-12T00:00:36.425Z · score: 7 (7 votes) · EA · GW

1) The things we have no idea how to do aren't the implicit assumptions in the technical agenda, they're the explicit subject headings: decision theory, logical uncertainty, Vingean reflection, corrigibility, etc :-)

We've tried to make it very clear in various papers that we're dealing with very limited toy models that capture only a small part of the problem (see, e.g., basically all of section 6 in the corrigibility paper).

Right now, we basically have a bunch of big gaps in our knowledge, and we're trying to make mathematical models that capture at least part of the actual problem -- simplifying assumptions are the norm, not the exception. All I can easily say that common simplifying assumptions include: you have lots of computing power, there is lots of time between actions, you know the action set, you're trying to maximize a given utility function, etc. Assumptions tend to be listed in the paper where the model is described.

2) The FLI folks aren't doing any research; rather, they're administering a grant program. Most FHI folks are focused more on high-level strategic questions (What might the path to AI look like? What methods might be used to mitigate xrisk? etc.) rather than object-level AI alignment research. And remember that they look at a bunch of other X-risks as well, and that they're also thinking about policy interventions and so on. Thus, the comparison can't easily be made. (Eric Drexler's been doing some thinking about the object-level FAI questions recently, but I'll let his latest tech report fill you in on the details there. Stuart Armstrong is doing AI alignment work in the same vein as ours. Owain Evans might also be doing object-level AI alignment work, but he's new there, and I haven't spoken to him recently enough to know.)

Insofar as FHI folks would say we're making assumptions, I doubt they'd be pointing to assumptions like "UDT knows the policy set" or "assume we have lots of computing power" (which are obviously simplifying assumptions on toy models), but rather assumptions like "doing research on logical uncertainty now will actually improve our odds of having a working theory of logical uncertainty before it's needed."

(3) I think most of the FHI folks & FLI folks would agree that it's important to have someone hacking away at the technical problems, but just to make the arguments more explicit, I think that there are a number of problems that it's hard to even see unless you have your "try to solve FAI" goggles on. Consider: people have been working on some of these problems for decades (logical uncertainty) or even centuries (decision theory) without solving the AI-alignment-relevant parts.

We're still very much trying to work out the initial theory of highly reliable advanced agents. This involves taking various vague philosophical problems ("what even is logical uncertainty?") and turning them into concrete mathematical models (akin to the concrete model of probability theory attained by Kolmogorov & co).

We're still in the preformal stage, and if we can get this theory to the formal stage, I expect we may be able to get a lot more eyes on the problem, because the ever-crawling feelers of academia seem to be much better at exploring formalized problems than they are at formalizing preformal problems.

Then of course there's the heuristic of "it's fine to shout 'model uncertainty!' and hover on the sidelines, but it wasn't the armchair philosophers who did away with the epicycles, it was Kepler, who was up to his elbows in epicycle data." One of the big ways that you identify the things that need working on is by trying to solve the problem yourself. By asking how to actually build an aligned superintelligence, MIRI has generated a whole host of open technical problems, and I predict that that host will be a very valuable asset now that more and more people are turning their gaze towards AI alignment.

Comment by so8res on I am Nate Soares, AMA! · 2015-06-11T23:51:14.803Z · score: 1 (1 votes) · EA · GW

Than a slow takeoff? Yes :-)

Comment by so8res on I am Nate Soares, AMA! · 2015-06-11T23:50:10.596Z · score: 4 (4 votes) · EA · GW

(1) Eventually. Predicting the future is hard. My 90% confidence interval conditioned on no global catastrophes is maybe 5 to 80 years. That is to say, I don't know.

(2) I fairly strongly expect a fast takeoff. (Interesting aside: I was recently at a dinner full of AI scientists, some of them very skeptical about the whole long-term safety problem, who unanimously professed that they expect a fast takeoff -- I'm not sure yet how to square this with the fact that Bostrom's survey showed fast takeoff was a minority position).

It seems hard (but not impossible) to build something that's better than humans at designing AI systems & has access to its own software and new hardware, which does not self improve rapidly. Scenarios where this doesn't occur include (a) scenarios where the top AI systems are strongly hardware limited; (b) scenarios where all operators of all AI systems successfully remove all incentives to self-improve; or (c) the first AI system is strong enough to prevent all intelligence explosions, but is also constructed such that it does not itself self-improve. The first two scenarios seem unlikely from here, the third is more plausible (if the frontrunners explicitly try to achieve it) but still seems like a difficult target to hit.

(3) I think we're pretty likely to eventually get a singleton: in order to get a multi-polar outcome, you need to have a lot of systems that are roughly at the same level of ability for a long time. That seems difficult but not impossible. (For example, this is much more likely to happen if the early AGI designs are open-sourced and early AGI algorithms are incredibly inefficient such that progress is very slow and all the major players progress in lockstep.)

Remember that history is full of cases where a better way of doing things ends up taking over the world -- humans over the other animals, agriculture dominating hunting & gathering, the Brits, industrialization, etc. (Agriculture and arguably industrialization emerged separately in different places, but in both cases the associated memes still conquered the world.) One plausible outcome is that we get a series of almost-singletons that can't quite wipe out other weaker entities and therefore eventually go into decline (which is also a common pattern throughout history), but I expect superintelligent systems to be much better at "finishing the job" and securing very long-term power than, say, the Romans were. Thus, I expect a singleton outcome in the long run.

The run-up to that may look pretty strange, though.

Comment by so8res on I am Nate Soares, AMA! · 2015-06-11T23:36:32.635Z · score: 3 (3 votes) · EA · GW

We don't have a working definition of "what has intrinsic value." My basic view on these hairy problems ("but what should I value?") is that we really don't want to be coding in the answer by hand. I'm more optimistic about building something that has a few layers of indirection, e.g., something that figures out how to act as intended, rather than trying to transmit your object-level intentions by hand.

In the paper you linked, I think Max is raising about a slightly different issue. He's talking about what we would call the ontology identification problem. Roughly, imagine building an AI system that you want to produce lots of diamond. Maybe it starts out with an atomic model of the universe, and you (looking at its model) give it a utility function that scores one point per second for every carbon atom covalently bound to four other carbon atoms (and then time-discounts or something). Later, the system develops a nuclear model of the universe. You do want it to somehow deduce that carbon atoms in the old model map onto six-proton atoms in the new model, and maybe query the user about how to value carbon isotopes in its diamond lattice. You don't want it to conclude that none of these six-proton nuclei pattern-match to "true carbon", and then turn the universe upside down looking for some hidden cache of "true carbon."

We have a few different papers that mention this problem, albeit shallowly: Ontological Crises in Artificial Agents' Value Systems, The Value Learning Problem, Formalizing Two Problems of Realistic World-Models. There's a lot more work to be done here, and it's definitely on our radar, though also note that work on this problem is at least a little blocked on attaining a better understanding of how to build multi-level maps of the world.

Comment by so8res on I am Nate Soares, AMA! · 2015-06-11T23:32:41.037Z · score: 2 (2 votes) · EA · GW

I mostly agree with Daniel's paper :-)

Comment by so8res on I am Nate Soares, AMA! · 2015-06-11T23:30:11.910Z · score: 4 (4 votes) · EA · GW

Great question! The short version is, writing more & publishing more (and generally engaging with the academic mainstream more) are very high on my priority list.

Mainstream publications have historically been fairly difficult for us, as until last year, AI alignment research was seen as fairly kooky. (We've had a number of papers rejected from various journals due to the "weird AI motivation.") Going forward, it looks like that will be less of an issue.

That said, writing capability is a huge bottleneck right now. Our researchers are currently trying to (a) run workshops, (b) engage with & evaluate promising potential researchers, (c) attend conferences, (d) produce new research, (e) write it up, and (f) get it published. That's a lot of things for a three-person research team to juggle! Priority number 1 is to grow the research team (because otherwise nothing will ever be unblocked), and we're aiming to hire a few new researchers before the year is through. After that, increasing our writing output is likely the next highest priority.

Expect our writing output this year to be similar to last year's (i.e., a small handful of peer reviewed papers and a larger handful of technical reports that might make it onto the arXiv), and then hopefully we'll have more & higher quality publications starting in 2016 (the publishing pipeline isn't particularly fast).

Comment by so8res on I am Nate Soares, AMA! · 2015-06-11T23:27:09.509Z · score: 3 (3 votes) · EA · GW

Hard to get there. Highly likely that we get to neuromorphic AI along the way. (Low-fidelity images or low-speed partial simulations are likely very useful for learning more about intelligence, and I currently expect that the caches of knowledge unlocked on the way to WBE probably get you to AI before the imaging/hardware supports WBE.)

Comment by so8res on I am Nate Soares, AMA! · 2015-06-11T23:19:23.845Z · score: 6 (6 votes) · EA · GW

Short version: FAI. (You said "hope", not "expect" :-p)

Longer version: Hard question, both because (a) I don't know how you want me to trade off between how nice the advance would be and how likely we are to get it, and (b) my expectations for the next five years are very volatile. In the year since Nick Bostrom released Superintelligence, there has been a huge wave of interest in the future of AI (due in no small part to the efforts of FLI and their wonderful Puerto Rico conference!), and my expectations of where I'll be in five years range all the way from "well that was a nice fad while it lasted" to "oh wow there are billions of dollars flowing into the field".

But I'll do my best to answer. The most obvious schelling point I'd like to hit in 5 years is "fully naturalized AIXI," that is, a solid theoretical understanding of how we would "brute force" an FAI if we had ungodly amounts of computing power. (AIXI is an equation that Marcus Hutter uses to define an optimal general intelligence under certain simplifying assumptions that don't hold in the real world: AIXI is sufficiently powerful that you could use it to destroy the world while demonstrating something that would surely look like "intelligence" from the outside, but it's not yet clear how you could use it to build a generally intelligent system that maximizes something in the world -- for example, even if you gave me unlimited computing power, I wouldn't yet know how to write the program that stably and reliably pursues the goal of turning as much of the universe as possible into into diamond.)

Formalizing "fully naturalized AIXI" would require a better understanding of decision theory (How do we want advanced systems to reason about counterfactuals? Preferences alone are not enough to determine what counts as a "good action," that notion also depends on how you evaluate the counterfactual consequences of taking various actions, we lack a theory of idealized counterfactual reasoning.), logical uncertainty (What does it even mean for a reasoner to reason reliably about something larger than the reasoner? Solomonoff induction basically works by having the reasoner be just friggin' bigger than the environment, and I'd be thrilled if we could get a working theoretical model of "good reasoning" in cases where the reasoner is smaller than the environment), and a whole host of other problems (many of them covered in our technical agenda).

5 years is a pretty wildly optimistic timeline for developing fully naturalized AIXI, though, and I'd be thrilled if we could make concrete progress in any one of the topic areas listed in the technical agenda.

Comment by so8res on I am Nate Soares, AMA! · 2015-06-11T23:12:36.142Z · score: 4 (4 votes) · EA · GW

We're actually going to be hiring a full-time office manager soon: someone who can just Make Stuff Happen and free up a lot of our day-to-day workload. Keep your eyes peeled, we'll be advertising the opening soon.

Additionally, we're hurting for researchers who can write fast & well, and before too long we'll be looking for a person who can stay up to speed on the technical research but spend most of their time doing outreach and stewarding other researchers who are interested in doing AI alignment research. Both of these jobs would require a bit less technical ability than is required to make new breakthroughs in the field.

Comment by so8res on I am Nate Soares, AMA! · 2015-06-11T23:08:59.014Z · score: 11 (11 votes) · EA · GW

That post mixes a bunch of different assertions together, let me try to distill a few of them out and answer them in turn:


One of Peter's first (implicit) points is that AI alignment is a speculative cause. I tend to disagree.

Imagine it's 1942. The Manhattan project is well under way, Leo Szilard has shown that it's possible to get a neutron chain reaction, and physicists are hard at work figuring out how to make an atom bomb. You suggest that this might be a fine time to start working on nuclear containment, so that, once humans are done bombing the everloving breath out of each other, they can harness nuclear energy for fun and profit. In this scenario, would nuclear containment be a "speculative cause"?

There are currently thousands of person-hours and billions of dollars going towards increasing AI capabilities every year. To call AI alignment a "speculative cause" in an environment such as this one seems fairly silly to me. In what sense is it speculative to work on improving the safety of the tools that other people are currently building as fast as they can? Now, I suppose you could argue that either (a) AI will never work or (b) it will be safe by default, but both those arguments seem pretty flimsy to me.

You might argue that it's a bit weird for people to claim that the most effective place to put charitable dollars is towards some field of scientific study. Aren't charitable dollars supposed to go to starving children? Isn't the NSF supposed to handle scientific funding? And I'd like to agree, but society has kinda been dropping the ball on this one.

If we had strong reason to believe that humans could build strangelets, and society were pouring billions of dollars and thousands of human-years into making strangelets, and almost no money or effort was going towards strangelet containment, and it looked like humanity was likely to create a strangelet sometime in the next hundred years, then yeah, I'd say that "strangelet safety" would be an extremely worthy cause.

How worthy? Hard to say. I agree with Peter that it's hard to figure out how to trade off "safety of potentially-very-highly-impactful technology that is currently under furious development" against "children are dying of malaria", but the only way I know how to trade those things off is to do my best to run the numbers, and my back-of-the-envelope calculations currently say that AI alignment is further behind than the globe is poor.

Now that the EA movement is starting to look more seriously into high-impact interventions on the frontiers of science & mathematics, we're going to need to come up with more sophisticated ways to assess the impacts and tradeoffs. I agree it's hard, but I don't think throwing out everything that doesn't visibly pay off in the extremely short term is the answer.


Alternatively, you could argue that MIRI's approach is unlikely to work. That's one of Peter's explicit arguments: it's very hard to find interventions that reliably affect the future far in advance, especially when there aren't hard objective metrics. I have three disagreements with Peter on this point.

First, I think he picks the wrong reference class: yes, humans have a really hard time generating big social shifts on purpose. But that doesn't necessarily mean humans have a really hard time generating math -- in fact, humans have a surprisingly good track record when it comes to generating math!

Humans actually seem to be pretty good at putting theoretical foundations underneath various fields when they try, and various people have demonstrably succeeded at this task (Church & Turing did this for computing, Shannon did this for information theory, Kolmogorov did a fair bit of this for probability theory, etc.). This suggests to me that humans are much better at producing technical progress in an unexplored field than they are at generating social outcomes in a complex economic environment. (I'd be interested in any attempt to quantitatively evaluate this claim.)

Second, I agree in general that any one individual team isn't all that likely to solve the AI alignment problem on their own. But the correct response to that isn't "stop funding AI alignment teams" -- it's "fund more AI alignment teams"! If you're trying to ensure that nuclear power can be harnessed for the betterment of humankind, and you assign low odds to any particular research group solving the containment problem, then the answer isn't "don't fund any containment groups at all," the answer is "you'd better fund a few different containment groups, then!"

Third, I object to the whole "there's no feedback" claim. Did Kolmogorov have tight feedback when he was developing an early formalization of probability theory? It seems to me like the answer is "yes" -- figuring out what was & wasn't a mathematical model of the properties he was trying to capture served as a very tight feedback loop (mathematical theorems tend to be unambiguous), and indeed, it was sufficiently good feedback that Kolmogorov was successful in putting formal foundations underneath probability theory. We're trying to do something similar with various other confusing aspects of good reasoning (such as logical uncertainty), and you're welcome to raise concerns about whether we need to understand good reasoning under logical uncertainty in order to build an aligned AI, but saying that there's "no feedback loop" seems to just misunderstand the approach.

Comment by so8res on I am Nate Soares, AMA! · 2015-06-11T22:59:31.869Z · score: 5 (5 votes) · EA · GW

(1) That's not quite how I'd characterize the current technical agenda. Rather, I'd say that in order to build an AI aligned with human interests, you need to do three things: (a) understand how to build an AI that's aligned with anything (could you build an AI that reliably builds as much diamond as possible?), (b) understand how to build an AI that assists you in correcting things-you-perceive-as-flaws (this doesn't come for free, but it's pretty important, because humans are bad at getting software right on the first try), and (c) figure out how to build a machine that can safely learn human values & intentions from training data.

We're currently splitting our time between all these problems. It's not that we haven't focused on the value learning problem yet, rather, it's that the value learning problem is only a fraction of the whole problem. We'll keep working on all the parts, and I'm not sure which parts will yield first. I can't give you a timeline on how long various parts will take; scientific progress is very hard to predict.

(2) I wouldn't currently say that "formal logic is the arena in which MIRI's technical work takes place" -- if anything, “math in general” is the arena, and that will probably remain the case until we have a much better understanding of the problems we're trying to solve (and how to solve simplified versions of them), at which point computer programming will become much more essential. Again, it's hard to say how long it will take to get there, because scientific progress is hard to predict.

Formal logic is one of many tools useful in mathematics (alongside probability theory, statistics, linear algebra, etc.) that shows up fairly frequently in our work, but I don’t think of our work as "focused on formal logic." I don't think we'll "move away from formal logic" at a particular time; rather, we'll just use whichever mathematical tools look useful for the problems at hand. That will change as the problems change :-)

Comment by so8res on I am Nate Soares, AMA! · 2015-06-11T22:47:51.830Z · score: 6 (6 votes) · EA · GW

There's a big spectrum, there. Some people think that no matter what the AI does that's fine because it's our progeny (even if it turns as much matter as it can into a giant computer so it can find better YouTube recommendations). Other people think that you can't actually build a superintelligent paperclip maximizer (because maximizing paperclips would be stupid, and we're assuming that it's intelligent). Other people think that yeah, you don't get good behavior by default, but AI is hundreds and hundreds of years off, so we don't need to start worrying now. Other people think that AI alignment is a pressing concern now but that improving our theoretical understanding of what we're trying to do isn't the missing puzzle piece. I interface with each of these different types of people in very different ways.

To actually answer your question, though, the default interface is "publish papers, attend conferences," with a healthy dose of "talk to people in person when they're in town" mixed in :-)

Comment by so8res on I am Nate Soares, AMA! · 2015-06-11T22:41:36.349Z · score: 6 (6 votes) · EA · GW
  1. (a) grow the research team, (b) engage more with mainstream academia. I'd also like to spend some time experimenting to figure out how to structure the research team so as to make it more effective (we have a lot of flexibility here that mainstream academic institutes don't have). Once we have the first team growing steadily and running smoothly, it's not entirely clear whether the next step will be (c.1) grow it faster or (c.2) spin up a second team inside MIRI taking a different approach to AI alignment. I'll punt that question to future-Nate.
  2. So first of all, I'm not convinced that there's less of a role for supporters. If we had just ten people earning-to-give at the (amazing!) level of Ethan Dickinson, Jesse Liptrap, Mike Blume, or Alexei Andreev (note: Alexei recently stopped earning-to-give in order to found a startup), that would bring in as much money per year as the Thiel Foundation. (I think people often vastly overestimate how many people are earning-to-give to MIRI, and underestimate how useful it is: the small donors taken together make a pretty big difference!)

Furthermore, if we successfully execute on (a) above, then we're going to be burning through money quite a bit faster than before. An FLI grant (if we get one) will certainly help, but I expect it's going to be a little while before MIRI can support itself on large donations & grants alone.

As for how I plan to keep supporters engaged & donating, I don't expect it will be that much of a problem: I think that many of our donors are excited to see us publish peer-reviewed papers, attend conferences, and engage in the ongoing global conversation. It's hard for me to say for sure, but it seems quite likely that the last year has been much more exciting for MIRI donors than the previous few years, even though there was no Singularity Summit and most of our output was math.

Comment by so8res on I am Nate Soares, AMA! · 2015-06-11T22:34:26.047Z · score: 7 (7 votes) · EA · GW

(1) I don't want to put words in their mouths. I'm guessing that most of us have fairly broad priors over what may happen, though. The future's hard to predict.

(2) Depends what you mean by "Friendly AI research." Does AI boxing count? Does improving the transparency of ML algorithms count? Once the FLI grants start going through, there will be lots of people doing long-term AI safety research that may well be useful, so if you count that as FAI research, then the answer is "there will be soon." But if by "FAI research" you mean "working towards a theoretical understanding of highly reliable advanced agents," then the answer is "not to my knowledge, no."

Comment by so8res on I am Nate Soares, AMA! · 2015-06-11T22:32:46.640Z · score: 5 (7 votes) · EA · GW

(1) Things Executive!Nate will do differently from Researcher!Nate? Or things Nate!MIRI will do differently from Luke!MIRI? For the former, I'll be thinking lots more about global coordination & engaging with interested academics etc, and lots less about specific math problems. For the latter, the biggest shift is probably going to be something like "more engagement with the academic mainstream," although it's a bit hard to say: Luke probably would have pushed in that direction too, after growing the research team a bit. (I have a lot of opportunities available to me that weren't available to Luke at this time last year.)

(2) The old SIAI definitely made some obvious mistakes; see e.g. Holden Karnofsky’s 2012 critique. Luke tried to transfer a number of the lessons learned to me, but it remains to be seen whether I actually learned them :-) The concrete list includes things like (a) constantly drive to systematize, automate, and outsource the busywork; (b) always attack the biggest constraint (by contrast, most people seem to have a default mode of "try and do everything that meets a certain importance level"); (c) put less emphasis on explicit models that you've built yourself an more emphasis on advice from others who have succeeded in doing something similar to what you're trying to do.

(3) MIRI played a pretty big role in getting long-term AI alignment issues onto the world stage. There are lots and lots of things I've learned from that particular success. Perhaps the biggest is "don't disregard intellectual capital."

Comment by so8res on I am Nate Soares, AMA! · 2015-06-11T22:29:27.449Z · score: 5 (5 votes) · EA · GW

1) Huh, that hasn't been my experience. We have a number of potential donors who ring us up and ask who in AI alignment needs money the most at the moment. (In fact, last year, we directed a number of donors to FHI, who had much more of a funding gap than MIRI did at that time.)

2) If MIRI disappeared and everything else was held constant, then I'd be pretty concerned about the lack of people focused on the object level problems. (All talk more about why I think this is so important in a little bit, I'm pretty sure at least one other person asks that question more directly.) There'd still be a few people working on the object level problems (Stuart Russell, Stuart Armstrong), but I'd want lots more. In fact, that statement is also true in the actual world! We only have three people on the research team right now, remember, with a fourth joining in August.

In other words, if you were to find yourself in a world like this one except without a MIRI, then I would strongly suggest building something like a MIRI :-)

Comment by so8res on I am Nate Soares, AMA! · 2015-06-11T22:25:11.739Z · score: 6 (6 votes) · EA · GW

(1) Not great. (2) Not great.

(To be clear, right now, MIRI is not attempting to build an AGI. Rather, we're working towards a better theoretical understanding of the problem.)

Comment by so8res on I am Nate Soares, AMA! · 2015-06-11T22:23:11.430Z · score: 7 (7 votes) · EA · GW

Yep :-)

The official mission statement is just "has a positive impact." I'll encourage people to also use phrasing that's more inclusive to other sentients in future papers/communications.

Comment by so8res on I am Nate Soares, AMA! · 2015-06-11T22:19:44.544Z · score: 1 (1 votes) · EA · GW

I’m not sure how to interpret this question: are you asking how much money I'd like to see dumped on other people? I’d like to see lots of money dumped on lots of other people, and for now I’m going to delegate to the GiveWell, Open Philanthropy Project, and GoodVentures folks to figure out who and how much :-)

I am Nate Soares, AMA!

2015-06-10T15:47:38.362Z · score: 18 (20 votes)

The Value of a Life

2015-02-17T17:33:56.374Z · score: 7 (7 votes)

On Caring

2014-10-07T05:12:16.238Z · score: 18 (18 votes)