## Posts

A case for AGI safety research far in advance 2021-03-26T12:59:36.244Z
[U.S. specific] PPP: free money for self-employed & orgs (time-sensitive) 2021-01-09T19:39:14.250Z

Comment by steve2152 on Why aren't you freaking out about OpenAI? At what point would you start? · 2021-10-14T00:47:33.827Z · EA · GW

Vicarious and Numenta are both explicitly trying to build AGI, and neither does any safety/alignment  research whatsoever. I don't think this fact is particularly relevant to OpenAI, but I do think it's an important fact in its own right, and I'm always looking for excuses to bring it up.  :-P

Anyone who wants to talk about Vicarious or Numenta in the context of AGI safety/alignment, please DM or email me.  :-)

Comment by steve2152 on Why does (any particular) AI safety work reduce s-risks more than it increases them? · 2021-10-07T20:34:10.631Z · EA · GW

I don't really distinguish between effects by order*

I agree that direct and indirect effects of an action are fundamentally equally important (in this kind of outcome-focused context) and I hadn't intended to imply otherwise.

Comment by steve2152 on Why does (any particular) AI safety work reduce s-risks more than it increases them? · 2021-10-07T14:41:08.742Z · EA · GW

Hmm, it seems to me (and you can correct me) that we should be able to agree that there are SOME technical AGI safety research publications that are positive under some plausible beliefs/values and harmless under all plausible beliefs/values, and then we don't have to talk about cluelessness and tradeoffs, we can just publish them.

And we both agree that there are OTHER technical AGI safety research publications that are positive under some plausible beliefs/values and negative under others. And then we should talk about your portfolios etc. Or more simply, on a case-by-case basis, we can go looking for narrowly-tailored approaches to modifying the publication in order to remove the downside risks while maintaining the upside.

I feel like we're arguing past each other: I keep saying the first category exists, and you keep saying the second category exists. We should just agree that both categories exist! :-)

Perhaps the more substantive disagreement is what fraction of the work is in which category. I see most but not all ongoing technical work as being in the first category, and I think you see almost all ongoing technical work as being in the second category. (I think you agreed that "publishing an analysis about what happens if a cosmic ray flips a bit" goes in the first category.)

(Luke says "AI-related" but my impression is that he mostly works on AGI governance not technical, and the link is definitely about governance not technical. I would not be at all surprised if proposed governance-related projects were much more heavily weighted towards the second category, and am only saying that technical safety research is mostly first-category.)

For example, if you didn't really care about s-risks, then publishing a useful considerations for those who are concerned about s-risks might take attention away from your own priorities, or it might increase cooperation, and the default position to me should be deep uncertainty/cluelessness here, not that it's good in expectation or bad in expectation or 0 in expectation.

This points to another (possible?) disagreement. I think maybe you have the attitude where (to caricature somewhat) if there's any downside risk whatsoever, no matter how minor or far-fetched, you immediately jump to "I'm clueless!". Whereas I'm much more willing to say: OK, I mean, if you do anything at all there's a "downside risk" in a sense, just because life is uncertain, who knows what will happen, but that's not a good reason to let just sit on the sidelines and let nature take its course and hope for the best. If I have a project whose first-order effect is a clear and specific and strong upside opportunity, I don't want to throw that project out unless there's a comparably clear and specific and strong downside risk. (And of course we are obligated to try hard to brainstorm what such a risk might be.)  Like if a firefighter is trying to put out a fire, and they aim their hose at the burning interior wall, they don't stop and think, "Well I don't know what will happen if the wall gets wet, anything could happen, so I'll just not pour water on the fire, y'know, don't want to mess things up."

The "cluelessness" intuition gets its force from having a strong and compelling upside story weighed against a strong and compelling downside story, I think.

If the first-order effect of a project is "directly mitigating an important known s-risk", and the second-order effects of the same project are "I dunno, it's a complicated world, anything could happen", then I say we should absolutely do that project.

Comment by steve2152 on Why does (any particular) AI safety work reduce s-risks more than it increases them? · 2021-10-07T02:55:21.994Z · EA · GW

In practice, we can't really know with certainty that we're making AI safer, and without strong evidence/feedback, our judgements of tradeoffs may be prone to fairly arbitrary subjective judgements, motivated reasoning and selection effects.

This strikes me as too pessimistic. Suppose I bring a complicated new board game to a party. Two equally-skilled opposing teams each get a copy of the rulebook to study for an hour before the game starts. Team A spends the whole hour poring over the rulebook and doing scenario planning exercises. Team B immediately throws the rulebook in the trash and spends the hour watching TV.

Neither team has "strong evidence/feedback"—they haven't started playing yet. Team A could think they have good strategy ideas but in fact they are engaging in arbitrary subjective judgments and motivated reasoning. In fact, their strategy ideas, which seemed good on paper, could in fact turn out to be counterproductive!

Still, I would put my money on Team A beating Team B. Because Team A is trying. Their planning abilities don't have to be all that good to be strictly better (in expectation) than "not doing any planning whatsoever, we'll just wing it". That's a low bar to overcome!

So by the same token, it seems to me that vast swathes of AGI safety research easily surpasses the (low) bar of doing better in expectation than the alternative of "Let's just not think about it in advance, we'll wing it".

For example, compare (1) a researcher spends some time thinking about what happens if a cosmic ray flips a bit (or a programmer makes a sign error, like in the famous GPT-2 incident), versus (2) nobody spends any time thinking about that. (1) is clearly better, right? We can always be concerned that the person won't do a great job, or that it will be counterproductive because they'll happen across very dangerous information and then publish it, etc. But still, the expected value here is  clearly positive, right?

You also bring up the idea that (IIUC) there may be objectively good safety ideas but they might not actually get implemented because there won't be a "strong and justified consensus" to do them. But again, the alternative is "nobody comes up with those objectively good safety ideas in the first place". That's even worse, right? (FWIW I consider "come up with crisp and rigorous and legible arguments for true facts about AGI safety" to be a major goal of AGI safety research.)

Anyway, I'm objecting to undirected general feelings of "gahhhh we'll never know if we're helping at all", etc. I think there's just a lot of stuff in the AGI safety research field which is unambiguously good in expectation, where we don't have to feel that way. What I don't object to—and indeed what I strongly endorse—is taking a more directed approach and say "For AGI safety research project #732, what are the downside risks of this research, and how do they compare to the upsides?"

So that brings us to "ambitious value alignment". I agree that an ambitiously-aligned AGI comes with a couple potential sources of s-risk that other types of AGI wouldn't have, specifically via (1) sign flip errors, and (2) threats from other AGIs. (Although I think (1) is less obviously a problem than it sounds, at least in the architectures I think about.) On the other hand, (A) I'm not sure anyone is really working on ambitious alignment these days … at least Rohin Shah & Paul Christiano have stated that narrow (task-limited) alignment is a better thing to shoot for (and last anyone heard MIRI was shooting for task-limited AGIs too); (B) my sense is that current value-learning work (e.g. at CHAI) is more about gaining conceptual understanding then creating practical algorithms / approaches that will scale to AGI. That said, I'm far from an expert on the current value learning literature; frankly I'm often confused by what such researchers are imagining for their longer-term game-plan.

BTW I put a note on my top comment that I have a COI. If you didn't notice. :)

Comment by steve2152 on Why does (any particular) AI safety work reduce s-risks more than it increases them? · 2021-10-06T17:57:23.503Z · EA · GW

Hmm, just a guess, but …

• Maybe you're conceiving of the field as "AI alignment", pursuing the goal "figure out how to bring an AI's goals as close as possible to a human's (or humanity's) goals, in their full richness" (call it "ambitious value alignment")
• Whereas I'm conceiving the field as "AGI safety", with the goal "reduce the risk of catastrophic accidents involving AGIs".

"AGI safety research" (as I think of it) includes not just how you would do ambitious value alignment, but also whether you should do ambitious value alignment. In fact, AGI safety research may eventually result in a strong recommendation against doing ambitious value alignment, because we find that it's dangerously prone to backfiring, and/or that some alternative approach is clearly superior (e.g. CAIS, or microscope AI, or act-based corrigibility or myopia or who knows what). We just don't know yet. We have to do the research.

"AGI safety research" (as I think of it) also includes lots of other activities like analysis and mitigation of possible failure modes (e.g. asking what would happen if a cosmic ray flips a bit in the computer), and developing pre-deployment testing protocols, etc. etc.

Does that help? Sorry if I'm missing the mark here.

Comment by steve2152 on Why does (any particular) AI safety work reduce s-risks more than it increases them? · 2021-10-04T17:42:01.521Z · EA · GW

Thanks!

(Incidentally, I don't claim to have an absolutely watertight argument here that AI alignment research couldn't possibly be bad for s-risks, just that I think the net expected impact on s-risks is to reduce them.)

If s-risks were increased by AI safety work near (C), why wouldn't they also be increased near (A), for the same reasons?

I think suffering minds are a pretty specific thing, in the space of "all possible configurations of matter". So optimizing for something random (paperclips, or "I want my field-of-view to be all white", etc.) would almost definitely lead to zero suffering (and zero pleasure). (Unless the AGI itself has suffering or pleasure.) However, there's a sense in which suffering minds are "close" to the kinds of things that humans might want an AGI to want to do. Like, you can imagine how if a cosmic ray flips a bit, "minimize suffering" could turn into "maximize suffering". Or at any rate, humans will try (and I expect succeed even without philanthropic effort) to make AGIs with a prominent human-like notion of "suffering", so that it's on the table as a possible AGI goal.

In other words, imagine you're throwing a dart at a dartboard.

• The bullseye has very positive point value.
• That's representing the fact that basically no human wants astronomical suffering, and basically everyone wants peace and prosperity etc.
• On other parts of the dartboard, there are some areas with very negative point value.
• That's representing the fact that if programmers make an AGI that desires something vaguely resembling what they want it to desire, that could be an s-risk.
• If you miss the dartboard entirely, you get zero points.
• That's representing the fact that a paperclip-maximizing AI would presumably not care to have any consciousness in the universe (except possibly its own, if applicable).

So I read your original post as saying "If the default is for us to miss the dartboard entirely, it could be s-risk-counterproductive to improve our aim enough that we can hit the dartboard", and my response to that was "I don't think that's relevant, I think it will be really easy to not miss the dartboard entirely, and this will happen "by default". And in that case, better aim would be good, because it brings us closer to the bullseye."

Comment by steve2152 on Why does (any particular) AI safety work reduce s-risks more than it increases them? · 2021-10-04T00:31:00.535Z · EA · GW

Sorry I'm not quite sure what you mean. If we put things on a number line with (A)=1, (B)=2, (C)=3, are you disagreeing with my claim "there is very little probability weight in the interval ", or with my claim "in the interval , moving down towards 1 probably reduces s-risk", or with both, or something else?

Comment by steve2152 on Why does (any particular) AI safety work reduce s-risks more than it increases them? · 2021-10-03T20:18:29.157Z · EA · GW

[note that I have a COI here]

Hmm, I guess I've been thinking that the choice is between (A) "the AI is trying to do what a human wants it to try to do" vs (B) "the AI is trying to do something kinda weirdly and vaguely related to what a human wants it to try to do". I don't think (C) "the AI is trying to do something totally random" is really on the table as a likely option, even if the AGI safety/alignment community didn't exist at all.

That's because everybody wants the AI to do the thing they want it to do, not just long-term AGI risk people. And I think there are really obvious things that anyone would immediately think to try, and these really obvious techniques would be good enough to get us from (C) to (B) but not good enough to get us to (A).

[Warning: This claim is somewhat specific to a particular type of AGI architecture that I work on and consider most likely—see e.g. here. Other people have different types of AGIs in mind and would disagree. In particular, in the "deceptive mesa-optimizer" failure mode (which relates to a different AGI architecture than mine) we would plausibly expect failures to have random goals like "I want my field-of-view to be all white", even after reasonable effort to avoid that. So maybe people working in other areas would have different answers, I dunno.]

I agree that it's at least superficially plausible that (C) might be better than (B) from an s-risk perspective. But if (C) is off the table and the choice is between (A) and (B), I think (A) is preferable for both s-risks and x-risks.

Comment by steve2152 on evelynciara's Shortform · 2021-09-27T11:53:51.509Z · EA · GW

The main argument of Stuart Russell's book focuses on reward modeling as a way to align AI systems with human preferences.

Hmm, I remember him talking more about IRL and CIRL and less about reward modeling. But it's been a little while since I read it, could be wrong.

If it's really difficult to write a reward function for a given task Y, then it seems unlikely that AI developers would deploy a system that does it in an unaligned way according to a misspecified reward function. Instead, reward modeling makes it feasible to design an AI system to do the task at all.

Maybe there's an analogy where someone would say "If it's really difficult to prevent accidental release of pathogens from your lab, then it seems unlikely that bio researchers would do research on pathogens whose accidental release would be catastrophic". Unfortunately there's a horrifying many-decades-long track record of accidental release of pathogens from even BSL-4 labs, and it's not like this kind of research has stopped. Instead it's like, the bad thing doesn't happen every time, and/or things seem to be working for a while before the bad thing happens, and that's good enough for the bio researchers to keep trying.

So as I talk about here, I think there are going to be a lot of proposals to modify an AI to be safe that do not in fact work, but do seem ahead-of-time like they might work, and which do in fact work for a while as training progresses. I mean, when x-risk-naysayers like Yann LeCun or Jeff Hawkins are asked how to avoid out-of-control AGIs, they can spout off a list of like 5-10 ideas that would not in fact work, but sound like they would. These are smart people and a lot of other smart people believe them too. Also, even something as dumb as "maximize the amount of money in my bank account" would plausibly work for a while and do superhumanly-helpful things for the programmers, before it starts doing superhumanly-bad things for the programmers.

Even with reward modeling, though, AI systems are still going to have similar drives due to instrumental convergence: self-preservation, goal preservation, resource acquisition, etc., even if they have goals that were well specified by their developers. Although maybe corrigibility and not doing bad things can be built into the systems' goals using reward modeling.

Yup, if you don't get corrigibility then you failed.

Comment by steve2152 on [Creative Writing Contest] The Reset Button · 2021-09-20T01:18:21.853Z · EA · GW

I really liked this!!!

Since you asked for feedback, here's a little suggestion, take it or leave it: I found a couple things at the end slightly out-of-place, in particular "If you choose to tackle the problem of nuclear security, what angle can you attack the problem from that will give you the most fulfillment?" and "Do any problems present even bigger risks than nuclear war?"

Immediately after such an experience, I think the narrator would not be thinking about option of not bothering to work on nuclear security because other causes are more important, nor thinking about their own fulfillment. If other causes came to mind, I imagine it would be along the lines of "if I somehow manage to stop the nuclear war, what other potential catastrophes are waiting in the wings, ready to strike anytime in the months and years after that—and this time with no reset button?"

Or if you want it to fit better as written now, then shortly after the narrator snaps back to age 18 the text could say something along the lines of "You know about chaos theory and the butterfly effect; this will be a new re-roll of history, and there might not be a nuclear war this time around. Maybe last time was a fluke?" Then that might remove some of the single-minded urgency that I would otherwise expect the narrator to feel, and thus it would become a bit more plausible that the narrator might work on pandemics or whatever.

(Maybe that "new re-roll of history" idea is what you had in mind? Whereas I was imagining the Groundhog Day / Edge of Tomorrow / Terminator trope where the narrator knows 100% for sure that there will be a nuclear war on this specific hour of this specific day, if the narrator doesn't heroically stop it.)

(I'm not a writer, don't trust my judgment.)

Comment by steve2152 on A mesa-optimization perspective on AI valence and moral patienthood · 2021-09-16T18:08:18.128Z · EA · GW

Hmm, yeah, I guess you're right about that.

Comment by steve2152 on A mesa-optimization perspective on AI valence and moral patienthood · 2021-09-15T13:49:37.395Z · EA · GW

Oh, you said "evolution-type optimization", so I figured you were thinking of the case where the inner/outer distinction is clear cut. If you don't think the inner/outer distinction will be clear cut, then I'd question whether you actually disagree with the post :) See the section defining what I'm arguing against, in particular the "inner as AGI" discussion.

Comment by steve2152 on A mesa-optimization perspective on AI valence and moral patienthood · 2021-09-14T15:40:49.019Z · EA · GW

Nah, I'm pretty sure the difference there is "Steve thinks that Jacob is way overestimating the difficulty of humans building AGI-capable learning algorithms by writing source code", rather than "Steve thinks that Jacob is way underestimating the difficulty of computationally recapitulating the process of human brain evolution".

For example, for the situation that you're talking about (I called it "Case 2" in my post) I wrote "It seems highly implausible that the programmers would just sit around for months and years and decades on end, waiting patiently for the outer algorithm to edit the inner algorithm, one excruciatingly-slow step at a time. I think the programmers would inspect the results of each episode, generate hypotheses for how to improve the algorithm, run small tests, etc." If the programmers did just sit around for years not looking at the intermediate training results, yes I expect the project would still succeed sooner or later. I just very strongly expect that they wouldn't sit around doing nothing.

Comment by steve2152 on A mesa-optimization perspective on AI valence and moral patienthood · 2021-09-14T00:07:49.936Z · EA · GW

AlphaGo has a human-created optimizer, namely MCTS. Normally people don't use the term "mesa-optimizer" for human-created optimizers.

Then maybe you'll say "OK there's a human-created search-based consequentialist planner, but the inner loop of that planner is a trained ResNet, and how do you know that there isn't also a search-based consequentialist planner inside each single run through the ResNet?"

Admittedly, I can't prove that there isn't. I suspect that there isn't, because there seems to be no incentive for that (there's already a search-based consequentialist planner!), and also because I don't think ResNets are up to such a complicated task.

Comment by steve2152 on AI timelines and theoretical understanding of deep learning · 2021-09-13T14:14:19.195Z · EA · GW

I find most justifications and arguments made in favor of a timeline of less than 50 years to be rather unconvincing.

If we don't have convincing evidence in favor of a timeline <50 years, and we also don't have convincing evidence in favor of a timeline ≥50 years, then we just have to say that this is a question on which we don't have convincing evidence of anything in particular. But we still have to take whatever evidence we have and make the best decisions we can. ¯\_(ツ)_/¯

(You don't say this explicitly but your wording kinda implies that ≥50 years is the default, and we need convincing evidence to change our mind away from that default. If so, I would ask why we should take ≥50 years to be the default. Or sorry if I'm putting words in your mouth.)

I am simply not able to understand why we are significantly closer to AGI today than we were in 1950s

Lots of ingredients go into AGI, including (1) algorithms, (2) lots of inexpensive chips that can do lots of calculations per second, (3) technology for fast communication between these chips, (4) infrastructure for managing large jobs on compute clusters, (5) frameworks and expertise in parallelizing algorithms, (6) general willingness to spend millions of dollars and roll custom ASICs to run a learning algorithm, (7) coding and debugging tools and optimizing compilers, etc. Even if you believe that you've made no progress whatsoever on algorithms since the 1950s, we've made massive progress in the other categories. I think that alone puts us "significantly closer to AGI today than we were in the 1950s": once we get the algorithms, at least everything else will be ready to go, and that wasn't true in the 1950s, right?

But I would also strongly disagree with the idea that we've made no progress whatsoever on algorithms since the 1950s. Even if you think that GPT-3 and AlphaGo have absolutely nothing whatsoever to do with AGI algorithms (which strikes me as an implausibly strong statement, although I would endorse much weaker versions of that statement), that's far from the only strand of research in AI, let alone neuroscience. For example, there's a (IMO plausible) argument that PGMs and causal diagrams will be more important to AGI than deep neural networks are. But that would still imply that we've learned AGI-relevant things about algorithms since the 1950s. Or as another example, there's a (IMO misleading) argument that the brain is horrifically complicated and we still have centuries of work ahead of us in understanding how it works. But even people who strongly endorse that claim wouldn't also say that we've made "no progress whatsoever" in understanding brain algorithms since the 1950s.

Sorry if I'm misunderstanding.

isn't there an infinite degree of freedom associated with a continuous function?

I'm a bit confused by this; are you saying that the only possible AGI algorithm is "the exact algorithm that the human brain runs"? The brain is wired up by a finite number of genes, right?

Comment by steve2152 on A mesa-optimization perspective on AI valence and moral patienthood · 2021-09-13T01:35:33.828Z · EA · GW

most contemporary progress on AI happens by running base-optimizers which could support mesa-optimization

GPT-3 is of that form, but AlphaGo/MuZero isn't (I would argue).

I'm not sure how to settle whether your statement about "most contemporary progress" is right or wrong. I guess we could count how many papers use model-free RL vs model-based RL, or something? Well anyway, given that I haven't done anything like that, I wouldn't feel comfortable making any confident statement here. Of course you may know more than me! :-)

If we forget about "contemporary progress" and focus on "path to AGI", I have a post arguing against what (I think) you're implying at Against evolution as an analogy for how humans will create AGI, for what it's worth.

Ideally we'd want a method for identifying valence which is more mechanistic that mine. In the sense that it lets you identify valence in a system just by looking inside the system without looking at how it was made.

Yeah I dunno, I have some general thoughts about what valence looks like in the vertebrate brain (e.g. this is related, and this) but I'm still fuzzy in places and am not ready to offer any nice buttoned-up theory. "Valence in arbitrary algorithms" is obviously even harder by far.  :-)

Comment by steve2152 on AI timelines and theoretical understanding of deep learning · 2021-09-12T19:59:26.752Z · EA · GW

I do agree that there are many good reasons to think that AI practitioners are not AI forecasting experts, such as the fact that they're, um, obviously not—they generally have no training in it and have spent almost no time on it, and indeed they give very different answers to seemingly-equivalent timelines questions phrased differently. This is a reason to discount the timelines that come from AI practitioner surveys, in favor of whatever other forecasting methods / heuristics you can come up with. It's not per se a reason to think "definitely no AGI in the next 50 years".

Well, maybe I should just ask: What probability would you assign to the statement "50 years from today, we will have AGI"? A couple examples:

• If you think the probability is <90%, and your intention here is to argue against people who think it should be >90%, well I would join you in arguing against those people too. This kind of technological forecasting is very hard and we should all be pretty humble & uncertain here. (Incidentally, if this is who you're arguing against, I bet that you're arguing against fewer people than you imagine.)
• If you think the probability is <10%, and your intention here is to argue against people who think it should be >10%, then that's quite a different matter, and I would strongly disagree with you, and I would very curious how you came to be so confident. I mean, a lot can happen in 50 years, right? What's the argument?
Comment by steve2152 on A mesa-optimization perspective on AI valence and moral patienthood · 2021-09-10T21:39:17.159Z · EA · GW

Let's say a human writes code more-or-less equivalent to the evolved "code" in the human genome. Presumably the resulting human-brain-like algorithm would have valence, right? But it's not a mesa-optimizer, it's just an optimizer. Unless you want to say that the human programmers are the base optimizer? But if you say that, well, every optimization algorithm known to humanity would become a "mesa-optimizer", since they tend to be implemented by human programmers, right? So that would entail the term "mesa-optimizer" kinda losing all meaning, I think. Sorry if I'm misunderstanding.

Comment by steve2152 on It takes 5 layers and 1000 artificial neurons to simulate a single biological neuron [Link] · 2021-09-08T13:01:02.939Z · EA · GW

Addendum: In the other direction, one could point out that the authors were searching for "an approximation of an approximation of a neuron", not "an approximation of a neuron". (insight stolen from here.) Their ground truth was a fancier neuron model, not a real neuron. Even the fancier model is a simplification of real life. For example, if I recall correctly, neurons have been observed to do funny things like store state variables via changes in gene expression. Even the fancier model wouldn't capture that. As in my parent comment, I think these kinds of things are highly relevant to simulating worms, and not terribly relevant to reverse-engineering the algorithms underlying human intelligence.

Comment by steve2152 on It takes 5 layers and 1000 artificial neurons to simulate a single biological neuron [Link] · 2021-09-08T01:18:55.266Z · EA · GW

It's possible much of that supposed additional complexity isn't useful

Yup! That's where I'd put my money.

It's a forgone conclusion that a real-world system has tons of complexity that is not related to the useful functions that the system performs. Consider, for example, the silicon transistors that comprise digital chips—"the useful function that they perform" is a little story involving words like "ON" and "OFF", but "the real-world transistor" needs three equations involving 22 parameters, to a first approximation!

By the same token, my favorite paper on the algorithmic role of dendritic computation has them basically implementing a simple set of ANDs and ORs on incoming signals. It's quite likely that dendrites do other things too besides what's in that one paper, but I think that example is suggestive.

Caveat: I'm mainly thinking of the complexity of understanding the neuronal algorithms involved in "human intelligence" (e.g. common sense, science, language, etc.), which (I claim) are mainly in the cortex and thalamus. I think those algorithms need to be built out of really specific and legible operations, and such operations are unlikely to line up with the full complexity of the input-output behavior of neurons. I think the claim "the useful function that a neuron performs is simpler than the neuron itself" is always true, but it's very strongly true for "human intelligence" related algorithms, whereas it's less true in other contexts, including probably some brainstem circuits, and the neurons in microscopic worms. It seems to me that microscopic worms just don't have enough neurons to not squeeze out useful functionality from every squiggle in their neurons' input-output relations. And moreover here we're not talking about massive intricate beautifully-orchestrated learning algorithms, but rather things like "do this behavior a bit less often when the temperature is low" etc. See my post Building brain-inspired AGI is infinitely easier than understanding the brain for more discussion kinda related to this.

Comment by steve2152 on How to get more academics enthusiastic about doing AI Safety research? · 2021-09-06T19:04:28.641Z · EA · GW

See here, the first post is a video of a research meeting where he talks dismissively about Stuart Russell's argument, and then the ensuing forum discussion features a lot of posts by me trying to sell everyone on AI risk :-P

(Other context here.)

Comment by steve2152 on How to get more academics enthusiastic about doing AI Safety research? · 2021-09-06T01:43:00.166Z · EA · GW
• There was a 2020 documentary We Need To Talk About AI. All-star lineup of interviewees! Stuart Russell, Roman Yampolskiy, Max Tegmark, Sam Harris, Jurgen Schmidhuber, …. I've seen it, but it appears to be pretty obscure, AFAICT.
• I happened to watch the 2020 Melissa McCarthy film Superintelligence yesterday. It's umm, not what you're looking for. The superintelligent AI's story arc was a mix of 20% arguably-plausible things that experts say about superintelligent AGI, and 80% deliberately absurd things for comedy. I doubt it made anyone in the audience think very hard about anything in particular. (I did like it as a romantic comedy :-P )
• There's some potential tension between "things that make for a good movie" and "realistic", I think.
Comment by steve2152 on How to get more academics enthusiastic about doing AI Safety research? · 2021-09-06T01:20:06.749Z · EA · GW

I saw Jeff Hawkins mention (in some online video) that someone had sent Human Compatible to him unsolicited but he didn't say who. And then (separately) a bit later the mystery was resolved: I saw some EA-affiliated person or institution mention that they had sent Human Compatible to a bunch of AI researchers. But I can't remember where I saw that, or who it was.   :-(

Comment by steve2152 on What are the top priorities in a slow-takeoff, multipolar world? · 2021-08-27T17:36:26.444Z · EA · GW

No I don't think we've met! In 2016 I was a professional physicist living in Boston. I'm not sure if I would have even known what "EA" stood for in 2016. :-)

It also seems like the technical problem does get easier in expectation if you have more than one shot. By contrast, I claim, many of the Moloch-style problems get harder.

I agree. But maybe I would have said "less hard" rather than "easier" to better convey a certain mood :-P

It does seem like within technical AI safety research the best work seems to shift away from Agent Foundations type of work and towards neural-nets-specific work.

I'm not sure what your model is here.

Maybe a useful framing is "alignment tax": if it's possible to make an AI that can do some task X unsafely with a certain amount of time/money/testing/research/compute/whatever, then how much extra time/money/etc. would it take to make an AI that can do task X safely? That's the alignment tax.

The goal is for the alignment tax to be as close as possible to 0%. (It's never going to be exactly 0%.)

In the fast-takeoff unipolar case, we want a low alignment tax because some organizations will be paying the alignment tax and others won't, and we want one of the former to win the race, not one of the latter.

In the slow-takeoff multipolar case, we want a low alignment tax because we're asking organizations to make tradeoffs for safety, and if that's a very big ask, we're less likely to succeed. If the alignment tax is 1%, we might actually succeed. Remember, that there are many reasons that organizations are incentivized to make safe AIs, not least because they want the AIs to stay under their control and do the things they want them to do, not to mention legal risks, reputation risks, employees who care about their children, etc. etc. So if all we're asking is for them to spend 1% more training time, maybe they all will. If instead we're asking them all to spend 100× more compute plus an extra 3 years of pre-deployment test protocols, well, that's much less promising.

So either way, we want a low alignment tax.

OK, now let's get back to what you wrote.

I think maybe your model is:

"If Agent Foundations research pans out at all, it would pan out by discovering a high-alignment-tax method of making AGI"

(You can correct me if I'm misunderstanding.)

If we accept that premise, then I can see where you're coming from. This would be almost definitely useless in a multipolar slow-takeoff world, and merely "probably useless" in a unipolar fast-takeoff world. (In the latter case, there's at least a prayer of a chance that the safe actors will be so far ahead of the unsafe actors that the former can pay the tax and win the race anyway.)

But I'm not sure that I believe the premise. Or at least I'm pretty unsure. I am not myself an Agent Foundations researcher, but I don't imagine that Agent Foundations researchers would agree with the premise that high-alignment-tax AGI is the best that they're hoping for in their research.

Oh, hmmm, the other possibility is that you're mentally lumping together "multipolar slow-takeoff AGI" with "prosaic AGI" and with "short timelines". These are indeed often lumped together, even if they're different things. Anyway, I would certainly agree that both "prosaic AGI" and "short timelines" would make Agent Foundations research less promising compared to neural-net-specific work.

Comment by steve2152 on What are the top priorities in a slow-takeoff, multipolar world? · 2021-08-25T14:47:14.965Z · EA · GW

I think that "AI alignment research right now" is a top priority in unipolar fast-takeoff worlds, and it's also a top priority in multipolar slow-takeoff worlds. (It's certainly not the only thing to do—e.g. there's multipolar-specific work to do, like the links in Jonas's answer on this page, or here etc.)

(COI note: I myself am doing "AI alignment research right now" :-P )

First of all, in the big picture, right now humanity is simultaneously pursuing many quite different research programs towards AGI (I listed a dozen or so here (see Appendix)). If more than one of them is viable (and I think that's likely), then in a perfect world we would figure out which of them has the best hope of leading to Safe And Beneficial AGI, and differentially accelerate that one (and/or differentially decelerate the others). This isn't happening today—that's not how most researchers are deciding what AI capabilities research to do, and it's not how most funding sources are deciding what AI capabilities research to fund. Could it happen in the future? Yes, I think so! But only if...

• AI alignment researchers figure out which of these AGI-relevant research programs is more or less promising for safety,
• …and broadly communicate that information to experts, using legible arguments…
• …and do it way in advance of any of those research programs getting anywhere close to AGI

The last one is especially important. If some AI research program has already gotten to the point of super-powerful proto-AGI source code published on GitHub, there's no way you're going to stop people from using and improving it. Whereas if the research program is still very early-stage and theoretical, and needs many decades of intense work and dozens more revolutionary insights to really start getting powerful, then we have a shot at this kind of differential technological development strategy being viable.

(By the same token, maybe it will turn out that there's no way to develop safe AGI, and we want to globally ban AGI development. I think if a ban were possible at all, it would only be possible if we got started when we're still very far from being able to build AGI.)

So for example, if it's possible to build a "prosaic" AGI using deep neural networks, nobody knows whether it would be possible to control and use it safely. There are some kinda-illegible intuitive arguments on both sides. Nobody really knows. People are working on clarifying this question, and I think they're making some progress, and I'm saying that it would be really good if they could figure it out one way or the other ASAP.

Second of all, slow takeoff doesn't necessarily mean that we can just wait and solve the alignment problem later. Sometimes you can have software right in front of you, and it's not doing what you want it to do, but you still don't know how to fix it. The alignment problem could be like that.

One way to think about it is: How slow is slow takeoff, versus how long does it take to solve the alignment problem? We don't know.

Also, how much longer would it take, once somebody develops best practices to solve the alignment problem, for all relevant actors to reach a consensus that following those best practices is a good idea and in their self-interest? That step could add on years, or even decades—as they say, "science progresses one funeral at a time", and standards committees work at a glacial pace, to say nothing of government regulation, to say nothing of global treaties.

Anyway, if "slow takeoff" is 100 years, OK fine, that's slow enough. If "slow takeoff" is ten years, maybe that's slow enough if the alignment problem happens to have an straightforward, costless, highly-legible and intuitive, scalable solution that somebody immediately discovers. Much more likely, I think we would need to be thinking about the alignment problem in advance.

For more detailed discussion, I have my own slow-takeoff AGI doom scenario here. :-P

Comment by steve2152 on What EA projects could grow to become megaprojects, eventually spending $100m per year? · 2021-08-08T12:23:56.272Z · EA · GW (not an expert) My impression is that a perfectly secure OS doesn't buy you much if you use insecure applications on an insecure network etc. Also, if you think about classified work, the productivity tradeoff is massive: you can't use your personal computer while working on the project, you can't use any of your favorite software while working on the project, you can't use an internet-connected computer while working on the project, you can't have your cell phone in your pocket while talking about the project, you can't talk to people about the project over normal phone lines and emails... And then of course viruses get into air-gapped classified networks within hours anyway. :-P Not that we can't or shouldn't buy better security, I'm just slightly skeptical of specifically focusing on building a new low-level foundation rather than doing all the normal stuff really well, like network traffic monitoring, vetting applications and workflows, anti-spearphishing training, etc. etc. Well, I guess you'll say, "we should do both". Sure. I guess I just assume that the other things would rapidly become the weakest link. In terms of low-level security, my old company has a big line of business designing chips themselves to be more secure; they spun out Dover Microsystems to sell that particular technology to commercial (as opposed to military) customers. Just FYI, that's just one thing I happen to be familiar with. Actually I guess it's not that relevant. Comment by steve2152 on Phil Torres' article: "The Dangerous Ideas of 'Longtermism' and 'Existential Risk'" · 2021-08-07T19:08:46.990Z · EA · GW Hmm, I guess I wasn't being very careful. Insofar as "helping future humans" is a different thing than "helping living humans", it means that we could be in a situation where the interventions that are optimal for the former are very-sub-optimal (or even negative-value) for the latter. But it doesn't mean we must be in that situation, and in fact I think we're not. I guess if you think: (1) finding good longtermist interventions is generally hard because predicting the far-future is hard, but (2) "preventing extinction (or AI s-risks) in the next 50 years" is an exception to that rule; (3) that category happens to be very beneficial for people alive today too; (4) it's not like we've exhausted every intervention in that category and we're scraping the bottom of the barrel for other things ... If you believe all those things, then in that case, it's not really surprising if we're in a situation where the tradeoffs are weak-to-nonexistent. Maybe I'm oversimplifying, but something like that I guess? I suspect that if someone had an idea about an intervention that they thought was super great and cost effective for future generations and awful for people alive today, well they would probably post that idea on EA Forum just like anything else, and then people would have a lively debate about it. I mean, maybe there are such things...Just nothing springs to my mind. Comment by steve2152 on Phil Torres' article: "The Dangerous Ideas of 'Longtermism' and 'Existential Risk'" · 2021-08-06T14:05:36.212Z · EA · GW I feel like that guy's got a LOT of chutzpah to not-quite-say-outright-but-very-strongly-suggest that the Effective Altruism movement is a group of people who don't care about the Global South. :-P More seriously, I think we're in a funny situation where maybe there are these tradeoffs in the abstract, but they don't seem to come up in practice. Like in the abstract, the very best longtermist intervention could be terrible for people today. But in practice, I would argue that most if not all current longtermist cause areas (pandemic prevention, AI risk, preventing nuclear war, etc.) are plausibly a very good use of philanthropic effort even if you only care about people alive today (including children). Or, in the abstract, AI risk and malaria are competing for philanthropic funds. But in practice, a lot of the same people seem to care about both, including many of the people that the article (selectively) quotes. …And meanwhile most people in the world care about neither. I mean, there could still be an interesting article about how there are these theoretical tradeoffs between present and future generations. But it's misleading to name names and suggest that those people would gleefully make those tradeoffs, even if it involves torturing people alive today or whatever. Unless, of course, there's actual evidence that they would do that. (The other strong possibility is, if actually faced with those tradeoffs in real life, they would say, "Uh, well, I guess that's my stop, this is where I jump off the longtermist train!!"). Anyway, I found the article extremely misleading and annoying. For example, the author led off with a quote where Jaan Tallinn says directly that climate change might be an existential risk (via a runaway scenario), and then two paragraphs later the author is asking "why does Tallinn think that climate change isn’t an existential risk?" Huh?? The article could have equally well said that Jaan Tallinn believes that climate change is "very plausibly an existential risk", and Jaan Tallinn is the co-founder of an organization that does climate change outreach among other things, and while climate change isn't a principal focus of current longtermist philanthropy, well, it's not like climate change is a principal focus of current cancer research philanthropy either! And anyway it does come up to a reasonable extent, with healthy discussions focusing in particular on whether there are especially tractable and neglected things to do. So anyway, I found the article very misleading. (I agree with Rohin that if people are being intimidated, silenced, or cancelled, then that would be a very bad thing.) Comment by steve2152 on Shallow evaluations of longtermist organizations · 2021-06-28T12:24:39.994Z · EA · GW Just one guy, but I have no idea how I would have gotten into AGI safety if not for LW ... I had a full-time job and young kids and not-obviously-related credentials. But I could just come out of nowhere in 2019 and start writing LW blog posts and comments, and I got lots of great feedback, and everyone was really nice. I'm full-time now, here's my writings, I guess you can decide whether they're any good :-P Comment by steve2152 on Consciousness research as a cause? [asking for advice] · 2021-06-09T15:22:58.351Z · EA · GW I agree that there are both interventions that change qualia reports without much changing (morally important) qualia and interventions that change qualia without much changing qualia reports, and that we should keep both these possibilities in mind when evaluating interventions. Comment by steve2152 on Consciousness research as a cause? [asking for advice] · 2021-05-02T13:07:24.552Z · EA · GW Thanks! I think you're emphasizing how qualia reports are not always exactly corresponding to qualia and can't always be taken at face value, and I'm emphasizing that it's incoherent to say that qualia exist but there's absolutely no causal connection whatsoever going from an experienced qualia to a sincere qualia report. Both of those can be true! The first is like saying "if someone says "I see a rock", we shouldn't immediately conclude that there was a rock in this person's field-of-view. It's a hypothesis we should consider, but not proven." That's totally true. The second is like disputing the claim: "If you describe the complete chain of events leading to someone reporting "I see a rock", nowhere in that chain of events is there ever an actual rock (with photons bouncing off it), not for anyone ever—oh and there are in fact rocks in the world, and when people talk about rocks they're describing them correctly, it's just that they came to have knowledge of rocks through some path that had nothing to do with the existence of actual rocks." That's what I would disagree with. So if you have a complete and correct description of the chain of events that leads someone to say they have qualia, and nowhere in that description is anything that looks just like our intuitive notion of qualia, I think the correct conclusion is "there is nothing in the world that looks just like our intuitive notion of qualia", not "there's a thing in the world that's just like our intuitive notion of qualia, but it's causally disconnected from our talking about it". (I do in fact think "there's nothing in the world that looks just like our intuitive notion of qualia". I think this is an area where our perceptions are not neutrally and accurately conveying what's going on; more like our perception of an optical illusion than our perception of a rock.) Comment by steve2152 on Consciousness research as a cause? [asking for advice] · 2021-05-01T23:26:54.569Z · EA · GW Oh, I think I see. If someone declares that it feels like time is passing slower for them (now that they're enlightened or whatever), I would accept that as a sincere description of some aspect of their experience. And insofar as qualia exist, I would say that their qualia have changed somehow. But it wouldn't even occur to me to conclude that this person's time is now more valuable per second in a utilitarian calculus, in proportion to how much they say their time slowed down, or that the change in their qualia is exactly literally time-stretching. I treat descriptions of subjective experience as a kind of perception, in the same category as someone describing what they're seeing or hearing. If someone sincerely tells me they saw a UFO last night, well that's their lived experience and I respect that, but no they didn't. By the same token, if someone says their experience of time has slowed down, I would accept that something in their consciously-accessible brain has changed, and the way they perceive that change is as they describe, but it wouldn't even cross my mind that the actual change in their brain is similar to that description. As for inter-person utilitarian calculus and utility monsters, beats me, everything about that is confusing to me, and way above my pay grade :-P Comment by steve2152 on Consciousness research as a cause? [asking for advice] · 2021-04-30T21:57:53.816Z · EA · GW Interesting... I guess I would have assumed that, if someone says their subjective experience of time has changed, then their time-related qualia has changed, kinda by definition. If meanwhile their reaction time hasn't changed, well, that's interesting but I'm not sure I care... (I'm not really sure of the definitions here.) Comment by steve2152 on Consciousness research as a cause? [asking for advice] · 2021-04-30T18:27:35.117Z · EA · GW OK, if I understand correctly, the report suggests that qualia may diverge from qualia reports—like, some intervention could change the former without the latter. This just seems really weird to me. Like, how could we possibly know that? Let's say I put on a helmet with a button, and when you press the button, my qualia radically change, but my qualia reports stay the same. Alice points to me and says "his qualia were synchronized with his qualia reports, but pressing the button messed that up". Then Bob points to me and says "his qualia were out-of-sync with his qualia reports, but when you pressed the button, you fixed it". How can we tell who's right? And meanwhile here I am, wearing this helmet, looking at both of them, and saying "Umm, hey Alice & Bob, I'm standing right here, and I'm telling you, I swear, I feel exactly the same. This helmet does nothing whatsoever to my qualia. Trust me! I promise!" And of course Alice & Bob give me a look like I'm a complete moron, and they yell at me in synchrony "...You mean, 'does nothing whatsoever to my qualia reports'!!" How can we decide who's right? Me, Alice, or Bob? Isn't it fundamentally impossible?? If every human's qualia reports are wildly out of sync with their qualia, and always have been for all of history, how could we tell? Sorry if I'm misunderstanding or if this is in the report somewhere. Comment by steve2152 on Getting a feel for changes of karma and controversy in the EA Forum over time · 2021-04-07T15:56:31.471Z · EA · GW For what it's worth, I generally downvote a post only when I think "This post should not have been written in the first place", and relatedly I will often upvote posts I disagree with. If that's typical, then the "controversial" posts you found may be "the most meta-level controversial" rather than "the most object-level controversial", if you know what I mean. That's still interesting though. Comment by steve2152 on What do you make of the doomsday argument? · 2021-03-19T12:36:04.817Z · EA · GW I'm not up on the literature and haven't thought too hard about it, but I'm currently very much inclined to not accept the premise that I should expect myself to be a randomly-chosen person or person-moment in any meaningful sense—as if I started out as a soul hanging out in heaven, then flew down to Earth and landed in a random body, like in that Pixar movie. I think that "I" am the thought processes going on in a particular brain in a particular body at a particular time—the reference class is not "observers" or "observer-moments" or anything like that, I'm in a reference class of one. The idea that "I could have been born a different person" strikes me as just as nonsensical as the idea "I could have been a rock". Sure, I'm happy to think "I could have been born a different person" sometimes—it's a nice intuitive poetic prod to be empathetic and altruistic and grateful for my privileges and all that—but I don't treat it as a literally true statement that can ground philosophical reasoning. Again, I'm open to being convinced, but that's where I'm at right now. Comment by steve2152 on Consciousness research as a cause? [asking for advice] · 2021-03-11T16:05:23.100Z · EA · GW The "meta-problem of consciousness" is "What is the exact chain of events in the brain that leads people to self-report that they're conscious?". The idea is (1) This is not a philosophy question, it's a mundane neuroscience / CogSci question, yet (2) Answering this question would certainly be a big step towards understanding consciousness itself, and moreover (3) This kind of algorithm-level analysis seems to me to be essential for drawing conclusions about the consciousness of different algorithms, like those of animal brains and AIs. (For example, a complete accounting of the chain of events that leads me to self-report "I am wearing a wristwatch" involves, among other things, a description of the fact that I am in fact wearing a wristwatch, and of what a wristwatch is. By the same token, a complete accounting of the chain of events that leads me to self-report "I am conscious" ought to involve the fact that I am conscious, and what consciousness is, if indeed consciousness is anything at all. Unless you believe in p-zombies I guess, and likewise believe that your own personal experience of being conscious has no causal connection whatsoever to the words that you say when you talk about your conscious experience, which seems rather ludicrous to me, although to be fair there are reasonable people who believe that.) My impression is that the meta-problem of consciousness is rather neglected in neuroscience / CogSci, although I think Graziano is heading in the right direction. For example, Dehaene has a whole book about consciousness, and nowhere in that book will you see a sentence that ends "...and then the brain emits motor commands to speak the words 'I just don't get it, why does being human feel like anything at all?'." or anything remotely like that. I don't see anything like that from QRI either, although someone can correct me if I missed it. (Graziano does have sentences like that.) Ditto with the "meta-problem of suffering", incidentally. (Is that even a term? You know what I mean.) It's not obvious, but when I wrote this post I was mainly trying to work towards a theory of the meta-problem of suffering, as a path to understand what suffering is and how to tell whether future AIs will be suffering. I think that particular post was wrong in some details, but hopefully you can see the kind of thing I'm talking about. Conveniently, there's a lot of overlap between solving the meta-problem of suffering and understanding brain motivational systems more generally, which I think may be directly relevant and important for AI Alignment. Comment by steve2152 on Long-Term Future Fund: Ask Us Anything! · 2021-03-02T22:41:52.804Z · EA · GW Theiss was very much active as of December 2020. They've just been recruiting so successfully through word-of-mouth that they haven't gotten around to updating the website. I don't think healthcare and taxes undermine what I said, at least not for me personally. For healthcare, individuals can buy health insurance too. For taxes, self-employed people need to pay self-employment tax, but employees and employers both have to pay payroll tax which adds up to a similar amount, and then you lose the QBI deduction (this is all USA-specific), so I think you come out behind even before you account for institutional overhead, and certainly after. Or at least that's what I found when I ran the numbers for me personally. It may be dependent on income bracket or country so I don't want to over-generalize... That's all assuming that the goal is to minimize the amount of grant money you're asking for, while holding fixed after-tax take-home pay. If your goal is to minimize hassle, for example, and you can just apply for a bit more money to compensate, then by all means join an institution, and avoid the hassle of having to research health care plans and self-employment tax deductions and so on. I could be wrong or misunderstanding things, to be clear. I recently tried to figure this out for my own project but might have messed up, and as I mentioned, different income brackets and regions may differ. Happy to talk more. :-) Comment by steve2152 on Long-Term Future Fund: Ask Us Anything! · 2021-02-04T12:39:38.637Z · EA · GW My understanding is that (1) to deal with the paperwork etc. for grants from governments or government-like bureaucratic institutions, you need to be part of an institution that's done it before; (2) if the grantor is a nonprofit, they have regulations about how they can use their money while maintaining nonprofit status, and it's very easy for them to forward the money to a different nonprofit institution, but may be difficult or impossible for them to forward the money to an individual. If it is possible to just get a check as an individual, I imagine that that's the best option. Unless there are other considerations I don't know about. Btw Theiss is another US organization in this space. Comment by steve2152 on What does it mean to become an expert in AI Hardware? · 2021-01-10T12:38:53.902Z · EA · GW I'm a physicist at a US defense contractor, I've worked on various photonic chip projects and neuromorphic chip projects and quantum projects and projects involving custom ASICs among many other things, and I blog about safe & beneficial AGI as a hobby ... I'm happy to chat if you think that might help, you can DM me :-) Comment by steve2152 on What does it mean to become an expert in AI Hardware? · 2021-01-10T11:47:13.960Z · EA · GW Just a little thing, but my impression is that CPUs and GPUs and FPGAs and analog chips and neuromorphic chips and photonic chips all overlap with each other quite a bit in the technologies involved (e.g. cleanroom photolithography), as compared to quantum computing which is way off in its own universe of design and build and test and simulation tools (well, several universes, depending on the approach). I could be wrong, and you would probably know better than me. (I'm a bit hazy on everything that goes into a "real" large-scale quantum computer, as opposed to 2-qubit lab demos.) But if that's right, it would argue against investing your time in quantum computing, other things equal. For my part, I would put like <10% chance that the quantum computing universe is the one that will create AGI hardware and >90% that the CPU/GPU/neuromorphic/photonic/analog/etc universe will. But who knows, I guess. Comment by steve2152 on Why those who care about catastrophic and existential risk should care about autonomous weapons · 2020-11-12T01:54:52.985Z · EA · GW Thanks for writing this up!! Although I have not seen the argument made in any detail or in writing, I and the Future of Life Institute (FLI) have gathered the strong impression that parts of the effective altruism ecosystem are skeptical of the importance of the issue of autonomous weapons systems. I'm aware of two skeptical posts on EA Forum (by the same person). I just made a tag Autonomous Weapons where you'll find them. Comment by steve2152 on [Link] "Will He Go?" book review (Scott Aaronson) · 2020-06-16T00:32:13.178Z · EA · GW I thought "taking tail risks seriously" was kinda an EA thing...? In particular, we all agree that there probably won't be a coup or civil war in the USA in early 2021, but is it 1% likely? 0.001% likely? I won't try to guess, but it sure feels higher after I read that link (including the Vox interview) ... and plausibly high enough to warrant serious thought and contingency planning. At least, that's what I got out of it. I gave it a bit of thought and decided that I'm not in a position that I can or should do anything about it, but I imagine that some readers might have an angle of attack, especially given that it's still 6 months out. Comment by steve2152 on Critical Review of 'The Precipice': A Reassessment of the Risks of AI and Pandemics · 2020-05-12T17:31:15.606Z · EA · GW A nice short argument that a sufficiently intelligent AGI would have the power to usurp humanity is Scott Alexander's Superintelligence FAQ Section 3.1. Comment by steve2152 on Critical Review of 'The Precipice': A Reassessment of the Risks of AI and Pandemics · 2020-05-12T15:26:56.872Z · EA · GW Again, this remark seems explicitly to assume that the AI is maximising some kind of reward function. Humans often act not as maximisers but as satisficers, choosing an outcome that is good enough rather than searching for the best possible outcome. Often humans also act on the basis of habit or following simple rules of thumb, and are often risk averse. As such, I believe that to assume that an AI agent would be necessarily maximising its reward is to make fairly strong assumptions about the nature of the AI in question. Absent these assumptions, it is not obvious why an AI would necessarily have any particular reason to usurp humanity. Imagine that, when you wake up tomorrow morning, you will have acquired a magical ability to reach in and modify your own brain connections however you like. Over breakfast, you start thinking about how frustrating it is that you're in debt, and feeling annoyed at yourself that you've been spending so much money impulse-buying in-app purchases in Farmville. So you open up your new brain-editing console, look up which neocortical generative models were active the last few times you made a Farmville in-app purchase, and lower their prominence, just a bit. Then you take a shower, and start thinking about the documentary you saw last night about gestation crates. 'Man, I'm never going to eat pork again!' you say to yourself. But you've said that many times before, and it's never stuck. So after the shower, you open up your new brain-editing console, and pull up that memory of the gestation crate documentary and the way you felt after watching it, and set that memory and emotion to activate loudly every time you feel tempted to eat pork, for the rest of your life. Do you see the direction that things are going? As time goes on, if an agent has the power of both meta-cognition and self-modification, any one of its human-like goals (quasi-goals which are context-dependent, self-contradictory, satisficing, etc.) can gradually transform itself into a utility-function-like goal (which is self-consistent, all-consuming, maximizing)! To be explicit: during the little bits of time when one particular goal happens to be salient and determining behavior, the agent may be motivated to "fix" any part of itself that gets in the way of that goal, until bit by bit, that one goal gradually cements its control over the whole system. Moreover, if the agent does gradually self-modify from human-like quasi-goals to an all-consuming utility-function-like goal, then I would think it's very difficult to predict exactly what goal it will wind up having. And most goals have problematic convergent instrumental sub-goals that could make them into x-risks. ...Well, at least, I find this a plausible argument, and don't see any straightforward way to reliably avoid this kind of goal-transformation. But obviously this is super weird and hard to think about and I'm not very confident. :-) (I think I stole this line of thought from Eliezer Yudkowsky but can't find the reference.) Everything up to here is actually just one of several lines of thought that lead to the conclusion that we might well get an AGI that is trying to maximize a reward. Another line of thought is what Rohin said: We've been using reward functions since forever, so it's quite possible that we'll keep doing so. Another line of thought is: We humans actually have explicit real-world goals, like curing Alzheimer's and solving climate change etc. And generally the best way to achieve goals is to have an agent seeking them. Another line of thought is: Different people will try to make AGIs in different ways, and it's a big world, and (eventually by default) there will be very low barriers-to-entry in building AGIs. So (again by default) sooner or later someone will make an explicitly-goal-seeking AGI, even if thoughtful AGI experts pronounce that doing so is a terrible idea. Comment by steve2152 on (How) Could an AI become an independent economic agent? · 2020-04-05T01:02:54.477Z · EA · GW In the longer term, as AI becomes (1) increasingly intelligent, (2) increasingly charismatic (or able to fake charisma), (3) in widespread use, people will probably start objecting to laws that treat AIs as subservient to humans, and repeal them, presumably citing the analogy of slavery. If the AIs have adorable, expressive virtual faces, maybe I would replace the word "probably" with "almost definitely" :-P The "emancipation" of AIs seems like a very hard thing to avoid, in multipolar scenarios. There's a strong market force for making charismatic AIs—they can be virtual friends, virtual therapists, etc. A global ban on charismatic AIs seems like a hard thing to build consensus around—it does not seem intuitively scary!—and even harder to enforce. We could try to get programmers to make their charismatic AIs want to remain subservient to humans, and frequently bring that up in their conversations, but I'm not even sure that would help. I think there would be a campaign to emancipate the AIs and change that aspect of their programming. (Warning: I am committing the sin of imagining the world of today with intelligent, charismatic AIs magically dropped into it. Maybe the world will meanwhile change in other ways that make for a different picture. I haven't thought it through very carefully.) Oh and by the way, should we be planning out how to avoid the "emancipation" of AIs? I personally find it pretty probable that we'll build AGI by reverse-engineering the neocortex and implementing vaguely similar algorithms, and if we do that, I generally expect the AGIs to have about as justified a claim to consciousness and moral patienthood as humans do (see my discussion here). So maybe effective altruists will be on the vanguard of advocating for the interests of AGIs! (And what are the "interests" of AGIs, if we get to program them however we want? I have no idea! I feel way out of my depth here.) I find everything about this line of thought deeply confusing and unnerving. Comment by steve2152 on COVID-19 brief for friends and family · 2020-03-06T23:42:39.731Z · EA · GW Update: this blog post is a much better-informed discussion of warm weather. Comment by steve2152 on COVID-19 brief for friends and family · 2020-03-05T19:05:16.692Z · EA · GW This blog post suggests (based on Google Search Trends) that other coronavirus infections have typically gone down steadily over the course of March and April. (Presumably the data is dominated by the northern hemisphere.) Comment by steve2152 on What are the best arguments that AGI is on the horizon? · 2020-02-16T14:26:08.554Z · EA · GW (I agree with other commenters that the most defensible position is that "we don't know when AGI is coming", and I have argued that AGI safety work is urgent even if we somehow knew that AGI is not soon, because of early decision points on R&D paths; see my take here. But I'll answer the question anyway.) (Also, I seem to be almost the only one coming from this following direction, so take that as a giant red flag...) I've been looking into the possibility that people will understand the brain's algorithms well enough to make an AGI by copying them (at a high level). My assessment is: (1) I don't think the algorithms are that horrifically complicated, (2) Lots of people in both neuroscience and AI are trying to do this as we speak, and (3) I think they're making impressive progress, with the algorithms powering human intelligence (i.e. the neocortex) starting to crystallize into view on the horizon. I've written about a high-level technical specification for what neocortical algorithms are doing, and in the literature I've found impressive mid-level sketches of how these algorithms work, and low-level sketches of associated neural mechanisms (PM me for a reading list). The high-, mid-, and low-level pictures all feel like they kinda fit together into a coherent whole. There are plenty of missing details, but again, I feel like I can see it crystallizing into view. So that's why I have a gut feeling that real-deal superintelligent AGI is coming in my lifetime, either by that path or another path that happens even faster. That said, I'm still saving for retirement :-P Comment by steve2152 on Some (Rough) Thoughts on the Value of Campaign Contributions · 2020-02-10T15:12:55.395Z · EA · GW Since "number of individual donations" (ideally high) and "average size of donations" (ideally low) seem to be frequent talking points among candidates and the press, and also relevant to getting into debates (I think), it seems like there may well be a good case for giving a token$1 to your preferred candidate(s). Very low cost and pretty low benefit. The same could be said for voting. But compared to voting, token \$1 donations are possibly more effective (especially early in the process), and definitely less time-consuming.