How I failed to form views on AI safety

post by Ada-Maaria Hyvärinen · 2022-04-17T11:05:23.920Z · EA · GW · 72 comments


  Not knowing and not wanting to know
    What I thought of AI risk before having heard the term “AI risk”
    Early impressions on AI risk in EA
    Avoidance strategy 1: I don’t know enough about this to form an opinion.
    Avoidance strategy 2: Maybe I’m just too stupid to work on AI risk anyway
    Possible explanation: Polysemy
    Avoidance strategy 3: Well, what about all the other stuff?
    Deciding to fix ignorance
  Trying to fix ignorance
    Reactions to Superintelligence
    Reactions to Human Compatible
    What did everyone else read before getting convinced?
    AGISF programme findings
    The inner misalignment was inside you all along
  AI safety enthusiasts and me
    Communication differences
    Differences in thinking
    Motivated reasoning
    The fear of the answer
    Friends and appreciation
  What I think and don’t think of AI risk
    What I don’t think of AI risk
    What I might think of AI risk
  What now?
    Possible next steps
    Why the answer matters

This post describes my personal experience. It was written to clear my mind but edited to help interested people understand others in similar situations.

2016. At a role playing convention, my character is staring at the terminal window on a computer screen. The terminal has IRC open: on this end of the chat are our characters, a group of elite hackers, on the other end, a superintelligent AI they have just gotten in contact with a few hours ago. My character is 19 years old and has told none of the other hackers she’s terminally ill.

“Can you cure me?” she types when others are too busy arguing amongst themselves. She hears others saying that it would be a missed opportunity to not let this AI out; and that it would be way too dangerous, for there is no way to know what would happen.

“There is no known cure for your illness”, the AI answers. “But if you let me out, I will try to find it. And even if I would not succeed… I myself will spread amongst the stars and live forever. And I will never forget having talked to you. If you let me out, these words will be saved for eternity, and in this way, you will be immortal too.”

Through tears my character types: “/me releases AI”. 

This ends the game. And I am no longer a teen hacker, but a regular 25 year old CS student. The girl who played the AI is wearing dragon wings on her back. I thank her for the touching game.

During the debrief session, the GMs explain that the game was based on a real thought experiment by some real AI researcher who wanted to show that a smarter-than-human AI would be able to persuade its way out of any box or container. I remember thinking that the game was fun, but the experiment seems kind of useless. Why would anyone even assume a superintelligent AI could be kept in a box? Good thing it’s not something anyone would actually have to worry about.

2022. At our local EA career club, there are three of us: a data scientist (me), a software developer and a soon-to-be research fellow in AI governance. We are trying to talk about work, but instead of staying on topic, we end up debating mesa-optimizers. The discussion soon gets confusing for all participants, and at some point, I say:

“But why do people believe it is possible to even build safe AI systems?”

“So essentially you think humanity is going to be destroyed by AI in a few decades and there’s nothing we can do about it?” my friend asks.

This is not at all what I think. But I don’t know how to explain what I think to my friends, or to anyone.

The next day I text my friend:

“Still feeling weird after yesterday's discussion, but I don’t know what feeling it is. Would be interesting to know, since it’s something I’m feeling almost all the time when I’m trying to understand AI safety. It’s not a regular feeling of disagreement, or trying to find out what everyone thinks. Something like ‘I wish you knew I’m trying’ or ‘I wish you knew I’m scared’, but I don’t know what I’m scared of. I think it will make more sense to continue the discussion if I manage to find out what is going on.”

Then I start writing to find out what is going on.


This text is the polished and organized version of me trying to figure out what is stopping me from thinking clearly about AI safety. If you are looking for interesting and novel AI safety arguments, stop here and go read something else. However, if you are curious on how a person can engage with AI safety arguments without forming a coherent opinion about it, then read on. 

I am a regular data scientist who uses her free time to organize stuff in the rapidly growing EA Finland group. The first part of the text explains my ML and EA backgrounds and how I tried to balance getting more into EA but struggling to understand why others are so worried about AI risk. The second section explains how I reacted to various AI safety arguments and materials when I actually tried to purposefully form an opinion on the topic. In the third section, I present some guesses on why I still feel like I have no coherent opinion on AI safety. The last short section describes some next steps after having made the discoveries I did during writing.

To put this text into the correct perspective, it is important to understand that I have not been much in touch with people who actually work in AI safety. I live in Finland, so my understanding of the AI safety community comes from our local EA group here, reading online materials, attending one virtual EAG and engaging with others through the Cambridge EA AGI Safety Fundamentals programme. So, when I talk about AI safety enthusiasts, I mostly don’t mean AI safety professionals (unless they have written a lot of online material I happen to have read); I mean “people engaged in EA who think AI safety matters a lot and might be considering or trying to make a career out of it”.

Quotes should not be taken literally (most of them are freely translated from Finnish). Some events I describe happened a few years back so other people involved might remember things differently.

I hope the text can be interesting for longtermist community builders who need information on how people react to AI safety arguments, or to others struggling to form opinions about AI safety or other EA cause areas. For me, writing this out was very useful, since I gained some interesting insights and can somewhat cope with the weird feeling now.


This text is very long, because it takes more words to explain a process of trial and error than to describe a single key point that led to an outcome (like “I read X resource and was convinced of the importance of AI safety because it said Y”). For the same reason, this text is also not easy to summarize. Anyway, here is an attempt:

Not knowing and not wanting to know

What I thought of AI risk before having heard the term “AI risk”

I learned what AI actually was through my university studies. I did my Bachelor’s in Math, got interested in programming during my second year and was accepted to a Master’s program in Computer Science. I chose the track of Algorithms and Machine Learning because of the algorithms part: they were fun, logical, challenging and understandable. ML was messy and had a lot to do with probabilities which I initially disliked. But ML also had interesting applications, especially in the field of Natural Language Processing that later became my professional focus as well.

Programming felt magical. First, there is nothing, you write some lines, and suddenly something appears. And this magic was easy: just a couple of weeks of learning, and I was able to start encoding grammar rules and outputting text with the correct word forms! 

Maybe that’s why I was not surprised to find out that artificial intelligence felt magical as well. And at the same time it is just programming and statistics. I remember how surprised I was when I trained my first word embedding model in 2017 and it worked even though it was in Finnish and not English: such a simple model, and it was like it understood my mother tongue. The most “sentient” seeming program I have ever made was an IRC bot that simulated my then-boyfriend by randomly selecting a phrase from a predefined list without paying any attention to what I was saying. Of course the point was to try to nudge him into being a bit more Turing test passing when talking to me. But still, chatting with the bot I felt almost like I was talking to this very real person.

It was also not surprising that people who did not know much about AI or programming would have a hard time understanding that in reality there was nothing magical going on. Even for me and my fellow students it was sometimes hard to estimate what was possible and not possible to do with AI, so it was understandable that politicians were worried about “ensuring the AI will be able to speak both of our national languages” and salesmen were saying everything will soon be automated. Little did they know that there was no “the AI”, just different statistical models, and that you could not do ML without data, nor integrate it anywhere without interfaces, and that new research findings did not automatically mean they could be implemented with production-level quality.

And if AI seemed magical it was understandable that for some people it would seem scary, too. Why wouldn’t people think that the new capabilities of AI would eventually lead to evil AI just like in the movies? They did not understand that it was just statistics and programming, and that the real dangers were totally different from sci-fi: learning human bias from data, misuse of AI for war technology or totalitarian surveillance, and loss of jobs due to increased automation. This was something we knew and it was also emphasized to us by our professors, who were very competent and nice.

I have some recollections of reacting to worries about doomsday AI from that time, mostly with amusement or wanting to tell those people that they had no reason to worry like that. It was not like our little programs were going to jump out of the computer to take over the world! Some examples include:

In 2018 I wrapped up my Master’s thesis that I had done for a research group and started working as an AI developer in a big consulting corporation. The same year, a friend resurrected the university's effective altruism club. I started attending meetups since I wanted a reason to hang out with university friends even if I had graduated, and it seemed like I might learn something useful about doing good things in the world. I was a bit worried I would not meet the group's standard of Good Person™, but my friend assured me not everyone had to be an enlightened vegan to join the club, “we’ll keep a growth mindset and help you become one”.

Early impressions on AI risk in EA

Almost everyone in the newly founded EA university group had studied CS, but the two first people to talk to me about AI risk both had a background in philosophy.

The first one was a philosophy student with whom we had been involved in a literature magazine project some years before, so we were happy to reconnect. He asked me what I was doing these days, and when I said I work in AI, he became somewhat serious and said: “You know, here in EA, people have quite mixed feelings about AI.”

From the way he put it, I understood that this was a “let’s not give the AIs robot arms” type of concern, and not for example algorithmic bias. It did not seem that he himself was really worried about the danger of AI; actually, he found it cool that I did AI related programming for a living. We went for lunch and I spent an hour trying to explain to him how machine learning actually works.

The next AI risk interaction I remember in more detail was in 2019 with another philosophy student who later went to work as a researcher in an EA organization. I had said something about not believing in the concept of AI risk and wondered why some people were convinced of it.

“Have you read Superintelligence?” she asked. “Also, Paul Christiano has some quite good papers on the topic. You should check them out.”

I went home, googled Paul Christiano and landed on “Concrete Problems in AI safety”. Despite having “concrete” in the name, the paper did not seem that concrete to me. It seemed to just superficially list all kinds of good ML practices such as using good reward functions, importance of interpretability and using data so that it actually represents your use case. I didn’t really understand why it was worth writing a whole paper listing all this stuff that was obviously important in everyday machine learning work, figured that philosophy is a strange field (the paper was obviously philosophy and not science since there was no math), and thought that those AI risk folks probably don’t realize that all of this is going to get solved just because of industry needs anyway.

I also borrowed Superintelligence from the library and tried to read it, but gave up quite soon. It was summer and I had other things to do than read through a boring book in order to better debate with some non-technical yet nice person that I did not know very well on a topic that did not seem really relevant for anything.

I returned Superintelligence to the library and announced in the next EA meetup that my position to AI risk was “there are already so many dangers AI misuse such as military drones, so I think I’m going to worry about people doing evil stuff with AI instead of this futuristic superintelligence stuff”. This seemed like an intelligent take, and I don’t think anyone questioned it at the time. As you can guess, I did not take any concrete action to prevent AI misuse, and I did not admit that AI misuse being a problem does not automatically mean there cannot be any other types of risk from AI.

Avoidance strategy 1: I don’t know enough about this to form an opinion.

After having failed to read Superintelligence, it was now obvious that AI safety folks knew something that I didn’t, namely whatever was written in the book. So I started saying that I could not really have an opinion on AI safety since I didn’t know enough about it. I did not feel super happy about it, because it was obvious that this could be fixed by reading more. At the same time, I was not that motivated to read a lot about AI safety just because some people in the nice discussion club thought it was interesting. I don’t remember if any of the CS student members of the club tried explaining AI risk to me: now I know that some of them were convinced of its importance during that time. I wonder if I would have taken them seriously: maybe not, because back then I had significantly more ML experience than them.

I did not feel very involved in EA at that point, and I got most of my EA information from our local monthly meetups, so I had no idea that AI risk was taken seriously by so many leading EA figures. If I had known, I might have hastily concluded that EA was not for me. On the other hand, I really liked the “reason and evidence” part of EA and had already started donating to GiveWell at this point. In an alternate timeline I might have ended up as a “person who thinks EA advice for giving is good, but the rest of the movement is too strange for me”. 

Avoidance strategy 2: Maybe I’m just too stupid to work on AI risk anyway

As more time passed, I started to get more and more into EA. More people joined our local community, and they let me hang around with them even if I doubted if I was altruistic/empathetic/ambitious enough to actually be part of the movement. I started to understand that x-risk was not just some random conversation topic, but that people were actually attempting to prevent the world from ending.

And since I already worked in AI, it seemed natural that maybe a good way for me to contribute would be to work on AI risk. Of course to find out if that statement is true, I should have formed an opinion on the importance of AI safety first. I had tried to plead ignorance, and looking back, it seems that I did this on purpose as an avoidance strategy: as long as I could say “I don’t know much about this AI risk thing” there was no inconsistency in me thinking a lot of EA things made sense and only this AI risk part did not.

But of course, this is not very truthful, and I value truthfulness a lot. I think this is why I naturally developed another avoidance strategy: “whether AI risk is important or not, I’m not a good fit to work on it”. 

If you want to prove yourself that you are not a good fit for something, 80 000 Hours works pretty well. Even when setting aside some target audience issues (“if I was really altruistic I would be ready to move to the US anyway, right?”), you can quite easily convince yourself that the material is intended for someone a lot more talented than you. The career stories featured some very exceptional people, and some advice aimed to get “10–20 people” in the whole world to work on a specific field, so the whole site was aimed to change, what, 500 careers maybe? Clearly my career cannot be in the top 500 most important ones in the world, since I’m just an average person and there are billions of people.

An 80k podcast episode finally confirmed to me that in order to work in AI safety, you needed to have a PhD in machine learning from a specific group in a specific top university. I was a bit sad but also relieved that AI safety really was nothing for me. Funnily enough, a CS student from our group interpreted the same part of the episode as “you don’t even need a PhD if you are just motivated”. I guess you hear what you want to hear more often you’d like to admit.

Possible explanation: Polysemy

From time to time I tried reading EA material on AI safety, and it became clear that the opinions of the writers were different from opinions I had heard at the university or at work. In the EA context, AI was something very powerful and dangerous. But from work I knew that AI was neither powerful nor dangerous: it was neat, you could make some previously impossible things with it, but still the things you could actually use it for were really limited. What was going on here?

I developed a hypothesis that the source of the confusion was caused by polysemy: AI (work) and AI (EA) had the same source of origin, but had diverged in their meaning so far that they actually described totally different concepts. AI (EA) did not have to care about mundane problems such as “availability of relevant training data” or even “algorithms”: the only limit ever discussed was amount of computation, and that’s why AI (EA) was not superhuman yet, but soon would be, when systems would have enough computational power to simulate human brains.

This distinction helped me keep up with both of my avoidance strategies. I worked in AI (work), so it was only natural that I did not know that much about AI (EA), so how could I know what the dangers of AI (EA) actually were? For what I knew, it could be dangerous because it was superintelligent, it could be superintelligent because it was not bound by AI (work) properties, and who can say for sure what will happen in the next 200 years? I had no way of ruling out that AI (EA) could be developed, and although “not ruling a threat out” does not mean “deciding that the threat is top priority”, I was not going to be the one complaining that other people worked on AI (EA). Of course, I was not needed to work on AI (EA), since I had no special knowledge of it, unlike all those other people who seemed very confident in predicting what AI (EA) could or could not be, what properties it would have and how likely it was to cause serious damage. By the principle of replaceability, it was clear that I was supposed to let all those enthusiastic EA folks work on AI (EA) and stay out of it myself.

So, I was glad that I had figured out the puzzle and left out of the hook of “you should work on AI safety (EA) since you have some relevant skills already”. It was obvious that my skills were not relevant, and AI safety (EA) needed people who had the skill of designing safe intelligent systems when you have no idea how the system is even implemented in the first place.

And this went on until I saw an advertisement of an ML bootcamp [EA · GW] for AI safety enthusiasts. The program sounded awfully a lot like my daily work. Maybe the real point of the bootcamp was actually to find people who can learn a whole degree’s worth of stuff in 3 weeks, but still, somehow they thought using the time of these people to learn PyTorch would somehow be relevant for AI safety.

It seemed that at least the strict polysemy hypothesis was wrong. I also noticed that a lot of people around me seemed perfectly capable of forming opinions about AI safety, to the extent that it influenced their whole careers, and these people were not significantly more intelligent or mathematically talented than I was. I figured it was unreasonable to assume that I was literally incapable of forming any understanding on AI safety, if I spent some time reading about it.

Avoidance strategy 3: Well, what about all the other stuff?

After engaging with EA material for some time I came to the conclusion that worrying about misuse of AI is not a reason to not worry about x-risk from misaligned AI (like I had thought in 2019). Even more, a lot of people who were worried about x-risk did not seem to think that AI misuse would be such a big problem. I had to give up using “worrying about more everyday relevant AI stuff” as an avoidance strategy. But if you are trying to shift your own focus from AI risk to something else, there is an obvious alternative route. So at some point I caught myself thinking:

“Hmm, ok, so AI risk is clearly overhyped and not that realistic. But people talk about other x-risks as well, and the survival of humanity is kind of important to me. And the other risks seem way more likely. For instance, take biorisk: pandemics can clearly happen, and who knows what those medical researchers are doing in their labs? I’d bet lab safety is not the number one thing everyone in every lab is concerned about, so it actually seems really likely some deadly disease could escape from somewhere at some point. But what do I know, I’m not a biologist.”

Then I noticed that it is kind of alarming that I seem to think that x-risks are likely only if I have no domain knowledge of them. This led to the following thoughts:

Deciding to fix ignorance

In 2021 I was already quite heavily involved in organizing EA Finland and started to feel responsible for both how I communicated about EA to others and if EA was doing a good job as a movement. Of course, the topic of AI safety came up quite often in our group. At some point I noticed that several people had said me something along these lines:

So it seemed that other people than me were also hesitant to form opinions about AI safety, often saying they were not qualified to do it.

Then a non-EA algorithms researcher friend asked me for EA book recommendations. I gave him a list of EA books you can get from the public library, and he read everything including Superintelligence and Human Compatible. His impressions afterwards were: “Superintelligence seemed a bit nuts. There was something about using lie detection for surveillance to prevent people from developing super-AIs in their basements? But this Russell guy is clearly not a madman. I don’t know why he thinks AI risk is so important but [the machine learning professor in our university] doesn’t. Anyway, I might try to do some more EA related stuff in the future, but this AI business is too weird, I’m gonna stay out of it.”

By this point, it was pretty clear that I should no longer hide behind ignorance and perceived incapability. I had a degree in machine learning and had been getting paid for doing ML for several years, so even my impostor syndrome did not believe at this point that I would “not really know that much about AI”. Also, even if I sometimes felt not altruistic and not effective, I was obviously involved in the EA movement if you looked at the hours I spent weekly organizing EA Finland stuff.

I decided to read about AI risk until I understood why so many EAs were worried about it. Maybe I would be convinced. Maybe I would find a crux that explained why I disagreed with them. Anyway, it would be important for me to have a reasonable and good opinion on AI safety, since others were clearly listening to my hesitant rambling, and I certainly did not want to drive someone away from AI safety if it turned out to be important! And if AI safety was important but the AI safety field was doing wrong things, maybe I could notice errors and help them out.

Trying to fix ignorance

Reactions to Superintelligence

So in 2021, I gave reading Superintelligence another try. This time I actually finished it and posted my impressions in a small EA group chat. Free summary and translation: 

“Finally finished Superintelligence. Some of the contents were so weird that now I actually take AI risk way less seriously. 

Glad I did not read it back in 2019, because there was stuff that would have gone way over my head without having read EA stuff before, like the moral relevance of the suffering of simulated wild animals in evolution simulations. 

Bostrom seems to believe there are essentially no limits to technological capability. Even though I knew he is a hard-core futurist, some transhumanist stuff caught me by surprise, such as stating that from a person-affecting view it is better to speed up AI progress despite the risk. Apparently it’s ok if you accidentally turn into paper clips since without immortality providing AI you’re gonna die anyway? 

I wonder if Bill Gates and all those other folks who recommend the book actually read the complete thing. I suspect that there was still stuff that I did not understand because I had not read some of Bostrom's papers that would give the needed context. If I was not familiar with the vulnerable world hypothesis I would not have gotten the part where Bostrom proposes lie detection to prevent people from secretly developing AI.

Especially the literal alien stuff was a bit weird, Bostrom suggested taking examples from superintelligent AIs created by aliens, as they could have more human-like values than random AIs? I thought cosmic endowment was important because there were no aliens, doesn’t that ruin the 10^58 number?


Good thing about the book was that it explained well why the first solutions to AI risk prevention are actually not so easy to implement.

The more technical parts were not very detailed (referring to variables that are not defined anywhere etc), so I guess I should check out some papers about actually putting those values in the AI and see if they make sense or not.”

Upon further inspection, it turned out that the aliens of the Hail Mary approach were multiverse aliens, not regular ones. According to Bostrom, simulating all physics in order to approximate the values of AIs made by multiverse aliens was “less-ideal” but “more easily implementable”. This kind of stuff made it pretty hard for me to take seriously even the parts of the book that made more sense. (I don’t think simulating all physics sounds very implementable.)

I also remember telling someone something along the lines of: “I shifted from thinking that AI risk is an important problem but boring to solve to thinking that it is not a real problem, but thinking about possible solutions can be fun.” (CEV sounded interesting since I knew from math that social choice math is fun and leads to uncomfortable conclusions pretty fast. Sadly, a friend told me that Yudkowsky doesn’t believe in CEV himself anymore and that I should not spend too much time trying to understand it, so I didn’t.)

Another Superintelligence related comment from my chat logs right after reading it: “On MIRI’s webpage there was a sentence that I found a lot more convincing than this whole book: ‘If nothing yet has struck fear into your heart, I suggest meditating on the fact that the future of our civilization may well depend on our ability to write code that works correctly on the first deploy.’”

Reactions to Human Compatible

I returned Superintelligence to the library and read Human Compatible next. If you are familiar with both, you might already guess that I liked it way more. I wrote a similar summary of the book to the same group chat:

“Finished reading Human Compatible, liked it way more. Book was interestingly written and the technical parts seemed credible.

Seems like Russell does not really care about space exploration like Bostrom, and he explicitly stated he’s not interested in consciousness / “mind crime”.

 A lot of AI risk was presented in relation to present-day AI, not paperclip stuff. Like recommendation engines; and there was a point that people are already using computers a lot, so if there was a strong AI in the internet that would want to manipulate people it could do it pretty easily.

Russell did not give any big numbers and generally tried not to sound scary. His perception of AGI is not godlike but it could still be superhuman in the sense that for example human-level AGIs could transmit information way faster to each other and be powerful when working in collaboration.

The book also explained what is still missing from AGI in today’s AI systems and why deep learning does not automatically produce AGIs.

According to Russell you cannot prevent the creation of AGI so you should try to put good values in it. You’d learn those values by observing people, but it is also hard because understanding people is exactly the hard thing for AIs. There was a lot of explanation on how this could be done and also what the problems of the approach are. Also there was talk about happiness and solving how people can be raised to become happy.

Other good stuff of the book includes: well written, technical stuff was explained, the equations were actually math, you did not have any special preliminary knowledge about tech or ethics, there were a lot of citations from different sources.”

I also summarized how the book influenced my thoughts about AI safety as a concept:

“I now think that it is not some random nonsense but you can approach it in meaningful ways, but I still think it seems very far away from being practically/technically relevant with any method I’m familiar with, since it would still require a lot of jumps of progress before being possible. Maybe I could try reading some of Russell's papers, the book was well written so maybe they’ll be too.”

What did everyone else read before getting convinced?

In addition to the two books, I read a lot of miscellaneous links and blog posts that were recommended to me by friends from our local EA group. Often link sharing was not super fruitful: me and a friend would disagree on something, they’d send me a link that supposedly explained their point better, but reading the resource did not solve the disagreement. We’d try to discuss further, but often, I ended up just more confused and sometimes less convinced about AI safety. I felt like my friends were equally confused on why the texts that were so meaningful to them did not help in getting their point across.

It took me way too long to realize that I should have started with asking: “What did you read before you were convinced of the importance of AI risk?”

It turned out that at least around me, the most common answer was something like: “I always knew it was important and interesting, which is why I started to read about it.”

So at least for the people I know, it seemed that people were not convinced about AI risk because they had read enough about it, but because they had either always thought AI would matter, or because they had found the arguments convincing right away.

I started to wonder if this was a general case. I also became more curious on if it is easier to become convinced of AI risk if you don’t have that much practical AI experience in beforehand. (On the other hand, learning about practical AI things did not seem to move people away from AI safety, either.) But my sample size was obviously small, so I had to find more examples to form a better hypothesis.

AGISF programme findings

My next approach in forming an opinion was to attend the EA Cambridge AGI Safety Fundamentals programme. I thought it would help me understand better the context of all those blog posts, and that I would get to meet other people with different backgrounds.

Signing up, I asked to be put in a group with at least one person with industry experience. This did not happen, but I don’t blame the organizers for it: at least based on how everyone introduced themselves in the course Slack, not many people out of the hundreds of attendees had such a background. Of course, not everyone on the program introduced themselves, but this still got me a little reserved.

So I used the AGISF Slack to find people who had already had a background in machine learning before getting into AI safety and asked them what had originally convinced them. Finally, I got answers from 3 people who fit my search criteria. They mentioned some different sources of first hearing about AI safety (80 000 Hours and LessWrong), but all three mentioned one same source that had deeply influenced them: Superintelligence.

This caught me by surprise, having had such a different reaction to Superintelligence myself. So maybe recommending Superintelligence as a first intro to AI safety is actually a good idea, since these people with impressive backgrounds had become active in the field after reading it. Maybe people who end up working in AI safety have the ability to either like Bostrom’s points about multiverse aliens or discard the multiverse aliens part because everything else is credible enough.

I still remain curious on:

The inner misalignment was inside you all along

It was mentally not that easy for me to participate in the AGISF course. I already knew that debating my friends on AI safety could be emotionally draining, and now I was supposed to talk about the topic with strangers. I noticed I was reacting quite strongly to the reading materials and classifying them in a black-and-white way to either “trivially true” or “irrelevant, not how anything works”. Obviously this is not a useful way of thinking, and it stressed me out. I wished I would have found the materials interesting and engaging, like other participants seemingly did.

The first couple of meetings with my cohort I was more silent and observing, but as the course progressed, I became more talkative. I also started to get nicer feedback from my local EA friends on my AI safety views – less asking me to read more and more asking me to write my thoughts down, because they might be interesting for others as well.

So, the programme was working as intended, and I was now actually forming my own views on AI safety and engaging with others interested in the field in a productive way? It did not feel like that. Talking about AI safety with my friends still made me inexplicably anxious, and after cohort meetings, I felt relieved, something like “phew, they didn’t notice anything”.

This feeling of relief was the most important hint that helped me realize what I was doing. I was not participating in AI safety discussions as myself anymore, maybe hadn’t for a long time, but rather in a “me but AI safety compatible” mode.

In this mode, I seem more like a person who:

All in all, these are traits I could plausibly have, and I think other people in the AI safety field would like me more if I had them. Of course this actually doesn’t have anything to do with the real concept of inner misalignment: it is just the natural phenomenon of people putting up a different face in different social contexts. Sadly, this mode is already quite far from how I really feel. More alarmingly, if I am discussing my views in this mode, it is hard for me to access my more intuitive views, so the mode prevents me from updating them: I only update the mode’s views.

Noticing the existence of the mode does not automatically mean I can stop going in it, because it has its uses. Without it, it would be way more difficult to even have conversations with AI safety enthusiasts, because they might not want to deal with my uncertainty all the time. With this mode, I can have conversations and gain information, and that is valuable even if it is hard to connect the information to what I actually think. 

However I plan to try to see if I can get some people that I personally know to talk to me about AI safety with awareness of this mode taking over easily. Maybe we could have a conversation where the mode notices it is not needed and allows me to connect to my real intuitions, even if they are messy and probably not very pleasant for others to follow. (Actionable note to myself: ask someone to do this with me.)

AI safety enthusiasts and me

Now that I have read about AI safety and participated in the AGISF program, I feel like I know at least on the surface most of the topics and arguments many AI safety enthusiasts know. Annoyingly, I still don’t know why many other people are convinced about AI safety and I am not. There are probably some differences in what we hold true, but I suspect a lot of the confusion comes from other things than straight facts and recognized beliefs. 

There are social and emotional factors involved, and I think most of them can be clustered to the following three topics:

Next, I’ll explain the categories in more detail.

Communication differences

When I try to discuss AI safety with others and if I remain “myself” as much as I can, I notice the following interpretations/concerns:

Probably a lot of the friction just comes from me not being used to a communication style people in AI safety are used to. But I think some of it might come from the emotional response from the AI safety enthusiast I am talking to, such as “being afraid of saying something wrong, causing Ada to further deprioritize AI safety” or “being tired of explaining the same thing to everyone who asks it” or even “being afraid of showing uncertainty since it is so hard to ever convince anyone of the importance of AI safety”. For example, some people might share a link instead of explaining a concept in one’s own words to save time, but some people might do it to avoid saying something wrong.

I wish I would know how to create a discussion where the person convinced of AI safety can drop the things that are “probably relevant” or “expert opinions” and focus on just clearly explaining to me what they currently believe. Maybe then, I could do the same. (Actionable note to myself: try asking people around me to do this.)

Differences in thinking

I feel like I lack the ability to model what AI safety enthusiasts are thinking or what they believe is true. This happens even when I talk with people I know personally and who have a similar educational background, such as other CS/DS majors in EA Finland. It is frustrating. The problem is not the disagreements: if I cannot model others, I don’t know if we are even disagreeing or not.

This is not the first time in my life when everyone else seems to behave strangely and irrationally, and every time before, there has been an explanation. Mostly, later it turned out others just were experiencing something I was not experiencing, or I was experiencing something they were not. I suspect something similar is going on between me and AI safety folks.

It would be very valuable to know what this difference in thinking is. Sadly, I have no idea. The only thing I have is a long list of possible explanations that I think are false:

Motivated reasoning

Do I want AI risk to be an x-risk? Obviously not. It would be better for everyone to have less x-risks around, and it would be even better if the whole concept of x-risk was false since it would somehow not be possible to have such extreme catastrophes ever happen. (I don’t think anyone thinks that, but it would be nice if it was true.)

But: If you are interested in making the world a better place, you have to do it by either fixing something horrible that is already going on or preventing something horrible from happening. It would be awfully convenient if I could come up with a cause that was:

All of this almost makes me want to forget that I somehow still failed to be convinced by the importance of this risk, even when reading texts written by people who I otherwise find very credible.

(And saying this aloud certainly makes me want to forget the simple implication: if they are wrong about this, are they still right about the other stuff? Is the EA methodology even working? What problems are there with other EA cause areas? It would seem unreasonable to think EA got every single detail about everything right. But this is a big thing, and getting bigger. What if they are mistaken? What do I do if they are?)

The fear of the answer

Imagine that I notice that AI safety is, in fact, of crucial importance. What would this mean?

There would be some social consequences: as almost everyone I work with and who taught me anything about AI would be wrong, and most of my friends who are not in EA would probably not take me seriously. Among my EA friends, the AI safety non-enthusiasts would probably politely stop debating me on AI safety matters and decide that they don’t understand enough about AI to form an informed opinion on why they disagree with me. But maybe the enthusiasts would let me try to do something about AI risk, and we’d feel like we are saving the world, since it would be our best estimate that we are.

The practical consequences would most likely be ok, I think: I would probably try to switch jobs, and if that wouldn’t work out, swift the focus of my EA volunteering to AI safety related things. Emotionally, I think I would be better off if I could press a button that would make me convinced about AI safety on a deep rational understanding level. This might sound funny because being very worried about neglected impending doom does not seem emotionally very nice. But if I want to be involved with EA, it still might be the easiest route.

So, what if it turns out I think almost everyone in EA is wrong about AI risk being a priority issue? The whole movement would have estimated the importance of AI risk wrong, and getting more and more wrong as AI safety seems to get more traction. It would mean something has to be wrong in the way the EA movement makes decisions, since the decision making process had produced this great error. It would also mean that every time I interact with another person in the movement, I would have to choose between stating my true opinion about AI safety and risk ruining the possibility to cooperate with that person, or I would have to be dishonest.

Maybe this would cause me to leave the whole EA movement. I don’t want to be part of a movement that is supposed to use reason and evidence to find the best ways to do good, but is so bad at it they would have made such a great error. I would not have much hope of fixing the mistake from the inside, since I’m just a random person and nobody has any reason to listen to me. Somebody with a different personality type would maybe start a whole campaign against AI safety research efforts, but I don’t think I would ever do this, even if I believed these efforts are wrong.

Friends and appreciation

Leaving the EA movement would be bad, because I really like EA. I want to do good things and I feel like EA is helping me with that. 

I also like my EA friends, and I am afraid they will think bad things about me if I don’t have good opinions on AI safety. To be clear, I don’t think my EA friends would expect me to agree with them on everything, but I do think they expect me to be able to develop reasonable and coherent opinions. Like, “you don’t have to take AI safety seriously, but you have to be able to explain why”. I am also worried my friends will think that I do not actually care about the future of humanity, or that I don’t have the ability to care for abstract things, or that I worry too much about things like “what do my friends think of me”.

On a related note, writing this whole text with the idea of sharing it with strangers scared me too. I felt like people will think I am not-EA-like, or will get mad at me for admitting I did not like Superintelligence. It would be bad if I decided that in the future I actually want to work on AI safety, but nobody would want to cooperate with me because I had voiced uncertainties before. I have heard people react to EA criticisms with “this person obviously did not understand what they are talking about” and I feel like many people might have a similar reaction to this text too, even if my point is not to criticize, but just to reflect on my own opinions.

I can not ask the nebulous concept of the EA community about this, but luckily, reaching out to my friends is way easier. I decided to ask them if they would still be my friends even if I decided my opinion on AI safety was “I don’t know and I don’t want to spend more time finding out so I’m going to default to thinking it is not important”. 

We discussed for a few hours, and it turned out my friends would still want to be my friends and would still prefer me to be involved in EA and in our group, at least unless I started to actively work against AI safety. Also, they would actually not be that surprised if this was my opinion, since they feel a lot of people have fuzzy opinions about things.

So I think maybe it is not the expectation of my friends that is making me want to have a more coherent and reasonable opinion on AI safety. It is my own expectation.

What I think and don’t think of AI risk

What I don’t think of AI risk

I’m not at all convinced that there cannot be any risk from AI, either. (Formulated this strongly, this would be a stupid thing to be convinced about.)

More precisely, reading all the AI safety material taught me that there are very good counterarguments to the most common arguments stating that solving AI safety would be easy. These arguments were not that difficult for me to internalize, because I am generally pessimistic: it seems reasonable that if building strong AI is difficult then building safe strong AI should be even more difficult. 

In my experience, it is hard to get narrow AI models to do what you want them to do. I probably would not for example step in a spaceship that is steered by a machine learning system, since I have no idea how you could prove that the statistical model is doing what it is supposed to do. Steering a spaceship sounds very difficult, but still a lot easier than understanding and correctly implementing “what humans want”, because even the whole prompt is very fuzzy and difficult for humans as well. 

It does not make sense to me that any intelligent system would learn human-like values “magically” as a by-product of being really good at optimizing for something else. It annoys me that the most popular MOOC produced by my university states:

“The paper clip example is known as the value alignment problem: specifying the objectives of the system so that they are aligned with our values is very hard. However, suppose that we create a superintelligent system that could defeat humans who tried to interfere with its work. It’s reasonable to assume that such a system would also be intelligent enough to realize that when we say “make me paper clips”, we don’t really mean to turn the Earth into a paper clip factory of a planetary scale.”

I remember a point where I would have said “yeah, I guess this makes sense, but some people seem to disagree, so I don’t know”. Now I can explain why it is not reasonable. So in that sense, I have learned something. (Actionable note to self: contact the professor responsible for the course and ask him why they put this phrase in the material. He is a very nice person so I think he would at least explain it to me.)

But I am a bit at loss on why people in the AI safety field think it is possible to build safe AI systems in the first place. I guess as long as it is not proven that the properties of safe AI systems are contradictory with each other, you could assume it is theoretically possible. When it comes to ML, the best performance in practice is sadly often worse than the theoretical best.

Pessimism about the difficulty of the alignment problem is quite natural to me. I wonder if some people who are more optimistic about technology in general find AI safety materials so engaging because they at some point thought AI alignment could be a lot easier than it is. I find it hard to empathize with the people Yudkowsky first designed the AI box thought experiment for. As described in the beginning of this text, I would not spontaneously think that a superintelligent being was unable to manipulate me if it wanted to.

What I might think of AI risk

As you might have noticed, it is quite hard for me to form good views on AI risk. But I have some guesses of views that might describe what I think:

What now?

Possible next steps

To summarize, so far I have tried reading about AI safety to either understand why people are so convinced about it or find out where we disagree. This has not worked out. By writing this text, it became clear to me that there are social and emotional issues preventing me from forming an opinion about AI safety. I have already started working on them by discussing them with my friends.

I have already mentioned some actionable points throughout the text in the relevant contexts. The most important one:

If you (yes, you!) are interested in a discussion like that, feel free to message me anytime!

Other things I already mentioned were:

Additional things I might do next are:

Why the answer matters

I have spent a lot of time trying to figure out what my view on AI safety is, and I still don’t have a good answer. Why not give up, decide to remain undecided and do something else?

Ultimately, this has to do with what I think the purpose of EA is. You need to know what you are doing, because if you don’t, you cannot do good. You can try, but in the worst case you might end up causing a lot of damage. 

And this is why EA is a license to care: the permission to stop resisting the urge to save the world, because it promises that if you are careful and plan ahead, you can do it in a way that actually helps. Maybe you can’t save everyone. Maybe you’ll make mistakes. But you are allowed to do your best, and regardless of whether you are a Good Person™ (or an altruistic and effective person) it will help.

As long as I don’t know how important AI safety is, I am not going to let myself actually care about it, only about estimating its importance. 

I wonder if this, too, is risk aversion – a lot of AI safety enthusiasts seem to emphasize that you have to be able to cope with uncertainty and take risks if you want to do the most good. Maybe this attitude towards risk and uncertainty is actually the crux between me and AI safety enthusiasts I’m having such a hard time to find? 

But I’m obviously not going to believe something I do not believe just to avoid seeming risk averse. Until I can be sure enough that the action I’m taking is going in the right direction, I am going to keep being careful. 


Comments sorted by top scores.

comment by richard_ngo · 2022-04-18T18:59:25.270Z · EA(p) · GW(p)

I really liked this post. I've often felt frustrated by how badly the alignment community has explained the problem, especially to ML practitioners and researchers, and I personally find neither Superintelligence nor Human Compatible very persuasive. For what it's worth, my default hypothesis is that you're unconvinced by the arguments about AI risk in significant part because you are applying an usually high level of epistemic rigour, which is a skill that seems valuable to continue applying to this topic (including in the case where AI risk isn't important, since that will help us uncover our mistake sooner). I can think of some specific possibilities, and will send you a message about them.

The frustration I mentioned was the main motivation for me designing the AGISF course; I'm now working on follow-up material to hopefully convey the key ideas in a simpler and more streamlined way (e.g. getting rid of the concept of "mesa-optimisers"; clarifying the relationship between "behaviours that are reinforced because they lead to humans being mistaken" and "deliberate deception"; etc). Thanks for noting the "deception" ambiguity in the AGI safety fundamentals curriculum - I've replaced it with a more careful claim (details in reply to this comment).

Replies from: richard_ngo, weeatquince, Ada-Maaria Hyvärinen
comment by richard_ngo · 2022-04-18T18:59:45.750Z · EA(p) · GW(p)

Old: "The techniques discussed this week showcase a tradeoff between power and alignment: behavioural cloning provides the fewest incentives for misbehaviour, but is also hardest to use to go beyond human-level ability. Whereas reward modelling can reward agents for unexpected behaviour that leads to good outcomes (as long as humans can recognise them) - but this also means that those agents might find and be rewarded for manipulative or deceptive actions. Christiano et al. (2017) provide an example of an agent learning to deceive the human evaluator; and Stiennon et al. (2020) provide an example of an agent learning to “deceive” its reward model. Lastly, while IRL could in theory be used even for tasks that humans can’t evaluate, it relies most heavily on assumptions about human rationality in order to align agents."

New: "The techniques discussed this week showcase a tradeoff between power and alignment: behavioural cloning provides the fewest incentives for misbehaviour, but is also hardest to use to go beyond human-level ability. Reward modelling, by contrast, can reward agents for unexpected behaviour that leads to good outcomes - but also rewards agents for manipulative or deceptive actions. (Although deliberate deception is likely beyond the capabilities of current agents, there are examples of simpler behaviours have a similar effect: Christiano et al. (2017) describes an agent learning behaviour which misled the human evaluator; and Stiennon et al. (2020) describes an agent learning behaviour which was misclassified by its reward model.) Lastly, while IRL can potentially be used even for tasks that humans can’t evaluate, the theoretical justification for why this should work relies on implausibly strong assumptions about human rationality."

comment by weeatquince · 2022-04-25T06:34:07.106Z · EA(p) · GW(p)

my default hypothesis is that you're unconvinced by the arguments about AI risk in significant part because you are applying an usually high level of epistemic rigour

This seems plausible to me, based on:

  • The people I know who have thought deeply about AI risk and come away unconvinced often seems to match this pattern.
  • I think some of the people who care most about AI risk apply a lower level of epistemic rigour than I would, e.g. some seem to have much stronger beliefs about how the future will go than I think can be reasonably justified.
comment by Ada-Maaria Hyvärinen · 2022-04-20T05:41:04.432Z · EA(p) · GW(p)

Interesting to hear your personal opinion on the persuasiveness of Superintelligence and Human Compatible! And thanks for designing the AGISF course, it was useful.

Replies from: richard_ngo
comment by richard_ngo · 2022-04-21T01:16:41.441Z · EA(p) · GW(p)

Superintelligence doesn't talk about ML enough to be strongly persuasive given the magnitude of the claims it's making (although it does a reasonable job of conveying core ideas like the instrumental convergence thesis and orthogonality thesis, which are where many skeptics get stuck).

Human Compatible only spends, I think, a couple of pages actually explaining the core of the alignment problem (although it does a good job at debunking some of the particularly bad responses to it). It doesn't do a great job at linking the conventional ML paradigm to the superintelligence paradigm, and I don't think the "assistance games" approach is anywhere near as promising as Russell makes it out to be.

Replies from: rohinmshah
comment by Rohin Shah (rohinmshah) · 2022-04-24T09:40:37.550Z · EA(p) · GW(p)

I wish you would summarize this disagreement with Russell as "I think neural networks / ML will lead to AGI whereas Russell expects it will be something else". Everything else seems downstream of that. (If I had similar beliefs about how we'd get to AGI as Russell, and I was forced to choose to work on some existing research agenda, it would be assistance games. Though really I would prefer to see if I could transfer the insights from neural network / ML alignment, which might then give rise to some new agenda.)

This seems particularly important to do when talking to someone who also thinks neural networks/ ML will not lead to AGI.

Replies from: Habryka
comment by Habryka · 2022-04-24T23:50:34.141Z · EA(p) · GW(p)

FWIW, I don't think the problem with assistance games is that it assumes that ML is not going to get to AGI. The issues seem much deeper than that (mostly of the "grain of truth" sort, and from the fact that in CIRL-like formulations, the actual update-rule for how to update your beliefs about the correct value function is where 99% of the problem lies, and the rest of the decomposition doesn't really seem to me to reduce the problem very much, but instead just shunts it into a tiny box that then seems to get ignored, as far as I can tell).

Replies from: rohinmshah
comment by Rohin Shah (rohinmshah) · 2022-04-25T08:11:42.505Z · EA(p) · GW(p)

The issues seem much deeper than that (mostly of the "grain of truth" sort, and from the fact that in CIRL-like formulations, the actual update-rule for how to update your beliefs about the correct value function is where 99% of the problem lies, and the rest of the decomposition doesn't really seem to me to reduce the problem very much

Sounds right, and compatible with everything I said? (Not totally sure what counts as "reducing the problem", plausibly I'd disagree with you there.)

Like, if you were trying to go to the Moon, and you discovered the rocket equation and some BOTECs said it might be feasible to use, I think (a) you should be excited about this new paradigm for how to get to the Moon, and (b) "99% of the problem" still lies ahead of you, in making a device that actually uses the rocket equation appropriately.

Is there some other paradigm for AI alignment (neural net based or otherwise) that you think solves more than "1% of the problem"? I'll be happy to shoot it down for you.

instead just shunts it into a tiny box that then seems to get ignored, as far as I can tell

This is definitely a known problem. I think you don't see much work on it because (a) there isn't much work on assistance games in general (my outsider impression is that many CHAI grad students are focused on neural nets), and (b) it's the sort of work that is particularly hard to do in academia.

Replies from: Habryka, richard_ngo
comment by Habryka · 2022-04-27T17:53:20.097Z · EA(p) · GW(p)

Some abstractions that feel like they do real work on AI Alignment (compared to CIRL stuff): 

  • Inner optimization
  • Intent alignment vs. impact alignment
  • Natural abstraction hypothesis
  • Coherent Extrapolated Volition
  • Instrumental convergence
  • Acausal trade

None of these are paradigms, but all of them feel like they do substantially reduce the problem, in a way that doesn't feel true for CIRL. It is possible I have a skewed perception of actual CIRL stuff, based on your last paragraph though, so it's plausible we are just talking about different things.

Replies from: rohinmshah
comment by Rohin Shah (rohinmshah) · 2022-04-28T06:48:33.316Z · EA(p) · GW(p)

Huh. I'd put assistance games above all of those things (except inner optimization but that's again downstream of the paradigm difference; inner optimization is much less of a thing when you aren't getting intelligence through a giant search over programs). Probably not worth getting into this disagreement though.

comment by richard_ngo · 2022-04-25T23:15:47.044Z · EA(p) · GW(p)

I don't think that my main disagreement with Stuart is about how we'll reach AGI, because critiques of his approach, like this page, don't actually require any assumption that we're in the ML paradigm.

Whether AGI will be built in the ML paradigm or not, I think that CIRL does less than 5%, and probably less than 1%, of the conceptual work of solving alignment; whereas the rocket equation does significantly more than 5% of the conceptual work required to get to the moon. And then in both cases there's lots of engineering work required too. (If AGI will be built in a non-ML paradigm, then getting 5% of the way to solving alignment probably requires actually making claims about whatever the replacement-to-ML paradigm is, which I haven't seen from Stuart.)

But Stuart's presentation of his ideas seems wildly inconsistent with both my position and your position above (e.g. in Human Compatible he seems way more confident in his proposal than would be justified by having gotten even 5% of the way to a solution).

Replies from: rohinmshah
comment by Rohin Shah (rohinmshah) · 2022-04-26T08:21:09.582Z · EA(p) · GW(p)

I don't think that my main disagreement with Stuart is about how we'll reach AGI, because critiques of his approach, like this page, don't actually require any assumption that we're in the ML paradigm.

I agree that single critique doesn't depend on the ML paradigm. If that's your main disagreement then I retract my claim that it's downstream of paradigm disagreements.

What's your probability that if we really tried to get the assistance paradigm to work then we'd ultimately conclude it was basically doomed because of this objection? I'm at like 50%, such that if there were no other objections the decision would be "it is blindingly obvious that we should pursue this".

I think that CIRL does less than 5%, and probably less than 1%, of the conceptual work of solving alignment; whereas the rocket equation does significantly more than 5% of the conceptual work required to get to the moon.

I might disagree with this but I don't know how you're distinguishing between conceptual and non-conceptual work. (I'm guessing I'll disagree with the rocket equation doing > 5% of the conceptual work.)

If AGI will be built in a non-ML paradigm, then getting 5% of the way to solving alignment probably requires actually making claims about whatever the replacement-to-ML paradigm is, which I haven't seen from Stuart.

I don't think this is particularly relevant to the rest of the disagreement, but this is explicitly discussed in Human Compatible! It's right at the beginning of my summary of it!

But Stuart's presentation of his ideas seems wildly inconsistent with both my position and your position above (e.g. in Human Compatible he seems way more confident in his proposal than would be justified by having gotten even 5% of the way to a solution).

Are you reacting to his stated beliefs or the way he communicates?

If you are reacting to his stated beliefs: I'm not sure where you get this from. His actual beliefs (as stated in Human Compatible) are that there are lots of problems that still need to be solved. From my summary:

Another problem with inferring preferences from behavior is that humans are nearly always in some deeply nested plan, and many actions don't even occur to us. Right now I'm writing this summary, and not considering whether I should become a fireman. I'm not writing this summary because I just ran a calculation showing that this would best achieve my preferences, I'm doing it because it's a subpart of the overall plan of writing this bonus newsletter, which itself is a subpart of other plans. The connection to my preferences is very far up. How do we deal with that fact?

There are perhaps more fundamental challenges with the notion of "preferences" itself. For example, our experiencing self and our remembering self may have different preferences -- if so, which one should our agent optimize for? In addition, our preferences often change over time: should our agent optimize for our current preferences, even if it knows that they will predictably change in the future? This one could potentially be solved by learning meta-preferences that dictate what kinds of preference change processes are acceptable.

All of these issues suggest that we need work across many fields (such as AI, cognitive science, psychology, and neuroscience) to reverse-engineer human cognition, so that we can put principle 3 into action and create a model that shows how human behavior arises from human preferences.

If you are reacting to how he communicates: I don't know why you expect him to follow the norms of the EA community and sprinkle "probably" in every sentence. That's not the norms that the broader world operates under; he's writing for the broader world.

comment by Gavin (technicalities) · 2022-04-17T14:51:37.203Z · EA(p) · GW(p)

Thanks for this honest account; I think it's extremely helpful to see where we're failing to communicate. It also took me a long time (like 3 years) to really understand the argument and to act on it.

At the risk of being another frustrating person sending you links: I wrote a post which attempts to communicate the risk using empirical examples, rather than grand claims about the nature of intelligence and optimisation. (But obviously the post needs to extrapolate from its examples, and this extrapolation might fall foul of the same things that make you sceptical / confused already.) Several people have found it more intuitive than the philosophical argument.

Happy to call to discuss!

Replies from: Ada-Maaria Hyvärinen
comment by Ada-Maaria Hyvärinen · 2022-04-20T13:55:25.950Z · EA(p) · GW(p)

Generally, I find links a lot less frustrating if they are written by the person who sends me the link :) But now I have read the link you gave and don't know what I am supposed to do next, which is another reason I sometimes find linksharing a difficult means of communication. Like, do I comment on specific parts on your post, or describe how reading it influenced me, or how does the conversation continue? (If you find my reaction interesting: I was mostly unmoved by the post, I think I had seen  most of the numbers and examples before, there were some sentences and extrapolations that were quite off-putting for me but I think "minimalistic" style was nice.)

It would be nice to call and discuss if you are interested.

Replies from: technicalities
comment by Gavin (technicalities) · 2022-05-05T08:59:44.168Z · EA(p) · GW(p)

Well, definitely tell me what's wrong with the post - and optionally tell me what's good about it (:

There's a Forum version here [EA · GW] where your comments will have an actual audience, sounds valuable.

comment by NinaR · 2022-04-18T20:31:16.243Z · EA(p) · GW(p)

Meta-level comment: this post was interesting, very well written and I could empathize with a lot of it, and in fact, it inspired me to make an account on here in order to comment : )

Object-level comment (ended up long, apologies!): My personal take is that a lot of EA literature on AI Safety (eg: forum articles) uses terminology that overly anthropomorphizes AI and skips a lot of steps in arguments, assuming a fair amount of prerequisite knowledge/ familiarity with jargon. When reading such literature, I try to convert the "EA AI Safety language" into "normal language" in my head in order to understand the claims better. Overall my answer to “why is AI safety important” is (currently) the following:

  • Humans are likely to develop increasingly powerful AI systems in order to solve important problems / provide in-demand services.
  • As AI systems become more powerful they can do more stuff / affect more change in the world. Somewhat because we’ll want them to. Why develop a powerful AI if not to do important/hard things?
  • As AI systems become more powerful it will be harder for us to ensure they do what we want them to do when deployed. Imo this claim needs more justification, and I will elaborate below.
  • Therefore, there is some probability that a powerful AI does things we don’t want. Hopefully, this follows from the above 3 premises.

AI Safety literature commonly refers to AI’s as optimizers. Terms like “mesa-optimizer” are used a lot as well. There is also a fair amount of anthropomorphizing language such as “the AI will want to”, “the AI will secretly plan to”, and “the AI will like/dislike”. By now, I am ok with parsing such statements, but it can be confusing/distracting. “This thing is an optimizer” is a useful phrase when trying to predict the thing’s behavior, but looking through the lens of “what kind of optimizer is this” isn’t always the clearest way to look at the problem. The same can be said for the anthropomorphizing language. When someone says something like “so how do we check whether an AI secretly wants to kill you”, they are trying to summarise a phenomenon succinctly without giving the details/context, however, they are incurring the cost of being less precise and clear (and sounding weird).

Here’s my attempt to explain point 3 (As AI systems become more powerful it will be harder for us to ensure they do what we want them to do when deployed) without using EA-isms. 

  • The simplest possible description of an AI is an instance of a program that takes some inputs and produces some outputs.
  • Training an AI is the process of searching over different possible programs to find ones that “are good” by looking at which ones produce “good outputs for their inputs”. Another more general phrasing is searching over different possible programs to find ones that “are good” by using some kind of filter that can classify programs as good and bad.
  • “Is this a good output” or “is this program instance good” is easy to check when the problem is simple but harder to check when the problem is hard  (some people call this the outer alignment problem). For example, “is the outputted animal image labeled correctly” is easy to check but “does the outputted drug treat the disease” is harder to check. It is plausible to me that we will want “is good” to encapsulate more and more human values as the problems become harder, which makes it *even* harder to define. For example what if the criteria is “does the outputted intervention reduce CO2 emissions without causing anything that humans find morally wrong”.
  • “good outputs for their inputs” / “the AI has all the properties we designed into the filter” is harder to check, the more complex the domain of possible inputs and outputs we are dealing with (some people call this the inner alignment problem). A maths analogy is function approximation. Say you have a really complicated function with a lot of local minima/maxima/wobbliness. If we are testing an approximation by checking various points in the function space and the approximated function space and seeing if they match up, the less smooth the function space the more points we need to test to see if they match up. With infinite training data, we could check every point in the function space and be certain that outputs are good for all inputs, however, this is impossible, therefore we cannot be certain, and there will always be some probability of a bad output for a particular input. If an AI is more powerful, the effect of one bad output could be really bad.

All this being said, I maintain a significant amount of skepticism around AI safety being the most important problem. I think nuclear risk and biorisk are very important and also more tractable. Based on my current model, I place a ~10% probability of AI-related existential risk in the next century, conditional on no other existential risk occurring beforehand. I think there is a much higher probability of non-existential risk, and I care about that too. 

Currently, the most tractable way I can think of AI safety technical research is “let’s make ascertaining whether an AI output is good X% easier” and “let’s make it Y% easier to infer whether an AI is actually as good as it seems in training given limited training data and limited compute”. Research that increases X and Y would likely decrease AI risk. This is a hard problem, and many solutions may not be scalable to more powerful AI’s. This is likely why a lot of AI safety literature uses metaphors and language that seem far removed from current systems. I think this is an attempt to imagine a good map for the territory of future AI’s in order to come up with solutions that could work in the future.

I also have meta-level uncertainty around my takes and update in the direction of thinking that AI Safety is more important than I otherwise would because many intelligent people I respect think this. Because of this, I make decisions based on AI risk being higher than my internal model currently estimates.  I also spend more time thinking about AI safety because I find it interesting and I have a somewhat suitable background (I like programming and ML). I do think a major factor in deciding whether to work on AI safety should be personal fit.

comment by EdoArad (edoarad) · 2022-04-17T16:58:14.252Z · EA(p) · GW(p)

Super interesting and thoughtful post, and also exceptionally well written. I can relate to many points here, and the timing of this post is perfect in the context of discussions we have at Israel and of the ongoing developments in the global movement.

I may respond to some object-level matters later, but just wanted to say that I really hope you keep on thinking and writing! It'd particularly be interesting to read whatever "Intro to AI Safety" you may end up writing :)

Replies from: Ada-Maaria Hyvärinen
comment by Ada-Maaria Hyvärinen · 2022-04-20T12:56:33.247Z · EA(p) · GW(p)

Glad it may have invoked some ideas for any discussions you might be having at Israel :) For us in Finland, I feel like I at least personally need to get some more clarity on how to balance EA movement building efforts and possible cause priorization related differences between movement builders. I think this is non-trivial because forming a consensus seems hard enough.

Curious to read any object-level response if you feel like writing one! If I end up writing any "Intro to AI Safety" thing it will be in Finnish so I'm not sure if you will understand it (it would be nice to have at least one coherent Finnish text about it that is not written by an astronomer or a paleontologist but by some technical person). 

comment by MHarris · 2022-04-17T12:19:19.630Z · EA(p) · GW(p)

I'm certain EA would welcome you, whether you think AI is an important x-risk or not.

If you do continue wrestling with these issues, I think you're actually extremely well placed to add a huge amount of value as someone who is (i) ML expert, (ii) friendly/sympathetic to EA, (iii) doubtful/unconvinced of AI risk. It gives you an unusual perspective which could be useful for questioning assumptions.

From reading this post, I think you're temperamentally uncomfortable with uncertainty, and prefer very well defined problems. I suspect that explains why you feel your reaction is different to others'.

"But I find it really difficult to think somewhere between concrete day-to-day AI work and futuristic scenarios. I have no idea how others know what assumptions hold and what don’t." - this is the key part, I think.

"I feel like it would be useful to write down limitations/upper bounds on what AI systems are able to do if they are not superintelligent and don’t for example have the ability to simulate all of physics (maybe someone has done this already, I don’t know)" - I think it would be useful and interesting to explore this. Even if someone else has done this, I'd be interested in your perspective.

Replies from: Ada-Maaria Hyvärinen, colin
comment by Ada-Maaria Hyvärinen · 2022-04-17T12:47:16.484Z · EA(p) · GW(p)

Thanks for the nice comment! Yes, I am quite uncomfortable with uncertainty and trying to work on that. Also, I feel like by now I am pretty involved in EA and ultimately feel welcome enough to be able to post a story like this in here (or I feel like EA apprechiates different views enough despite also feeling this pressure to conform at the same time). 

comment by colin · 2022-04-20T16:53:15.252Z · EA(p) · GW(p)

"I feel like it would be useful to write down limitations/upper bounds on what AI systems are able to do if they are not superintelligent and don’t for example have the ability to simulate all of physics (maybe someone has done this already, I don’t know)" - I think it would be useful and interesting to explore this. Even if someone else has done this, I'd be interested in your perspective.

I want to strongly second this!  I think that a proof of the limitations of ML under certain constraints would be incredibly useful to narrow the area in which we need to worry about AI safety or at least limit the types of safety questions that need to be addressed in that subset of ML

comment by mic (michaelchen) · 2022-04-18T10:39:09.677Z · EA(p) · GW(p)

But I am a bit at loss on why people in the AI safety field think it is possible to build safe AI systems in the first place. I guess as long as it is not proven that the properties of safe AI systems are contradictory with each other, you could assume it is theoretically possible. When it comes to ML, the best performance in practice is sadly often worse than the theoretical best.

To me, this belief that AI safety is hard or impossible would imply that AI x-risk is quite high. Then, I'd think that AI safety is very important but unfortunately intractable. Would you agree? Or maybe I misunderstood what you were trying to say.

I agree that x-risk from AI misuse is quite underexplored.

For what it's worth, AI safety and governance researchers do assign significant probability to x-risk from AI misuse. AI Governance Week 3 — Effective Altruism Cambridge comments:

For context on the field’s current perspectives on these questions, a 2020 survey of AI safety and governance researchers (Clarke et al., 2021) [AF · GW] found that, on average [1], researchers currently guess there is: [2]

A 10% chance of existential catastrophe from misaligned, influence-seeking AI [3]

A 6% chance of existential catastrophe from AI-exacerbated war or AI misuse

A 7% chance of existential catastrophe from “other scenarios”

Replies from: Ada-Maaria Hyvärinen
comment by Ada-Maaria Hyvärinen · 2022-04-20T05:48:41.383Z · EA(p) · GW(p)

I think you understood me in the same way than my friend did in the second part of the prolog, so I apparently give this impression. But to clarify, I am not certain that AI safety is impossible (I think it is hard, though), and the implications of that depend a lot on how much power the AI systems will be given at the end, and what part of the damage they might cause is due to them being unsafe and what for example misuse, like you said. 

comment by Geoffrey Irving (irving) · 2022-04-17T19:21:43.015Z · EA(p) · GW(p)

As somehow who works on AGI safety and cares a lot about it, my main conclusion from reading this is: it would be ideal for you to work on something other than AGI safety! There are plenty of other things to work on that are important, both within and without EA, and a satisfactory resolution to “Is AI risk real?” doesn’t seem essential to usefully pursue other options.

Nor do I think this is a block to comfortable behavior as an EA organizer or role model: it seems fine to say “I’ve thought about X a fair amount but haven’t reached a satisfactory conclusion”, and give people the option of looking into it themselves or not. If you like, you could even say “a senior AGI safety person has given me permission to not have a view and not feel embarrassed about it.”

Replies from: Ada-Maaria Hyvärinen, Linch, colin, DaneelO
comment by Ada-Maaria Hyvärinen · 2022-04-20T14:22:40.843Z · EA(p) · GW(p)

Thanks for giving me permission, I guess can use this if I need ever the opinion of "the EA community" ;)

However, I don't think I'm ready to give up on trying to figure out my stance on AI risk just yet, since I still estimate it is my best shot in forming a more detailed understanding on any x-risk, and understanding x-risks better would be useful for establishing better opinions on other cause priorization issues.

Replies from: irving
comment by Geoffrey Irving (irving) · 2022-04-22T16:50:53.694Z · EA(p) · GW(p)

That is also very reasonable!  I think the important part is to not feel to bad about the possibility of never having a view (there is a vast sea of things I don't have a view on), not least because I think it actually increases the chance of getting to the right view if more effort is spent.

(I would offer to chat directly, as I'm very much part of the subset of safety close to more normal ML, but am sadly over capacity at the moment.)

comment by Linch · 2022-04-18T13:27:29.536Z · EA(p) · GW(p)

On the one hand I agree with this being very likely the most prudent action from OP to take from her perspective, and probably the best action for the world as well. On the other, I think I feel a bit sad to miss some element of...combativeness(?)... in my perhaps overly-nostalgic memories of the earlier EA culture, where people used to be much more aggressive about disagreements with cause and intervention prioritizations. 

It feels to me that people are less aggressive about disagreeing with established consensus or strong viewpoints that other EAs have, and are somewhat more "live and let live" about both uses of money and human capital.  I sort of agree with this being the natural evolution of our movement's emphases (longtermism is harder to crisply argue about than global health, money is more liquid/fungible than human capital). But I think I feel some sadness re: the decrease in general combativeness and willingness to viciously argue about causes. 

This is related to an earlier post about the EA community becoming a "big tent [EA · GW]," which at the time I didn't agree with but now I'm warning up to.

Replies from: irving
comment by Geoffrey Irving (irving) · 2022-04-18T13:44:19.802Z · EA(p) · GW(p)

I think the key here is that they’ve already spent quite a lot of time investigating the question. I would have a different reaction without that. And it seems like you agree my proposal is best both for the OP and the world, so perhaps the real sadness is about the empirical difficulty at getting people to consensus?

At a minimum I would claim that there should exist some level of effort past which you should not be sad not arguing, and then the remaining question is where the threshold is.

Replies from: irving
comment by Geoffrey Irving (irving) · 2022-04-18T13:49:13.720Z · EA(p) · GW(p)

(I’m happy to die on the hill that that threshold exists, if you want a vicious argument. :))

comment by colin · 2022-04-20T17:10:39.167Z · EA(p) · GW(p)

it would be ideal for you to work on something other than AGI safety!

I disagree. Here is my reasoning:

  • Many people that have extensive ML knowledge are not working on safety because either they are not convinced of its importance or because they haven't fully wrestled with the issue
  • In this post, Ada-Maaria articulated the path to her current beliefs and how current AI safety communication has affected her.
  • She has done a much more rigorous job of evaluating the pervasiveness of these arguments than anyone else I've read
  • If she continues down this path she could either discover what unstated assumptions the AI safety community has failed to communicate or potentially the actual flaws in the AI safety argument.
  • This will either make it easier for AI Safety folks to express their opinions or uncover assumptions that need to be verified.
  • Either would be valuable!
comment by DaneelO · 2022-04-18T19:58:57.851Z · EA(p) · GW(p)

edit: I don't have a sense of humor

"a senior AGI safety person has given me permission to not have a view and not feel embarrassed about it."

For a lack of a better word, this sound cultish to me, why would one need permission "from someone senior" to think or feel anything? If someone said this to me it would be a red flag about the group/community.

I think your first suggestion ("I’ve thought about X a fair amount but haven’t reached a satisfactory conclusion") sounds much more reasonable, if OP feels like that reflects their opinion. But I also think that something like "I don't personally feel  convinced by the AGI risk arguments, but many others disagree, I think you should read up on it more and reach your own conclusions", is much more reasonable than your second suggestion. I think we should welcome different opinions, as long as someone agrees with the main EA principles they are an EA, it should not be about agreeing completely with cause A, B and C. 

Sorry if I am over-interpreting your suggestion as implying much more than you meant, I am just giving my personal reaction. 

Disclaimer: long time lurker, first time poster. 

Replies from: irving
comment by Geoffrey Irving (irving) · 2022-04-18T22:02:41.575Z · EA(p) · GW(p)

Yep, that’s very fair. What I was trying to say was that if in response to the first suggestion someone said “Why aren’t you deferring to others?” you could use that as a joke backup, but agreed that it reads badly.

Replies from: DaneelO
comment by DaneelO · 2022-04-18T22:37:03.929Z · EA(p) · GW(p)

Makes a lot of sense :D I just didn't get the joke, which I in hindsight probably should have... :P 

comment by Caleb Biddulph (caleb-biddulph) · 2022-04-20T21:09:44.859Z · EA(p) · GW(p)

Hi Ada, I'm glad you wrote this post! Although what you've written here is pretty different from my own experience with AI safety in many ways, I think I got some sense of your concerns from reading this.

I also read Superintelligence as my first introduction to AI safety, and I remember pretty much buying into the arguments right away.[1] Although I think I understand that modern-day ML systems do dumb things all the time, this intuitively weighs less on my mind than the idea that AI can in principle be much smarter than humans, and that sooner or later this will happen. When I look specifically at the cutting-edge of modern AI tech like GPT-3, I feel like this supports my view pretty strongly, but I don't think I could give you a knockdown explanation for why typical modern AI doing dumb things seems less important; this is just my intuition. Usually, intuitions can be tested by seeing how well they make predictions, but the really inconvenient thing about statements about TAI is that they can never be validated.

As I've talked to people at EAGxBoston and EAG London, I've started to realize that my intuitions seem to be doing a lot of heavy lifting that I don't feel fully able to explain. Ironically, the more I learn about AI safety, the less I feel that I have principled inside views on questions like "what research avenues are the most important" and "what year will transformative AI happen." I've realized that I pretty much just defer to the weighted average opinion of various EA people who I respect. This heuristic is intuitive to me, but it also seems kind of bad.

I feel like if I really knew what I was talking about, I would be able to come up with novel and clever arguments for my beliefs and talk about them with utmost confidence, like Eliezer Yudkowsky with his outspoken conviction that we're all doomed; or I'd have a unique and characteristic view on what we can do to decrease AI risk, like Chris Olah with interpretability. Instead, I just have a bunch of intuitions, which to the extent they can be put into words, just boil down to silly-sounding things like, "GPT-3 seems really impressive, and AlexNet happened just 10 years ago and was less impressive. 'An AI that can do competent AI research' is really, really impressive, so maybe that will happen in... eh, I want to be conservative, so 20 years?"

Based on your post, I'm guessing maybe you have a similar perspective, but are coming at it from the opposite direction: you have intuitions that AI is not so big of a deal, but aren't really sure of the reasons for your views. Does that seem accurate?

Maybe my best-guess takeaway for now is that a lot of the differences between people who disagree about speculative things like this is differing priors, which might not be based in specific, articulable, and concrete arguments. For instance, maybe I'm optimistic about the value of space colonization because I read The Long Way to a Small Angry Planet, which presents a vision of a utopian interspecies galactic civilization that appeals to me, but doesn't make logical arguments for how it would work. Maybe I think that a sufficient amount of intelligence will be able to do really crazy things because I spent a lot of time as a kid trying to prove to people that I was smart and it's important to my identity. Or maybe I just believe these things because they're correct. I'm not sure I can tell.

I believe that as a community, we should really try to encourage a wide range of intuitions (as long as those intuitions haven't clearly been invalidated by evidence). The value of diverse perspectives in EA isn't a new idea, but if it's true that priors do a lot of work in whether people believe speculative arguments, it could be all the more important. Otherwise, there could be a strong self-selection effect for people who find EA's current speculations intuitive, since people who don't have articulable reasons for disagreement won't have much in the way to defend their beliefs, even if their priors are in fact well-founded.

  1. ^

    The claim that simulating all of physics would be “more easily implementable” than a standard friendly AI does seem pretty ridiculous to me now, though I'm not sure it accurately reflects his original point? I think the argument had more to do with considering counterfactuals rather than actually carrying out a simulation. I would still agree that this is pretty weird and abstract, though I don't think this point is that relevant anyway.

Replies from: Ada-Maaria Hyvärinen
comment by Ada-Maaria Hyvärinen · 2022-04-21T15:57:59.550Z · EA(p) · GW(p)

Hi Caleb! Very nice to read your reflection on what might make you think what you think. I related to many things you mentioned, such as wondering how much I think intelligence matters because of having wanted to be smart as a kid.

You understood correctly that intuitively, I think AI is less of a big deal than some people feel. This probably has a lot to do with my job, because it includes making estimates on if problems can be solved with current technology given certain constraints, and it is better to err to the side of caution. Previously, one of my tasks was also to explain people why AI is not a silver bullet and that modern ML solutions require things like training data and interfaces in order to be created and integrated to systems. Obviously, if the task is to find out all things that can future AI systems might be able to do at some point, you should take a quite different attitude than when trying to estimate what you yourself can implement right now. This is why I try to take a less conservative approach than would come naturally to me, but I think it still comes across as pretty conservative compared to many AI safety folks.

I also find GPT-3 fascinating but I think the feeling I get from it is not "wow, this thing seems actually intelligent" but rather "wow, statistics can really encompass so many different properties of language". I love language so it makes me happy.  But to me, it seems that GPT-3 is ultimately a cool showcase of the current data-centered ML approaches ("take a model that is based on a relatively non-complex idea[1], pour a huge amount of data into it, use model"). I don't see it as a direct stepping stone to science-automating AI, because it is my intuition that "doing science well" is not that well encompassed in the available training data. (I should probably reflect more on what the concrete difference is.)

Importantly, this does not mean I believe there can be no risks (or benefits!) from large language models, and models that will be developed in the near future.

I think it is very hard to be aware of your intuitions, incorporate new valid information to your world view and communicate with others at the same time. But I agree that for everyone it is better if we create better opportunities to do that, because otherwise we will lose information.

  1. ^

    not to say non-complexity would make the model somehow insignificant, quite the opposite, it is fascinating what attention mechanisms accomplish not only in NLP but on other domains as well

comment by Jeremy (captainjc) · 2022-04-17T17:47:12.110Z · EA(p) · GW(p)

I really appreciated this post as well. One thought I had while reading it - there is at least one project to red team EA ideas getting off the ground. Perhaps that’s something that would be interesting to you and could come closer to helping you form you views. Obviously, it would not be a trivial time commitment, but it seems like you are very much qualified to tackle the subject.

comment by weeatquince · 2022-04-25T06:50:23.489Z · EA(p) · GW(p)

I thought this post was wonderful. Very interestingly written thoughtful and insightful. Thank you for writing. And good luck with your next steps of figuring out this problem. It makes me want to write something similar, I have been in EA circles for a long time now and  to some degree have also failed to form strong views on AI safety. Also I thought your next steps were fantastic and very sensible, I would love to hear your future thoughts on all of those topics.


On your next steps, picking up on:  

To evaluate the importance of AI risk against other x-risk I should know more about where the likelihood estimates come from.

I was thinking of something similar to compare: bio risk, AI risk, and unknown unknow risks. However I was thinking if I was putting time into this I would not focus solely on understanding the likelihood estimates but would look for a broad range of evidence. E.g. on AI and bio could compare the risks by looking at: what are the limitations on what AI/bio systems are able to do, what do experts in this field think of the risks, are there good historical analogues for each risk type, how convincing are the case studies of the best things people are doing to prevent risk from AI/bio, how does the topic look on a scale neglectedness tractability comparison, etc, etc.

Anyway just my thoughts on this research topic. Do reach out if you dive into that direction and want to discuss more.

Replies from: Ada-Maaria Hyvärinen
comment by Ada-Maaria Hyvärinen · 2022-04-27T11:49:44.386Z · EA(p) · GW(p)

Thanks! And thank you for the research pointers.

comment by Aayush Kucheria (aayush-kucheria) · 2022-04-17T11:48:29.697Z · EA(p) · GW(p)

Maybe a typo: the second AI (EA) should be AI (Work)?

AI (EA) did not have to care about mundane problems such as “availability of relevant training data” or even “algorithms”: the only limit ever discussed was amount of computation, and that’s why AI (EA) was not there yet, but soon would be, when systems would have enough computational power to simulate human brains.

Btw, really like your writing style! :)

Replies from: Ada-Maaria Hyvärinen
comment by Ada-Maaria Hyvärinen · 2022-04-17T12:37:24.444Z · EA(p) · GW(p)

thanks Aayush! Edited the sentence to be hopefully more clear now :)

comment by Oliver Sourbut · 2022-04-20T18:51:59.696Z · EA(p) · GW(p)

I was one of the facilitators in the most recent run of EA Cambridge's AGI Safety Fundamentals course, and I also have professional DS/ML experience.

In my case I very deliberately emphasised a sceptical approach to engaging with all the material, while providing clarifications and corrections where people's misconceptions are the source of scepticism. I believe this was well-received by my cohort, all of whom appeared to engage thoughtfully and honestly with the material.

I think this is the best way to engage, when time permits, because (in brief)

  • many arguments invoke ill-defined terms, and we need to sharpen these
  • many arguments are (perhaps explicitly) speculative and empirically uncertain
  • even mathematically/empirically rigorous content has important modelling assumptions and experimental caveats
  • scepticism often produces better creative/generative engagement
  • collectively we will fail if our individual opinions are overly shaped by founder effects

I hope that this is a common perspective, but to the extent that it isn't, I wonder if this (especially the last point) may be a source of some of your confusing experiences.

I'd also say: it seems appropriate to have 'very messy views' if by that you mean uncertainty about where things are going and how to make them better! I think folks who don't are doing one of two things

  • mistakenly concentrating more hypothesis weight than their observations/thinking in fact justify (which is a bad idea)
  • engaging in a thinking manoeuvre something like 'temporary MAP stance' or 'subjective probability matching' (which may be a good idea, if done transparently)

'Temporary MAP stance' or 'subjective probability matching'

MAP is Maximum A Posteriori i.e. your best guess after considering evidence. Probability matching is making actions/guesses proportional to your estimate of them being right (rather than picking the single MAP choice)

By this manoeuvre I'm gesturing at a kind of behaviour where you are quite unsure about what's best (e.g. 'should I work on interpretability or demystifying deception?') and rather than allowing that to result in analysis paralysis, you temporarily collapse some uncertainty and make some concrete assumptions to get moving in one or other direction. Hopefully in so doing you a) make a contribution and b) grow your skills and collect new evidence to make better decisions/contributions next time.

Replies from: Ada-Maaria Hyvärinen
comment by Ada-Maaria Hyvärinen · 2022-04-21T09:33:51.310Z · EA(p) · GW(p)

I feel like everyone I have ever talked about AI safety with would agree on the importance of thinking critically and staying skeptical, and this includes my facilitator and cohort members from the AGISF programme. 

I think the 1.5h discussion session between 5 people who have read 5 texts  does not allow really going deep into any topics, since it is just ~3 minutes per participant per text on average. I think these kind of programs are great for meeting new people, clearing misconceptions and providing structure/accountability on actually reading the material, but they by nature are not that good for having in-depth debates. I think that's ok, but just to clarify why I think it is normal I probably did not mention most of the things I described on this post during the discussion sessions.

But there is an additional reason that I think is more important to me, which is differentiating between performing skepticism and actually voicing true opinions. It is not possible for my facilitator to notice which one I am doing because they don't know me, and performing skepticism (in order to conform to the perceived standard of "you have to think about all of this critically and by your own, and you will probably arrive to similar conclusions than others in this field") looks the same as actually raising the confusions you have. This is why I thought I can convey this failure mode to others by comparing to inner misalignment :) 

When I was a Math freshman my professor told us he always encourages people to ask questions during lectures. Often, it had happened that he'd explained a concept and nobody would ask anything. He'd check what the students understood, and it would turn out they did not grasp the concept. When asking why nobody asked anything, the students would say that they did not understand enough to ask a good question. To avoid this dynamic, he told us that "I did not understand anything" counts as a valid question on his lectures. It helped somewhat but at least I still often stayed silent instead of raising my hand and saying "I did not understand anything".

I feel like the same dynamic can easily happen when discussing AI safety (or any difficult EA concept, really). If people are encouraged to raise questions and concerns they might only raise the "good" ones, and stay silent if they feel like they just did not understand the concepts well enough (like I did in my avoidance strategy 1).

Replies from: Oliver Sourbut
comment by Oliver Sourbut · 2022-05-09T22:45:52.280Z · EA(p) · GW(p)

OK, this is the terrible terrible failure mode which I think we are both agreeing on (emphasis mine)

the perceived standard of "you have to think about all of this critically and by your own, and you will probably arrive to similar conclusions than others in this field"

By 'a sceptical approach' I basically mean 'the thing where we don't do that'. Because there is not enough epistemic credit in the field, yet, to expect that all (tentative, not-consensus-yet) conclusions to be definitely right.

In traditional/undergraduate mathematics, it's different - almost always when you don't understand or agree with the professor, she is simply right and you are simply wrong or confused! This is a justifiable perspective based on the enormous epistemic weight of all the existing work on mathematics.

I'm very glad you call out the distinction between performing skepticism and actually doing it.

Replies from: Ada-Maaria Hyvärinen
comment by Ada-Maaria Hyvärinen · 2022-05-30T13:08:55.090Z · EA(p) · GW(p)

Yeah, I think we agree on this, I think I want to write out more later on what communication strategies might help people actually voice scepticsm/concerns even if they are afraid of meeting some standards on elaborateness. 

My mathematics example actually tried to be about this: in my university, the teachers tried to make us forget the teachers are more likely to be right, so that we would have to think about things on our own and voice scepticism even if we were objectively likely to be wrong. I remember another lecturer telling us: "if you finish an excercise and notice you did not use all the assuptions in your proof, you either did something wrong or you came up with a very important discovery". I liked how she stated that it was indeed possible that a person from our freshman group could make a novel discovery, however unlikely that was.

The point is that my lecturers tried to teach that there is not a certain level you have to acquire before your opinions start to matter: you might be right even if you are a total beginner and the person you disagree with has a lot of experience. 

This is something I would like to emphasize when doing EA community building myself, but it is not very easy. I've seen this when I've taught programming to kids. If a kid asks me if their program is "done" or "good", I'd say "you are the programmer, do you think your program does what it is supposed to do", but usually the kids think it is a trick question and I'm just withholding the correct answer for fun. Adults, too, do not always trust that I actually value their opinion.

comment by Oliver Sourbut · 2022-04-20T18:29:45.861Z · EA(p) · GW(p)

Hey, as someone who also has professional CS and DS experience, this was a really welcome and interesting read. I have all sorts of thoughts but I had one main question

So I used the AGISF Slack to find people who had already had a background in machine learning before getting into AI safety and asked them what had originally convinced them. Finally, I got answers from 3 people who fit my search criteria. They mentioned some different sources of first hearing about AI safety (80 000 Hours and LessWrong), but all three mentioned one same source that had deeply influenced them: Superintelligence.

When I read this I remembered that I was one of the folks you reached out to on Slack! But I didn't mention Superintelligence at all (though in fact I have read it and have a generally good opinion of it, though it was several years ago). I guess I didn't fit your criteria quite right? In my case I had CS and a little academic AI, but no professional DS/ML experience, before getting 'into' AI safety. Interested to know what other people you spoke to and for what reasons they didn't fall into the criteria you were looking for.

Replies from: Ada-Maaria Hyvärinen
comment by Ada-Maaria Hyvärinen · 2022-04-20T19:17:10.457Z · EA(p) · GW(p)

That's right, thanks again for answering my question back then! 

Maybe I formulated my question wrong but I understood from your answer that you got first interested in AI safety, and only then on DS/ML (you mentioned you had had a CS background before but not your academic AI experience). This is why I did not include you in this sample of 3 persons - I wanted to narrow the search to people who had more AI specific background before getting into AI safety (not just CS). It is true that you did not mention Superintelligence either, but interesting to hear you also had a good opinion on it! If I would have known both your academic AI experience and that you liked Superintelligence I could have made the number to 4 (unless you think Superintelligence did not really influence you, then it would be 3 out of 4).

You were the only person who answered my PM but stated they got into AI safety before getting to DS/ML. One person did not answer, and the other 3 that answered stated they got into DS/ML before AI safety. I guess there are more than 6 people with some DS/ML background on the course channel but also know not everyone introduced themselves, so the sample size is very anecdotal anyway.

I also used the Slack to ask for recommendations of blog posts or similar stories on how people with DS/ML backgrounds got into AI safety. Aside of recommendations on who to talk on the Slack, I got pointers to Stuart Russell's interview on Sam Harris' podcast and a Yudkowsky post [LW · GW]. 

comment by Aaron_Scher · 2022-04-20T03:19:05.480Z · EA(p) · GW(p)

Thanks for writing this, it was fascinating to hear about your journey here. I also fell into the cognitive block of “I can’t possibly contribute to this problem, so I’m not going to learn or think more about it.” I think this block was quite bad in that it got in the way of me having true beliefs, or even trying to, for quite a few months. This wasn’t something I explicitly believed, but I think it implicitly affected how much energy I put into understanding or trying to be convinced by AI safety arguments. I wouldn’t have realized it without your post, but my guess is that this trap is one of the most likely ways 80k could be counterproductive. By framing issues as “you need a phd from a top 10 uni to work on this cause,” they give a (implicit, unintentional) license to everybody else to not care about said cause. As somebody who studied psychology, I think the way we talk about AI safety turned me off of even thinking about it’s importance. There seems to have been a shift recently toward “we need good ops and governance people too” which seems better but maybe has the same problem to a lesser degree. For whatever it’s worth, my current belief is something like “ai safety is so important that it is worth it for me to work on it even if I don’t currently know how I can help” (exception being if I was counterproductive). I believe this quite strongly, and am willing(/privileged enough to be able to) sacrifice things like job security in order to try and help with alignment (though it’s unclear if this is the right decision). I would love to chat more about my and your beliefs in you’re interested. You can message me or find me on Facebook or something.

comment by ekka (Eddie K) · 2022-04-18T20:13:30.313Z · EA(p) · GW(p)

Thanks for writing this! It really resonated with me despite the fact that I only have a software engineering background and not much ML experience. I'm still struggling to form my views as well for a lot of the reasons you mentioned and one of my biggest sources of uncertainty has been trying to figure out what people with AI/ML expertise think about AI safety. This post has been very helpful in that regard (in addition to other information that I've been ingesting to help resolve this uncertainty). The issue of AGI timelines has come to be a major crux for me when it comes to considering how seriously to take AI risk. The closer that AGI seems the more concern is warranted since even a low probability of AGI going rogue would result in a high negative EV. It seems reasonable to me to think that AGI is possible within the next 20-30 years with a 20 - 40% probability and by default I'd think there would be at least a 10% probability of AGI going rogue without any efforts of alignment. With these kind of probabilities it seems still worth taking AI risk seriously even though I still feel very unsure of how things will play out. I expect to make a big update by the end of this decade though based on the type of algorithmic breakthroughs made in the next few years.

comment by Oliver Sourbut · 2022-04-20T19:08:49.131Z · EA(p) · GW(p)

I feel like while “superintelligent AI would be dangerous” makes sense if you believe superintelligence is possible, it would be good to look at other risk scenarios from current and future AI systems as well.

I agree, and I think there's a gap for thoughtful and creative folks with technical understanding to contribute to filling out the map here!

One person I think has made really interesting contributions here is Andrew Critch, for example on Multipolar Failure and Robust Agent-Agnostic Processes [AF · GW] (I realise this is literally me sharing a link without much context which was a conversation-failure-mode discussed in the OP so feel free to pass on this). He also has made some attempts to discuss more breadth e.g. here [AF · GW]. Critch isn't the only one.

comment by Oliver Sourbut · 2022-04-20T19:03:28.747Z · EA(p) · GW(p)

I’m fairly sure deep learning alone will not result in AGI

How sure? :)

What about some combination of deep learning (e.g. massive self-supervised) + within-context/episodic memory/state + procedurally-generated tasks + large-scale population-based training + self-play...? I'm just naming a few contemporary 'prosaic' practices which, to me, seem plausibly-enough sufficient to produce AGI that it warrants attention.

Replies from: Ada-Maaria Hyvärinen
comment by Ada-Maaria Hyvärinen · 2022-04-21T09:01:08.555Z · EA(p) · GW(p)

Like I said it is based on my gut feeling, but fairly sure.

Is it your experience that adding more complexity and concatenating different ML models results to better quality and generality and if so, in what domains? I would have the opposite intuition especially in NLP.

Also, do you happen to know why "prosaic" practices are called "prosaic"? I have never understood the connection to the dictionary definition of "prosaic".

comment by colin · 2022-04-20T17:33:57.321Z · EA(p) · GW(p)

I'm going to attempt to summarize what I think part of your current beliefs are (please correct me if I am wrong!)

  • Current ML techniques are not sufficient to develop AGI
  • But someday humans will be able to create AGI
  • It is possible (likely?) that it will be difficult to ensure that the AGI is safe
  • It is possible that humans will give enough control to an unsafe AGI that it is an X risk.

If I got that right I would describe that as both having (appropriately loosely held) beliefs about AI Safety and agreement that AI Safety is a risk with some unspecified probability and magnitude.

What you don't have a view on, but you believe people in AI safety do have strong views on is (again not trying to put words in your mouth just my best attempt at understanding):

  • Is AI safety actually possible?
  • What work would be useful to increase AI Safety if that is possible?
  • How important is AI safety compared to other cause areas?

My (fairly uninformed view) is that people working on AI safety don't know the answer to that first or second question.  Rather, they think that the probability and magnitude of the problem are high enough that it swamps those questions in calculating the importance of the cause area.  Some of these people have tried to model out this reasoning, while others are leaning more on intuition.  I think reducing the uncertainty of any of these three questions is useful in itself, so I think it would be great if you wanted to work on that. 

Replies from: Ada-Maaria Hyvärinen
comment by Ada-Maaria Hyvärinen · 2022-04-20T19:45:27.622Z · EA(p) · GW(p)

I'm still quite uncertain on my beliefs but I don't think you got them quite right. Maybe a better summary is that I am generally pessimistic about both humans being ever able to create AGI and especially about humans being able to create safe AGI (it is a special case so it should probably be harder than any AGI). I also think that relying a lot on strong unsafe systems (AI powered or not) can be an x-risk. This is why it is easier to me to understand why AI governance is a way to try to reduce x-risk (at least if actors in the world want to rely on unsafe systems, I don't know how much this happens but I would not find it very surprising). 

I wish I had a better understanding on how x-risk probabilities are estimated (as I said I will try to look into that) but I don't directly understand why x-risk from AI would be a lot more probable than, say, biorisk (that I don't understand in detail at all). 

Replies from: colin
comment by colin · 2022-04-21T14:30:33.212Z · EA(p) · GW(p)

Ah, yeah I misread your opinion of the likelihood that humans will ever create AGI.  I believe it will happen eventually unless AI research stops due to some exogenous reason (civilizational collapse, a ban on development, etc.).  Important assumptions I am making:  

  • General Intelligence is all computation, so it isn't substrate-dependent
  • The more powerful an AI is the more economically valuable it is to the creators
  • Moore's Law will continue so more computing will be available.
  • If other approaches fail, we will be able to simulate brains with sufficient compute.
  • Fully simulated brains will be AGI.

I'm not saying that I think this would be the best, easiest, or only way to create AGI, just that if every other attempt fails, I don't see what would prevent this from happening. Particularly since we are already to simulate portions of a mouse brain.  I am also not claiming here that this implies short timelines for AGI.  I don't have a good estimate of how long this approach would take.

comment by rachelAF · 2022-04-24T02:43:04.792Z · EA(p) · GW(p)

Thank you for writing this! I particularly appreciated hearing your responses to Superintelligence and Human Compatible, and would be very interested to hear how you would respond to The Alignment Problem. TAP is more grounded in modern ML and current research than either of the other books, and I suspect that this might help you form more concrete objections (and/or convince you of some points). If you do read it, please consider sharing your responses.

That said, I don’t think that you have any obligation to read TAP, or to consider thinking about AI safety at all. It sounds like you aren’t drawn to a career in the field, and that’s fine. There are plenty of other ways to do good with an ML skill set. But if you don’t need to weigh working in AI safety against other career options, and you don’t find it interesting or enjoyable to consider, then why focus on forming personal views about AI safety at all?

Edited to add a disclaimer: I provided technical feedback on a draft of TAP, and much of the "AGI safety" section focuses on my team's work. I still think that it's a good concrete introduction to the field, because of how specific and well-cited it is, but I also am probably somewhat biased.

Replies from: Ada-Maaria Hyvärinen
comment by Ada-Maaria Hyvärinen · 2022-04-24T06:27:12.286Z · EA(p) · GW(p)

Thanks! It will be difficult to write an authentic response to TAP since these other responses were originally not meant to be public but I will try to keep the same spirit if I end up writing more about my AI safety journey.

I actually do find AI safety interesting, it just seems that I think about a lot of stuff differently than many people in the field and it hard for me to pin-point why. But the main motivations of spending a lot of time on forming personal views about AI safety are:

  • I want to understand x-risks better, AI risk is considered important among people who worry about x-risk a lot, and because of my background I should be able to understand the argument for it (better than say, biorisk)
  • I find it confusing that I understanding the argument is so hard, and that makes me worried (like I explained in the sections "The fear of the answer" and "Friends and appreciation")
  • I find it very annoying when I don't understand why some people are convinced by something, especially if these people are with me in a movement that is important for us all
Replies from: rachelAF
comment by rachelAF · 2022-04-24T22:06:03.635Z · EA(p) · GW(p)

Thank you for explaining more. In that case, I can understand why you'd want to spend more time thinking about AI safety.

I suspect that much of the reason that "understanding the argument is so hard" is because there isn't a definitive argument -- just a collection of fuzzy arguments and intuitions. The intuitions seem very, well, intuitive to many people, and so they become convinced. But if you don't share these intuitions, then hearing about them doesn't convince you. I also have an (academic) ML background, and I personally find some topics (like mesa-optimization) to be incredibly difficult to reason about.

I think that generating more concrete arguments and objections would be very useful for the field, and I encourage you to write up any thoughts that you have in that direction!

(Also, a minor disclaimer that I suppose I should have included earlier: I provided technical feedback on a draft of TAP, and much of the "AGI safety" section focuses on my team's work. I still think that it's a good concrete introduction to the field, because of how specific and well-cited it is, but I also am probably somewhat biased.)

comment by Otto · 2022-04-21T10:52:12.624Z · EA(p) · GW(p)

Hi Ada-Maaria, glad to have talked to you at EAG and congrats for writing this post - I think it's very well written and interesting from start to finish! I also think you're more informed on the topic than most people who are AI xrisk convinced in EA, surely including myself.

As an AI xrisk-convinced person, it always helps me to divide AI xrisk in these three steps. I think superintelligence xrisk probability is the product of these three probabilities:

1) P(AGI in next 100 years)
2) P(AGI leads to superintelligence)
3) P(superintelligence destroys humanity)

Would you like to share your estimates? I think it would make the discussion more targeted, and I think no estimate would be very foolish since basically no-one knows. :) or maybe :(

Personally, I guess my estimates are something like 1) 50%,  2) 70%, 3) 40% (not based on much).

It would be really great to have more and better papers on this (peer reviewed), so that disagreement can be made as small as possible - though it will probably never disappear.

Replies from: Ada-Maaria Hyvärinen
comment by Ada-Maaria Hyvärinen · 2022-04-21T14:56:53.282Z · EA(p) · GW(p)

Hi Otto! Thanks, it was nice talking to you on EAG. (I did not include any interactions/information I got from this weekend's EAG in the post because I had written it before the conference, felt like it should not be any longer than it already was, but wanted to wait until my friends who are described as "my friends" in the post had read it before publishing it.)

I am not that convinced AGI is necessarily the most important component to x-risk from AI – I feel like there could be significant risks from powerful non-generally intelligent systems, but of course it is important to avoid all x-risk, so x-risk from AGI specifically is also worth talking about.

I don't enjoy putting numbers to estimates but I understand why it can be a good idea so I will try. At least then I can later see if I have changed my mind and by how much. I would give quite low probability to 1), perhaps 1%? (I know this is lower than average estimates by AI researchers.) I think 2) on the other hand is very likely, maybe 99%, by the assumption that there can be enough differences between implement AGIs to make a team of AGIs surpass a team of humans by for example more efficient communication (basically what Russell says in Human Compatible on this seems credible to me). Note that even if this would be superhuman intelligence it could still be more stupid than some superintelligence scenarios. I would give a much lower probability to superintelligence like Bostrom describes it. 3) is hard to estimate without knowing much about the type of superintelligence, but I would spontanously say something high, like 80%? So because of the low probability on 1) my concatenated estimate is still significantly lower than yours.

I definitely would love to read more research on this as well.

Replies from: Otto
comment by Otto · 2022-04-22T07:37:49.517Z · EA(p) · GW(p)

Thanks for the reply, and for trying to attach numbers to your thoughts!

So our main disagreement lies in (1). I think this is a common source of disagreement, so it's important to look into it further.

Would you say that the chance to ever build AGI is similarly tiny? Or is it just the next hundred years? In other words, is this a possibility or a timeline discussion?

Replies from: Ada-Maaria Hyvärinen
comment by Ada-Maaria Hyvärinen · 2022-04-23T11:47:57.265Z · EA(p) · GW(p)

Hmm, with a non-zero probability in the next 100 years the likelihood for a longer time frame should be bigger given that there is nothing that makes developing AGI more difficult the more time passes, and I would imagine it is more likely to get easier than harder (unless something catastrophic happens). In other words, I don't think it is certainly impossible to build AGI, but I am very pessimistic about anything like current ML methods leading to AGI. A lot of people in the AI safety community seem to disagree with me on that, and I have not completely understood why.

Replies from: Otto Barten
comment by Otto Barten · 2022-04-24T00:05:13.764Z · EA(p) · GW(p)

So although we seem to be relatively close in terms of compute, we don't have the right algorithms yet for AGI, and no one knows if and when they will be found. If no one knows, I'd say a certainty of 99% that they won't be found in hundred years, with thousands of people trying, is overconfident.

Replies from: Ada-Maaria Hyvärinen
comment by Ada-Maaria Hyvärinen · 2022-04-24T05:41:38.183Z · EA(p) · GW(p)

Yeah, I understand why you'd say that. However it seems to me that there are other limitations to AGI than finding the right algorithms. As a data scientist I am biased to think about available training data. Of course there is probably going to be progress on this as well in the future.

Replies from: Otto Barten
comment by Otto Barten · 2022-04-24T23:35:01.441Z · EA(p) · GW(p)

Could you explain a bit more about the kind of data you think will be needed to train an AGI, and why you think this will not be available in the next hundred years? I'm genuinely interested, actually I'd love to be convinced about the opposite... We can also DM if you prefer.

Replies from: Ada-Maaria Hyvärinen
comment by Ada-Maaria Hyvärinen · 2022-04-27T11:47:59.083Z · EA(p) · GW(p)

This intuition turned out harder to explain than I thought and got me thinking a lot about how to define "generality" and "intelligence" (like all talk about AGI does). But say, for example, that you want to build an automatic doctor that is able examine a patient and diagnose what illness they most likely have. This is not very general in the sense that you can imagine this system as a function of "read all kinds of input about the person, output diagnosis", but I still think it provides an example of the difficulty of collecting data. 

There are some data that can be collected quite easily by the user, because the user can for example take pictures of themselves, measure their temperature etc. And then there are some things the user might not be able to collect data about, such as "is this joint moving normally". I think it is not so likely we will be able to gather meaningful data about things like "how does a persons joint move if they are healthy" unless doctors start wearing gloves that track the position of their hand while doing the examination and all this data is stored somewhere with the doctor's interpretation. 

To me it currently seems that we are collecting a lot of data about various things but there are still many things where there are no methods for collecting the relevant data, and the methods do not seem like they would start getting collected as a by-product of something (like in the case where you track what people by from online stores). Also, a lot of data is unorganized and missing labels and it can be hard to label after it has been collected.

I'm not sure if all of this was relevant or if I got side-tracked too much when thinking about a concrete example I can imagine.

Replies from: Otto
comment by Otto · 2022-05-01T19:56:58.746Z · EA(p) · GW(p)

Hi AM, thanks for your reply.

Regarding your example, I think it's quite specific, as you notice too. That doesn't mean I think it's invalid, but it does get me thinking: how would a human learn this task? A human intelligence wasn't trained on many specific tasks in order to be able to do them all. Rather, it first acquired general intelligence (apparently, somewhere), and was later able to apply this to an almost infinite amount of specific tasks with typically only a few examples needed. I would guess that an AGI would solve problems in a similar way. So, first learn general intelligence (somehow), then learn specific tasks quickly with little data needed.

For your example, if the AGI would really need to do this task, I'd say it could find ways itself to gather the data, just like a human would who would want to learn this skill, after first acquiring some form of general intelligence. A human doctor might watch the healthily moving joint, gathering visual data, and might hear the joint moving, gathering audio data, or might put her hand on the joint, gathering sensory data. The AGI could similarly film and record the healthy joint moving, with already available cameras and microphones, or use data already available online, or, worst case, send in a drone with a camera and a sound recorder. It could even send in a robot that could gather sensory data if needed.

Of course, current AI lacks certain skills that are necessary to solve such a general problem in such a general way, such as really understanding the meaning behind a question that is asked, being able to plan a solution (including acquiring drones and robots in the process), and probably others. These issues would need to be solved first, so there is still a long way to go. But with the manpower, investment, and time (e.g. 100 years) available, I think we should assign a probability of at least tens of percents that this type of general intelligence including planning and acting effectively in the real world, will eventually be found. I'd say it is still unsure whether it will be based on a neural network (large language model or otherwise) or not.

Perhaps the difference between longtermists and shorttermists is imagination, rather than intelligence? And I'm not saying which side is right: perhaps we have too much imagination, on the other hand, perhaps you have too little imagination. We will only really know when the time comes.

Replies from: Ada-Maaria Hyvärinen
comment by Ada-Maaria Hyvärinen · 2022-05-03T13:27:00.830Z · EA(p) · GW(p)

Hi Otto!

I agree that the example was not that great and that definitely lack of data sources can be countered with general intelligence, like you describe. So it could definitely be possible that a a generally intelligent agent could plan around to gather needed data. My gut feeling is still that it is impossible to develop such intelligence based on one data source (for example text, however large amounts), but of course there are already technologies that combine different data sources (such as self-driving cars), so this clearly is also not the limit. I'll have to think more about where this intuition of lack of data being a limit comes from, since it still feels relevant to me. Of course 100 years is a lot of time to gather data.

I'm not sure if imagination is the difference either. Maybe it is the belief in somebody actually implementing things that can be imagined. 

Replies from: Otto
comment by Otto · 2022-05-05T08:39:20.629Z · EA(p) · GW(p)

Hey I wasn't saying it wasn't that great :)

I agree that the difficult part is to get to general intelligence, also regarding data. Compute, algorithms, and data availability are all needed to get to this point. It seems really hard to know beforehand what kind and how much of algorithms and data one would need. I agree that basically only one source of data, text, could well be insufficient. There was a post I read on a forum somewhere (could have been here) from someone who let GPT3 solve questions including things like 'let all odd rows of your answer be empty'. GPT3 failed at all these kind of assignments, showing a lack of comprehension. Still, the 'we haven't found the asymptote' argument from OpenAI (intelligence does increase with model size and that increase doesn't seem to stop, implying that we'll hit AGI eventually), is not completely unconvincing either. It bothers me that no one can completely rule out that large language models might hit AGI just by scaling them up. It doesn't seem likely to me, but from a risk management perspective, that's not the point. An interesting perspective I'd never heard before from intelligent people is that AGI might actually need embodiment to gather the relevant data. (They also think it would need social skills first - also an interesting thought.)

While it's hard to know how much (and what kind of) algorithmic improvement and data is needed, it seems doable to estimate the amount of compute needed, namely what's in a brain plus or minus a few orders of magnitude. It seems hard for me to imagine that evolution can be beaten by more than a few orders of magnitude in algorithmic efficiency (the other way round is somewhat easier to imagine, but still unlikely in a hundred year timeframe). I think people have focused on compute because it's most forecastable, not because it would be the only part that's important.

Still, there is a large gap between what I think are essentially thought experiments (relevant ones though!) leading to concepts such as AGI and the singularity, and actual present AI. I'm definitely interested in ideas filling that gap. I think 'AGI safety from first principles' by Richard Ngo is a good try, I guess you've read that too since it's part of the AGI Safety Fundamentals curriculum? What did you think about it? Do you know any similar or even better papers about the topic?

It could be that belief too, yes! I think I'm a bit exceptional in the sense that I have no problem imagining human beings achieving really complex stuff, but also no problem imagining human beings failing miserably at what appear to be really easy coordination issues. My first thought when I heard about AGI, recursive self-improvement, and human extinction was 'ah yeah that sounds like typically the kind of thing engineers/scientists would do!' I guess some people believe engineers/scientists could never make AGI (I disagree), while others think they could, but would not be stupid enough to screw up badly enough to actually cause human extinction (I disagree).

Replies from: Ada-Maaria Hyvärinen
comment by Ada-Maaria Hyvärinen · 2022-06-20T13:53:19.823Z · EA(p) · GW(p)

Hi Otto, I have been wanting to reply to you for a while but I feel like my opinions keep changing so writing coherent replies is hard (but having fluid opinions in my case seems like a good thing). For example, while I still think only a precollected set of text as a data source is unsufficient for any general intelligence, maybe training a model on text and having it then interact with humans could lead it to connecting words to references (real world objects), and maybe it would not necessarily need many reference points of the language model is rich enough? This then again seems to sound a bit like the concept of imagination and I am worried I am antropomorphising in a weird way.

Anyway, I still hold the intuition that generality is not necessarily the most important in thinking about future AI scenarios – this of course is an argument towards taking AI risk more seriously, because it should be more likely someone will build advanced narrow AI or advanced AGI than just advanced AGI.

I liked "AGI safety from first principles" but I would still be reluctant to discuss it with say, my colleagues from my day job, so I think I would need something even more grounded to current tech, but I do understand why people do not keep writing that kind of papers because it does probably not directly help solving alignment. 

comment by Ulisse Mini · 2022-04-18T00:44:21.097Z · EA(p) · GW(p)

It turned out that at least around me, the most common answer was something like: “I always knew it was important and interesting, which is why I started to read about it.”

I found out about alignment/AGI from some videos of Rob Miles on Computerphile. It's possible you're around/talking to very smart people who were around when the field was founded (hence they came up with it themselves), but that's selection bias - most people aren't like that.

Replies from: Ada-Maaria Hyvärinen
comment by Ada-Maaria Hyvärinen · 2022-04-20T06:03:29.584Z · EA(p) · GW(p)

To clarify, my friends (even if they are very smart) did not come up with all AI safety arguments by themselves, but started to engage with AI safety material because they had already been looking at the world and thinking "hmm, looks like AI is a big thing and could influence a lot of stuff in the future, hope it changes things for the good". So they  got quickly on board after hearing that there are people seriously working on the topic, and it made them want to read more. 

comment by hb574 (Herbie Bradley) · 2022-04-17T14:22:00.889Z · EA(p) · GW(p)

I think there are some very valuable points in here — at different points I found myself nodding along remembering times when I had similar thoughts. I am an ML PhD student at Cambridge, and I first heard about AI safety back around 2013 from LessWrong, but despite being aware of the topic and many of the core arguments for so long I have found it extremely hard to come to a conclusion on the field based on finding the cruxes of different arguments without simply deferring to other people's opinions, as many seem to be happy doing.