Posts

Climate Change Overview: CERI Summer Research Fellowship 2022-03-17T11:04:49.774Z

Comments

Comment by hb574 (Herbie Bradley) on Without specific countermeasures, the easiest path to transformative AI likely leads to AI takeover · 2022-08-26T19:14:24.251Z · EA · GW

Thank you for the comment  - it's a fair point about the difficulty of prediction. In my post I attempted to point to some heuristics which suggest strongly to me that significant fundamental breakthroughs are needed. Other people have different heuristics. At the same time though, it seems like your objection is a fully general argument against fundamental breakthroughs ever being necessary at any point, which seems quite unlikely. 

I also think that even the original Attention Is All You Need paper gave some indication of the future direction by testing a large and small transformer and showing greatly improved performance with the large one, while RLHF's early work does not appear to have a similar immediately obvious way to scale up and tackle the big RL challenges like sparse rewards, problems with long episode length, etc.

Comment by hb574 (Herbie Bradley) on Against longtermism · 2022-08-11T12:29:04.158Z · EA · GW

Concern about the threat of human extinction is not longtermism (see Scott Alexander's well known forum post about this), which I think is the point that the OP is making.

Comment by hb574 (Herbie Bradley) on Without specific countermeasures, the easiest path to transformative AI likely leads to AI takeover · 2022-07-21T15:53:06.759Z · EA · GW

The rough shape of the argument is that I think a PASTA system requires roughly human-level general intelligence, and that implies some capabilities which HFDT as described in this post does not have the ability to learn. Using Karnofsky's original PASTA post, let's look at some of the requirements:

  1. Consume research papers.
  2. Identify research gaps and open problems.
  3. Choose an open problem to focus on.
  4. Generate hypotheses to test within a problem to potentially solve it.
  5. Generate experiment ideas to test the hypotheses.
  6. Judge how good the experiment ideas are and select experiments to perform.
  7. Carry out the experiments.
  8. Select a hypothesis based on experiment results.
  9. Write up a paper.
  10. Judge how good the paper is.

A system which can do all of the above is a narrower requirement than full general intelligence, especially if each task is decomposed into separate models, but many of these points seem impractical given what we currently know about RL with human feedback. Crucially, a PASTA system requires the ability to do these tasks autonomously  in order to have any chance of transformatively accelerating scientific discovery - Ajeya specifies that Alex is an end-to-end system. It's not sufficient, for example, to make a system which can generate a bunch of experiment or research ideas in text but relies on humans to evaluate them. In particular, I would identify 4, 5, 6, 8, and 10 as the key parts of the research loop which seem to require significant ML breakthroughs to achieve, and which I think may be general-intelligence-complete. One thing these tasks have in common is that good feedback is hard for humans, the rewards are sparse, and the episode length is potentially extremely long.

And yet current work on RL with human feedback shows that these techniques work well in highly constrained environments with relatively short episode length, and accurate & frequent reward signal and gradients in the human feedback. Sparse feedback immediately decreases performance significantly (see the ReQueST paper). To me, this suggests a significant number of fundamental breakthroughs remain to achieve PASTA, and that HFDT as described here does not have this capability, even if scaled + minor improvements.

Regarding my impression of the opinions of other ML researchers, I'm a PhD student at Cambridge and have spoken to many academics (both senior and junior), some people at DeepMind and some at Google Brain about their guesses for pathways to AGI, and the vast majority don't seem to think that scaling current RL algorithms + LLMs + human feedback gets us much further towards AGI than we currently are, and they think there are many missing pieces of the puzzle. I'm not surprised to hear that OpenAI has a very different set of opinions though - the company has bet heavily on a particular approach to AGI so would naturally attract people with a different view to the one I've described.

One way to see this is to look at timelines to human level general intelligence - I think most ML researchers would not put this at within 10 or 20 years, based on previous surveys. Yet as Ajeya describes in the post, if training PASTA with baseline HFDT is possible, it seems very likely to happen within 10 years, and I agree with her that "the sooner transformative AI is developed, the more likely it is to be developed in roughly this way".  I think that if a full PASTA system can be made, then we are likely to have solved almost all of the bottlenecks to create an AGI. Therefore I think that the timelines conflict with the hypothesis that scaling current techniques which don't require new fundamental breakthroughs is enough for PASTA.

Comment by hb574 (Herbie Bradley) on If you're unhappy, consider leaving · 2022-07-20T23:39:44.559Z · EA · GW

People who are not perfectly satisfied with EA are more likely to have some disagreements with what they might percieve as EA consensus. Therefore, recommending that they leave directly decreases the diversity of ideas in EA and makes it more homogeneous. This seems likely to lead to a worse version of EA.

Comment by hb574 (Herbie Bradley) on Without specific countermeasures, the easiest path to transformative AI likely leads to AI takeover · 2022-07-19T20:10:36.570Z · EA · GW

I'm an ML researcher, and I would give the probability of baseline HFDT leading to a PASTA set of capabilities as approximately 0, and my impression is that this is the experience of the majority of ML researchers.

Baseline HFDT seems to be the single most straightforward vision that could plausibly work to train transformative AI very soon. From informal conversations, I get the impression that many ML researchers would bet on something like this working in broadly the way I described in this post, and multiple major AI companies are actively trying to scale up the capabilities of models trained with something like baseline HFDT.

In other words, I disagree with this, and therefore it seems unclear what to take away from the rest of the post, which is well-reasoned iff you agree with the starting assumptions.

Remember that PASTA is actually a very strong criterion: it requires AI to be able to do all activities in the scientific research loop. Including evaluating whether ideas are good and generating creative new ideas - skills which I think are general-intelligence-complete. I've no doubt that a HFDT system can automate some parts of the scientific discovery process, such as writing draft papers, or controlling some robot to do lab experiments in constrained environments. But ultimately this subset of PASTA only makes such a system a research assistant, one which like AlphaFold may make some research more efficient, but which does not complete the full loop.

Comment by hb574 (Herbie Bradley) on How to Become a World Historical Figure (Péladan's Dream) · 2022-07-19T14:42:04.683Z · EA · GW

What the flying fuck is this

Comment by hb574 (Herbie Bradley) on How Could AI Governance Go Wrong? · 2022-05-27T12:49:03.781Z · EA · GW

Good post! I'm curious if you have any thoughts on the potential conflicts or contradictions between the "AI ethics" community, which focuses on narrow AI and harms from current AI systems (members of this community include Gebru and Whittaker) and the AI governance community that has sprung out of the AI safety/alignment community (e.g GovAI)? In my view, these two groups are quite opposed in priorities and ways of thinking about AI (take a look at Timnit Gebru's twitter feed for a very stark example) and trying to put them under one banner doesn't really make sense. This contradiction seems to encourage some strange tactics (such as AI governance people proposing different regulations of narrow AI purely to slow down timelines rather than for any of the usual reasons given by the AI ethics community) which could lead to a significant backlash.

Comment by hb574 (Herbie Bradley) on Transcripts of interviews with AI researchers · 2022-05-09T15:22:11.978Z · EA · GW

This is great work, I think it's really valuable to get a better sense of what AI researchers think of AI safety.

Often when I ask people in AI safety what they think AI researchers think of AGI and alignment arguments, they don't have a clear idea and just default to some variation on "I'm not sure they've thought about it much". Yet as these transcripts show, many AI researchers are well aware of AI risk arguments (in my anecdotal experience, many have read at least part of Superintelligence ) and have more nuanced views. So I'm worried that AI safety is insular w.r.t mainstream AI researchers thought on AGI - and these are people who in many cases have spent their working life thinking about AGI, so their thoughts are highly valuable, and this work goes some way to reversing that insularity.

A nice followup direction to take this would be to get a list of common arguments used by AI researchers to be less worried about AI safety (or about working on capabilities, which is separate), counterarguments, and possible counter-counter arguments. Do you plan to touch on this kind of thing in your further work with the 86 researchers?

Comment by hb574 (Herbie Bradley) on The AI Messiah · 2022-05-06T13:19:44.970Z · EA · GW

Just my anecdotal experience, but when I ask a lot of EAs working in or interested in AGI risk why they think it's a hugely important x-risk, one of the first arguments that comes to people's minds is some variation on "a lot of smart people [working on AGI risk] are very worried about it". My model of many people in EA interested in AI safety is that they use this heuristic as a dominant factor in their reasoning — which is perfectly understandable! After all, formulating a view of the magnitude of risk from transformative AI without relying on any such heuristics is extremely hard. But I think this post is a valuable reminder that it's not particularly good epistemics for lots of people to think like this.

Comment by Herbie Bradley on [deleted post] 2022-05-03T15:58:33.344Z

The title of this post is a general claim about the long-term future, and yet nowhere in your post do you mention any x-risks other than AI. Why should we not expect other x-risks to outweigh these AGI considerations, since they may not fit into this framework of extinction, ok outcome, utopian outcome? I am not necessarily convinced that pulling the utopia handle on actions related to AGI (like the four you suggest) have a greater effect on P(utopia) than some set of non-AGI-related interventions.

Comment by hb574 (Herbie Bradley) on Replicating and extending the grabby aliens model · 2022-04-23T22:39:32.091Z · EA · GW

Looks like great work! Do you plan to publish this in a similar venue to previous papers on this topic, such as in an astrophysics journal? I would be very happy to see more EA work published in mainstream academic venues.

Comment by hb574 (Herbie Bradley) on How I failed to form views on AI safety · 2022-04-17T14:22:00.889Z · EA · GW

I think there are some very valuable points in here — at different points I found myself nodding along remembering times when I had similar thoughts. I am an ML PhD student at Cambridge, and I first heard about AI safety back around 2013 from LessWrong, but despite being aware of the topic and many of the core arguments for so long I have found it extremely hard to come to a conclusion on the field based on finding the cruxes of different arguments without simply deferring to other people's opinions, as many seem to be happy doing.

Comment by hb574 (Herbie Bradley) on FLI launches Worldbuilding Contest with $100,000 in prizes · 2022-01-18T22:49:05.543Z · EA · GW

Isn't "Technology is advancing rapidly and AI is transforming the world sector by sector" perfectly consistent with a singularity? Perhaps it would be a rather large understatement, but still basically true.

Comment by hb574 (Herbie Bradley) on A case for the effectiveness of protest · 2021-12-01T00:12:35.480Z · EA · GW

There's a lot of good work here and I don't have time to analyse it in detail, but I had a look at some of your estimates, and I think they depend a bit too heavily on subjective guesses about the counterfactual impact of XR to be all that useful. I can imagine that if you vary the parameter for how much XR might have brought forward net zero or the chance that it directly caused net zero pledges to be taken, then you end up with very large bounds on your ultimate effectiveness numbers. Personally, I don't think it's all that reasonable to suggest that, for example, making a net zero pledge one or two years earlier means a corresponding one or two year difference in the time to hit net zero (this seems highly non-linear and there could reasonably be 0 difference). 

In addition, I think you discount "zeitgeist effects" - having XR gain traction at the precise time when many other climate groups and climate awareness in general were also gaining traction means that attributing specific outcomes to XR becomes very difficult, although of course XR is part of said zeitgeist. Therefore it seems possible that you could model SMOs as "riding the wave" of public sentiment - contributing to popular awareness of their cause to some extent, but acting as a manifestation of popular awareness rather than as a cause of it.