What is the role of Bayesian ML for AI alignment/safety?

post by mariushobbhahn · 2022-01-11T08:07:15.573Z · EA · GW · 6 comments


  Why I changed my mind
      I was bullish on Bayesian ML for AI safety because:
      I changed my mind because:
  Possibly relevant projects
    “Knowing what we don’t know” & Out-of-distribution detection 
    Reward uncertainty
    Learning from human preferences
    Constraining models
  I need your help

I started a Ph.D. in Bayesian ML because I thought it was a relevant approach to AI safety. Currently, I think that this is unlikely and other paths within ML are more promising. 

The main purpose of this post is to spark a discussion, get feedback and collect promising paths within Bayesian ML. Before that, I want to give a very short overview of why I changed my mind. 

If there are relevant projects in the space of Bayesian ML for AI safety I would be very keen on knowing them. Please share them in the comments.

For this post, I'll call everything Bayesian ML that uses Bayes theorem in the context of ML or tries to estimate probability distributions rather than point estimates. 

Why I changed my mind

I was bullish on Bayesian ML for AI safety because:

  1. The Bayesian framework is very powerful in general. It can be used to describe a lot of phenomena in all kinds of disciplines, e.g. neuroscience, economics, medicine, … . Thus, I thought that such a general framework might be scaled up to something AGI-like and I should understand it better.
  2. Quantified uncertainty seemed relevant for AI alignment. This might be through quantifying what a model doesn’t know or implementing constraints, e.g. through choosing specific priors.

I changed my mind because:

  1. It currently looks like TAI will come from really large NNs. Making NNs Bayesian is usually quite computationally expensive, especially for large models. Furthermore, it looks like the current Bayesian techniques are still quite flawed, i.e. the quality of their uncertainty isn’t that great---at least that’s the vibe I’m getting after working with it for about a year.
  2. As a consequence, when relevant changes in AI come around, the Bayesian version is usually years behind. This is way too long for safety-critical applications and thus not very useful.
  3. I’m not sure quantified uncertainty matters that much for alignment. Assume you have an AGI that is unaligned and you can quantify that in some way. This still doesn’t solve the underlying problem that the AGI is not aligned.

I still think that Bayesian ML might be really useful in other aspects and I hope that Bayesian techniques will be increasingly used for statistical analysis in economics, neuroscience, medicine, etc. I’m just not sure it matters for alignment. 

Possibly relevant projects

This is a shortlist from the top of my head. I’m not an expert on most of these topics, so I might misrepresent them. Feel free to correct me and add further ones. 

“Knowing what we don’t know” & Out-of-distribution detection 

There are a ton of papers that use Bayesian techniques to quantify predictive uncertainty in neural networks. One of my Ph.D. projects is focused on such a technique and my impression is that they work OKish---at least in classification settings. However, many simple non-Bayesian techniques such as ensembles yield results of similar or even better quality. Thus, I’m not sure if the Bayesian formalism adds anything of value while it is usually much more expensive. 

Reward uncertainty

In the context of RL, the reward is usually given as a point estimate. Using distributions for the reward instead of point estimates might make the agent more robust and thus less prone to errors. Possibly this is a path to reduce misalignment. 

Learning from human preferences

To align AI systems, it is important to efficiently give them feedback on their actions or beliefs. Many current systems that implement learning from human preferences either are conceptually Bayesian or use probabilistic models such as Gaussian Processes. 

Constraining models

In the Bayesian framework, priors incorporate previous knowledge. We can model safety constraints as strong priors, e.g. punishing the model heavily for crossing a certain threshold. While this is nice in theory, it still doesn’t solve the problem of mapping the real world into mathematical constraints, i.e. we still need to map “don’t do the bad thing” into a probability distribution. 


Some people believe that causal modeling is the missing piece for AGI. Most causal modeling approaches use techniques that are related to Bayesian ML. I’m not sure yet how realistic causal modeling is but I think there are some insights that the AI safety community is currently overlooking. A post on this topic is in the making and will be published soon. 

I need your help

I’m trying to pivot to projects more related to AI safety within my Ph.D. For this, I want to better understand which kinds of things are relevant. If you have ideas, papers, blog posts, etc. that could be helpful don’t hesitate to comment. 



Comments sorted by top scores.

comment by Sebastian_Farquhar · 2022-01-11T16:50:43.684Z · EA(p) · GW(p)

I began my PhD with a focus on Bayesian deep learning with exactly the same reasoning as you. I also share your doubts about the relevance of BDL to long-term safety. I have two clusters of thoughts: some reasons why BDL might be worth pursuing regardless, and alternative approaches.

Considerations about BDL and important safety research:

  • Don't overfit to recent trends. LLMs are very remarkable. Before them, DRL was very remarkable. I don't know what will be remarkable next. My hunch is that we won't get AGI by just doing more of what we are doing now. (People I respect disagree with that, and I am uncertain. Also, note I don't say we could't get AGI that way.)
  • Bayesian inference is powerful and general. The original motivation is still real. It is tempered by your (in my view, correct) observation that existing methods for approximate inference have big flaws. My view is that probability still describes the correct way to update given evidence and so it contains deep truths about reliable information processing. That means that understanding approximate Bayesian inference is still a useful guide for anyone trying to automatically process information correctly (and being aware of the necessary assumptions). And an awful lot of failure modes for AGI involve dangerous mistaken generalization. Also note that statements like "simple non-Bayesian techniques such as ensembles" are controversial, and there's considerable debate about whether ensembles are working because they perform approximate integration. Andrew Gordon Wilson has written a lot about this, and I tentatively agree with much of it.
  • Your PhD is not your career. As Mark points out, a PhD is just the first step. You'll learn how to do research. You really won't start getting that good at it until a few years in, by which point you'll write up the thesis and start working on something different. You're not even supposed to just keep doing your thesis as you continue your research. The main thing is to have a great research role model, and I think Phillip is quite good (by reputation, I don't know him personally).
  • BDL teaches valuable skills. Honestly, I just think statistics is super important for understanding modern deep learning, and it gives you a valuable lens to reason about why things are working. There are other specialisms that can develop valuable skills. But I'd be nervous about trading the opportunity to develop deep familiarity with the stats for practical experience on current SoTA systems (because stats will stay true and important, but SoTA won't stay SoTA). (People I respect disagree with that, and I am uncertain.)

Big picture, I think intellectual diversity among AGI safety researchers is good, Bayesian inference is important and fundamental, and lots of people glom on to whatever the latest hot thing is (currently LLMs), leading to rapid saturation.

So what is interesting to work on? I'm currently thinking about two main things:

  • I don't think that exact alignment is possible, in ways that are similar to how exact Bayesian inference is generally possible. So I'm working on trying to learn from the ways in which approximate inference is well/poorly defined to get insights for how alignment can be well/poorly defined and approximated. (Here I agree 100% with Mark that most of what is hard in AGI safety remains framing the problem correctly.)
  • I think a huge problem for AGI-esque systems is about to be hunting for dangerous failures. There's a lot of BDL work on 'actively' finding informative data, but mostly for small-data in low-dimensions. I'm much more interested in huge data, high-dimensions, which creates whole new problems (e.g., you can't just compute a score function for each possible datapoint). (Note that this is almost exactly the opposite to Mark's point below! But I don't exactly disagree with him, it's just that lots of things are worth trying.)

There are other things that are important, and I agree that OOD detection is also important (and I'm working on a conceptual paper on this, rather than a detection method specifically). If you'd like to speak about any of this stuff I'm happy to talk. You can reach me at sebastian.farquhar@cs.ox.ac.uk

Replies from: mariushobbhahn
comment by mariushobbhahn · 2022-01-11T17:33:54.193Z · EA(p) · GW(p)

Wow. That was really insightful. 

I can confirm that Philipp is a great supervisor! I also don't plan on chasing the next best thing but want to understand ways to combine Bayesian ML with AI safety/alignment relevant things. 

I'll write you a mail soon!

comment by markvdw · 2022-01-11T10:01:09.539Z · EA(p) · GW(p)

I totally see where you're coming from. I would tend to agree that Bayesian inference doesn't seem to be that useful right now. Visible and exciting leaps are being made by large neural networks, like you say. Bayesian inference on the other hand just works OKish, and faces stiff competition in benchmarks from non-Bayesian methods.

In my opinion, the biggest reasons why Bayesian inference isn't making much of an impact at the moment are:

  • Bayes is being applied in simple settings where deep learning works well, and uncertainty is not really needed: large-data, batch training, prediction only tasks. Bayesian inference becomes more important when you have small data, need to update your belief (from heterogeneous sources), and if you need to make decisions on how to act.
  • Since we have large datasets, the biggest bottleneck in AI/ML is setting up the problem correctly (i.e. asking the right question). OpenAI models like CLIP and GPT-3, and research directions like multitask/metalearning illustrate this nicely. By setting the problem up in a different way, they can leverage huge amounts of new data. Once you introduce a new problem, you reach for the easiest tool to start to solve it. Bayesian inference is not the easiest tool, so it doesn't contribute much here.
  • Current approximate Bayesian training tools don't work very well. (Some disagree with me on this, but I do believe it.)

I also think that in AI alignment, the biggest problem is figuring out the right question to ask. Currently, observable failure cases in the current simple test settings are rare*, which makes it hard to do concrete technical work regardless of whether you choose the to use the Bayesian paradigm.

The thought experiments that motivate AI/ML safety are often longer term, and embedded in larger systems (like society). One place where I do think people have a concrete idea of problems that need solving in AI/ML is social science! I have seen interesting points being made e.g. in FAcct [1] about how you should set up your AI/ML system if you want to avoid certain consequences when deploying in society.

This is where I would focus, if I were to want to work on good outcomes of AI/ML that have impact right now.

Now, I still work on Bayesian ML (although not directly on AI safety). Why do I do this when I agree that Bayesian inference doesn't have much to offer right now? Well, because I think there are reasons to believe that Bayesian inference will have new abilities to offer deep learning (not just uncertainty) in the long term. We just need to work on them! I may turn out to be wrong, but I do still believe that my research direction should be explored in case it turns out to be right. Big companies seem to have large model design under control anyway.

What should you work on? This is difficult. Large model research is difficult to do outside of a few organisations, and requires a lot of engineering effort. You could try to join such an organisation, you could try to investigate properties of trained models, you could try and make these models more widely available, or try to find smaller scale test set-ups where you can probe properties (this is difficult if scale is intrinsically necessary).

I do believe that a great way to get impact right now is to work on questions about deploying ML, and where we want to apply it in society.

You could also remember that your career is longer than a PhD, and work on technical problems now, even if they're not related to AI safety. If you want to work on Bayesian ML, or just ML that quantifies uncertainty in a way that matters (right now), I would work on problems with a decision making element in them. Places where the data-gathering<->decision loop is closed.

Self-driving cars are a cool example, but again difficult to work on outside specific organisations. Bayesian Optimisation or experimental design in industrial settings is a cool small-scale example of where uncertainty modelling really matters. I think smaller scale and lower dimensional applied ML problems are undervalued in the ML community. They're hard to find (it may require talking to people in industry), but I think they are numerous. And what's interesting, is that research can make the difference between a poor solution and a great solution.

I'm curious to hear other people's thoughts on this as well.


[1] https://facctconference.org/


* Correct me if I'm wrong on this. If the failure cases are not rare, then my advice would be to pick one, and try to solve it in a tool agnostic way.

comment by Vanessa · 2022-01-17T14:38:01.110Z · EA(p) · GW(p)

Quantified uncertainty might be fairly important for alignment, since there is a class of approaches that rely on confidence thresholds to avoid catastrophic errors (1 [AF · GW], 2, 3 [AF · GW]). What might also be important is the ability to explicitly control your prior in order to encode assumptions such as those needed for value learning (but maybe there are ways to do it with other methods).

comment by PabloAMC · 2022-01-11T11:42:29.367Z · EA(p) · GW(p)

A similar idea is by the way discussed in a post by Jaime Sevilla on the limits of causal discovery: https://towardsdatascience.com/the-limits-of-graphical-causal-discovery-92d92aed54d6

Related to your causality comment above, two days ago I submitted a research proposal on Causal Representation Learning for AI Safety. You may want to see it here: https://www.lesswrong.com/posts/5BkEoJFEqQEWy9GcL/an-open-philanthropy-grant-proposal-causal-representation [LW · GW]

comment by MaxRa · 2022-01-11T11:04:01.679Z · EA(p) · GW(p)

Yeah, difficult question... some random thoughts, not sure how helpful:

  • for paths where a computational understanding of human cognition is especially useful, expertise with Bayesian modeling of cognition might be promising (and seems already popular)
  • relatedly, I have the impression that human cognition does things that would require current approaches to become more Bayesian... e.g. like the things that are worked on under active learning in ML, and models of perception related to active inference in neuroscience
    • this might imply, that we'll (want to) build AI systems that go more into this direction
    • this might also imply, that there are approximations of Bayesian cognition that scale? 
  • something something developing AI systems that make explicit predictions about explicit future scenarios might be one of the key applications of highly advanced AI systems and the more "formal" this is, the easier to align?
  • you likely talked to him already, but I understood that David Lindner at ETH Zürich does something related to value learning using approaches analogous to Bayesian optimization(?)
  • I also wondered how dependend the final design of TAI will be on the current architectures we have today. Even assuming we'll see TAI in less than 20 years, maybe what ends up happening is that TAI will already be largely designed by AI systems? And then maybe more Bayesian architectures are totally on the table if we think it would be helpful?
    • maybe a somewhat useful question to ask: In case AI Safety researchers had a headstart that amounts to multiple years, would making the AI more Bayesian be among the most promising ways to use the headstart?