# Consciousness, counterfactual robustness and absurdity

post by MichaelStJules · 2022-04-12T09:19:07.033Z · EA · GW · 9 comments

## Contents

  What to do?
Helpful stuff I've read on this and was too lazy to cite properly
None


The more I think about "counterfactual robustness", the more I think consciousness is absurd.

Counterfactual robustness implies for animal (including human) brains, that even if the presence of a neuron (or specific connection) during a given sequence of brain activity didn't affect that sequence (i.e. the neuron didn't fire, and if it had disappeared, the activity would have been the same), the presence of that neuron can still matter for what exactly was experienced and whether anything was experienced at all, because it could have made a difference in counterfactual sequences that didn't happen. That seems unphysical, since we're saying that even if something made no actual physical difference, it can still make a difference for subjective experience. And, of course, since those neurons don't affect neural activity by hypothesis, their disappearance during those sequences where they have no influence wouldn't affect reports, either! So, no physical influence and no difference in reports. How could those missing neurons possibly matter during that sequence? Einstein derided quantum entanglement as "spooky action at a distance". Counterfactual robustness seems like "spooky action from alternate histories".

To really drive the point home: if you were being tortured, would it stop feeling bad if those temporarily unused neurons (had) just disappeared, but you acted no differently? Would you be screaming and begging, but unconscious?

So, surely we must reject counterfactual robustness. Then, it seems that what we're left with is that your experiences are reducible to just the patterns of actual physical events in your brain, probably roughly reducible to your neurons that actually fired and the actual signals sent between them. So, we should be some kind of identity theorist.

But neurons don't seem special, and if you reject counterfactual robustness, then it's hard to see how we wouldn't find consciousness everywhere, and not only that, but maybe even human-like experiences, like the feeling of being tortured, could be widespread in mundane places, like in the interactions between particles in walls. That seems very weird and unsettling.

Maybe we're lucky and the brain activity patterns responsible for morally relevant experiences are complex enough that they're rare in practice, though. That would be somewhat reassuring if it turns out to be true, but I'd also like whether or not my walls are being tortured to not depend too much on how many particles there are and how much they interact in basically random ways. My understanding is that the ways out (without accepting the unphysical) that don't depend on this kind of empirical luck require pretty hard physical assumptions (or even worse, biological substrationist assumptions 🤮) that would prevent us from recognizing beings functionally equivalent and overall very similar to humans as conscious, which also seems wrong. Type identity theory with detailed types (e.g. involving actual biological neurons) is one such approach.

But if we really want to be able to abstract away almost all of the physical details and just look at a causal chain of events and interactions, then we should go with a fairly abstract/substrate-independent type identity theory go with something like token identity theory/anomalous realism and accept the possibility of tortured walls. Or, we could accept a huge and probably ad hoc disjunction of physical details, "mental states such as pain could eventually be identified with the (potentially infinite) disjunctive physical state of, say, c-fiber excitation (in humans), d-fiber excitation (in mollusks), and e-network state (in a robot)" (Schneider, 2010?). But how could we possibly know what to include and exclude in this big disjunction?

Since it looks like every possible theory of consciousness either has to accept or reject counterfactual robustness, and there are absurd consequences either way, every theory of consciousness will have absurd consequences. So, it looks like consciousness is absurd.

### What to do?

I'm still leaning towards some kind of token identity theory, since counterfactual robustness seems to just imply pretty obviously false predictions about experiences when you git rid of stuff that wouldn't have made any physical difference anyway, and substrationism will be the new speciecism, whereas tortured walls just seem very weird and unsettling. But maybe I'm just clinging to physical intuitions, and I should let them go and accept counterfactual robustness, and that getting rid of the unused neurons during torture can turn off the lights. "Spooky action at a distance" ended up being real, after all.

What do you think?

### Helpful stuff I've read on this and was too lazy to cite properly

Counterfactual robustness

1. "2.2  Is a Wall a Computer?" in https://www.nyu.edu/gsas/dept/philo/faculty/block/papers/msb.html
2. "Objection 3" and "Counterfactuals Can't Count" in http://www.doc.gold.ac.uk/~mas02mb/Selected%20Papers/2004%20BICS.pdf
4. "Objection 6: Mapping to reality" https://opentheory.net/2017/07/why-i-think-the-foundational-research-institute-should-rethink-its-approach/ (also on the EA Forum here [EA · GW])

Identity theory

comment by ML · 2022-04-13T00:54:12.395Z · EA(p) · GW(p)

This is a subject I've thought about a lot, so I'm pretty happy to have seen this post :).

I'm not convinced by counterfactual robustness either. For one, I don't think humans are very robust either, since we rely on a relatively specific environment to live. And where to draw the line between robust and non-robust seems arbitrary.

Plus, whether a person is counterfactually robust can be changed without modifying them, and only by modifying their surroundings. For example, if you could perfectly predict a person's actions, you could "trap" their environment, adding some hidden cameras that check that the person doesn't deviate from your predictions, and triggers a bomb if they do deviate. Then that person is no longer counterfactually robust, since any slight change will trigger the bomb and destroy them. But we didn't touch them at all, only some hidden surroundings!

---

I also suspect that we can't just bite the bullet about consciousness and Turing machines appearing everywhere, since I think it would have anthropic implications that don't match reality. Anthropic arguments are not on very solid footing, so I'm not totally confident about that, but nonetheless I think there's probably just something we don't understand yet.

I also think this absurdity you've noticed is an instance of a more general problem, since it applies to pretty much any emergent pattern. The same way you can find consciousness everywhere, you can find all sorts of Turing machines everywhere. So I view this as the problem of trying to characterize emergent phenomena.

---

Investigating causality was the lead I followed for a while as well, but every attempt I've made with it has ended up too strong, capable of seeing imaginary Turing machines everywhere. So lately I've been investigating the possibility that emergence might be about *information* in addition to causality.

One intuition I have for this is that the problem might happen because we add information in the process of pointing to the emergent phenomena. Given a bunch of particles randomly interacting with each other, you can probably point to a path of causality and make a correspondence to a person. But pointing out that path takes a lot of information which might only be present inside the pointer, so I think it's possible that we're effectively "sneaking in" the person via our pointer.

I often also use Conway's Game of Life when I think about this issue. In the Game of Life, bits are often encoded as the presence or absence of a glider. This means that causality has to be able to travel the void of dead cells, so that the absence of a glider can be causal. This gives a pretty good argument that every cell has some causal effect on its neighbours, even dead ones.

But if we allow that, we can suddenly draw effectively arbitrary causal arrows inside a completely dead board! So I don't think that can be right, either. My current lead for solving this is that the dead board has effectively no information; it's trivial to write a proof that every future cell is also dead. On the other hand, for a complex board, proving its future state can be very difficult and might require simulating every step. This seems to point to a difference in *informational* content, even in two places where we have similar causal arrows.

So my suspicion is that random interactions inside walls might not contain the right information to encode a person. Unfortunately I don't know much information theory yet, so my progress in figuring this out is slow.

Replies from: MichaelStJules
comment by MichaelStJules · 2022-04-13T04:34:01.883Z · EA(p) · GW(p)

The bomb trap example is very interesting! Can't be counterfactually robust if you're dead. Instead of bombs, we could also just use sudden overwhelming sensory inputs in the modality they're fastest in to interrupt other processing.  However, one objection could be that there exist some counterfactuals (for the same unmodified brain) where the person does what they're supposed to. Objects we normally think of as unconscious don't even have this weaker kind of counterfactual robustness: they need to be altered into different systems to do what they're supposed to to be conscious.

But pointing out that path takes a lot of information which might only be present inside the pointer, so I think it's possible that we're effectively "sneaking in" the person via our pointer.

Interesting. Do you think if someone kept the mapping between the states and "firing" and "non-firing" neurons, and translated the events as they were happening (on paper, automatically on a computer, in their own heads), this would generate (further) consciousness?

I often also use Conway's Game of Life when I think about this issue. In the Game of Life, bits are often encoded as the presence or absence of a glider. This means that causality has to be able to travel the void of dead cells, so that the absence of a glider can be causal. This gives a pretty good argument that every cell has some causal effect on its neighbours, even dead ones.

But if we allow that, we can suddenly draw effectively arbitrary causal arrows inside a completely dead board! So I don't think that can be right, either.

Doesn't counting the causal effects of dead cells on dead cells, especially on a totally dead board, bring us back counterfactual robustness, though?

To expand a bit on the OP, the way I've tentatively been thinking about causality as the basis for consciousness is more like active physical signalling than like full counterfactuals, to avoid counterfactual robustness (and counting static objects as conscious, but there are probably plenty of ways to avoid that). On this view, dead cells don't send signals to other cells, and there's no signalling in a dead board or a dead brain, so there's no consciousness in them (at the cell-level) either. What I care about for a neuron (and I'm not sure how well this translates to Conway's Game of Life) is whether it actually just received a signal, whether it actually just "fired", and whether removing/killing it would have prevented it from sending a signal to another that it did actually send. In this way, its presence had to actually make a non-trivial difference compared just to the counterfactual where it's manipulated to be gone/dead.

Another related point is that while shadows (voids) and light spots can "move" faster than light, no actual particles are moving faster than light, and information can still only travel at most at the speed of light

On your approach and examples, it does certainly seem like information is correlated with the stuff that matters in some way. It would be interesting to see this explored further. Have you found any similar theories in the literature?

comment by Derek Shiller · 2022-04-12T12:42:20.365Z · EA(p) · GW(p)

That seems unphysical, since we're saying that even if something made no actual physical difference, it can still make a difference for subjective experience.

The neuron is still there, so its existing-but-not-firing makes a physical difference, right? Not firing is as much a thing a neuron can do as firing. (Also, for what it's worth, my impression is that cognition is less about which neurons are firing and more about what rate they are firing at and how their firing is coordinated with that of other neurons.)

But neurons don't seem special, and if you reject counterfactual robustness, then it's hard to see how we wouldn't find consciousness everywhere, and not only that, but maybe even human-like experiences, like the feeling of being tortured, could be widespread in mundane places, like in the interactions between particles in walls.

The patterns of neural firing involved in conscious experiences are surely quite complicated. Why think that we would find similar patterns anywhere outside of brains?

Replies from: MichaelStJules
comment by MichaelStJules · 2022-04-12T16:51:07.555Z · EA(p) · GW(p)

Thanks for the comment!

Yes, it's literally a physical difference, but, by hypothesis, it had no influence on anything else in the brain at the time, and your behaviour and reports would be the same. Empty space (or a disconnected or differently connected neuron) could play the same non-firing neuron role in the actual sequence of events. Of course, empty space couldn't also play the firing neuron role in counterfactuals (and a differently connected neuron wouldn't play identical roles across counterfactuals), but why would what didn't happen matter?

Do you expect that those temporarily inactive neurons disappearing temporarily (or slightly more realistically, being temporarily and artificially suppressed from firing) would make a difference to your experiences?

(Firing rates would still be captured with sequences of neurons firing, since the same neuron can fire multiple times in a sequence. If it turns out basically every neuron has a nonzero firing rate during every interval of time long enough to generate an experience, if that even makes sense, then tortured walls could be much rarer. OTOH, we could just make all the neurons only be present exactly when they need to be to preserve the pattern of firing, so they might disappear between firing.)

On finding similar patterns elsewhere, it's because of the huge number of particles and interactions between them going on and relatively small number of interactions in a morally relevant pattern of activity. A human brain has fewer than 100 billion neurons, and the maximum neuron firing rate in many morally relevant experiences is probably less than 1000. So we're only talking at most trillions of events and their connections in a second, which is long enough for a morally relevant experience. But there are many ordered subsets of merely trillions of interacting particles we can find, effectively signaling each other with forces and small changes to their positions. There are orders of magnitude more particles in your head and similar volumes of organic matter, like wood, or in water, which might be more "active", being totally liquid. There are at least 10^25 atoms in a liter of water, but at most 10^14 neuron firing events in a second in a human brain.

Hmm, but maybe because the particles are also influenced by other things outside of the selected neuron substitutes and plausibly by each other in the wrong ways, appropriate subsets and mappings are hard to find. Direct influence between more distant particles is more likely to be messed up by local events. And maybe we should count the number of (fairly) sparse directed acyclic graphs with n vertices (for neurons or particles), but at most thousands of connections per vertex on average (the average for humans).

One simplifying assumption I'd probably make first is that any neuron used at least twice in a sequence could be split into multiple neurons each used exactly once, so we can represent the actual events with a topologically sorted directed acyclic graphs, plotted with time on the x-axis. If you were to draw each neuron when it fires and the signal paths with time as the x-axis (ignoring actual spatial distances and orientations), you'd see the same neuron at different points, and then you'd just treat those as if they were entirely different neurons. Even with counterfactual robustness, you can unfold a recurrent neural network and in principle preserve all functionality (but in practice, timing would probably be hard to maintain, and functionality would likely be lost if the timings are off). This simplifying assumption rules out IIT and Recurrent Processing Theory as too strict in one way.

We could also reduce the number from trillions by using much smaller conscious animal brains. With some mammals (small rodents or bats), it could be in the billions, and with bees, in the billions or millions, and with fruit flies, in the millions, if these animals are conscious. See this wiki page for neuron count estimates across species. Their firing rates might be higher (although apparently invertebrates use continuous rather than binary neuron potentials, and mouse neuron firing rates still seem to be below 1000 per second), but then it's also plausible (although not obvious) that their shortest morally relevant time intervals are roughly inversely proportional to these firing rates, so smaller.

Replies from: MichaelStJules
comment by MichaelStJules · 2022-04-12T18:10:54.710Z · EA(p) · GW(p)

For , the number of directed graphs with  vertices labelled   and at most  directed edges from any vertex (and no multiple edges going the same way between the same pair of vertices) has an upper bound of

The number of directed acyclic graphs assuming the vertices are topologically sorted by their labels is smaller, though, with an upper bound like the following, since we only back-connect each vertex  to at most  of the previous  vertices.

But even 1 million choose 1000 is a huge huge number, 10^3432, and the number of atoms in the observable universe is only within a few orders of magnitude of 10^80, far far smaller. A very loose upper bound is 10^202681, for at most 100 trillion neuron firings (1000 firings in a second per neuron x 100 billion neurons in the human brain) and at most 20,000 connections per neuron (the average in the human brain is 1000-7000 according to this page and up to 15,000 for a given neuron here).

I think the question is whether or not we can find huge degrees of freedom and numbers of events in mundane places and with the flexibility we have for interpreting different events like particle movements as neurons firing and sending signals (via elementary forces between particles).

For example, if all the particles in a group of  continuously move and continuously exert force on one another, there are  ways to order those n particles, one movement per particle to represent a neuron firing, and use (the changes in) exertions of force between particles to represent signals between neurons. 1 million! is about 10^5565709. Maybe these numbers don't actually matter much, and we can pick any chronological ordering of particle movements for at most 100 trillion mutually interacting particles to represent each time a neuron fired, and map each signal from one neuron to the next to (a change in) the exertion of force between the corresponding particles.

However, this ignores all but one force exerted on each particle at a time (and there are at least , by hypothesis) and so the net forces and net movements of particles aren't explained this way. And maybe this is too classical (non-quantum) of a picture, anyway.

Replies from: Derek Shiller, Derek Shiller
comment by Derek Shiller · 2022-04-12T23:58:15.740Z · EA(p) · GW(p)

But there are many ordered subsets of merely trillions of interacting particles we can find, effectively signaling each other with forces and small changes to their positions.

In brains, patterns of neural activity stimulate further patterns of neural activity. We can abstract this out into a system of state changes and treat conscious episodes as patterns of state changes. Then if we can find similar causal networks of state changes in the wall, we might have reason to think they are conscious as well. Is this the idea? If so, what sort of states are you imagining to change in the wall? Is it the precise configurations of particles? I expect a lot of the states you'll identify to fulfill the relevant patterns will be arbitrary or gerrymandered. That might be an important difference that should make us hesitate before ascribing conscious experiences to walls.

Replies from: MichaelStJules
comment by MichaelStJules · 2022-04-13T01:50:45.986Z · EA(p) · GW(p)

I need to think about this in more detail, but here are some rough ideas, mostly thinking out loud (and perhaps not worth your time to go through these):

1. One possibility is that because we only care about when the neurons are firing if we reject counterfactual robustness anyway, we don't even need to represent when they're not firing with particle properties. Then the signals from one neuron to the next can just be represented by the force exerted by the corresponding particle to the next corresponding particle. However, this way, the force doesn't seem responsible for the "firing" state (i.e. that Y exerts a force on Z is not because of some Z that exerted a force on Y before that), so this probably doesn't work.
2. We can just pick any specific property, and pick a threshold between firing and non-firing that puts every particle well-above the threshold into firing. But again, the force wouldn't be responsible for the state being above the threshold.
3. We can use a particle's position, velocity, acceleration, energy, net force, whatever as encoding whether or not a neuron is firing, but then we only care about when the neurons are firing anyway, and we could have independent freedom for each individual particle to decide which quantity or vector to use, which threshold to use, which side of the threshold counts as a neuron firing, etc.. If we use all of those independent degrees of freedom or even just one independent degree of freedom per particle, then this does seem pretty arbitrary and gerrymandered. But we can also imagine replacing individual neurons in a full typical human brain each with a different kind of artificial neuron (or particle) whose firing is replaced by a different kind of degree of freedom, and still preserve counterfactual robustness, and it could (I'm not sure) look the same once we get rid of all of the inactive neurons, so is it really gerrymandered?
4. If we only have a number of degrees of freedom much smaller than the number of times neurons fired, so we need to pick things for all the particles at once (a quantity or vector, a uniform threshold to separate firing from non-firing, the same side of the threshold for all), and not independently, then it doesn't seem very gerrymandered, but you can still get a huge number of degrees of freedom by considering degrees of freedom that neurons should probably be allowed in our interpreting their activity as conscious:
1. which particles to use (given n particles, we have  subsets of k particles to choose from)
2. which moment each particle counts as "firing", and exactly which neuron firing event it gets mapped to.

But for 4, I still don't know what "signals" to use, so that they are "responsible" for the states. Maybe any reasonable signal that relates to states in the right way will make it incredibly unlikely for walls to be conscious.

comment by Derek Shiller · 2022-04-12T23:44:51.753Z · EA(p) · GW(p)

Yes, it's literally a physical difference, but, by hypothesis, it had no influence on anything else in the brain at the time, and your behaviour and reports would be the same. Empty space (or a disconnected or differently connected neuron) could play the same non-firing neuron role in the actual sequence of events. Of course, empty space couldn't also play the firing neuron role in counterfactuals (and a differently connected neuron wouldn't play identical roles across counterfactuals), but why would what didn't happen matter?

I can get your intuition about your case. Here is another with the same logic in which I don't have the corresponding intuition:

Suppose that instead of just removing all non-firing neurons, we also remove all neurons both before they are triggered and after they trigger the next neurons in the sequence. E.g. you brain consists of neurons that magically pop into existence just in time to have the right effect on the next neurons that pop into existence in the sequence, and then they disappear back into nothing. We could also go a level down and have your brain consist only in atoms that briefly pop into existence in time to interact with the next atoms.

Your behavior and introspective reports wouldn't change -- do you think you'd still be conscious?

Replies from: MichaelStJules
comment by MichaelStJules · 2022-04-13T00:55:11.622Z · EA(p) · GW(p)

If the signals are still there to ensure causal influence, I think I would still be conscious like normal. The argument is exactly the same: whenever something is inactive and not affecting other things, it doesn't need to be there at all.

We could also go a level down and have your brain consist only in atoms that briefly pop into existence in time to interact with the next atoms.

This is getting close to the problem I'm grappling with, once we step away from neurons and look at individual particles (or atoms). First, I could imagine individual atoms acting like neurons to implement a human-like neural network in a counterfactually robust way, too, and that would very likely be conscious. The atoms could literally pass photons or electrons to one another. Or maybe the signals would be their (changes in the) exertion of elementary forces (or gravity?). If during a particular sequence of events, whenever something happened to be inactive, it happened to disappear, then this shouldn't make a difference.

But if you start from something that was never counterfactually robust in the first place, which I think is your intention, and its events just happen to match a conscious sequence of activity in a human brain, then it seems like it probably wouldn't be conscious (although this is less unintuitive to me than is accepting counterfactual robustness mattering in a system that is usually counterfactually robust). Rejecting counterfactual robustness (together with my other views, and assuming things are arranged and mapped correctly) seems to imply that this should be conscious, and the consequences seem crazy if this turns out to be morally relevant.

It seems like counterfactual robustness might matter for consciousness in systems that aren't normally conscious but very likely doesn't matter in systems that are normally conscious, which doesn't make much sense to me.