Integrity and accountability are core parts of rationality [LW-Crosspost]

post by Habryka · 2019-07-23T00:14:56.417Z · EA · GW · 2 comments


  Who Should You Be Accountable To?
  In summary:

This was originally a post I wrote for LessWrong [LW · GW], but it felt more relevant than usual to EA interests, so I figured I would crosspost it here. Note that this post tries to mostly talk about how to have sane thoughts yourself, and not about how to get the people around you to have sane thoughts. See the discussion on LessWrong [LW · GW] for some clarifications.

Epistemic Status: Pointing at early stage concepts, but with high confidence that something real is here. Hopefully not the final version of this post.

When I started studying rationality and philosophy, I had the perspective that people who were in positions of power and influence should primarily focus on how to make good decisions in general and that we should generally give power to people who have demonstrated a good track record of general rationality. I also thought of power as this mostly unconstrained resource, similar to having money in your bank account, and that we should make sure to primarily allocate power to the people who are good at thinking and making decisions.

That picture has changed a lot over the years. While I think there is still a lot of value in the idea of "philosopher kings", I've made a variety of updates that significantly changed my relationship to allocating power in this way:

A lot of these updates have also shaped my thinking while working at CEA, LessWrong and the LTF-Fund over the past 4 years. I've been in various positions of power, and have interacted with many people who had lots of power over the EA and Rationality communities, and I've become a lot more convinced that there is a lot of low-hanging fruit and important experimentation to be done to ensure better levels of accountability and incentive-design for the institutions that guide our community.

I also generally have broadly libertarian intuitions, and a lot of my ideas about how to build functional organizations are based on a more start-up like approach that is favored here in Silicon Valley. Initially these intuitions seemed at conflict with the intuitions for more emphasis on accountability structures, with broken legal systems, ad-hoc legislation, dysfunctional boards and dysfunctional institutions all coming to mind immediately as accountability-systems run wild. I've since then reconciled my thoughts on these topics a good bit.


Somewhat surprisingly, "integrity" has not been much discussed as a concept handle on LessWrong. But I've found it to be a pretty valuable virtue to meditate and reflect on.

I think of integrity as a more advanced form of honesty – when I say “integrity” I mean something like “acting in accordance with your stated beliefs.” Where honesty is the commitment to not speak direct falsehoods, integrity is the commitment to speak truths that actually ring true to yourself, not ones that are just abstractly defensible to other people. It is also a commitment to act on the truths that you do believe, and to communicate to others what your true beliefs are.

Integrity can be a double-edged sword. While it is good to judge people by the standards they expressed, it is also a surefire way to make people overly hesitant to update. If you get punished every time you change your mind because your new actions are now incongruent with the principles you explained to others before you changed your mind, then you are likely to stick with your principles for far longer than you would otherwise, even when evidence against your position is mounting.

The great benefit that I experienced from thinking of integrity as a virtue, is that it encourages me to build accurate models of my own mind and motivations. I can only act in line with ethical principles that are actually related to the real motivators of my actions. If I pretend to hold ethical principles that do not correspond to my motivators, then sooner or later my actions will diverge from my principles. I've come to think of a key part of integrity being the art of making accurate predictions about my own actions and communicating those as clearly as possible.

There are two natural ways to ensure that your stated principles are in line with your actions. You either adjust your stated principles until they match up with your actions, or you adjust your behavior to be in line with your stated principles. Both of those can backfire, and both of those can have significant positive effects.

Who Should You Be Accountable To?

In the context of incentive design, I find thinking about integrity valuable because it feels to me like the natural complement to accountability. The purpose of accountability is to ensure that you do what you say you are going to do, and integrity is the corresponding virtue of holding up well under high levels of accountability.

Highlighting accountability as a variable also highlights one of the biggest error modes of accountability and integrity – choosing too broad of an audience to hold yourself accountable to.

There is tradeoff between the size of the group that you are being held accountable by, and the complexity of the ethical principles you can act under. Too large of an audience, and you will be held accountable by the lowest common denominator of your values, which will rarely align well with what you actually think is moral (if you've done any kind of real reflection on moral principles).

Too small or too memetically close of an audience, and you risk not enough people paying attention to what you do, to actually help you notice inconsistencies in your stated beliefs and actions. And, the smaller the group that is holding you accountable is, the smaller your inner circle of trust, which reduces the amount of total resources that can be coordinated under your shared principles.

I think a major mistake that even many well-intentioned organizations make is to try to be held accountable by some vague conception of "the public". As they make public statements, someone in the public will misunderstand them, causing a spiral of less communication, resulting in more misunderstandings, resulting in even less communication, culminating into an organization that is completely opaque about any of its actions and intentions, with the only communication being filtered by a PR department that has little interest in the observers acquiring any beliefs that resemble reality.

I think a generally better setup is to choose a much smaller group of people that you trust to evaluate your actions very closely, and ideally do so in a way that is itself transparent to a broader audience. Common versions of this are auditors, as well as nonprofit boards that try to ensure the integrity of an organization.

This is all part of a broader reflection on trying to create good incentives for myself and the LessWrong team. I will try to follow this up with a post that more concretely summarizes my thoughts on how all of this applies to LessWrong concretely.

In summary:

[This post was originally posted on my shortform feed [? · GW]]


Comments sorted by top scores.

comment by Milan_Griffes · 2019-07-23T13:50:23.336Z · EA(p) · GW(p)
One lens to view integrity through is as an advanced form of honesty – “acting in accordance with your stated beliefs.”

How do you think this definition of integrity interacts with the appeal-to-consequences concern that's being discussed on LW these days? (1 [LW · GW], 2 [LW · GW])

I haven't thought about this rigorously, but it seems like this definition could be entirely compatible with doing a lot of appeal-to-consequences reasoning (which seems to miss some important part of what we're gesturing at when we talk about integrity).

Replies from: Milan_Griffes
comment by Milan_Griffes · 2019-07-23T14:04:14.084Z · EA(p) · GW(p)

Hmm... now I'm worried that I'm not parsing you correctly.

Are you intending something closer to (1) or (2)?

(1) "stated beliefs" just means beliefs about what is true about physical reality. You're saying that acting with integrity means doing what accords with what one thinks is true, regardless of the consequences. (Sorta like fiat justitia ruat caelum, incompatible with appeal to consequences.)

(2) "stated beliefs" means beliefs about what is true about physical reality and social reality. You're saying that acting with integrity means doing what seems best / will result in the best outcomes given one's current understanding of social reality. (Compatible with appeal to consequences.)