Success without dignity: a nearcasting story of avoiding catastrophe by luck 2023-03-15T20:17:34.922Z
What does Bing Chat tell us about AI risk? 2023-02-28T18:47:12.198Z
How major governments can help with the most important century 2023-02-24T19:37:18.985Z
Taking a leave of absence from Open Philanthropy to work on AI safety 2023-02-23T19:05:43.755Z
What AI companies can do today to help with the most important century 2023-02-20T17:40:32.276Z
Jobs that can help with the most important century 2023-02-12T18:19:00.000Z
We're no longer "pausing most new longtermist funding commitments" 2023-01-30T19:29:17.722Z
Spreading messages to help with the most important century 2023-01-25T20:35:25.026Z
How we could stumble into AI catastrophe 2023-01-16T14:52:50.648Z
Transformative AI issues (not just misalignment): an overview 2023-01-06T02:19:41.816Z
Racing through a minefield: the AI deployment problem 2022-12-31T21:44:56.039Z
High-level hopes for AI alignment 2022-12-20T02:11:19.222Z
AI Safety Seems Hard to Measure 2022-12-11T01:31:39.092Z
Why Would AI "Aim" To Defeat Humanity? 2022-11-29T18:59:25.845Z
My takes on the FTX situation will (mostly) be cold, not hot 2022-11-18T23:57:13.341Z
Some comments on recent FTX-related events 2022-11-10T22:23:53.742Z
“Technological unemployment” AI vs. “most important century” AI: how far apart? 2022-10-11T04:50:54.591Z
EA is about maximization, and maximization is perilous 2022-09-02T17:13:52.226Z
How might we align transformative AI if it’s developed very soon? 2022-08-29T15:48:13.744Z
AI strategy nearcasting 2022-08-26T16:25:24.411Z
The Track Record of Futurists Seems ... Fine 2022-07-04T15:47:16.231Z
Nonprofit Boards are Weird 2022-06-28T10:17:52.343Z
AI Could Defeat All Of Us Combined 2022-06-10T23:25:51.238Z
Useful Vices for Wicked Problems 2022-04-12T19:19:42.902Z
Ideal governance (for companies, countries and more) 2022-04-07T16:54:02.748Z
The Wicked Problem Experience 2022-03-02T17:43:00.747Z
Important, actionable research questions for the most important century 2022-02-24T16:34:29.061Z
Learning By Writing 2022-02-22T15:39:53.803Z
Future-proof ethics 2022-02-02T19:40:24.811Z
Other-centered ethics and Harsanyi's Aggregation Theorem 2022-02-02T03:21:46.054Z
AI alignment research links 2022-01-06T05:52:51.207Z
Consider trying the ELK contest (I am) 2022-01-05T19:42:11.106Z
Bayesian Mindset 2021-12-21T19:54:28.391Z
Rowing, Steering, Anchoring, Equity, Mutiny 2021-11-30T21:11:46.378Z
Minimal-trust investigations 2021-11-23T18:02:46.511Z
“Biological anchors” is about bounding, not pinpointing, AI timelines 2021-11-18T21:03:56.695Z
Weak point in "most important century": lock-in 2021-11-11T22:02:44.392Z
Comments for shorter Cold Takes pieces 2021-11-03T12:48:41.645Z
Has Life Gotten Better? 2021-10-05T08:31:40.696Z
Summary of history (empowerment and well-being lens) 2021-09-28T17:48:19.417Z
Call to Vigilance 2021-09-15T18:46:09.068Z
How to make the best of the most important century? 2021-09-14T21:05:57.096Z
AI Timelines: Where the Arguments, and the "Experts," Stand 2021-09-07T17:35:12.431Z
The Most Important Century: Sequence Introduction 2021-09-03T08:10:53.657Z
Forecasting transformative AI: the "biological anchors" method in a nutshell 2021-08-31T18:17:03.013Z
Forecasting Transformative AI: Are we "trending toward" transformative AI? (How would we know?) 2021-08-24T17:15:18.742Z
Forecasting transformative AI: what's the burden of proof? 2021-08-17T17:14:37.482Z
Forecasting Transformative AI: What Kind of AI? 2021-08-10T21:38:46.178Z
This Can't Go On 2021-08-03T15:53:33.837Z
Digital People FAQ 2021-07-27T17:19:59.605Z


Comment by Holden Karnofsky (HoldenKarnofsky) on My takes on the FTX situation will (mostly) be cold, not hot · 2023-03-23T21:44:27.861Z · EA · GW

To give a rough idea, I basically mean anyone who is likely to harm those around them (using a common-sense idea of doing harm) and/or "pollute the commons" by having an outsized and non-consultative negative impact on community dynamics. It's debatable what the best warning signs are and how reliable they are.

Comment by Holden Karnofsky (HoldenKarnofsky) on Time Article Discussion - "Effective Altruist Leaders Were Repeatedly Warned About Sam Bankman-Fried Years Before FTX Collapsed" · 2023-03-22T22:44:40.700Z · EA · GW

Re: "In the weeks leading up to that April 2018 confrontation with Bankman-Fried and in the months that followed, Mac Aulay and others warned MacAskill, Beckstead and Karnofsky about her co-founder’s alleged duplicity and unscrupulous business ethics" -

I don't remember Tara reaching out about this, and I just searched my email for signs of this and didn’t see any. I'm not confident this didn't happen, just noting that I can't remember or easily find signs of it.

In terms of what I knew/learned 2018 more generally, I discuss that here.

Comment by Holden Karnofsky (HoldenKarnofsky) on Taking a leave of absence from Open Philanthropy to work on AI safety · 2023-03-22T22:40:28.474Z · EA · GW

For context, my wife is the President and co-founder of Anthropic, and formerly worked at OpenAI.

80% of her equity in Anthropic is (not legally bindingly) pledged for donation. None of her equity in OpenAI is. She may pledge more in the future if there is a tangible compelling reason to do so.

I plan to be highly transparent about my conflict of interest, e.g. I regularly open meetings by disclosing it if I’m not sure the other person already knows about it, and I’ve often mentioned it when discussing related topics on Cold Takes.

I also plan to discuss the implications of my conflict of interest for any formal role I might take. It’s possible that my role in helping with safety standards will be limited to advising with no formal powers (it’s even possible that I’ll decide I simply can’t work in this area due to the conflict of interest, and will pursue one of the other interventions I’ve thought about).

But right now I’m just exploring options and giving non-authoritative advice, and that seems appropriate. (I’ll also note that I expect a lot of advice and opinions on standards to come from people who are directly employed by AI companies; while this does present a conflict of interest, and a more direct one than mine, I think it doesn’t and can’t mean they are excluded from relevant conversations.)

Comment by Holden Karnofsky (HoldenKarnofsky) on Some comments on recent FTX-related events · 2023-03-22T22:37:49.953Z · EA · GW

There was no one with official responsibility for the relationship between FTX and the EA community. I think the main reason the two were associated was via FTX’s/Sam having a high profile and talking a lot about EA - that’s not something anyone else was able to control. (Some folks did ask him to do less of this.)

It’s also worth noting that we generally try to be cautious about power dynamics as a funder, which means we are hesitant to be pushy about most matters. In particular, I think one of two major funders in this space attacking the other, nudging grantees to avoid association and funding from it, etc. would’ve been seen as strangely territorial behavior absent very strong evidence of misconduct.

That said: as mentioned in another comment, with the benefit of hindsight, I wish I’d reasoned more like this: “This person is becoming very associated with effective altruism, so whether or not that’s due to anything I’ve done, it’s important to figure out whether that’s a bad thing and whether proactive distancing is needed.”

Comment by Holden Karnofsky (HoldenKarnofsky) on Some comments on recent FTX-related events · 2023-03-22T22:35:56.977Z · EA · GW

In 2018, I heard accusations that Sam had communicated in ways that left people confused or misled, though often with some ambiguity about whether Sam had been confused himself, had been inadvertently misleading while factually accurate, etc. I put some effort into understanding these concerns (but didn’t spend a ton of time on it; Open Phil didn’t have a relationship with Sam or Alameda).

I didn’t hear anything that sounded anywhere near as bad as what has since come out about his behavior at FTX. At the time I didn’t feel my concerns rose to the level where it would be appropriate or fair to publicly attack or condemn him. The whole situation did make me vaguely nervous, and I spoke with some people about it privately, but I never came to a conclusion that there was a clearly warranted (public) action.

Comment by Holden Karnofsky (HoldenKarnofsky) on My takes on the FTX situation will (mostly) be cold, not hot · 2023-03-21T16:37:55.595Z · EA · GW

I don’t believe #1 is correct. The Open Philanthropy grant is a small fraction of the funding OpenAI has received, and I don’t think it was crucial for OpenAI at any point.

I think #2 is fair insofar as running a scaling lab poses big risks to the world. I hope that OpenAI will avoid training or deploying directly dangerous systems; I think that even the deployments it’s done so far pose risks via hype and acceleration. (Considering the latter a risk to society is an unusual standard to hold a company to, but I think it’s appropriate here.)

#3 seems off to me - “regulatory capture” does not describe what’s at the link you gave (where’s the regulator?) At best it seems like a strained analogy, and even there it doesn’t seem right to me - I don’t know of any sense in which I or anyone else was “captured” by OpenAI.

I can’t comment on #4.

#5 seems off to me. I don’t know whether OpenAI uses nondisparagement agreements; I haven’t signed one. The reason I am careful with public statements about OpenAI is (a) it seems generally unproductive for me to talk carelessly in public about important organizations (likely to cause drama and drain the time and energy of me and others); (b) I am bound by confidentiality requirements, which are not the same as nondisparagement requirements. Information I have access to via having been on the board, or via being married to a former employee, is not mine to freely share.

Comment by Holden Karnofsky (HoldenKarnofsky) on Spreading messages to help with the most important century · 2023-03-21T02:15:25.865Z · EA · GW

Just noting that many of the “this concept is properly explained elsewhere” links are also accompanied by expandable boxes that you can click to expand for the gist. I do think that understanding where I’m coming from in this piece requires a bunch of background, but I’ve tried to make it as easy on readers as I could, e.g. explaining each concept in brief and providing a link if the brief explanation isn’t clear enough or doesn’t address particular objections.

Comment by Holden Karnofsky (HoldenKarnofsky) on Spreading messages to help with the most important century · 2023-03-21T02:14:30.694Z · EA · GW

Noting that I’m now going back through posts responding to comments, after putting off doing so for months - I generally find it easier to do this in bulk to avoid being distracted from my core priorities, though this time I think I put it off longer than I should’ve.

It is generally true that my participation in comments is extremely sporadic/sparse, and folks should factor that into curation decisions.

Comment by Holden Karnofsky (HoldenKarnofsky) on Taking a leave of absence from Open Philanthropy to work on AI safety · 2023-03-21T02:12:44.392Z · EA · GW

I wouldn’t say I’m in “sprinting” mode - I don’t expect my work hours to go up (and I generally work less than I did a few years ago, basically because I’m a dad now).

The move is partly about AI timelines, partly about the opportunities I see and partly about Open Philanthropy’s stage of development.

Comment by Holden Karnofsky (HoldenKarnofsky) on Taking a leave of absence from Open Philanthropy to work on AI safety · 2023-03-21T02:11:28.716Z · EA · GW

I threw that book together for people who want to read it on Kindle, but it’s quite half-baked. If I had the time, I’d want to rework the series (and a more recent followup series at into a proper book, but I’m not sure when or whether I’ll do this.

Comment by Holden Karnofsky (HoldenKarnofsky) on We're no longer "pausing most new longtermist funding commitments" · 2023-03-21T02:11:14.197Z · EA · GW

I expect more funding discontinuations than usual, but we generally try to discontinue funding in a way that gives organizations time to plan around the change.

I’m not leading the longer-term process. I expect Open Philanthropy will publish content about it, but I’m not sure when.

Comment by Holden Karnofsky (HoldenKarnofsky) on We're no longer "pausing most new longtermist funding commitments" · 2023-03-21T02:10:44.286Z · EA · GW

I don’t have a good answer, sorry. The difficulty of getting cardinal estimates for longtermist grants is a lot of what drove our decision to go with an ordinal approach instead.

Comment by Holden Karnofsky (HoldenKarnofsky) on We're no longer "pausing most new longtermist funding commitments" · 2023-03-21T02:10:28.278Z · EA · GW

Aiming to spend down in less than 20 years would not obviously be justified even if one’s median for transformative AI timelines were well under 20 years. This is because we may want extra capital in a “crunch time” where we’re close enough to transformative AI for the strategic picture to have become a lot clearer, and because even a 10-25% chance of longer timelines would provide some justification for not spending down on short time frames.

This move could be justified if the existing giving opportunities were strong enough even with a lower bar. That may end up being the case in the future. But we don’t feel it’s the case today, having eyeballed the stack rank.

Comment by Holden Karnofsky (HoldenKarnofsky) on My takes on the FTX situation will (mostly) be cold, not hot · 2023-03-21T02:05:15.174Z · EA · GW

Here’s a followup with some reflections.

Note that I discuss some takeaways and potential lessons learned in this interview.

Here are some (somewhat redundant with the interview) things I feel like I’ve updated on in light of the FTX collapse and aftermath:

  • The most obvious thing that’s changed is a tighter funding situation, which I addressed here.
  • I’m generally more concerned about the dynamics I wrote about in EA is about maximization, and maximization is perilous. If I wrote that piece today, most of it would be the same, but the “Avoiding the pitfalls” section would be quite different (less reassuring/reassured). I’m not really sure what to do about these dynamics, i.e., how to reduce the risk that EA will encourage and attract perilous maximization, but a couple of possibilities:
    • It looks to me like the community needs to beef up and improve investments in activities like “identifying and warning about bad actors in the community,” and I regret not taking a stronger hand in doing so to date. (Recent sexual harassment developments reinforce this point.).
    • I’ve long wanted to try to write up a detailed intellectual case against what one might call “hard-core utilitarianism.” I think arguing about this sort of thing on the merits is probably the most promising way to reduce associated risks; EA isn’t (and I don’t want it to be) the kind of community where you can change what people operationally value just by saying you want it to change, and I think the intellectual case has to be made. I think there is a good substantive case for pluralism and moderation that could be better-explained and easier to find, and I’m thinking about how to make that happen (though I can’t promise to do so soon).
  • I had some concerns about SBF and FTX, but I largely thought of the situation as not being my responsibility, as Open Philanthropy had no formal relationship to either. In hindsight, I wish I’d reasoned more like this: “This person is becoming very associated with effective altruism, so whether or not that’s due to anything I’ve done, it’s important to figure out whether that’s a bad thing and whether proactive distancing is needed.”
  • I’m not surprised there are some bad actors in the EA community (I think bad actors exist in any community), but I’ve increased my picture of how much harm a small set of them can do, and hence I think it could be good for Open Philanthropy to become more conservative about funding and associating with people who might end up being bad actors (while recognizing that it won’t be able to predict perfectly on this front).
  • Prior to the FTX collapse, I had been gradually updating toward feeling like Open Philanthropy should be less cautious with funding and other actions; quicker to trust our own intuitions and people who intuitively seemed to share our values; and generally less cautious. Some of this update was based on thinking that some folks associated with FTX were being successful with more self-trusting, less-cautious attitudes; some of it was based on seeing few immediate negative consequences of things like the Future Fund regranting program; some of it was probably a less rational response to peer pressure. I now feel the case for caution and deliberation in most actions is quite strong - partly because the substantive situation has changed (effective altruism is now enough in the spotlight, and controversial enough, that the costs of further problems seem higher than they did before).
    • On this front, I’ve updated a bit toward my previous self, and more so toward Alexander’s style, in terms of wanting to weigh both explicit risks and vague misgivings significantly before taking notable actions. That said, I think balance is needed and this is only a fairly moderate update, partly because I didn’t update enormously in the other direction before. I think I’m still overall more in favor of moving quickly than I was ~5 years ago, for a number of reasons. In any case I don’t expect there to be a dramatic visible change on this front in terms of Open Philanthropy’s grantmaking, though it might be investing more effort in improving functions like community health.
  • Having seen the EA brand under the spotlight, I now think it isn’t a great brand for wide public outreach. It throws together a lot of very different things (global health giving, global catastrophic risk reduction, longtermism) in a way that makes sense to me but seems highly confusing to many, and puts them all under a wrapper that seems self-righteous and, for lack of a better term, punchable? I still think of myself as an effective altruist and think we should continue to have an EA brand for attracting the sort of people (like myself) who want to put a lot of dedicated, intensive time into thinking about what issues they can work on to do the most good; but I’m not sure this is the brand that will or should attract most of the people who can be helpful on key causes. I think it’s probably good to focus more on building communities and professional networks around specific causes (e.g., AI risk, biorisk, animal welfare, global health) relative to building them around “EA.”
  • I think we should see “EA community building” as less valuable than before, if only because one of the biggest seeming success stories now seems to be a harm story. I think this concern applies to community building for specific issues as well. It’s hard to make a clean quantitative statement about how this will change Open Philanthropy's actions, but it’s a factor in how we recently ranked grants. I think it'll be important to do quite a bit more thinking about this (and in particular, to gather more data along these lines) in the longer run.
Comment by Holden Karnofsky (HoldenKarnofsky) on High-level hopes for AI alignment · 2023-03-21T01:55:47.691Z · EA · GW

My point with the observation you quoted wasn't "This would be unprecedented, therefore there's a very low prior probability." It was more like: "It's very hard to justify >90% confidence on anything without some strong base rate to go off of. In this case, we have no base rate to go off of; we're pretty wildly guessing." I agree something weird has to happen fairly "soon" by zoomed-out historical standards, but there are many possible candidates for what the weird thing is (I also endorse dsj's comment below).

Comment by Holden Karnofsky (HoldenKarnofsky) on What AI companies can do today to help with the most important century · 2023-03-18T03:50:13.154Z · EA · GW

If I saw a path to slowing down or stopping AI development, reliably and worldwide, I think it’d be worth considering.

But I don’t think advising particular AI companies to essentially shut down (or radically change their mission) is a promising step toward that goal.

And I think partial progress toward that goal is worse than none, if it slows down relatively caution-oriented players without slowing down others.

Comment by Holden Karnofsky (HoldenKarnofsky) on Jobs that can help with the most important century · 2023-03-18T01:14:26.524Z · EA · GW

Not easily - I skimmed it before linking to it and thought "Eh, I would maybe reframe some of these if I were writing the post today," but found it easier to simply note that point than to do a rewrite or even a list of specific changes, given that I don't think the picture has radically changed.

Comment by Holden Karnofsky (HoldenKarnofsky) on Spreading messages to help with the most important century · 2023-03-18T01:10:40.647Z · EA · GW


Comment by Holden Karnofsky (HoldenKarnofsky) on High-level hopes for AI alignment · 2023-03-18T01:00:50.401Z · EA · GW

Civilizational collapse would be a historically unprecedented event, and the future is very hard to predict; on those grounds alone, putting the odds of civilizational collapse above 90% seems like it requires a large burden of proof/argumentation. I don't think "We can't name a specific, likely-seeming path to success now" is enough to get there - I think there are many past risks to civilization that people worried about in advance, didn't see clear paths to dealing with, and yet didn't end up being catastrophic. Furthermore, I do envision some possible paths to success, e.g.

Comment by Holden Karnofsky (HoldenKarnofsky) on How might we align transformative AI if it’s developed very soon? · 2023-03-18T00:55:29.438Z · EA · GW

Very belatedly fixed - thanks!

Comment by Holden Karnofsky (HoldenKarnofsky) on Nonprofit Boards are Weird · 2023-03-18T00:49:08.423Z · EA · GW

This sounds like it could be good for some organizations (e.g., membership organizations), though it's less clear how to make it work (who gets a vote?) for many other types of organizations.

Comment by Holden Karnofsky (HoldenKarnofsky) on Nonprofit Boards are Weird · 2023-03-18T00:48:11.605Z · EA · GW

I broadly agree with these recommendations. I think they are partial but not full mitigations to the "weird" properties I mention, and often raise challenges of their own (though I think they're often worth it on balance).

I haven't seen much in the way of nonprofit boards with limited powers / outside-the-board accountability. (I haven't mostly dealt with membership organizations.) It definitely sounds interesting, but I don't have solid examples of how it's done in practice and what other issues are raised by that.

Comment by Holden Karnofsky (HoldenKarnofsky) on AI Could Defeat All Of Us Combined · 2023-03-18T00:42:54.203Z · EA · GW

These questions are outside the scope of this post, which is about what would happen if AIs were pointed at defeating humanity.

I don't think there's a clear answer to whether AIs would have a lot of their goals in common, or find it easier to coordinate with each other than with humans, but the probability of each seems at least reasonably high if they are all developed using highly similar processes (making them all likely more similar to each other in many ways than to humans).

Comment by Holden Karnofsky (HoldenKarnofsky) on AI Could Defeat All Of Us Combined · 2023-03-18T00:40:29.444Z · EA · GW

Sorry for chiming in so late! The basic idea here is that if you have 2x the resources it would take to train a transformative model, then you have enough to run a huge number of them.

It's true that the first transformative model might eat all the resources its developer has at the time. But it seems likely that (a) given that they've raised $X to train it as a reasonably speculative project, once it turns out to be transformative there will probably be at least a further $X available to pay for running copies; (b) not too long after, as compute continues to get more efficient, someone will have the 2x the resources needed to train the model.

Comment by Holden Karnofsky (HoldenKarnofsky) on The Wicked Problem Experience · 2023-03-17T23:50:06.494Z · EA · GW

Apologies for chiming in so late!

I believe GWWC's recommendation of Against Malaria Foundation was based on GiveWell's (otherwise they might've recommended another bednet charity). And Peter Singer generally did not recommend the charities that GiveWell ranks highly, before GiveWell ranked them highly.

I don't want to deny, though, that for any given research project you might undertake, there's often a much quicker approach that gets you part of the way there. I think the process you described is a fine way to generate some good initial leads (I think GWWC independently recommended Schistosomiasis Control Initiative before GiveWell did, for example). As the stakes of the research rise, though, I think it becomes more valuable and important to get a lot of the details right - partly because so much money rides on it, partly because quicker approaches seem more vulnerable to adversarial behavior/Goodharting of the process.

Comment by Holden Karnofsky (HoldenKarnofsky) on Doing EA Better · 2023-02-10T04:25:59.723Z · EA · GW

Thanks for the time you’ve put into trying to improve EA, and it’s unfortunate that you feel the need to do so anonymously!

Below are some reactions, focused on points that you highlighted to me over email as sections you’d particularly appreciate my thoughts on.

On anonymity - as a funder, we need to make judgments about potential grantees, but want to do so in a way that doesn't create perverse incentives. This section of an old Forum post summarizes how I try to reconcile these goals, and how I encourage others to. When evaluating potential grantees, we try to focus on what they've accomplished and what they're proposing, without penalizing them for holding beliefs we don't agree with.

  • I understand that it’s hard to trust someone to operate this way and not hold your beliefs against you; generally, if one wants to do work that’s only a fit for one source of funds (even if those funds run through a variety of mechanisms!), I’m (regretfully) sympathetic to feeling like the situation is quite fragile and calls for a lot of carefulness.
  • That said, for whatever it’s worth, I believe this sort of thing shouldn’t be a major concern w/r/t Open Philanthropy funding; “lack of output or proposals that fit our goals” seems like a much more likely reason not to be funded than “expressed opinions we disagree with.”

On conflicts of interest: with a relatively small number of people interested in EA overall, it doesn’t feel particularly surprising to me that there are a relatively small number of particularly prominent folks who are or have been involved in multiple of the top organizations. More specifically:

  • Since Open Philanthropy funds a large % of the orgs focused on our priority issues, it doesn’t seem surprising or concerning that many of the people who’ve spent some time working for Open Philanthropy have also spent some time working for Open Philanthropy grantees. I think it is generally common for funders to hire people who previously worked at their grantees, and in turn for ex-employees of funders to leave for jobs at grantees.
  • It doesn’t seem surprising or concerning that people who have written prominent books on EA-connected ideas have also helped build community infrastructure organizations such as the Centre for Effective Altruism.

To be clear, I think it’s important for conflicts of interest to be disclosed and handled appropriately, and there are some conflicts of interest that concern me for sure - I don’t at all mean to minimize the importance of conflicts of interest or potential concerns around them. I still thought it was worth sharing those reactions to the specific takes given in that section of the post.

On our concentration on a couple of existential risks: here I think we disagree. OP works on a wide variety of causes, but I don't think we should be diversifying more than we are *within* existential risk given our picture of the size and neglectedness of the different risks.

On being in line with the interests of billionaires: I understand the misgivings that people have about EA being so reliant on a small number of funders, and address that point below. And I understand skepticism that funders who have made their wealth in the technology industry have only global impact in mind when they focus their philanthropy on technology issues. For what it’s worth, in the case of the particular billionaires I know best, Cari and Dustin were pretty emotionally reluctant to work on x-risks (and I was as well) - this felt at least to me like a case of them reluctantly concluding that these are important issues rather than coming in with pet causes.

On centralization of funding: I’m having trouble operationalizing the calls for less centralization of funding decision-making, which seems to be the main driver of much of your concerns. I agree that heavy concentration of funding for a given area brings some concerns that would be reduced if the same amount of funding were more spread out among funders; but I haven't seen an alternative funding mechanism proposed that seems terribly promising.

I was broadly in sync with Dustin's thoughts here, though not saying I'd endorse every word. I don't see a good way to define the "members" of EA without keeping a lot of judgment/discretion over what counts (and thus keeping the concentration of power around), or eroding the line between EA and the broader world with its very different priorities. To me, it looks like EA is fundamentally a self-applied label for a bunch of individuals making decisions using an intellectual framework that’s both unusual and highly judgment-laden; I think there are good and bad things about this, but I haven't seen a way to translate it into more systematic or democratic formal structures without losing those qualities.

I’m not confident here and don’t pretend to have thought it fully through. I remain interested in suggestions for approaches to spending Cari’s and Dustin’s capital that could improve how it’s spent - the more specific and mechanistic, the better.

Comment by Holden Karnofsky (HoldenKarnofsky) on Why did CEA buy Wytham Abbey? · 2022-12-16T19:20:05.426Z · EA · GW

My take is about 90% in agreement with this. 

The other 10% is something like: "But sometimes adding time and care to how, when, and whether you say something can be a big deal. It could have real effects on the first impressions you, and the ideas and communities and memes you care about, make on people who (a) could have a lot to contribute on goals you care about; (b) are the sort of folks for whom first impressions matter."

10% is maybe an average. I think it should be lower (5%?) for an early-career person who's prioritizing exploration, experimentation and learning. I think it should be higher (20%?) for someone who's in a high-stakes position, has a lot of people scrutinizing what they say, and would lose the opportunity to do a lot of valuable things if they substantially increased the time they spent clearing up misunderstandings.

I wish it could be 0% instead of 5-20%, and this emphatically includes what I wish for myself. I deeply wish I could constantly express myself in exploratory, incautious ways - including saying things colorfully and vividly, saying things I'm not even sure I believe, and generally 'trying on' all kinds of ideas and messages. This is my natural way of being; but I feel like I’ve got pretty unambiguous reasons to think it’s a bad idea.

If you want to defend 0%, can you give me something here beyond your intuition? The stakes are high (and I think "Heuristics are almost never >90% right" is a pretty good prior).

Comment by Holden Karnofsky (HoldenKarnofsky) on Future-proof ethics · 2022-04-01T03:03:44.030Z · EA · GW

Hm. I contacted Nick and replaced it with another link - does that work?

Comment by Holden Karnofsky (HoldenKarnofsky) on Comments for shorter Cold Takes pieces · 2022-03-31T23:10:08.822Z · EA · GW

I didn't make a claim that constant replacement occurs "empirically." As far as I can tell, it's not possible to empirically test whether it does or not. I think we are left deciding whether we choose to think of ourselves as being constantly replaced, or not - either choice won't contradict any empirical observations. My post was pointing out that if one does choose to think of things that way, a lot of other paradoxes seem to go away.

Comment by Holden Karnofsky (HoldenKarnofsky) on Comments for shorter Cold Takes pieces · 2022-03-31T23:09:52.574Z · EA · GW

I personally like Radiohead a lot, but I don't feel like my subjective opinions are generally important here; with Pet Sounds I tried to focus on what seemed like an unusually clear-cut case (not that the album has nothing interesting going on, but that it's an odd choice for #1 of all time, especially in light of coming out a year after A Love Supreme).

Comment by Holden Karnofsky (HoldenKarnofsky) on Comments for shorter Cold Takes pieces · 2022-03-31T23:09:29.469Z · EA · GW

I think this is interesting and plausible, but I'm somewhat skeptical in light of the fact that there doesn't seem to have been much (or at least, very effective) outcry over the rollback of net neutrality.

Comment by Holden Karnofsky (HoldenKarnofsky) on The Wicked Problem Experience · 2022-03-31T23:09:09.905Z · EA · GW

I think this is often a good approach!

Comment by Holden Karnofsky (HoldenKarnofsky) on Important, actionable research questions for the most important century · 2022-03-31T23:08:51.228Z · EA · GW

I think "people can test their fit without much experience, but would get lots of value out of that experience for actually doing this work" is pretty valid, though I'll also comment that I think there are diminishing returns to direct experience - I think getting some experience (or at least exposure, e.g. via conversation with insiders) is important, but I don't think one necessarily needs several years inside key institutions in order to be helpful on problems like these.

Comment by Holden Karnofsky (HoldenKarnofsky) on Important, actionable research questions for the most important century · 2022-03-31T23:08:28.215Z · EA · GW

I don't have anything available for this offhand - I'd have to put serious thought into what questions are at the most productive intersection of "resolvable", "a good fit for Metaculus" and "capturing something important." Something about warning signs ("will an AI system steal at least $10 million?") could be good.

Comment by Holden Karnofsky (HoldenKarnofsky) on Consider trying the ELK contest (I am) · 2022-03-31T22:59:22.063Z · EA · GW

Thanks! I'd estimate another 10-15 hours on top of the above, so 20-30 hours total. A good amount of this felt like leisure time and could be done while not in front of a computer, which was nice. I didn't end up with "solutions" I'd be actually excited about for substantive progress on alignment, but I think I accomplished my goal of understanding the ELK writeup well enough to nitpick it.

Comment by Holden Karnofsky (HoldenKarnofsky) on Future-proof ethics · 2022-03-31T22:37:59.613Z · EA · GW

The link works for me in incognito mode (it is a Google Drive file).

Comment by Holden Karnofsky (HoldenKarnofsky) on Future-proof ethics · 2022-03-31T22:37:42.029Z · EA · GW

Thanks, this is helpful! I wasn't aware of that usage of "moral quasi-realism."

Personally, I find the question of whether principles can be described as "true" unimportant, and don't have much of a take on it. My default take is that it's convenient to sometimes use "true" in this way, so I sometimes do, while being happy to taboo it anytime someone wants me to or I otherwise think it would be helpful to.


Comment by Holden Karnofsky (HoldenKarnofsky) on Future-proof ethics · 2022-03-31T22:34:49.991Z · EA · GW

I share a number of your intuitions as a starting point, but this dialogue (and previous ones) is intended to pose challenges to those intuitions. To follow up on those:

On Challenge 1A (and as a more general point) - if we take action against climate change, that presumably means making some sort of sacrifice today for the sake of future generations. Does your position imply that this is "simply better for some and worse for others, and not better or worse on the whole?" Does that imply that it is not particularly good or bad to take action on climate change, such that we may as well do what's best for our own generation?

Also on Challenge 1A - under your model, who specifically are the people it is "better for" to take action on climate change, if we presume that the set of people that exists conditional on taking action is completely distinct from the set of people that exists conditional on not taking action (due to chaotic effects as discussed in the dialogue)?

On Challenge 1B, are you saying there is no answer to how to ethically choose between those two worlds, if one is simply presented with a choice?

On Challenge 2, does your position imply that it is wrong to bring someone into existence, because there is a risk that they will suffer greatly (which will mean they've been wronged), and no way to "offset" this potential wrong?

Non-utilitarian Holden has a lot of consequentialist intuitions that he ideally would like to accommodate, but is not all-in on consequentialism.

Comment by Holden Karnofsky (HoldenKarnofsky) on Future-proof ethics · 2022-03-31T22:34:24.690Z · EA · GW

I think that's a fair point. These positions just pretty much end up in the same place when it comes to valuing existential risk.

Comment by Holden Karnofsky (HoldenKarnofsky) on Future-proof ethics · 2022-03-31T22:34:10.331Z · EA · GW That seems reasonable re: sentientism. I agree that there's no knockdown argument against lexicographic preferences, though I find them unappealing for reasons gestured at in this dialogue.
Comment by Holden Karnofsky (HoldenKarnofsky) on Future-proof ethics · 2022-03-31T22:33:28.195Z · EA · GW

It's interesting that you have that intuition! I don't share it, and I think the intuition somewhat implies some of the "You shouldn't leave your house" type things alluded to in the dialogue.

Comment by Holden Karnofsky (HoldenKarnofsky) on Future-proof ethics · 2022-03-31T22:33:11.298Z · EA · GW

I agree with this argument for discount rates, but I think it is a practical rather than philosophical argument. That is, I don't think it undermines the idea that if we were to avert extinction, all of the future lives thereby enabled should be given "full weight."

Comment by Holden Karnofsky (HoldenKarnofsky) on Future-proof ethics · 2022-03-31T22:32:55.521Z · EA · GW

You're right that I haven't comprehensively addressed risk aversion in this piece. I've just tried to give an intuition for why the pro-risk-aversion intuition might be misleading.

Comment by Holden Karnofsky (HoldenKarnofsky) on Future-proof ethics · 2022-03-31T22:32:40.241Z · EA · GW I appreciate Kenny's comments pointing toward potentially relevant literature, and agree that you could be a utilitarian without fully biting this bullet ... but as far as I can tell, attempts to do so have enough weird consequences of their own that I'd rather just bite the bullet. This dialogue gives some of the intuition for being skeptical of some things being infinitely more valuable than others.
Comment by Holden Karnofsky (HoldenKarnofsky) on Future-proof ethics · 2022-03-31T22:32:19.110Z · EA · GW I think you lose a lot when you give up additivity, as discussed here and here.
Comment by Holden Karnofsky (HoldenKarnofsky) on Future-proof ethics · 2022-03-31T22:31:56.805Z · EA · GW I think I agree with Tyler. Also see this follow-up piece - "future-proof" is supposed to mean "would still look good if we made progress, whatever that is." This is largely supposed to be a somewhat moral-realism-agnostic operationalization of what it means for object-level arguments to be right.
Comment by Holden Karnofsky (HoldenKarnofsky) on Future-proof ethics · 2022-03-31T22:31:29.534Z · EA · GW I don't think we should assume future ethics are better than ours, and that's not the intent of the term. I discuss what I was trying to do more here.
Comment by Holden Karnofsky (HoldenKarnofsky) on Other-centered ethics and Harsanyi's Aggregation Theorem · 2022-03-31T22:30:58.277Z · EA · GW

Good point, thanks! Edited.

Comment by Holden Karnofsky (HoldenKarnofsky) on Other-centered ethics and Harsanyi's Aggregation Theorem · 2022-03-31T22:30:25.286Z · EA · GW

Thanks, this is appreciated!

Comment by Holden Karnofsky (HoldenKarnofsky) on “Biological anchors” is about bounding, not pinpointing, AI timelines · 2022-03-31T21:04:25.947Z · EA · GW

Sorry for the long delay, I let a lot of comments to respond to pile up!

APS seems like a category of systems that includes some of the others you listed (“Advanced capability: they outperform the best humans on some set of tasks which when performed at advanced levels grant significant power in today’s world (tasks like scientific research, business/military/political strategy, engineering, and persuasion/manipulation) … “). I still don’t feel clear on what you have in mind here in terms of specific transformative capabilities. If we condition on not having extreme capabilities for persuasion or research/engineering, I’m quite skeptical that something in the "business/military/political strategy" category is a great candidate to have transformative impact on its own.

Thanks for the links re: persuasion! This seems like a major theme for you and a big place where we currently disagree. I'm not sure what to make of your take, and I think I'd have to think a lot more to have stable views on it, but here are quick reactions:

  • If we made a chart of some number capturing "how easy it is to convince key parts of society to recognize and navigate a tricky novel problem" (which I'll abbreviate as "epistemic responsiveness") since the dawn of civilization, what would that chart look like? My guess is that it would be pretty chaotic; that it would sometimes go quite low and sometime sgo quite high; and that it would be very hard to predict the impact of a given technology or other development on epistemic responsiveness. Maybe there have been one-off points in history when epistemic responsiveness was very high; maybe it is much lower today compared to peak, such that someone could already claim we have passed the "point of no return"; maybe "persuasion AI" will drive it lower or higher, depending partly on who you think will have access to the biggest and best persuasion AIs and how they will use them. So I think even if we grant a lot of your views about how much AI could change the "memetic environment," it's not clear how this relates to the "point of no return."
  • I think I feel a lot less impressed/scared than you with respect to today's "persuasion techniques."
    • I'd be interested in seeing literature on how big an effect size you can get out of things like focus groups and A/B testing. My guess is that going from completely incompetent at persuasion (e.g., basically modeling your audience as yourself, which is where most people start) to "empirically understanding and incorporating your audience's different-from-you characteristics" causes a big jump from a very low level of effectiveness, but that things flatten out quickly after that, and that pouring more effort into focus groups and testing leads to only moderate effects, such that "doubling effectiveness" on the margin shouldn't be a very impressive/scary idea.
    • I think most media is optimizing for engagement rather than persuasion, and that it's natural for things to continue this way as AI advances. Engagement is dramatically easier to measure than persuasion, so data-hungry AI should help more with engagement than persuasion; targeting engagement is in some sense "self-reinforcing" and "self-funding" in a way that targeting persuasion isn't (so persuasion targeters need some sort of subsidy to compete with engagement targeters); and there are norms against targeting persuasion as well. I do expect some people and institutions to invest a lot in persuasion targeting (as they do today), but my modal expectation does not involve it becoming pervasive on nearly all websites, the way yours seems to.
    • I feel like a lot of today's "persuasion" is either (a) extremely immersive (someone is raised in a social setting that is very committed to some set of views or practices); or (b) involves persuading previously-close-to-indifferent people to believe things that call for low-cost actions (in many cases this means voting and social media posting; in some cases it can mean more consequential, but still ultimately not-super-high-personal-cost, actions). (b) can lead over time to shifting coalitions and identities, but the transition from (b) to (a) seems long.
    • I particularly don't feel that today's "persuaders" have much ability to accomplish the things that you're pointing to with "chatbots," "coaches," "Imperius curses" and "drugs." (Are there cases of drugs being used to systematically cause people to make durable, sustained, action-relevant changes to their views, especially when not accompanied by broader social immersion?)
  • I'm not really all that sure what the special role of AI is here, if we assume (for the sake of your argument that AI need not do other things to be transformative or PONR-y) a lack of scientific/engineering ability. What has/had higher ex ante probability of leading to a dramatic change in the memetic environment: further development of AI language models that could be used to write more propaganda, or the recent (last 20 years) explosion in communication channels and data, or many other changes over the last few hundred years such as the advent of radio and television, or the change in business models for media that we're living through now? This comparison is intended to be an argument both that "your kind of reasoning would've led us to expect many previous persuasion-related PONRs without needing special AI advances" and that "if we condition on persuasion-related PONRs being the big thing to think about, we shouldn't necessarily be all that focused on AI."

I liked the story you wrote! A lot of it seems reasonably likely to be reasonably on point to me - I especially liked your bits about AIs confusing people when asked about their internal lives. However:

  • I think the story is missing a kind of quantification or "quantified attitude" that seems important if we want to be talking about whether this story playing out "would mean we're probably looking at transformative/PONR-AI in the following five years." For example, I do expect progress in digital assistants, but it matters an awful lot how much progress and economic impact there is. Same goes for just how effective the "pervasive persuasion targeting" is. I think this story could be consistent with worlds in which I've updated a lot toward shorter transformative AI timelines, and with worlds in which I haven't at all (or have updated toward longer ones.)
  • As my comments probably indicate, I'm not sold on this section.
    • I'll be pretty surprised if e.g. the NYT is using a lot of persuasion targeting, as opposed to engagement targeting.
    • I do expect "People who still remember 2021 think of it as the golden days, when conformism and censorship and polarization were noticeably less than they are now" will be true, but that's primarily because (a) I think people are just really quick to hallucinate declinist dynamics and call past times "golden ages"; (b) 2021 does seem to have extremely little conformism and censorship (and basically normal polarization) by historical standards, and actually does kinda seem like a sort of epistemic golden age to me.
      • For people who are strongly and genuinely interested in understanding the world, I think we are in the midst of an explosion in useful websites, tools, and blogs that will someday be seen nostalgically;* a number of these websites/tools/blogs are remarkably influential among powerful people; and while most people are taking a lot less advantage than they could and seem to have pretty poorly epistemically grounded views, I'm extremely unconvinced that things looked better on this front in the past - here's one post on that topic.

I do generally think that persuasion is an underexplored topic, and could have many implications for transformative AI strategy. Such implications could include something like "Today's data explosion is already causing dramatic improvements in the ability of websites and other media to convince people of arbitrary things; we should assign a reasonably high probability that language models will further speed this in a way that transforms the world." That just isn't my guess at the moment.

*To be clear, I don't think this will be because websites/tools/blogs will be less useful in the future. I just think people will be more impressed with those of our time, which are picking a lot of low-hanging fruit in terms of improving on the status quo, so they'll feel impressive to read while knowing that the points they were making were novel at the time.